XSearch-Service-API

From Gcube Wiki
Revision as of 17:45, 11 March 2014 by Pavlos.fafalios (Talk | contribs) (Main Service A: Post-processing of search results)

Jump to: navigation, search

Objective

  • The objective of this activity is to provide an API for the semantic post-processing services of search results that were developed in the context of T10.4.

Requirements

  • The API should allow someone to build an application (web application, mobile application) that exploits the services of T10.4 (and their deployment in the infrastructure).
  • The functionality of 10.4 is generic, i.e. one can configure the underlying search system, the desired categories, the SPARQL endpoints to be used. It is desirable to offer this generality also at the API level.

Decisions

  • We plan to develop a REST API since such an API could be used easily in various contexts.
  • We will comply with OpenSearch because it is a standard and can allow configuring the underlying search system to use.

Services

Main Service A: Post-processing of search results

This service is responsible for the post-proccessing of the search results by performing textual clustering, entity mining or both in the top search results as returned by the given search system. The API accepts (as input parameters) an underlying search system and a set of parameters (e.g. query, number of results to analyze, etc.). The underlying source can be an opensearch compliant search engine. XSearch is then responsible for sending the query (with the given parameters) to the search engine and retrieve the results. Additional parameters related to the tasks of entity mining and textual clustering will be provided to the API.

Input Parameters:

  • searchsystem: The name of a supported search system (see Supported Service S2) - required
  • query: The query string - required in case searchsystem != gcube
  • descrdoc: The URL of an OpenSearch Description Document (it is considered only in case searchsystem=opensearch)
  • locator: The gCube ResultSet locator (it is considered only in case searchsystem=gcube)
  • numofresults: Number of results to analyze (integer). Default: 100
  • mining: Enable or not Entity Mining. Valid values: {true, false}. Default: true
  • categories: Active categories of entities (semicolon-separated names of categories) (see Supported Service S1). Default: X-Search's current active categories.
  • clustering: Enable or not Textual Clustering. Valid values: {true, false}. Default: true
  • numofclusters: Number of clusters to return (integer). Default: 15
  • clusteringalg: ID of clustering algorithm (see Supported Service S2). Valid values: {cl1, cl2, cl3, cl4, cl5}. Default: cl3
  • typeofresults: Type of results to analyze. Valid values: {snippets, contents}. Default: snippets
  • format: The output format. Valid values: {json, xml}. Default: json

Output:

  • Detected Entities (and for each entity, the results in which it lies)
  • Produced Clusters (and for each cluster, the results in which it lies)
  • Results
  • The input parameters
  • Date of request

Examples:

REST call:

Part of the CSV result:

Main Service B: Match an identified entity

This service links the name of an entity with a resource in a Knowledge Base. For example, the service can match the name "yellowfin tuna" with the DBpedia resource "http://dbpedia.org/resource/Yellowfin_tuna".

Input Parameters:

  • name - The name of the entity (string) - required
  • category - The category of the entity (string) - required
  • endpoint - The SPARQL endpoint to use (url). Default: the endpoint that has been configured in X-Search for the given category name.
  • tquery - The SPARQL template query to use (string). Default: the template query that has been configured in X-Search for the given category name.

Output:

  • The semantic information that is returned by running the SPARQL template query for the given name and category at the SPARQL endpoint.

Examples:

Consider that for the category "Species", the specified SPARQL endpoint is: http://virtuoso.i-marine.d4science.org:8890/sparql and the specified template query is:

define input:inference <http://www.ics.forth.gr/isl/Schema>
select distinct ?URI ?label as ?Value  
FROM <http://www.ics.forth.gr/isl/SameAs>
FROM <http://www.ics.forth.gr/isl/Ecoscope>
FROM <http://www.ics.forth.gr/isl/DBpedia>
FROM <http://www.ics.forth.gr/isl/Fishbase>
FROM <http://www.ics.forth.gr/isl/FLOD>
FROM <http://www.ics.forth.gr/isl/Worms>
FROM <http://www.ics.forth.gr/isl/Schema>
where  
{
 ?URI a <http://ics.forth.gr/Ontology/MarineTLO/imarine#MarineSpecies> .
 ?URI rdfs:label ?label FILTER(regex(str(?label),'<ENTITY>','i')) 
} 


REST call:

http://.../xsearch-service-2.0.0/api/link?name=salmon&category=Species


CSV result:

URI=http://dbpedia.org/resource/Salmonidae	Value=Salmonidae
URI=http://dbpedia.org/resource/Sockeye_salmon	Value=Sockeye salmon
URI=http://dbpedia.org/resource/Chinook_salmon	Value=Chinook salmon
URI=http://dbpedia.org/resource/Salmon_shark	Value=Salmon shark
URI=http://dbpedia.org/resource/Pink_salmon	Value=Pink salmon
URI=http://dbpedia.org/resource/Atlantic_salmon	Value=Atlantic salmon
URI=http://dbpedia.org/resource/King-of-the-salmon	Value=King-of-the-salmon
URI=http://dbpedia.org/resource/Beaked_salmon	Value=Beaked salmon
URI=http://dbpedia.org/resource/Chum_salmon	Value=Chum salmon
URI=http://dbpedia.org/resource/Coho_salmon	Value=Coho salmon
URI=http://dbpedia.org/resource/Giant_salmon_carp	Value=Giant Salmon Carp
URI=http://dbpedia.org/resource/Giant_salmon_carp	Value=Giant salmon carp
URI=http://dbpedia.org/resource/Lake_Salmon	Value=Lake Salmon
URI=http://dbpedia.org/resource/Satsukimasu_salmon	Value=Satsukimasu salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/b254c0d9-fec2-4243-b697-1a9b76a54074	Value=salmonids nei
URI=http://www.fao.org/figis/flod/entities/codedentity/b254c0d9-fec2-4243-b697-1a9b76a54074	Value=salmonidae
URI=http://www.fao.org/figis/flod/entities/codedentity/2b45d5f6-9e19-4f89-949b-de591d63723b	Value=australian salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/cdf01cf9-388e-4c10-ad22-8b68bd6f9614	Value=salmon shrimp
URI=http://www.fao.org/figis/flod/entities/codedentity/bdb5ca11-f594-4255-b6ca-5ecd792318a2	Value=salmon catfish
URI=http://www.fao.org/figis/flod/entities/codedentity/3b4729fe-4931-45dd-8683-79649d6fccc8	Value=smallmouthed salmon catfish
URI=http://www.fao.org/figis/flod/entities/codedentity/cc3bf81f-c84e-4d7e-b391-4ad90415d1d5	Value=salmon horse conch
URI=http://www.fao.org/figis/flod/entities/codedentity/86021df0-f907-478b-a677-98c079466b8f	Value=sockeye(=red)salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/ebf2099e-53cf-4b60-be70-bedb82394055	Value=salmonetes	etc. nep
URI=http://www.fao.org/figis/flod/entities/codedentity/d6ea3fb9-aeb2-44a7-9c9a-89a7492fc9d2	Value=salmonete de roca
URI=http://www.fao.org/figis/flod/entities/codedentity/2af070a8-c25c-4c70-a3e5-5560f071e7c3	Value=beachsalmon
URI=http://www.fao.org/figis/flod/entities/codedentity/b82bf0e5-1bbc-4f3c-b400-0392fe4c36f7	Value=salmonete de fango
URI=http://www.fao.org/figis/flod/entities/codedentity/46b53229-0926-42d4-b1b3-e2c233c76b9a	Value=pink(=humpback)salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/942d2502-c8f1-4319-af20-5d8818643147	Value=salmonete vanicolense
URI=http://www.fao.org/figis/flod/entities/codedentity/9752057e-827d-40db-92fa-92bebdb53fd2	Value=chinook(=spring=king)salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/4490feaa-41b5-44c8-b6bc-c3c7ee83caae	Value=salmonete índico
URI=http://www.fao.org/figis/flod/entities/codedentity/86b2959f-58f8-4999-b7c3-4830626754b7	Value=masu(=cherry) salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/79bb60a9-f930-4f4a-b010-144c8f3b7819	Value=beaked salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/10371267-8726-4982-9155-edada05a1aab	Value=salmonete barbudo
URI=http://www.fao.org/figis/flod/entities/codedentity/fb47b744-42e3-49cd-b657-4c6d7ba0753f	Value=chum(=keta=dog)salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/5c909af3-c57e-4723-bef6-df6ab24fa240	Value=atlantic salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/8002f8ba-4a02-4a5f-b6f3-09dd6393ae7d	Value=coho(=silver)salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/0f7489b8-5a2c-4ca6-b3cc-aec9832e8312	Value=salmon shark
URI=http://www.fao.org/figis/flod/entities/codedentity/e415e0f6-4b6a-45c3-8e0f-4e1fedab81ab	Value=lake salmon
URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57	Value=salmonoids nei
URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57	Value=salmonoideos nep
URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57	Value=salmonoidei
URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57	Value=salmonoidés nca
URI=http://www.fao.org/figis/flod/entities/codedentity/2396f893-2ef9-40ba-89f6-499a1d71d15c	Value=pacific salmons nei
URI=http://www.fao.org/figis/flod/entities/codedentity/2396f893-2ef9-40ba-89f6-499a1d71d15c	Value=salmones del pacífico nep
URI=http://www.fao.org/figis/flod/entities/codedentity/ab76430b-3f6b-45da-8591-dfeae6d3848f	Value=salmonetes
URI=http://www.fao.org/figis/flod/entities/codedentity/8b88d794-e083-4f82-8fc4-c14ee20a4805	Value=salmonetes nep

Main Service C: Enrich an identified entity

This services enriches an entity with semantic information. For example, for the entity "yellowfin tuna" (that has been linked the resource "http://dbpedia.org/resource/Yellowfin_tuna"), the service can return its incoming and outcoming properties.

Input Parameters:

  • uri - The URI of the entity (url) - required
  • type - Type of properties to retrieve (incoming|outcoming|both) - required
  • category - The category of the entity (string). If no value is given, the parameter endpoint must be specified.
  • endpoint - The SPARQL endpoint to use (url). If no value is given, the parameter category must be specified. In this case, the endpoint that has been configured in X-Search for the given category name is considered.
  • lang - The language code for retrieving literals (string)

Output:

  • A list of RDF triples

Example:

Consider that for the category "Species", the specified SPARQL endpoint is: http://virtuoso.i-marine.d4science.org:8890/sparql

REST call:

http://.../xsearch-service-2.0.0/api/enrich?uri=http://dbpedia.org/resource/Yellowfin_tuna&category=Species&type=outgoing

Part of the CSV result:

http://dbpedia.org/resource/Yellowfin_tuna	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://dbpedia.org/ontology/Species
http://dbpedia.org/resource/Yellowfin_tuna	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://dbpedia.org/ontology/Eukaryote
http://dbpedia.org/resource/Yellowfin_tuna	http://www.w3.org/1999/02/22-rdf-syntax-ns#type	http://dbpedia.org/ontology/Fish
http://dbpedia.org/resource/Yellowfin_tuna	http://dbpedia.org/ontology/family	http://dbpedia.org/resource/Scombridae
http://dbpedia.org/resource/Yellowfin_tuna	http://dbpedia.org/ontology/genus	http://dbpedia.org/resource/Thunnus
http://dbpedia.org/resource/Yellowfin_tuna	http://dbpedia.org/ontology/kingdom	http://dbpedia.org/resource/Animal
http://dbpedia.org/resource/Yellowfin_tuna	http://dbpedia.org/ontology/order	http://dbpedia.org/resource/Perciformes
http://dbpedia.org/resource/Yellowfin_tuna	http://dbpedia.org/ontology/phylum	http://dbpedia.org/resource/Chordate

Main Service D: Identify entities in a Web Document

This service retrieves the contents of a Web document (e.g. a Web page or a PDF file) and performs entity mining in these contents.

Input Parameters:

  • url: The URL of the Web page/document - required
  • categories: Active categories of entities (semicolon-separated names of categories) (see Supported Service S1). If no value is given, then X-Search's current active categories are considered.

Output:

  • A list with the detected entities together with their corresponding category

Example:

REST call:

http://.../xsearch-service-2.0.0/api/processdocument?url=http://en.wikipedia.org/wiki/Yellowfin_tuna&categories=Species;Country

Part of the CSV result:

"ENTITY_NAME"	"CATEGORY_NAME"
yellowfin tuna	Species
blackfin tuna	Species
wahoo	Species
striped marlin	Species
Pacific bluefin tuna	Species
Albacore	Species
Bigeye tuna	Species
Mexico	Country
Panama	Country

Support Service S1: Get the supported categories of entities

This service returns the categories that are currently supported by XSearch-Service. The service also returns the SPARQL endpoint and the SPARQL template query that have been defined for each category.

Input Parameters:

-

Output:

  • The names of the supported categories and for each category the corresponding SPARQL endpoint and template query


Example:

REST call:

http://..../xsearch-service-2.0.0/api/getsupportedcategories

CSV Result:

"CATEGORY_NAME"	"SPARQL_ENDPOINT"	"SPARQL_LINKING_TEMPLATE_QUERY"
Species	http://virtuoso.i-marine.d4science.org:8890/sparql	SELECT distinct ?URI FROM.....
Country	http://virtuoso.i-marine.d4science.org:8890/sparql	SELECT distinct ?URI FROM.....
Water Area	http://virtuoso.i-marine.d4science.org:8890/sparql	SELECT distinct ?URI FROM.....
Regional Fisheris Bodies	http://virtuoso.i-marine.d4science.org:8890/sparql	SELECT distinct ?URI FROM .....

Support Service S2: Get the supported clustering algorithms

This service returns the algorithms that are supported bt XSearch-Service.

Input Parameters:

-

Output:

  • The ID, the name and a small description for each supported clustering algorithm


Example:

REST call:

http://.../xsearch-service-2.0.0/api/getsupportedclusteringalgs

CSV Result:

"CLUSTERING_ALGORITHM_ID"	"CLUSTERING_ALGORITHM_NAME"	"CLUSTERING_ALGORITHM_DESCRIPTION"
cl1	STC	Suffix Tree Clustering Algorithm
cl2	STC+	Variation of STC which differs in the way the clusters are scored and in the way base clusters are merged
cl3	NM-STC	No Merge Suffix Tree Clustering
cl4	STC++	Variation of STC+
cl5	NM-STC+	Variation of NM-STC

Support Service S3: Get the supported search systems

This service returns the search systems that are currently supported by X-Search-Service.

Input Parameters:

-

Output:

  • The name and a small description of each supported search system

Example:

REST call:

http://.../xsearch-service-2.0.0/api/getsupportedsearchsystems

CSV Result:

"SEARCH_SYSTEM_NAME"	"SEARCH_SYSTEM_DESCRIPTION"
opensearch	OpenSearch (http://www.opensearch.org/). The OpenSearch Description Document must be provided.
ecoscope	Ecoscope Search System (http://www.ecoscopebc.ird.fr/)
gcube	gCube Infrastructure Search System (https://i-marine.d4science.org/web/guest/about-gcube). The ResultSet locator must be provided.