XSearch-Service-API
Contents
- 1 Objective
- 2 Requirements
- 3 Decisions
- 4 Services
- 4.1 Main Service A: Post-processing of search results
- 4.2 Main Service B: Match an identified entity
- 4.3 Main Service C: Enrich an identified entity
- 4.4 Main Service D: Identify entities in a Web Document
- 4.5 Support Service S1: Get the supported categories of entities
- 4.6 Support Service S2: Get the supported clustering algorithms
- 4.7 Support Service S3: Get the supported search systems
Objective
- The objective of this activity is to provide an API for the semantic post-processing services of search results that were developed in the context of T10.4.
Requirements
- The API should allow someone to build an application (web application, mobile application) that exploits the services of T10.4 (and their deployment in the infrastructure).
- The functionality of 10.4 is generic, i.e. one can configure the underlying search system, the desired categories, the SPARQL endpoints to be used. It is desirable to offer this generality also at the API level.
Decisions
- We plan to develop a REST API since such an API could be used easily in various contexts.
- We will comply with OpenSearch because it is a standard and can allow configuring the underlying search system to use.
Services
Main Service A: Post-processing of search results
This service is responsible for the post-proccessing of the search results by performing textual clustering, entity mining or both in the top search results as returned by the given search system. The API accepts (as input parameters) an underlying search system and a set of parameters (e.g. query, number of results to analyze, etc.). The underlying source can be an opensearch compliant search engine. XSearch is then responsible for sending the query (with the given parameters) to the search engine and retrieve the results. Additional parameters related to the tasks of entity mining and textual clustering will be provided to the API.
Input Parameters:
- searchsystem: The name of a supported search system (see Supported Service S2) - required
- query: The query string - required in case searchsystem != gcube
- descrdoc: The URL of an OpenSearch Description Document (it is considered only in case searchsystem=opensearch)
- locator: The gCube ResultSet locator (it is considered only in case searchsystem=gcube)
- numofresults: Number of results to analyze (integer). Default: 100
- mining: Enable or not Entity Mining. Valid values: {true, false}. Default: true
- categories: Active categories of entities (semicolon-separated names of categories) (see Supported Service S1). Default: X-Search's current active categories.
- clustering: Enable or not Textual Clustering. Valid values: {true, false}. Default: true
- numofclusters: Number of clusters to return (integer). Default: 15
- clusteringalg: ID of clustering algorithm (see Supported Service S2). Valid values: {cl1, cl2, cl3, cl4, cl5}. Default: cl3
- typeofresults: Type of results to analyze. Valid values: {snippets, contents}. Default: snippets
- format: The output format. Valid values: {json, xml}. Default: json
Output:
- Detected Entities (and for each entity, the results in which it lies)
- Produced Clusters (and for each cluster, the results in which it lies)
- Results
- The input parameters
- Date of request
Examples:
REST call:
Part of the CSV result:
Main Service B: Match an identified entity
This service links the name of an entity with a resource in a Knowledge Base. For example, the service can match the name "yellowfin tuna" with the DBpedia resource "http://dbpedia.org/resource/Yellowfin_tuna".
Input Parameters:
- name - The name of the entity (string) - required
- category - The category of the entity (string) - required
- endpoint - The SPARQL endpoint to use (url). Default: the endpoint that has been configured in X-Search for the given category name.
- tquery - The SPARQL template query to use (string). Default: the template query that has been configured in X-Search for the given category name.
Output:
- The semantic information that is returned by running the SPARQL template query for the given name and category at the SPARQL endpoint.
Examples:
Consider that for the category "Species", the specified SPARQL endpoint is: http://virtuoso.i-marine.d4science.org:8890/sparql and the specified template query is:
define input:inference <http://www.ics.forth.gr/isl/Schema> select distinct ?URI ?label as ?Value FROM <http://www.ics.forth.gr/isl/SameAs> FROM <http://www.ics.forth.gr/isl/Ecoscope> FROM <http://www.ics.forth.gr/isl/DBpedia> FROM <http://www.ics.forth.gr/isl/Fishbase> FROM <http://www.ics.forth.gr/isl/FLOD> FROM <http://www.ics.forth.gr/isl/Worms> FROM <http://www.ics.forth.gr/isl/Schema> where { ?URI a <http://ics.forth.gr/Ontology/MarineTLO/imarine#MarineSpecies> . ?URI rdfs:label ?label FILTER(regex(str(?label),'<ENTITY>','i')) }
REST call:
http://.../xsearch-service-2.0.0/api/link?name=salmon&category=Species
CSV result:
URI=http://dbpedia.org/resource/Salmonidae Value=Salmonidae URI=http://dbpedia.org/resource/Sockeye_salmon Value=Sockeye salmon URI=http://dbpedia.org/resource/Chinook_salmon Value=Chinook salmon URI=http://dbpedia.org/resource/Salmon_shark Value=Salmon shark URI=http://dbpedia.org/resource/Pink_salmon Value=Pink salmon URI=http://dbpedia.org/resource/Atlantic_salmon Value=Atlantic salmon URI=http://dbpedia.org/resource/King-of-the-salmon Value=King-of-the-salmon URI=http://dbpedia.org/resource/Beaked_salmon Value=Beaked salmon URI=http://dbpedia.org/resource/Chum_salmon Value=Chum salmon URI=http://dbpedia.org/resource/Coho_salmon Value=Coho salmon URI=http://dbpedia.org/resource/Giant_salmon_carp Value=Giant Salmon Carp URI=http://dbpedia.org/resource/Giant_salmon_carp Value=Giant salmon carp URI=http://dbpedia.org/resource/Lake_Salmon Value=Lake Salmon URI=http://dbpedia.org/resource/Satsukimasu_salmon Value=Satsukimasu salmon URI=http://www.fao.org/figis/flod/entities/codedentity/b254c0d9-fec2-4243-b697-1a9b76a54074 Value=salmonids nei URI=http://www.fao.org/figis/flod/entities/codedentity/b254c0d9-fec2-4243-b697-1a9b76a54074 Value=salmonidae URI=http://www.fao.org/figis/flod/entities/codedentity/2b45d5f6-9e19-4f89-949b-de591d63723b Value=australian salmon URI=http://www.fao.org/figis/flod/entities/codedentity/cdf01cf9-388e-4c10-ad22-8b68bd6f9614 Value=salmon shrimp URI=http://www.fao.org/figis/flod/entities/codedentity/bdb5ca11-f594-4255-b6ca-5ecd792318a2 Value=salmon catfish URI=http://www.fao.org/figis/flod/entities/codedentity/3b4729fe-4931-45dd-8683-79649d6fccc8 Value=smallmouthed salmon catfish URI=http://www.fao.org/figis/flod/entities/codedentity/cc3bf81f-c84e-4d7e-b391-4ad90415d1d5 Value=salmon horse conch URI=http://www.fao.org/figis/flod/entities/codedentity/86021df0-f907-478b-a677-98c079466b8f Value=sockeye(=red)salmon URI=http://www.fao.org/figis/flod/entities/codedentity/ebf2099e-53cf-4b60-be70-bedb82394055 Value=salmonetes etc. nep URI=http://www.fao.org/figis/flod/entities/codedentity/d6ea3fb9-aeb2-44a7-9c9a-89a7492fc9d2 Value=salmonete de roca URI=http://www.fao.org/figis/flod/entities/codedentity/2af070a8-c25c-4c70-a3e5-5560f071e7c3 Value=beachsalmon URI=http://www.fao.org/figis/flod/entities/codedentity/b82bf0e5-1bbc-4f3c-b400-0392fe4c36f7 Value=salmonete de fango URI=http://www.fao.org/figis/flod/entities/codedentity/46b53229-0926-42d4-b1b3-e2c233c76b9a Value=pink(=humpback)salmon URI=http://www.fao.org/figis/flod/entities/codedentity/942d2502-c8f1-4319-af20-5d8818643147 Value=salmonete vanicolense URI=http://www.fao.org/figis/flod/entities/codedentity/9752057e-827d-40db-92fa-92bebdb53fd2 Value=chinook(=spring=king)salmon URI=http://www.fao.org/figis/flod/entities/codedentity/4490feaa-41b5-44c8-b6bc-c3c7ee83caae Value=salmonete índico URI=http://www.fao.org/figis/flod/entities/codedentity/86b2959f-58f8-4999-b7c3-4830626754b7 Value=masu(=cherry) salmon URI=http://www.fao.org/figis/flod/entities/codedentity/79bb60a9-f930-4f4a-b010-144c8f3b7819 Value=beaked salmon URI=http://www.fao.org/figis/flod/entities/codedentity/10371267-8726-4982-9155-edada05a1aab Value=salmonete barbudo URI=http://www.fao.org/figis/flod/entities/codedentity/fb47b744-42e3-49cd-b657-4c6d7ba0753f Value=chum(=keta=dog)salmon URI=http://www.fao.org/figis/flod/entities/codedentity/5c909af3-c57e-4723-bef6-df6ab24fa240 Value=atlantic salmon URI=http://www.fao.org/figis/flod/entities/codedentity/8002f8ba-4a02-4a5f-b6f3-09dd6393ae7d Value=coho(=silver)salmon URI=http://www.fao.org/figis/flod/entities/codedentity/0f7489b8-5a2c-4ca6-b3cc-aec9832e8312 Value=salmon shark URI=http://www.fao.org/figis/flod/entities/codedentity/e415e0f6-4b6a-45c3-8e0f-4e1fedab81ab Value=lake salmon URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57 Value=salmonoids nei URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57 Value=salmonoideos nep URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57 Value=salmonoidei URI=http://www.fao.org/figis/flod/entities/codedentity/63f4c750-c41c-4bd0-a1a2-20f33941ba57 Value=salmonoidés nca URI=http://www.fao.org/figis/flod/entities/codedentity/2396f893-2ef9-40ba-89f6-499a1d71d15c Value=pacific salmons nei URI=http://www.fao.org/figis/flod/entities/codedentity/2396f893-2ef9-40ba-89f6-499a1d71d15c Value=salmones del pacífico nep URI=http://www.fao.org/figis/flod/entities/codedentity/ab76430b-3f6b-45da-8591-dfeae6d3848f Value=salmonetes URI=http://www.fao.org/figis/flod/entities/codedentity/8b88d794-e083-4f82-8fc4-c14ee20a4805 Value=salmonetes nep
Main Service C: Enrich an identified entity
This services enriches an entity with semantic information. For example, for the entity "yellowfin tuna" (that has been linked the resource "http://dbpedia.org/resource/Yellowfin_tuna"), the service can return its incoming and outcoming properties.
Input Parameters:
- uri - The URI of the entity (url) - required
- type - Type of properties to retrieve (incoming|outcoming|both) - required
- category - The category of the entity (string). If no value is given, the parameter endpoint must be specified.
- endpoint - The SPARQL endpoint to use (url). If no value is given, the parameter category must be specified. In this case, the endpoint that has been configured in X-Search for the given category name is considered.
- lang - The language code for retrieving literals (string)
Output:
- A list of RDF triples
Example:
Consider that for the category "Species", the specified SPARQL endpoint is: http://virtuoso.i-marine.d4science.org:8890/sparql
REST call:
http://.../xsearch-service-2.0.0/api/enrich?uri=http://dbpedia.org/resource/Yellowfin_tuna&category=Species&type=outgoing
Part of the CSV result:
http://dbpedia.org/resource/Yellowfin_tuna http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Species http://dbpedia.org/resource/Yellowfin_tuna http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Eukaryote http://dbpedia.org/resource/Yellowfin_tuna http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://dbpedia.org/ontology/Fish http://dbpedia.org/resource/Yellowfin_tuna http://dbpedia.org/ontology/family http://dbpedia.org/resource/Scombridae http://dbpedia.org/resource/Yellowfin_tuna http://dbpedia.org/ontology/genus http://dbpedia.org/resource/Thunnus http://dbpedia.org/resource/Yellowfin_tuna http://dbpedia.org/ontology/kingdom http://dbpedia.org/resource/Animal http://dbpedia.org/resource/Yellowfin_tuna http://dbpedia.org/ontology/order http://dbpedia.org/resource/Perciformes http://dbpedia.org/resource/Yellowfin_tuna http://dbpedia.org/ontology/phylum http://dbpedia.org/resource/Chordate
Main Service D: Identify entities in a Web Document
This service retrieves the contents of a Web document (e.g. a Web page or a PDF file) and performs entity mining in these contents.
Input Parameters:
- url: The URL of the Web page/document - required
- categories: Active categories of entities (semicolon-separated names of categories) (see Supported Service S1). If no value is given, then X-Search's current active categories are considered.
Output:
- A list with the detected entities together with their corresponding category
Example:
REST call:
http://.../xsearch-service-2.0.0/api/processdocument?url=http://en.wikipedia.org/wiki/Yellowfin_tuna&categories=Species;Country
Part of the CSV result:
"ENTITY_NAME" "CATEGORY_NAME" yellowfin tuna Species blackfin tuna Species wahoo Species striped marlin Species Pacific bluefin tuna Species Albacore Species Bigeye tuna Species Mexico Country Panama Country
Support Service S1: Get the supported categories of entities
This service returns the categories that are currently supported by XSearch-Service. The service also returns the SPARQL endpoint and the SPARQL template query that have been defined for each category.
Input Parameters:
-
Output:
- The names of the supported categories and for each category the corresponding SPARQL endpoint and template query
Example:
REST call:
http://..../xsearch-service-2.0.0/api/getsupportedcategories
CSV Result:
"CATEGORY_NAME" "SPARQL_ENDPOINT" "SPARQL_LINKING_TEMPLATE_QUERY" Species http://virtuoso.i-marine.d4science.org:8890/sparql SELECT distinct ?URI FROM..... Country http://virtuoso.i-marine.d4science.org:8890/sparql SELECT distinct ?URI FROM..... Water Area http://virtuoso.i-marine.d4science.org:8890/sparql SELECT distinct ?URI FROM..... Regional Fisheris Bodies http://virtuoso.i-marine.d4science.org:8890/sparql SELECT distinct ?URI FROM .....
Support Service S2: Get the supported clustering algorithms
This service returns the algorithms that are supported bt XSearch-Service.
Input Parameters:
-
Output:
- The ID, the name and a small description for each supported clustering algorithm
Example:
REST call:
http://.../xsearch-service-2.0.0/api/getsupportedclusteringalgs
CSV Result:
"CLUSTERING_ALGORITHM_ID" "CLUSTERING_ALGORITHM_NAME" "CLUSTERING_ALGORITHM_DESCRIPTION" cl1 STC Suffix Tree Clustering Algorithm cl2 STC+ Variation of STC which differs in the way the clusters are scored and in the way base clusters are merged cl3 NM-STC No Merge Suffix Tree Clustering cl4 STC++ Variation of STC+ cl5 NM-STC+ Variation of NM-STC
Support Service S3: Get the supported search systems
This service returns the search systems that are currently supported by X-Search-Service.
Input Parameters:
-
Output:
- The name and a small description of each supported search system
Example:
REST call:
http://.../xsearch-service-2.0.0/api/getsupportedsearchsystems
CSV Result:
"SEARCH_SYSTEM_NAME" "SEARCH_SYSTEM_DESCRIPTION" opensearch OpenSearch (http://www.opensearch.org/). The OpenSearch Description Document must be provided. ecoscope Ecoscope Search System (http://www.ecoscopebc.ird.fr/) gcube gCube Infrastructure Search System (https://i-marine.d4science.org/web/guest/about-gcube). The ResultSet locator must be provided.