Difference between revisions of "X-Link"
(→Updating a category of entities) |
(→References) |
||
(40 intermediate revisions by the same user not shown) | |||
Line 4: | Line 4: | ||
X-Link is based on Gate ANNIE and supports both gazetteers (lists of names) and natural language processing functions. Gate ANNIE is a ready-made information extraction system which contains several components (e.g. Tokeniser, Gazetteer, Sentence Splitter, Orthographic Coreference, etc.). We have extended Gate ANNIE in order to be able to create a new supported category and update an existing one (using gazetteers) by exploiting the Linked Data. | X-Link is based on Gate ANNIE and supports both gazetteers (lists of names) and natural language processing functions. Gate ANNIE is a ready-made information extraction system which contains several components (e.g. Tokeniser, Gazetteer, Sentence Splitter, Orthographic Coreference, etc.). We have extended Gate ANNIE in order to be able to create a new supported category and update an existing one (using gazetteers) by exploiting the Linked Data. | ||
− | |||
Currently, X-Link exports the results in XML and CSV. | Currently, X-Link exports the results in XML and CSV. | ||
+ | |||
+ | X-Link is available in the [http://maven.research-infrastructures.eu/nexus/index.html Maven Repository] of iMarine. | ||
= Key features = | = Key features = | ||
Line 22: | Line 23: | ||
* Set/change the active categories, i.e. the categories for which X-Link identified entities. | * Set/change the active categories, i.e. the categories for which X-Link identified entities. | ||
− | = | + | = Functionality = |
− | == Load the | + | == Load the entity mining component, get the currently supported categories of entities and print their names == |
EntityMiningComponent emc = new GateEntityMiningComponent("C:/XLinkGateComponent"); | EntityMiningComponent emc = new GateEntityMiningComponent("C:/XLinkGateComponent"); | ||
emc.startup(); | emc.startup(); | ||
− | emc. | + | |
+ | ArrayList<Category> availableCategories = emc.getAvailableCategories(); | ||
+ | |||
+ | System.out.println("# Available categories: "); | ||
+ | for (Category category : availableCategories) { | ||
+ | System.out.println(category.getName()); | ||
+ | } | ||
− | == Create an instance of X-Link, set the accepted (active) categories of entities | + | == Create an instance of X-Link, set the accepted (active) categories of entities and perform entity mining in a Web page == |
XLink xlink = new XLink(); | XLink xlink = new XLink(); | ||
Line 39: | Line 46: | ||
xlink.retrieveEntities(extractor, acceptedCategoryNames); | xlink.retrieveEntities(extractor, acceptedCategoryNames); | ||
− | == Link the identified entities with semantic resources ( | + | == '''Link''' the identified entities with semantic resources (URIs) == |
xlink.matchEntities(); | xlink.matchEntities(); | ||
− | == Stop the | + | == '''Enrich''' the entity URIs with semantic information (RDF triples) == |
+ | |||
+ | * Enrich all entity URIs with their outgoing properties: | ||
+ | xlink.enrich(PropertiesType.OUTGOING); | ||
+ | |||
+ | * Enrich the entity URIs of the category 'Species' with their outgoing properties: | ||
+ | xlink.enrich(emc.getCategory("species"), PropertiesType.OUTGOING); | ||
+ | |||
+ | == Infer the '''connectivity''' of the identified entities == | ||
+ | |||
+ | * Infer the connectivity of all the identified entities: | ||
+ | xlink.connect(); | ||
+ | |||
+ | * Infer the connectivity of the entities belonging to the category 'Species': | ||
+ | xlink.connect("Species"); | ||
+ | |||
+ | == Stop the entity mining component and print the identified entities, together with the matching URIs == | ||
emc.shutdown(); | emc.shutdown(); | ||
Line 67: | Line 90: | ||
ResultExporter exp3 = new CSVExporter("C:/x-link/results/results.csv", entities); | ResultExporter exp3 = new CSVExporter("C:/x-link/results/results.csv", entities); | ||
exp3.exportResults(); | exp3.exportResults(); | ||
+ | |||
+ | |||
+ | = Configurability = | ||
+ | |||
+ | == The 'GATE Annie' Entity Mining Component == | ||
+ | |||
+ | [https://gate.ac.uk/ie/annie.html Gate ANNIE] is a ready-made information extraction system which contains several components (e.g. Tokeniser, Gazetteer, Sentence | ||
+ | Splitter, Orthographic Coreference, etc.). '''X-Link''' extends Gate ANNIE in order to be able to create a new supported category and update an existing one (using gazetteers). This gives us the opportunity to adapt its functionality according to our needs, making X-Link configurable and extendible. | ||
+ | |||
+ | == The 'x-link.properties' file == | ||
+ | |||
+ | An example of the properties file: | ||
+ | |||
+ | ### X-Link Properties ### | ||
+ | |||
+ | # The categories that are currently supported by the entity mining component | ||
+ | gr.forth.ics.xlink.resources.categories=Species;Country | ||
+ | |||
+ | # The SPARQL endpoints that are used for the supported categories | ||
+ | gr.forth.ics.xlink.resources.categories.Species.endpoint=http\://factforge.net/sparql?query\= | ||
+ | gr.forth.ics.xlink.resources.categories.Country.endpoint=http\://dbpedia.org/sparql?query\= | ||
+ | |||
+ | # The SPARQL template queries that are used for LINKING the identified entities | ||
+ | gr.forth.ics.xlink.resources.categories.Species.linkingSparqlQuery=C\:/sparql/examples_of_queries/speciesLink.template | ||
+ | gr.forth.ics.xlink.resources.categories.Country.linkingSparqlQuery=C\:/sparql/examples_of_queries/countriesLink.template | ||
+ | |||
+ | # The SPARQL template queries that are used for ENRICHING the identified entities | ||
+ | gr.forth.ics.xlink.resources.categories.Species.enrichingSparqlQuery=C\:/sparql/examples_of_queries/speciesEnrich.template | ||
+ | gr.forth.ics.xlink.resources.categories.Country.enrichingSparqlQuery=C\:/sparql/examples_of_queries/countriesEnrich.template | ||
+ | |||
+ | # The RESOURCE CLASSES that are used for creating/updating the category 'Species' | ||
+ | gr.forth.ics.xlink.resources.categories.Species.resourceclass=http\://dbpedia.org/ontology/Fish | ||
+ | |||
+ | # The SPARQL QUERIES that are used for creating/updating the category 'Species' | ||
+ | gr.forth.ics.xlink.resources.categories.Species.sparqlqueryofentities=C\:/sparql/examples_of_queries/speciesEntitiesQuery.sparql | ||
+ | |||
+ | # The template query that is used for retrieving all the instances of a resource class | ||
+ | gr.forth.ics.xlink.resources.categories.addcategory.getinstancestemplatequery=C\:/sparql/examples_of_queries/getInstancesQuery.template | ||
+ | |||
+ | # The parameter that is used in the LINKING template queries. Specifically, | ||
+ | # when trying to match URIs for a particular entity, each occurance of the | ||
+ | # string <ENTITY> is replaced by the entity's name. | ||
+ | gr.forth.ics.xlink.resources.parameters.entity=<ENTITY> | ||
+ | |||
+ | # The parameter that is used in the INSTANCES template queries. Specifically, | ||
+ | # when trying to get all the instances of a resource class, each occurance of the | ||
+ | # string THE_URI is replaced by the resource class of the corresponding category. | ||
+ | gr.forth.ics.xlink.resources.parameters.resourceclass=THE_URI | ||
+ | |||
+ | # Detect entities that do not match exactly an entity in the lists of GATE | ||
+ | # by giving an edit distance (allowance) value. | ||
+ | # Edit distance value 0.2 means that for an entity with 10 characters, | ||
+ | # we can match entities with edit distance 2. | ||
+ | # For example, if a list contains the entity "Rhodes" but not the entity "Rhodos" | ||
+ | # and the document that we analyze contains only the word "Rhodos", then with edit | ||
+ | # distance value 0.2 the entity "Rhodos" will be detected (since the edit distance from "Rhodes" is 1). | ||
+ | gr.forth.ics.xlink.gate.editDistance = false | ||
+ | gr.forth.ics.xlink.gate.editDistance.value = 0.2 | ||
+ | |||
+ | # The radius for inspecting the connectivity of the identified entities | ||
+ | gr.forth.ics.xlink.connect.radius = 0 | ||
+ | |||
+ | # The SPARQL query that is used for retrieving the properties and related entities of the identified entities | ||
+ | # This query is used for inspecting the connectivity of the identified entities | ||
+ | gr.forth.ics.xlink.resources.entities.connectingtemplatequery=C\:/sparql/examples_of_queries/connecting.template | ||
= Configuring (programmatically) X-Link = | = Configuring (programmatically) X-Link = | ||
+ | |||
+ | X-Link starts by reading an initial configuration which is stored | ||
+ | in a '''properties file''' | ||
+ | but also offers the appropriate functions that | ||
+ | facilitate its configuration (e.g. through an API) | ||
+ | in a preprocessing step or even while a corresponding service is running. | ||
+ | |||
== Adding a new category of entities == | == Adding a new category of entities == | ||
Line 109: | Line 204: | ||
Category categoryToAdd = new Category("Fish Species"); | Category categoryToAdd = new Category("Fish Species"); | ||
categoryToAdd.setEndpoint("http://dbpedia.org/sparql"); | categoryToAdd.setEndpoint("http://dbpedia.org/sparql"); | ||
− | categoryToAdd. | + | categoryToAdd.setSparqlQuery("select distinct str(?name) where { ?uri a <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?name }"); |
categoryToAdd.retrieveNamedEntitiesByQuery(); | categoryToAdd.retrieveNamedEntitiesByQuery(); | ||
emc.addNewCategory(categoryToAdd); | emc.addNewCategory(categoryToAdd); | ||
− | or | + | or (in case the SPARQL query exists in a file) |
Category categoryToAdd = new Category("Fish Species"); | Category categoryToAdd = new Category("Fish Species"); | ||
categoryToAdd.setEndpoint("http://dbpedia.org/sparql"); | categoryToAdd.setEndpoint("http://dbpedia.org/sparql"); | ||
− | categoryToAdd. | + | categoryToAdd.setSparqlQueryFilepathOfEntities("C:/sparql/getFishSpeciesQuery.sparql"); |
categoryToAdd.retrieveNamedEntitiesByQuery(); | categoryToAdd.retrieveNamedEntitiesByQuery(); | ||
emc.addNewCategory(categoryToAdd); | emc.addNewCategory(categoryToAdd); | ||
− | == Updating a category | + | == Updating the entities of a category == |
=== Giving a list of entity names === | === Giving a list of entity names === | ||
Line 139: | Line 234: | ||
emc.updateCategoryByResourceClass("Fish Species", endpoint, resourceClass); | emc.updateCategoryByResourceClass("Fish Species", endpoint, resourceClass); | ||
− | + | === Giving a SPARQL query === | |
− | + | String endpoint = "http://dbpedia.org/sparql"; | |
− | + | String sparqlQuery = "select distinct str(?name) where { ?uri a <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?name FILTER(lang(?name)='es') }" | |
+ | emc.updateCategoryByQuery("water areas" , endpoint, sparqlQuery); | ||
− | or (in case | + | or (in case the SPARQL query exists in a file) |
− | emc. | + | String endpoint = "http://dbpedia.org/sparql"; |
+ | emc.updateCategoryByQuery("water areas" , endpoint, "C:/sparql/getWaterAreas.sparql"); | ||
+ | |||
+ | == Replacing the entities of a category == | ||
+ | |||
+ | === Giving a list of entity names === | ||
+ | |||
+ | Set<String> newEntities = new TreeSet<String>(); | ||
+ | newEntities.add("Patmos"); | ||
+ | newEntities.add("Crete"); | ||
+ | newEntities.add("Karpathos"); | ||
+ | |||
+ | emc.replaceCategoryBySet("Greek Islands", newEntities); | ||
+ | |||
+ | === Giving a resource class === | ||
+ | |||
+ | String endpoint = "http://lod.openlinksw.com/sparql"; | ||
+ | String resourceClass = "http://umbel.org/umbel/rc/Fish"; | ||
+ | |||
+ | emc.replaceCategoryByResourceClass("Fish Species", endpoint, resourceClass); | ||
=== Giving a SPARQL query === | === Giving a SPARQL query === | ||
Line 152: | Line 267: | ||
String endpoint = "http://dbpedia.org/sparql"; | String endpoint = "http://dbpedia.org/sparql"; | ||
String sparqlQuery = "select distinct str(?name) where { ?uri a <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?name FILTER(lang(?name)='es') }" | String sparqlQuery = "select distinct str(?name) where { ?uri a <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?name FILTER(lang(?name)='es') }" | ||
− | emc. | + | |
+ | emc.replaceCategoryByQuery("water areas" , endpoint, sparqlQuery); | ||
− | or (in case | + | or (in case the SPARQL query exists in a file) |
− | String | + | String endpoint = "http://dbpedia.org/sparql"; |
− | emc. | + | emc.replaceCategoryByQuery("water areas" , endpoint, "C:/sparql/getWaterAreas.sparql"); |
+ | |||
+ | == Renaming a category == | ||
+ | |||
+ | String newName = "Fish"; | ||
+ | emc.renameCategory("Fish Species", newName); | ||
+ | |||
+ | == Removing a category == | ||
+ | |||
+ | emc.removeCategory("Fish Species"); | ||
+ | |||
+ | == Specifying how to '''link''' the identified entities with semantic resources (i.e. URIs) == | ||
+ | |||
+ | === SPARQL linking template query === | ||
+ | For each supported category of entities, X-Link retain a '''SPARQL linking template query'''. | ||
+ | The SPARQL linking template query describes the way X-Link must query the corresponding SPARQL endpoint | ||
+ | for retrieving resources (i.e. URIs) that match a particular name of entity. | ||
+ | Specifically, a SPARQL linking template query contains the character sequence <ENTITY> | ||
+ | (including the < and >). At request time, the system reads | ||
+ | the endpoint and the corresponding template query of the category in which the | ||
+ | identified entity belongs, replaces each occurrence | ||
+ | of <ENTITY> in the template query with the entity's name, | ||
+ | and finally runs the query. | ||
+ | For example, the following template query tries to find a resource of type (rdf:type) Fish | ||
+ | whose label (rdfs:label) contains the name of the identified entity (ignoring case): | ||
+ | |||
+ | SELECT DISTINCT ?uri WHERE { | ||
+ | ?uri rdf:type <http://dbpedia.org/ontology/Fish> . | ||
+ | ?uri rdfs:label ?label FILTER(regex(str(?label), '<ENTITY>', 'i')) | ||
+ | } | ||
+ | |||
+ | The following code sets a SPARQL endpoint and a SPARQL template query for linking the identified entities: | ||
+ | |||
+ | Category category = emc.getCategory("Species"); | ||
+ | category.setEndpoint("http://dbpedia.org/sparql"); | ||
+ | category.setLinkingTemplateQuery("C:/sparql/templates/specieisLink.sparql"); | ||
+ | |||
+ | == Specifying how to '''enrich''' the identified entities with semantic information (i.e. RDF triples) == | ||
− | + | Category category = emc.getCategory("Species"); | |
+ | category.setEndpoint("http://dbpedia.org/sparql"); | ||
+ | category.setEnrichingTemplateQuery("C:/sparql/templates/specieisEnrich.sparql"); | ||
− | |||
− | |||
− | = | + | = References = |
− | + | * P. Fafalios, M. Baritakis and Y. Tzitzikas, Configuring Named Entity Extraction through Real-Time Exploitation of Linked Data, 4th International Conference on Web Intelligence, Mining and Semantics (WIMS'14), Thessaloniki, Greece, June 2014 ([http://www.ics.forth.gr/isl/X-Link/files/fafalios_2014_wims.pdf pdf] | [http://users.ics.forth.gr/~fafalios/files/ppts/fafalios_2014_xlink.pdf slides]) |
Latest revision as of 13:38, 17 November 2014
Contents
- 1 Overview
- 2 Key features
- 3 Functionality
- 3.1 Load the entity mining component, get the currently supported categories of entities and print their names
- 3.2 Create an instance of X-Link, set the accepted (active) categories of entities and perform entity mining in a Web page
- 3.3 Link the identified entities with semantic resources (URIs)
- 3.4 Enrich the entity URIs with semantic information (RDF triples)
- 3.5 Infer the connectivity of the identified entities
- 3.6 Stop the entity mining component and print the identified entities, together with the matching URIs
- 3.7 Export the results
- 4 Configurability
- 5 Configuring (programmatically) X-Link
- 5.1 Adding a new category of entities
- 5.2 Updating the entities of a category
- 5.3 Replacing the entities of a category
- 5.4 Renaming a category
- 5.5 Removing a category
- 5.6 Specifying how to link the identified entities with semantic resources (i.e. URIs)
- 5.7 Specifying how to enrich the identified entities with semantic information (i.e. RDF triples)
- 6 References
Overview
X-Link is a fully configurable (Linke Data-based) Named Entity Extraction (NEE) tool which allows the user/developer to easily define the categories of entities that are interesting for the application at hand by exploiting one or more (online) Semantic Knowledge Bases (Linked Data). The user is also able to update a category and specify how to semantically link and enrich the identified entities. This enhanced configurability allows X-Link to be lightly configured for different contexts, for building domain-specific applications (e.g. for identifying drugs in a medical search system, for annotating and exploring fish species in a marine-related web page, etc.).
X-Link is based on Gate ANNIE and supports both gazetteers (lists of names) and natural language processing functions. Gate ANNIE is a ready-made information extraction system which contains several components (e.g. Tokeniser, Gazetteer, Sentence Splitter, Orthographic Coreference, etc.). We have extended Gate ANNIE in order to be able to create a new supported category and update an existing one (using gazetteers) by exploiting the Linked Data. Currently, X-Link exports the results in XML and CSV.
X-Link is available in the Maven Repository of iMarine.
Key features
Currently X-Link supports the analysis of plain text files, HTML pages, Microsoft Word and Powerpoint files (.doc, .docx, .ppt and .pptx), PDF files, and XML-based files (e.g. XML and RDF files). At first it reads the contents of the corresponding document and performs a "cleaning" task, i.e. it removes useless text (e.g. HTML tags in a Web page or Meta elements in a Microsoft Word file). Then, it applies NEE in the cleaned contents of the document.
X-Link starts by reading an initial configuration which is stored in a properties file. It also implements functions that allow the user/developer to configure the system, e.g. through an administrator API. Specifically, the following functions are currently supported:
- Add a new category, using one or more lists of entities, one or more instances resource classes or one or more instances SPARQL queries.
- Update an existing category, using one or more lists of entities, one or more instances resource classes or one or more instances SPARQL queries. The user can either totally replace a category (i.e. remove the old entities and add the new ones) or just add the new entities.
- Remove an existing category.
- Change the dispayed name of an existing category (i.e rename).
- Set/change the underlying Knowledge Bases
- Set/change how to query the underlying Knowledge Bases for linking the identified entities.
- Set/change how to query the underlying Knowledge Bases for enriching the identified entities.
- Set/change the active categories, i.e. the categories for which X-Link identified entities.
Functionality
Load the entity mining component, get the currently supported categories of entities and print their names
EntityMiningComponent emc = new GateEntityMiningComponent("C:/XLinkGateComponent"); emc.startup(); ArrayList<Category> availableCategories = emc.getAvailableCategories(); System.out.println("# Available categories: "); for (Category category : availableCategories) { System.out.println(category.getName()); }
Create an instance of X-Link, set the accepted (active) categories of entities and perform entity mining in a Web page
XLink xlink = new XLink(); xlink.setEntityMiningComponent(emc); HashSet<String> acceptedCategoryNames = new HashSet<String>(); acceptedCategoryNames.add("species"); TextExtractor extractor = new WebPageTextExtractor("http://en.wikipedia.org/wiki/Yellowfin_tuna"); xlink.retrieveEntities(extractor, acceptedCategoryNames);
Link the identified entities with semantic resources (URIs)
xlink.matchEntities();
Enrich the entity URIs with semantic information (RDF triples)
- Enrich all entity URIs with their outgoing properties:
xlink.enrich(PropertiesType.OUTGOING);
- Enrich the entity URIs of the category 'Species' with their outgoing properties:
xlink.enrich(emc.getCategory("species"), PropertiesType.OUTGOING);
Infer the connectivity of the identified entities
- Infer the connectivity of all the identified entities:
xlink.connect();
- Infer the connectivity of the entities belonging to the category 'Species':
xlink.connect("Species");
Stop the entity mining component and print the identified entities, together with the matching URIs
emc.shutdown(); ArrayList<Entity> entities = xlink.getEntities(); // Gets the detected entities (together with all their information). System.out.println("# Detected entities: "); for (Entity entity : entities) { // Print the mane characteristics of the detected entities. System.out.println("Entity name: " + entity.getName()); System.out.println("Category: " + entity.getCategoryName()); System.out.println("Matching URIs: " + entity.getMatchingURIs()); System.out.println("-----"); }
Export the results
ResultExporter exp1 = new XMLExporter("C:/x-link/results/results.xml", entities); exp1.exportResults(); ResultExporter exp2 = new TXTExporter("C:/x-link/results/results.txt", entities); exp2.exportResults(); ResultExporter exp3 = new CSVExporter("C:/x-link/results/results.csv", entities); exp3.exportResults();
Configurability
The 'GATE Annie' Entity Mining Component
Gate ANNIE is a ready-made information extraction system which contains several components (e.g. Tokeniser, Gazetteer, Sentence Splitter, Orthographic Coreference, etc.). X-Link extends Gate ANNIE in order to be able to create a new supported category and update an existing one (using gazetteers). This gives us the opportunity to adapt its functionality according to our needs, making X-Link configurable and extendible.
The 'x-link.properties' file
An example of the properties file:
### X-Link Properties ### # The categories that are currently supported by the entity mining component gr.forth.ics.xlink.resources.categories=Species;Country # The SPARQL endpoints that are used for the supported categories gr.forth.ics.xlink.resources.categories.Species.endpoint=http\://factforge.net/sparql?query\= gr.forth.ics.xlink.resources.categories.Country.endpoint=http\://dbpedia.org/sparql?query\= # The SPARQL template queries that are used for LINKING the identified entities gr.forth.ics.xlink.resources.categories.Species.linkingSparqlQuery=C\:/sparql/examples_of_queries/speciesLink.template gr.forth.ics.xlink.resources.categories.Country.linkingSparqlQuery=C\:/sparql/examples_of_queries/countriesLink.template # The SPARQL template queries that are used for ENRICHING the identified entities gr.forth.ics.xlink.resources.categories.Species.enrichingSparqlQuery=C\:/sparql/examples_of_queries/speciesEnrich.template gr.forth.ics.xlink.resources.categories.Country.enrichingSparqlQuery=C\:/sparql/examples_of_queries/countriesEnrich.template # The RESOURCE CLASSES that are used for creating/updating the category 'Species' gr.forth.ics.xlink.resources.categories.Species.resourceclass=http\://dbpedia.org/ontology/Fish # The SPARQL QUERIES that are used for creating/updating the category 'Species' gr.forth.ics.xlink.resources.categories.Species.sparqlqueryofentities=C\:/sparql/examples_of_queries/speciesEntitiesQuery.sparql # The template query that is used for retrieving all the instances of a resource class gr.forth.ics.xlink.resources.categories.addcategory.getinstancestemplatequery=C\:/sparql/examples_of_queries/getInstancesQuery.template # The parameter that is used in the LINKING template queries. Specifically, # when trying to match URIs for a particular entity, each occurance of the # string <ENTITY> is replaced by the entity's name. gr.forth.ics.xlink.resources.parameters.entity=<ENTITY> # The parameter that is used in the INSTANCES template queries. Specifically, # when trying to get all the instances of a resource class, each occurance of the # string THE_URI is replaced by the resource class of the corresponding category. gr.forth.ics.xlink.resources.parameters.resourceclass=THE_URI # Detect entities that do not match exactly an entity in the lists of GATE # by giving an edit distance (allowance) value. # Edit distance value 0.2 means that for an entity with 10 characters, # we can match entities with edit distance 2. # For example, if a list contains the entity "Rhodes" but not the entity "Rhodos" # and the document that we analyze contains only the word "Rhodos", then with edit # distance value 0.2 the entity "Rhodos" will be detected (since the edit distance from "Rhodes" is 1). gr.forth.ics.xlink.gate.editDistance = false gr.forth.ics.xlink.gate.editDistance.value = 0.2 # The radius for inspecting the connectivity of the identified entities gr.forth.ics.xlink.connect.radius = 0 # The SPARQL query that is used for retrieving the properties and related entities of the identified entities # This query is used for inspecting the connectivity of the identified entities gr.forth.ics.xlink.resources.entities.connectingtemplatequery=C\:/sparql/examples_of_queries/connecting.template
Configuring (programmatically) X-Link
X-Link starts by reading an initial configuration which is stored in a properties file but also offers the appropriate functions that facilitate its configuration (e.g. through an API) in a preprocessing step or even while a corresponding service is running.
Adding a new category of entities
Giving a list of entity names
Category categoryToAdd = new Category("North Aegean Greek Islands"); TreeSet<String> names = new TreeSet<String>(); names.add("Lesvos"); names.add("Chios"); names.add("Samos"); names.add("Limnos"); names.add("Ikaria"); names.add("Samothraki"); names.add("Agios Eustratios"); names.add("Psara"); names.add("Fournoi"); names.add("Oinouses"); names.add("Thymaina"); names.add("Antipsara"); names.add("Pasas"); names.add("Agios Minas"); names.add("Samiopoula"); categoryToAdd.setNamedEntities(names); emc.addNewCategory(categoryToAdd);
Giving a resource class
Category categoryToAdd = new Category("Fish Species"); categoryToAdd.setResourceClass("http://dbpedia.org/ontology/Fish"); categoryToAdd.setEndpoint("http://dbpedia.org/sparql"); categoryToAdd.retrieveNamedEntitiesByResourceClass(); emc.addNewCategory(categoryToAdd);
Giving a SPARQL query
Category categoryToAdd = new Category("Fish Species"); categoryToAdd.setEndpoint("http://dbpedia.org/sparql"); categoryToAdd.setSparqlQuery("select distinct str(?name) where { ?uri a <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?name }"); categoryToAdd.retrieveNamedEntitiesByQuery(); emc.addNewCategory(categoryToAdd);
or (in case the SPARQL query exists in a file)
Category categoryToAdd = new Category("Fish Species"); categoryToAdd.setEndpoint("http://dbpedia.org/sparql"); categoryToAdd.setSparqlQueryFilepathOfEntities("C:/sparql/getFishSpeciesQuery.sparql"); categoryToAdd.retrieveNamedEntitiesByQuery(); emc.addNewCategory(categoryToAdd);
Updating the entities of a category
Giving a list of entity names
Set<String> newEntities = new TreeSet<String>(); newEntities.add("Patmos"); newEntities.add("Crete"); newEntities.add("Karpathos"); emc.updateCategoryBySet("Greek Islands", newEntities);
Giving a resource class
String endpoint = "http://lod.openlinksw.com/sparql"; String resourceClass = "http://umbel.org/umbel/rc/Fish"; emc.updateCategoryByResourceClass("Fish Species", endpoint, resourceClass);
Giving a SPARQL query
String endpoint = "http://dbpedia.org/sparql"; String sparqlQuery = "select distinct str(?name) where { ?uri a <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?name FILTER(lang(?name)='es') }" emc.updateCategoryByQuery("water areas" , endpoint, sparqlQuery);
or (in case the SPARQL query exists in a file)
String endpoint = "http://dbpedia.org/sparql"; emc.updateCategoryByQuery("water areas" , endpoint, "C:/sparql/getWaterAreas.sparql");
Replacing the entities of a category
Giving a list of entity names
Set<String> newEntities = new TreeSet<String>(); newEntities.add("Patmos"); newEntities.add("Crete"); newEntities.add("Karpathos"); emc.replaceCategoryBySet("Greek Islands", newEntities);
Giving a resource class
String endpoint = "http://lod.openlinksw.com/sparql"; String resourceClass = "http://umbel.org/umbel/rc/Fish"; emc.replaceCategoryByResourceClass("Fish Species", endpoint, resourceClass);
Giving a SPARQL query
String endpoint = "http://dbpedia.org/sparql"; String sparqlQuery = "select distinct str(?name) where { ?uri a <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?name FILTER(lang(?name)='es') }" emc.replaceCategoryByQuery("water areas" , endpoint, sparqlQuery);
or (in case the SPARQL query exists in a file)
String endpoint = "http://dbpedia.org/sparql"; emc.replaceCategoryByQuery("water areas" , endpoint, "C:/sparql/getWaterAreas.sparql");
Renaming a category
String newName = "Fish"; emc.renameCategory("Fish Species", newName);
Removing a category
emc.removeCategory("Fish Species");
Specifying how to link the identified entities with semantic resources (i.e. URIs)
SPARQL linking template query
For each supported category of entities, X-Link retain a SPARQL linking template query. The SPARQL linking template query describes the way X-Link must query the corresponding SPARQL endpoint for retrieving resources (i.e. URIs) that match a particular name of entity. Specifically, a SPARQL linking template query contains the character sequence <ENTITY> (including the < and >). At request time, the system reads the endpoint and the corresponding template query of the category in which the identified entity belongs, replaces each occurrence of <ENTITY> in the template query with the entity's name, and finally runs the query. For example, the following template query tries to find a resource of type (rdf:type) Fish whose label (rdfs:label) contains the name of the identified entity (ignoring case):
SELECT DISTINCT ?uri WHERE { ?uri rdf:type <http://dbpedia.org/ontology/Fish> . ?uri rdfs:label ?label FILTER(regex(str(?label), '<ENTITY>', 'i')) }
The following code sets a SPARQL endpoint and a SPARQL template query for linking the identified entities:
Category category = emc.getCategory("Species"); category.setEndpoint("http://dbpedia.org/sparql"); category.setLinkingTemplateQuery("C:/sparql/templates/specieisLink.sparql");
Specifying how to enrich the identified entities with semantic information (i.e. RDF triples)
Category category = emc.getCategory("Species"); category.setEndpoint("http://dbpedia.org/sparql"); category.setEnrichingTemplateQuery("C:/sparql/templates/specieisEnrich.sparql");