Difference between revisions of "X-Search"

From Gcube Wiki
Jump to: navigation, search
(Overview)
(Philosophy)
Line 24: Line 24:
 
=== Philosophy ===
 
=== Philosophy ===
 
X-Search has been designed to offer its functionality on top of other search systems. In particular it offers:
 
X-Search has been designed to offer its functionality on top of other search systems. In particular it offers:
* clustering of the results
+
* textual clustering of the results
 
* provision of extracted textual entities
 
* provision of extracted textual entities
 
* provision of gradual faceted search
 
* provision of gradual faceted search

Revision as of 13:35, 30 October 2013

Overview

X-Search is a meta-search engine that reads the descritpion of an underlying search source, and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories.

Key features

Provision of results clustering over any search system
Performs clustering over the textual snippets or textual contents of the returned results.
Provision of snippet or contents-based entity recognition
Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints
Provision of gradual faceted (session-based) search
Ability to gradually restrict the answer based on the selected entities and/or clusters
Ability to fetch and display the semantic information of an identified entity
Achieved by querying appropriate SPARQL endpoints
Ability to apply these services on any web page through a web browser
Using the functionality of bookmarklets

Design

Philosophy

X-Search has been designed to offer its functionality on top of other search systems. In particular it offers:

  • textual clustering of the results
  • provision of extracted textual entities
  • provision of gradual faceted search
  • ability to fetch semantic information about extracted entities
  • exploitation of the offered services in any web page

Architecture

X-Search is composed of several components. These are:

  • Text Clustering component which is responsible for performing clustering on the results of the underlying search system. Clustering is being performed on the textual snippets of the returned results, however clustering of the contents is also supported. Furthermore a ranking on the identified clusters is performed.
  • Text Entity Mining component is responsible for performing entity mining of the textual content. Similar to the Text Clustering component it can be performed either over the textual snippets or over the entire content, and supports ranking of the identified entities.
  • Search System Mediator is used as a mediator between X-Search and the underlying search system. Its role is to read an OpenSearch description document describing the underlying search system (i.e. location of the search system, query format, response format, etc.), parse the results of the search system and fetch the contents of a hit (upon user request, i.e. in the case where a user wants to perform entity mining on the whole content of a hit).
  • Caching component is used to maximize performance. In particular it is used to cache the results of the most frequent queries.
  • Linked Open Data Query component, is used for building appropriate SPARQL queries and sending them to particular SPARQL endpoints in order to provide useful information about the mined entities. Several SPARQL endpoints can be used for different purposes (i.e. GeoNames for locations, DBPedia for persons or organizations, etc.).
  • Bookmarklet for Dynamic Semantic Annotation allows performing entity mining for textual contents in any web page.

Deployment

The following component diagram describes the components that constitute XSearch and how they are connected to perform the desired functionallity. XSearchComponents.jpg

The component diagram shown below is used to depict the gCube components with which XSearch will communicate. Note that the following component diagram is not exhaustive, in the sense that only the elements that are necessary for the functionality of XSearch are included (without adding explicitly elements that are direclty or indirectly used by them). XSearchComponentModel within gCube.jpg

Use Cases

Well suited Use Cases

The user wants to search in documents using an Information Retrieval system. For this reason X-Search is “parameterized” to use it as its underlying search system. For example let’s suppose that he wants to search in the fisheries domain (through the FIGIS search component) for publications about the “Mediterranean Tuna”. Apart from getting the results he also wants to exploit available semantic sources (i.e. FLOD dataset) for annotating at query time the responses. Below we describe 5 sub use cases that are applicable:

  • UC1: Getting advanced search results using X-Search
  • UC2: Restricting (gradually) the search results
  • UC3: Mine (on-demand) all named entities of a hit
  • UC4: Exploit other sources for semantically annotating resulted entities
  • UC5: Enrich web browsing with semantic search facilities


UC1: Getting advanced search results using X-Search

The user exploits X-Search for searching for “Mediterranean Tuna” in the context of FAO publications about fisheries and aquaculture. For this reason X-Search redirect his query to the FIGIS search component. Before exposing the answers to the user, they are used to perform some post-search activities that will enrich the answer. More precisely real-time clustering and entity mining of the top-K hits of the answers are performed.


UC2: Restricting (gradually) the search results

After performing a search the user has available the search results, the clustering results and a set of mined entities (grouped in different categories). Instead of searching the hits of the answers he can select the entities which interest him. The user selects some of these entities and the answers are restricted to those having the selected entities. This way the user gets a more descriptive answer space.


UC3: Mine (on-demand) all named entities of a hit

After performing a search the user finds a hit in the answers that interests him. He has the ability to mine all the entities from this document. This is extremely useful because it allows users to have a quick view of the content of a document (in terms of its entities). Additionally it is helpful when documents are large enough.


UC4: Exploit other sources for semantically annotating resulted entities

The user can ask for more information about the resulted entities. Upon user request it is possible to build a SPARQL query that is being sent to appropriate SPARQL endpoint of Linked Open Data. Regarding publication in the fisheries domain the FLOD dataset can be exploited however there are several other datasets that can be used (for different entity categories), including DBPedia, GeoNames, Freebase, Wordnet, and more.


UC5: Enrich web browsing with semantic search facilities

The user is browsing a web page containing information from the fisheries domain that interests him. He wants to quickly identify which are the entities of this document. He just clicks on a bookmark in his browser and that page is now shown up with its entities highlighted. Of course it is not a simple bookmark but rather a bookmarklet, that allows adding one-click functionality to a web page or browser.

References

  • X-Search description document. Found at iMarine workspace
  • P. Fafalios, I. Kitsos, Y. Marketakis, C. Baldassarre, M. Salampasis and Y. Tzitzikas, Web Searching with Entity Mining at Query Time, In Proceedings of the 5th Information Retrieval Facility Conference (IRF'2012), Vienna, July 2012. link, presentation slides
  • XSearch User's Manual pdf