Difference between revisions of "X-Search"

From Gcube Wiki
Jump to: navigation, search
(FORTH - XSearch description (1st draft))
(FORTH - XSearch description (1st draft))
Line 23: Line 23:
  
 
=== Philosophy ===
 
=== Philosophy ===
This is the rationale behind the design. An example will be provided.
+
X-Search has been designed to offer its functionality on top of other search systems. In particular it offers:
 +
* clustering of the results
 +
* provision of extracted textual entities
 +
* provision of gradual faceted search
 +
* ability to fetch semantic information about extracted entities
 +
* exploitation of the offered services in any web page
  
 
=== Architecture ===
 
=== Architecture ===
The main software components forming the subsystem should be identified and roughly described. An architecture diagram has to be added here. A template for the representation of the architecture diagram will be proposed together with an opensource tool required to produce it.  
+
X-Search is composes of several components. These are:
 +
*'''Text Clustering component''' which is responsible for performing clustering on the results of the underlying search system. Clustering is being performed on the textual snippet of the returned results, however clustering of the contents is also supported. Furthermore a ranking on the identified clusters is performed.
 +
*'''Text Entity Mining component''' is responsible for performing entity mining of the textual content. Similar to the Text Clustering component it can be performed either over the textual snippets or over the entire content, and supports ranking of the identified entities.
 +
*'''Search System Mediator'''  is used as a mediator between X-Search and the underlying search system. Its role is to read an OpenSearch description document describing the underlying search system (i.e. location of the search system, query format, response format, etc.), parse the results of the search system and fetch the contents of a hit (upon user request, i.e. in the case where a user wants to perform entity mining on the whole content of a hit).
 +
*'''Caching component''' is used to maximize performance. In particular it is used to cache the results of the most frequent queries.
 +
*'''Linked Open Data Query component''', is used for building appropriate SPARQL queries and sending them to particular SPARQL endpoints in order to provide useful information about the mined entities. Several SPARQL endpoints can be used for different purposes (i.e. GeoNames for locations, DBPedia for persons or organizations, etc.).
 +
*'''Bookmarklet for Dynamic Semantic Annotation''' allows performing entity mining for textual contents in any web page.
  
 
== Deployment ==
 
== Deployment ==

Revision as of 10:59, 26 April 2012

Overview

X-Search is a meta-search engine that reads the descritpion of an underlying search source, and is able to query that source and analyze in various ways the returned results and also exploit the availability of semantic repositories.

Key features

Provision of results clustering over any search system
Returns textual snippets and for which there is an OpenSearch description
Provision of snippet or contents-based entity recognition
Generic as well as vertical - based on predetermined entity categories and lists which can be obtained by querying SPARQL endpoints
Provision of gradual faceted (session-based) search
Ability to gradually restrict the answer based on the selected entities and/or clusters
Ability to fetch and display the semantic information of an identified entity
Achieved by quqeying appropriate SPARQL endpoints
Ability to apply these services on any web page through a web browser
Using the functionality of bookmarklets

Design

Philosophy

X-Search has been designed to offer its functionality on top of other search systems. In particular it offers:

  • clustering of the results
  • provision of extracted textual entities
  • provision of gradual faceted search
  • ability to fetch semantic information about extracted entities
  • exploitation of the offered services in any web page

Architecture

X-Search is composes of several components. These are:

  • Text Clustering component which is responsible for performing clustering on the results of the underlying search system. Clustering is being performed on the textual snippet of the returned results, however clustering of the contents is also supported. Furthermore a ranking on the identified clusters is performed.
  • Text Entity Mining component is responsible for performing entity mining of the textual content. Similar to the Text Clustering component it can be performed either over the textual snippets or over the entire content, and supports ranking of the identified entities.
  • Search System Mediator is used as a mediator between X-Search and the underlying search system. Its role is to read an OpenSearch description document describing the underlying search system (i.e. location of the search system, query format, response format, etc.), parse the results of the search system and fetch the contents of a hit (upon user request, i.e. in the case where a user wants to perform entity mining on the whole content of a hit).
  • Caching component is used to maximize performance. In particular it is used to cache the results of the most frequent queries.
  • Linked Open Data Query component, is used for building appropriate SPARQL queries and sending them to particular SPARQL endpoints in order to provide useful information about the mined entities. Several SPARQL endpoints can be used for different purposes (i.e. GeoNames for locations, DBPedia for persons or organizations, etc.).
  • Bookmarklet for Dynamic Semantic Annotation allows performing entity mining for textual contents in any web page.

Deployment

Usually, a subsystem consists of a number of number of components. This section describes the setting governing components deployment, e.g. the hardware components where software components are expected to be deployed. In particular, two deployment scenarios should be discussed, i.e. Large deployment and Small deployment if appropriate. If it not appropriate, one deployment diagram has to be produced.

Large deployment

A deployment diagram suggesting the deployment schema that maximizes scalability should be described here.

Small deployment

A deployment diagram suggesting the "minimal" deployment schema should be described here.

Use Cases

The subsystem has been conceived to support a number of use cases moreover it will be used to serve a number of scenarios. This area will collect these "success stories".

Well suited Use Cases

Describe here scenarios where the subsystem proves to outperform other approaches.

Less well suited Use Cases

Describe here scenarios where the subsystem partially satisfied the expectations.