GCQLParser

From Gcube Wiki
Jump to: navigation, search

The GCQLParser is the component of Search_Framework_2.0 responsible for the parsing of the queries submitted there. It makes use of the Contextual Query Language with some additional enhancements and is implemented with the use of JavaCC.


Contextual Query Language

Contextual Query Language (CQL), previously known as Common Query Language, is a formal language for representing queries to information retrieval systems such as search engines. Its design objective is that queries be human readable and writable, and that the language be intuitive while maintaining the expressiveness of more complex query languages.

The specification of the language as well as usage examples can be found here.

CQL enhancements

In the context of the gCube framework, certain enhancements have been applied on CQL, in order to enable necessary functionalities of the Search_Framework_2.0. More explicitly, two additional keywords are used:

  • project: allows the user to determine the fields that she needs to be selected for projection. Fields are specified after the keyword, and an asterisk (*) signifies that all fields must be selected for projection.
  • fuse: if used, the fusion of the results coming from different Data Sources is enabled.

Implementation Details

GCQLParser, has been implemented with the use of the Java Compiler Compiler (JavaCC), a Java parser generator. With the use of a JJTree, a preprocessor for JavaCC, code to construct parse tree nodes is generated. Apart from the derived classes that JavaCC produces, additional ones have been implemented to enhance the behavior of the parser.

The character set the parser uses, comprises of the latin alphabet, numbers 0-9 and unicode characters from "\u00bf" to "\uffff". Additionally, certain punctuation marks are allowed, and their set is slightly extended when the query in search is enclosed in quotation marks. The use of quotation marks specifies that the user wants results that match her search terms exactly.