You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@chemistry.apache.org by Jens Hübel <jh...@opentext.com> on 2011/07/05 16:30:24 UTC

Text Search Parser added

Hi, Chemistries

 

just a quick note. Yesterday I have checked-in the code for parsing text search queries in a CONTAINS statement. Please check your servers if it breaks something.

 

The text search parser is implemented as a completely separated parser and lexer in a separate grammar. Using it is optional. You can configure the parser in a way that you either get a CONTAINS string literal as before or a parsed tree. There are some new support methods helping with unescaping. The text search parser is integrated with our parsing framework for simpler query integration.

 

One component that needs review is the JCR connector. Integrating the parser breaks some tests so I changed the code to use the compatibility mode. In case the JCR connector can benefit I added a code template how to integrate the full text parser. This needs to be completed. In case this does not make sense for the JCR connector please remove my added code.

 

The InMemory server uses the full text parser and is able to do a (very simplistic) full text search now. It does not do any kind of preprocessing, so it makes only sense for plain text files. If you store HTML content and search for 'body' you will get a hit for every document. It does not use any kind of index generation, it uses a grep like search. Don't expect therefore great performance. Currently there is no ranking implemented. See the unit tests for details.

 

Jens