You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Floyd Wu <fl...@gmail.com> on 2011/10/07 06:08:57 UTC

[Lucene.Net] How to obtain Query string AST

Hi,
I want to write my own query expander. I may need to obtain the AST
(abstract syntax tree) of an already parsed query string, navigate to
certain parts of it (words) and make logical phrases of those words by
adding to the AST - where necessary.

And finally transfrom this AST to lucene query string (or query
objcet) then send to lucene searcher to get result.

This cannot be done to the string because the query logic cannot be
semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed
first.

How can this be done with Lucene.Net or combine with other 3-party library?

Thanks for any tips.
Floyd

PS: example is user input a query string from front-end interface like
(A OR B) AND (C OR D)
I want my application rewrite this Query to
( A OR Y OR B OR T)  AND  (C OR Z OR D OR F)

The A B Y T C Z D F are CJK-words(term) with double-quota surround it.
Why I want to do this, Basically I want to do synonymous query but
lucene.net's synonymous seems have some problem in my test (Solr also)
especially processing CJK.

Re: [Lucene.Net] How to obtain Query string AST

Posted by Floyd Wu <fl...@gmail.com>.
Hi Ben,

Thanks!
In my plan, the query is parsed before it is executed, I just doing
something like query expansion.

You are absolutely right, I want to build an AST in parallel with
Lucene's  but I'm not familiar with ANTLR or JavaCC and after a day
searching and study, I found it is difficult to me. :)

One guy doing a project is very similar with my idea here.
https://github.com/romanchyla/montysolr/tree/master/src/java/org/apache/lucene/queryParser/aqp/parser

Seem to that he 100% translate Lucene's syntax to ANTLR's grammer. In
my case, I don't need 100% lucene syntax compatible especially those
operator related with fuzzy search, boost.

I'm trying to figure out how to do next, so please guide me kindly if
you have any idea.

Many thanks.

Floyd


2011/10/12 Ben West <bw...@yahoo.com>:
> Hey Floyd,
>
> I'm not sure what you mean. The query is parsed before it is executed, no?
>
> The code which generates the standard QueryParser class is QueryParser.JJ (it's a JavaCC lex file). I don't think the community has found any need to port this to a .NET-based parser/lexer, but that is certainly one route you could take if your query's grammar is very far away from what Lucene uses.
>
> It seems to me though that you could just build your own AST in parallel with Lucene's, if you need to traverse it in some weird way which QueryParser isn't flexible enough to implement:
>
> public override Query GetFieldQuery(string field, string queryText)
>
> {
>   MyAST.AddNode(field,queryText);
>   return base.GetFieldQuery(field,queryText);
> }
>
> And so forth. You would probably need to keep a stack or something in order to handle the recursion, but it may be easier than writing your own lexer.
>
> -Ben
>
>
> ----- Original Message -----
> From: Floyd Wu <fl...@gmail.com>
> To: lucene-net-user@lucene.apache.org; Ben West <bw...@yahoo.com>
> Cc:
> Sent: Monday, October 10, 2011 8:59 PM
> Subject: Re: [Lucene.Net] How to obtain Query string AST
>
> Hi Ben,
>
> Thanks for your kindly reply.
> I'm not using CJKAnalyzer but StandardAnalyzer. StandardAnalyzer seems
> that has some problem deal with CJK synonyms.
> Your code shows that write a QueryParser and used by lucene, actually
> what I want to do is beyond lucene which means I need to complete
> query rewrite before query executed by lucene( or solr ).
>
> Any idea?
>
> Floyd
>
>
> 2011/10/8 Ben West <bw...@yahoo.com>:
>> Hey Floyd,
>>
>> Have you tried: http://lucene.472066.n3.nabble.com/CJKAnalyzer-and-Synonyms-td2510104.html
>>
>> If you go the AST route, here is a code snippet for a query parser which replaces all term queries with the term + prefix query (e.g. "foo" -> "foo foo*"). This sounds approximately like what you need. (I apologize in advance for the formatting which I'm sure will be lost):
>>
>> class WildcardQueryParser : MultiFieldQueryParser
>>
>> {
>>         /// <summary>
>>         /// Gets a field (term or phrase) query.
>>         /// </summary>
>>         /// <param name="field"></param>
>>         /// <param name="queryText"></param>
>>         /// <returns></returns>
>>         public override Query GetFieldQuery(string field, string queryText)
>>         {
>>             Query origQuery = base.GetFieldQuery(field, queryText);
>>
>>             // The base query parser might decide that the query is null, e.g. if
>>             // they search for a word like "and"
>>             if (origQuery == null)
>>             {
>>                 return null;
>>             }
>>
>>             // Since both term and phrase queries call this method though, we need to check
>>             // to make sure it's a term query we're rewriting, and not a phrase query.            if (origQuery.GetType() != typeof(PhraseQuery))            {
>>                 BooleanQuery bq = new BooleanQuery(false);
>>
>>                 // Note that base query parser handles analysis, so we don't need to
>>                 bq.Add(origQuery, BooleanClause.Occur.SHOULD);
>>                 bq.Add(base.GetPrefixQuery(field, queryText), BooleanClause.Occur.SHOULD);
>>                 return bq;
>>             }
>>             else
>>             {
>>                 return origQuery;
>>             }
>>         }
>> }
>>
>>
>> ----- Original Message -----
>> From: Floyd Wu <fl...@gmail.com>
>> To: lucene-net-user@lucene.apache.org
>> Cc:
>> Sent: Thursday, October 6, 2011 11:08 PM
>> Subject: [Lucene.Net] How to obtain Query string AST
>>
>> Hi,
>> I want to write my own query expander. I may need to obtain the AST
>> (abstract syntax tree) of an already parsed query string, navigate to
>> certain parts of it (words) and make logical phrases of those words by
>> adding to the AST - where necessary.
>>
>> And finally transfrom this AST to lucene query string (or query
>> objcet) then send to lucene searcher to get result.
>>
>> This cannot be done to the string because the query logic cannot be
>> semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed
>> first.
>>
>> How can this be done with Lucene.Net or combine with other 3-party library?
>>
>> Thanks for any tips.
>> Floyd
>>
>> PS: example is user input a query string from front-end interface like
>> (A OR B) AND (C OR D)
>> I want my application rewrite this Query to
>> ( A OR Y OR B OR T)  AND  (C OR Z OR D OR F)
>>
>> The A B Y T C Z D F are CJK-words(term) with double-quota surround it.
>> Why I want to do this, Basically I want to do synonymous query but
>> lucene.net's synonymous seems have some problem in my test (Solr also)
>> especially processing CJK.
>>
>>
>
>

Re: [Lucene.Net] How to obtain Query string AST

Posted by Ben West <bw...@yahoo.com>.
Hey Floyd,

I'm not sure what you mean. The query is parsed before it is executed, no? 

The code which generates the standard QueryParser class is QueryParser.JJ (it's a JavaCC lex file). I don't think the community has found any need to port this to a .NET-based parser/lexer, but that is certainly one route you could take if your query's grammar is very far away from what Lucene uses.

It seems to me though that you could just build your own AST in parallel with Lucene's, if you need to traverse it in some weird way which QueryParser isn't flexible enough to implement:

public override Query GetFieldQuery(string field, string queryText) 

{
  MyAST.AddNode(field,queryText);
  return base.GetFieldQuery(field,queryText);
}

And so forth. You would probably need to keep a stack or something in order to handle the recursion, but it may be easier than writing your own lexer.

-Ben


----- Original Message -----
From: Floyd Wu <fl...@gmail.com>
To: lucene-net-user@lucene.apache.org; Ben West <bw...@yahoo.com>
Cc: 
Sent: Monday, October 10, 2011 8:59 PM
Subject: Re: [Lucene.Net] How to obtain Query string AST

Hi Ben,

Thanks for your kindly reply.
I'm not using CJKAnalyzer but StandardAnalyzer. StandardAnalyzer seems
that has some problem deal with CJK synonyms.
Your code shows that write a QueryParser and used by lucene, actually
what I want to do is beyond lucene which means I need to complete
query rewrite before query executed by lucene( or solr ).

Any idea?

Floyd


2011/10/8 Ben West <bw...@yahoo.com>:
> Hey Floyd,
>
> Have you tried: http://lucene.472066.n3.nabble.com/CJKAnalyzer-and-Synonyms-td2510104.html
>
> If you go the AST route, here is a code snippet for a query parser which replaces all term queries with the term + prefix query (e.g. "foo" -> "foo foo*"). This sounds approximately like what you need. (I apologize in advance for the formatting which I'm sure will be lost):
>
> class WildcardQueryParser : MultiFieldQueryParser
>
> {
>         /// <summary>
>         /// Gets a field (term or phrase) query.
>         /// </summary>
>         /// <param name="field"></param>
>         /// <param name="queryText"></param>
>         /// <returns></returns>
>         public override Query GetFieldQuery(string field, string queryText)
>         {
>             Query origQuery = base.GetFieldQuery(field, queryText);
>
>             // The base query parser might decide that the query is null, e.g. if
>             // they search for a word like "and"
>             if (origQuery == null)
>             {
>                 return null;
>             }
>
>             // Since both term and phrase queries call this method though, we need to check
>             // to make sure it's a term query we're rewriting, and not a phrase query.            if (origQuery.GetType() != typeof(PhraseQuery))            {
>                 BooleanQuery bq = new BooleanQuery(false);
>
>                 // Note that base query parser handles analysis, so we don't need to
>                 bq.Add(origQuery, BooleanClause.Occur.SHOULD);
>                 bq.Add(base.GetPrefixQuery(field, queryText), BooleanClause.Occur.SHOULD);
>                 return bq;
>             }
>             else
>             {
>                 return origQuery;
>             }
>         }
> }
>
>
> ----- Original Message -----
> From: Floyd Wu <fl...@gmail.com>
> To: lucene-net-user@lucene.apache.org
> Cc:
> Sent: Thursday, October 6, 2011 11:08 PM
> Subject: [Lucene.Net] How to obtain Query string AST
>
> Hi,
> I want to write my own query expander. I may need to obtain the AST
> (abstract syntax tree) of an already parsed query string, navigate to
> certain parts of it (words) and make logical phrases of those words by
> adding to the AST - where necessary.
>
> And finally transfrom this AST to lucene query string (or query
> objcet) then send to lucene searcher to get result.
>
> This cannot be done to the string because the query logic cannot be
> semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed
> first.
>
> How can this be done with Lucene.Net or combine with other 3-party library?
>
> Thanks for any tips.
> Floyd
>
> PS: example is user input a query string from front-end interface like
> (A OR B) AND (C OR D)
> I want my application rewrite this Query to
> ( A OR Y OR B OR T)  AND  (C OR Z OR D OR F)
>
> The A B Y T C Z D F are CJK-words(term) with double-quota surround it.
> Why I want to do this, Basically I want to do synonymous query but
> lucene.net's synonymous seems have some problem in my test (Solr also)
> especially processing CJK.
>
>


Re: [Lucene.Net] How to obtain Query string AST

Posted by Floyd Wu <fl...@gmail.com>.
Hi Ben,

Thanks for your kindly reply.
I'm not using CJKAnalyzer but StandardAnalyzer. StandardAnalyzer seems
that has some problem deal with CJK synonyms.
Your code shows that write a QueryParser and used by lucene, actually
what I want to do is beyond lucene which means I need to complete
query rewrite before query executed by lucene( or solr ).

Any idea?

Floyd


2011/10/8 Ben West <bw...@yahoo.com>:
> Hey Floyd,
>
> Have you tried: http://lucene.472066.n3.nabble.com/CJKAnalyzer-and-Synonyms-td2510104.html
>
> If you go the AST route, here is a code snippet for a query parser which replaces all term queries with the term + prefix query (e.g. "foo" -> "foo foo*"). This sounds approximately like what you need. (I apologize in advance for the formatting which I'm sure will be lost):
>
> class WildcardQueryParser : MultiFieldQueryParser
>
> {
>         /// <summary>
>         /// Gets a field (term or phrase) query.
>         /// </summary>
>         /// <param name="field"></param>
>         /// <param name="queryText"></param>
>         /// <returns></returns>
>         public override Query GetFieldQuery(string field, string queryText)
>         {
>             Query origQuery = base.GetFieldQuery(field, queryText);
>
>             // The base query parser might decide that the query is null, e.g. if
>             // they search for a word like "and"
>             if (origQuery == null)
>             {
>                 return null;
>             }
>
>             // Since both term and phrase queries call this method though, we need to check
>             // to make sure it's a term query we're rewriting, and not a phrase query.            if (origQuery.GetType() != typeof(PhraseQuery))            {
>                 BooleanQuery bq = new BooleanQuery(false);
>
>                 // Note that base query parser handles analysis, so we don't need to
>                 bq.Add(origQuery, BooleanClause.Occur.SHOULD);
>                 bq.Add(base.GetPrefixQuery(field, queryText), BooleanClause.Occur.SHOULD);
>                 return bq;
>             }
>             else
>             {
>                 return origQuery;
>             }
>         }
> }
>
>
> ----- Original Message -----
> From: Floyd Wu <fl...@gmail.com>
> To: lucene-net-user@lucene.apache.org
> Cc:
> Sent: Thursday, October 6, 2011 11:08 PM
> Subject: [Lucene.Net] How to obtain Query string AST
>
> Hi,
> I want to write my own query expander. I may need to obtain the AST
> (abstract syntax tree) of an already parsed query string, navigate to
> certain parts of it (words) and make logical phrases of those words by
> adding to the AST - where necessary.
>
> And finally transfrom this AST to lucene query string (or query
> objcet) then send to lucene searcher to get result.
>
> This cannot be done to the string because the query logic cannot be
> semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed
> first.
>
> How can this be done with Lucene.Net or combine with other 3-party library?
>
> Thanks for any tips.
> Floyd
>
> PS: example is user input a query string from front-end interface like
> (A OR B) AND (C OR D)
> I want my application rewrite this Query to
> ( A OR Y OR B OR T)  AND  (C OR Z OR D OR F)
>
> The A B Y T C Z D F are CJK-words(term) with double-quota surround it.
> Why I want to do this, Basically I want to do synonymous query but
> lucene.net's synonymous seems have some problem in my test (Solr also)
> especially processing CJK.
>
>

Re: [Lucene.Net] How to obtain Query string AST

Posted by Ben West <bw...@yahoo.com>.
Hey Floyd,

Have you tried: http://lucene.472066.n3.nabble.com/CJKAnalyzer-and-Synonyms-td2510104.html

If you go the AST route, here is a code snippet for a query parser which replaces all term queries with the term + prefix query (e.g. "foo" -> "foo foo*"). This sounds approximately like what you need. (I apologize in advance for the formatting which I'm sure will be lost):

class WildcardQueryParser : MultiFieldQueryParser

{
        /// <summary>
        /// Gets a field (term or phrase) query.
        /// </summary>
        /// <param name="field"></param>
        /// <param name="queryText"></param>
        /// <returns></returns>
        public override Query GetFieldQuery(string field, string queryText)
        {
            Query origQuery = base.GetFieldQuery(field, queryText);

            // The base query parser might decide that the query is null, e.g. if
            // they search for a word like "and"
            if (origQuery == null)
            {
                return null;
            }

            // Since both term and phrase queries call this method though, we need to check
            // to make sure it's a term query we're rewriting, and not a phrase query.            if (origQuery.GetType() != typeof(PhraseQuery))            {
                BooleanQuery bq = new BooleanQuery(false);

                // Note that base query parser handles analysis, so we don't need to
                bq.Add(origQuery, BooleanClause.Occur.SHOULD);     
                bq.Add(base.GetPrefixQuery(field, queryText), BooleanClause.Occur.SHOULD);
                return bq;
            }
            else
            {
                return origQuery;
            }
        }
}


----- Original Message -----
From: Floyd Wu <fl...@gmail.com>
To: lucene-net-user@lucene.apache.org
Cc: 
Sent: Thursday, October 6, 2011 11:08 PM
Subject: [Lucene.Net] How to obtain Query string AST

Hi,
I want to write my own query expander. I may need to obtain the AST
(abstract syntax tree) of an already parsed query string, navigate to
certain parts of it (words) and make logical phrases of those words by
adding to the AST - where necessary.

And finally transfrom this AST to lucene query string (or query
objcet) then send to lucene searcher to get result.

This cannot be done to the string because the query logic cannot be
semantically altered. (e.g. AND, OR, paren's etc) so it must be parsed
first.

How can this be done with Lucene.Net or combine with other 3-party library?

Thanks for any tips.
Floyd

PS: example is user input a query string from front-end interface like
(A OR B) AND (C OR D)
I want my application rewrite this Query to
( A OR Y OR B OR T)  AND  (C OR Z OR D OR F)

The A B Y T C Z D F are CJK-words(term) with double-quota surround it.
Why I want to do this, Basically I want to do synonymous query but
lucene.net's synonymous seems have some problem in my test (Solr also)
especially processing CJK.