You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Philip Puffinburger <pp...@tlcdelivers.com> on 2009/07/21 20:47:27 UTC

Strange(?) behaviour using MultiFieldQueryParser

We have code (using Lucene 2.4.1) that will build a query that looks like:

 

fielda:"ruz an"~2 OR fieldb:"ruz an"~2 OR fieldc:"ruz an"~2

 

When passed to a MultiFieldQueryParser and parsed it comes back looking
like:

 

fielda:"ruz an"~2 fieldb:"ruz an"~2 fieldc:ruz

 

It seems that whenever the input phrase has a stop word in it, fieldc
doesn't keep the phrase intact.   If there is no stop word then it works as
expected and just removes the OR's.

 

The code is pretty simple

 

MultiFieldQueryParser parser = new MultiFieldQueryParser(getCrossfields(),
new MyAnalyzer());

parser.setDefaultOperator(QueryParser.Operator.AND);

            

Query crossFieldQuery = parser.parse(searchExpression);

 

getCrossFields returns a String[] of all the fields we want to look at.
MyAnalyzer is a pretty basic analyzer which adds a filter to remove
diacritics.   searchExpression is the whatever is coming into the search
function, in this case the pre-built query.

 

 

 

Is there a reason that the parse method decides to change the last field
instead of keeping the phrase intact?

Re: Analysis Question

Posted by Ian Lea <ia...@gmail.com>.

You could write your own analyzer that worked out a boost as it
analyzed the document fields and had a getBoost() method that you
would call to get the value to add to the document as a separate
field.  If you write your own you can pass it what you like and it can
do whatever you want.


--
Ian.

On Thu, Aug 6, 2009 at 8:37 PM, Christopher Condit<co...@sdsc.edu> wrote:
> Hi Anshum-
>> You might want to look at writing a custom analyzer or something and
>> add a
>> document boost (while indexing) for documents containing those terms.
>
> Do you know how to access the document from an analyzer? It seems to only have access to the field...
>
> Thanks,
> -Chris
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: Analysis Question

Posted by Christopher Condit <co...@sdsc.edu>.

Hi Anshum-
> You might want to look at writing a custom analyzer or something and
> add a
> document boost (while indexing) for documents containing those terms.

Do you know how to access the document from an analyzer? It seems to only have access to the field...

Thanks,
-Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Analysis Question

Posted by Anshum <an...@gmail.com>.

Hi Cristopher,
You might want to look at writing a custom analyzer or something and add a
document boost (while indexing) for documents containing those terms.

--
Anshum Gupta
Naukri Labs!
http://ai-cafe.blogspot.com

The facts expressed here belong to everybody, the opinions to me. The
distinction is yours to draw............


On Thu, Aug 6, 2009 at 2:41 AM, Christopher Condit <co...@sdsc.edu> wrote:

> Perhaps a better question: let's say I have a few thousand terms or
> phrases. I want to prefer documents with these phrases in my search results
> over documents that do not have these terms or phrases. What's the best way
> to accomplish this?
> Thanks,
> -Chris
>
> > -----Original Message-----
> > From: Christopher Condit [mailto:condit@sdsc.edu]
> > Sent: Tuesday, July 21, 2009 2:48 PM
> > To: java-user@lucene.apache.org
> > Subject: Analysis Question
> >
> > I'm trying to implement an analyzer that will compute a score based on
> > vocabulary terms in the indexed content (ie a document field with more
> > terms in the vocabulary will score higher). Although I can see the
> > tokens I can't seem to access the document from the analyzer to set a
> > new field on it after I compute the value. Is there a way to do this
> > from an Analyzer? Or is there another preferred way to do this?
> > Thanks,
> > -Chris
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

RE: Analysis Question

Posted by Christopher Condit <co...@sdsc.edu>.

Perhaps a better question: let's say I have a few thousand terms or phrases. I want to prefer documents with these phrases in my search results over documents that do not have these terms or phrases. What's the best way to accomplish this?
Thanks,
-Chris

> -----Original Message-----
> From: Christopher Condit [mailto:condit@sdsc.edu]
> Sent: Tuesday, July 21, 2009 2:48 PM
> To: java-user@lucene.apache.org
> Subject: Analysis Question
> 
> I'm trying to implement an analyzer that will compute a score based on
> vocabulary terms in the indexed content (ie a document field with more
> terms in the vocabulary will score higher). Although I can see the
> tokens I can't seem to access the document from the analyzer to set a
> new field on it after I compute the value. Is there a way to do this
> from an Analyzer? Or is there another preferred way to do this?
> Thanks,
> -Chris
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Analysis Question

Posted by Christopher Condit <co...@sdsc.edu>.

I'm trying to implement an analyzer that will compute a score based on vocabulary terms in the indexed content (ie a document field with more terms in the vocabulary will score higher). Although I can see the tokens I can't seem to access the document from the analyzer to set a new field on it after I compute the value. Is there a way to do this from an Analyzer? Or is there another preferred way to do this?
Thanks,
-Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org