You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Richard Gunderson <Ri...@honda-eu.com> on 2006/03/27 17:56:18 UTC

Phrase Query query

Hi

I'm using PhraseQuery in conjunction with WhiteSpaceAnalyzer but it's
giving me slightly unusual results. If I have a text file containing the
text (quotes are just for clarity):

"Hello this is some text"

I don't find any results when I search.

But if I put spaces before and after the phrase:

" Hello this is some text "

then it does work. I'm breaking the phrase down into Terms, and setting the
slop to '0' by the way.

I'm kind of see that this makes sense, given the name: WhiteSpaceAnalyzer.
But aren't newlines, carriage-returns etc also treated as whitespace?

Thanks for your help!

Regards

Richard Gundersen
Honda UK - ISD
Tel: +44 (0)1753 590681
**************************************************************************
This email is confidential and intended solely for the use of the
individual to whom it is addressed. Any views or opinions presented are
solely those of the author and do not necessarily represent those of Honda
Motor Europe Ltd. or any of its group of companies.

If you are not the intended recipient, be advised that you have received
this email in error and that any use, dissemination, forwarding, printing
or copying of this email is strictly prohibited.

Visit our website: http://www.honda.co.uk
**************************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

Posted by Paul Elschot <pa...@xs4all.nl>.
Jelda,

I have just added a patch for DisjunctionSumScorer.java here:
https://issues.apache.org/jira/browse/LUCENE-413 issue.

Could you try that patch and report the results at the jira issue?

In case you need help using the patch could you move the
discussion to the java-dev list?

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

Posted by Ramana Jelda <ra...@ciao-group.com>.
Thanks for your reply.
For smaller index it is working fine.
I will try again and again to reproduce exception.

Please let me know, if there is a quick fix to do locally.

Thanks & Regards,
Jelda

> -----Original Message-----
> From: Paul Elschot [mailto:paul.elschot@xs4all.nl] 
> Sent: Tuesday, March 28, 2006 11:12 PM
> To: java-user@lucene.apache.org
> Subject: Re: BooleanQuery containing SpanNearQuery throws 
> ArrayOutOfBoundsException .
> 
> Comments inline below.
> 
> On Tuesday 28 March 2006 18:29, Ramana Jelda wrote:
> > 
> > Hi,
> > I have a got strange problem.
> > My searchterm : "mp3 player"
> > Lucene Query : 
> > +(
> >   +(
> >     spanNear([productName:mp, productName:3], 3, true) 
> >     spanNear([subName:mp, subName:3], 3, true)
> >    )
> >  +(productName:player subName:player)
> > )
> > 
> > Throws following lucene BooleanScorer2 exception.
> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
> > 	at
> > 
> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor(Boolea
> > nScore
> > r2.java:54)
> > 	at
> > 
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:328)
> > 	at
> > 
> org.apache.lucene.search.ConjunctionScorer.score(ConjunctionSc
> orer.java:82)
> > 	at
> > 
> org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2
> .java:186)
> > 	at
> > 
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:327)
> > 	at
> > 
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291)
> > 	at
> > 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
> > 	at
> > 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
> > 	at org.apache.lucene.search.Searcher.search(Searcher.java:76)
> > 
> > 
> > I tried to look at forums and JIRA issues. It seems somewhat it is 
> > releated to https://issues.apache.org/jira/browse/LUCENE-413 issue.
> 
> That seems to be the case indeed.
> Would it be possible for you to provide a (preferably small) 
> lucene index that shows this problem?
> When so, could you post it at the jira issue?
> 
> > At the same time if I search for "gx3 minolta" which makes a lucene 
> > query as
> > +(
> >  +(
> >    spanNear([productName:gx, productName:3], 3, true) 
> >    spanNear([subName:gx, subName:3], 3, true)
> >   )
> >  +(productName:minolta subName:minolta)
> > )
> > Works fine without any problems.
> 
> Similar strange behaviour occurred on the previous occasion.
> 
> > Does anyone encountered similar problem.
> > Do I totally ignore span queries and switch back to phrasequeries 
> > (which ofcourse not ordered & a drawback for our search)
> 
> You might try the alternative implementation of span queries 
> that is available at the jira issue. However, even with that, 
> the problem persisted on the previous occasion, so the source 
> of the problem seems to be somewhere else.
> This is also why a test index would be most welcome.
> 
> Regards,
> Paul Elschot
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

Posted by Paul Elschot <pa...@xs4all.nl>.
Comments inline below.

On Tuesday 28 March 2006 18:29, Ramana Jelda wrote:
> 
> Hi,
> I have a got strange problem.
> My searchterm : "mp3 player"
> Lucene Query : 
> +(
>   +(
>     spanNear([productName:mp, productName:3], 3, true) 
>     spanNear([subName:mp, subName:3], 3, true)
>    ) 
>  +(productName:player subName:player)
> )
> 
> Throws following lucene BooleanScorer2 exception.
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
> 	at
> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor(BooleanScore
> r2.java:54)
> 	at
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:328)
> 	at
> org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:82)
> 	at
> org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java:186)
> 	at
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:327)
> 	at
> org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291)
> 	at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
> 	at
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
> 	at org.apache.lucene.search.Searcher.search(Searcher.java:76)
> 
> 
> I tried to look at forums and JIRA issues. It seems somewhat it is releated
> to https://issues.apache.org/jira/browse/LUCENE-413 issue.

That seems to be the case indeed.
Would it be possible for you to provide a (preferably small) lucene index that
shows this problem?
When so, could you post it at the jira issue?

> At the same time if I search for "gx3 minolta" which makes a lucene query as
> +(
>  +(
>    spanNear([productName:gx, productName:3], 3, true) 
>    spanNear([subName:gx, subName:3], 3, true)
>   )
>  +(productName:minolta subName:minolta)
> )
> Works fine without any problems.

Similar strange behaviour occurred on the previous occasion.

> Does anyone encountered similar problem.
> Do I totally ignore span queries and switch back to phrasequeries (which
> ofcourse not ordered & a drawback for our search)

You might try the alternative implementation of span queries that is available
at the jira issue. However, even with that, the problem persisted on the 
previous occasion, so the source of the problem seems to be somewhere else.
This is also why a test index would be most welcome.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


BooleanQuery containing SpanNearQuery throws ArrayOutOfBoundsException .

Posted by Ramana Jelda <ra...@ciao-group.com>.
Hi,
I have a got strange problem.
My searchterm : "mp3 player"
Lucene Query : 
+(
  +(
    spanNear([productName:mp, productName:3], 3, true) 
    spanNear([subName:mp, subName:3], 3, true)
   ) 
 +(productName:player subName:player)
)

Throws following lucene BooleanScorer2 exception.
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
	at
org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor(BooleanScore
r2.java:54)
	at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:328)
	at
org.apache.lucene.search.ConjunctionScorer.score(ConjunctionScorer.java:82)
	at
org.apache.lucene.search.BooleanScorer2$2.score(BooleanScorer2.java:186)
	at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:327)
	at
org.apache.lucene.search.BooleanScorer2.score(BooleanScorer2.java:291)
	at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:132)
	at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:110)
	at org.apache.lucene.search.Searcher.search(Searcher.java:76)


I tried to look at forums and JIRA issues. It seems somewhat it is releated
to https://issues.apache.org/jira/browse/LUCENE-413 issue.
At the same time if I search for "gx3 minolta" which makes a lucene query as
+(
 +(
   spanNear([productName:gx, productName:3], 3, true) 
   spanNear([subName:gx, subName:3], 3, true)
  )
 +(productName:minolta subName:minolta)
)
Works fine without any problems.

Does anyone encountered similar problem.
Do I totally ignore span queries and switch back to phrasequeries (which
ofcourse not ordered & a drawback for our search)


Jelda


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phrase Query query

Posted by Richard Gunderson <Ri...@honda-eu.com>.
Hi Otis

Thanks for the information. I'm actually writing something to search files
containing code (such as JSP files) so I do expect there will be a few
problems like this because I guess Lucene's out-of-the box analyzers are
really suited to natural languages. But, I was wondering if you could you
recommend an Analyzer that would be good for my purposes. I tried
StandardAnalyzer and KeywordAnalyzer with limited success, although I admit
I might not be using them 100% correctly as I'm new to Lucene.

FYI I've added an option to bypass Lucene so I just do an String.indexOf on
my search terms, which works well (embarassingly-crude though!) but it
would be nice to be able to use the power of real text searching within
Lucene ;-)

Regards

Richard Gundersen
Honda UK - ISD
Tel: +44 (0)1753 590681
**************************************************************************
This email is confidential and intended solely for the use of the
individual to whom it is addressed. Any views or opinions presented are
solely those of the author and do not necessarily represent those of Honda
Motor Europe Ltd. or any of its group of companies.

If you are not the intended recipient, be advised that you have received
this email in error and that any use, dissemination, forwarding, printing
or copying of this email is strictly prohibited.

Visit our website: http://www.honda.co.uk
**************************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Phrase Query query

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Richard,

WhitespaceTokenizer (the tokenizer that WhitespaceAnalyzer uses) really just tokenizes on space characters:

  /** Collects only characters which do not satisfy
   * {@link Character#isWhitespace(char)}.*/
  protected boolean isTokenChar(char c) {
    return !Character.isWhitespace(c);
  }

Otis

----- Original Message ----
From: Richard Gunderson <Ri...@honda-eu.com>
To: java-user@lucene.apache.org
Sent: Monday, March 27, 2006 10:56:18 AM
Subject: Phrase Query query


Hi

I'm using PhraseQuery in conjunction with WhiteSpaceAnalyzer but it's
giving me slightly unusual results. If I have a text file containing the
text (quotes are just for clarity):

"Hello this is some text"

I don't find any results when I search.

But if I put spaces before and after the phrase:

" Hello this is some text "

then it does work. I'm breaking the phrase down into Terms, and setting the
slop to '0' by the way.

I'm kind of see that this makes sense, given the name: WhiteSpaceAnalyzer.
But aren't newlines, carriage-returns etc also treated as whitespace?

Thanks for your help!

Regards

Richard Gundersen
Honda UK - ISD
Tel: +44 (0)1753 590681
**************************************************************************
This email is confidential and intended solely for the use of the
individual to whom it is addressed. Any views or opinions presented are
solely those of the author and do not necessarily represent those of Honda
Motor Europe Ltd. or any of its group of companies.

If you are not the intended recipient, be advised that you have received
this email in error and that any use, dissemination, forwarding, printing
or copying of this email is strictly prohibited.

Visit our website: http://www.honda.co.uk
**************************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org