You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Lokeya <lo...@gmail.com> on 2007/04/11 06:37:34 UTC

Issue with : Searcher.search() returning Hits of same length for different searches

I am following all the points which are mentioned in the following link:

http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71

I am having the following issues:

1. For different Queries I give I get a Hits object where there are always
21 documents, but getting different set but same number everytime. Also not
all documents have the query term which I have set.

I have extended the Analyzer and using a new class
:StopStemmingAnalyzer.java. But that should not be the issue because I am
using that again for searching. Also I use  Field.Store.YES,
Field.Index.TOKENIZED for indexing.

Help Appreciated. 
-- 
View this message in context: http://www.nabble.com/Issue-with-%3A-Searcher.search%28%29-returning-Hits-of-same-length-for-different-searches-tf3557277.html#a9933089
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Issue with : Searcher.search() returning Hits of same length for different searches

Posted by Lokeya <lo...@gmail.com>.
Thanks for your suggestion. I used Luke to debug and found the issue.

I have one million records to index, each of which have "Tiltle",
"Desciption" and "Identifier". If take each document and try to index these
fields my program was very slow. So I took 100,000 records and get the value
of these fields, add them to the addDocument() method. Then I use the Index
writer to write this document. So by doing this looks like it creates only
one document id and have all contents in that.I repeat this writing for
700,000 records so 70 doc ids are craeted in total. Till now no
issue(atleast I assumed)

Then I tried to search the for some value, I was getting Hits whose length
would be some number say 21 and when i try to retrieve the documents
assuming all 21 documents have matches they actually dont have, so whats
happening is, it just gets the docs from same document id. Luke was helpful
in finding this issue. Later I took just around 20 records and tried to
index then separately and tried to retrieve and it worked fine. 

Now my major issue is when I try to open index 700,000 times, it will be
really very slow. I am wondering what is the ideal way to do this.

Thanks in Advance.



Daniel Naber-5 wrote:
> 
> On Wednesday 11 April 2007 18:51, Lokeya wrote:
> 
>> Thanks for your reply. I should have given more information and will
>> keep in mind this for my future queries.
> 
> If nothing else helps, please write a small, standalone test-case that 
> shows the problem. This can then easily be debugged by someone else (but 
> often you find the problem yourself when writing the test case).
> 
> Regards
>  Daniel
> 
> -- 
> http://www.danielnaber.de
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Issue-with-%3A-Searcher.search%28%29-returning-Hits-of-same-length-for-different-searches-tf3557277.html#a9964019
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Issue with : Searcher.search() returning Hits of same length for different searches

Posted by Daniel Naber <lu...@danielnaber.de>.
On Wednesday 11 April 2007 18:51, Lokeya wrote:

> Thanks for your reply. I should have given more information and will
> keep in mind this for my future queries.

If nothing else helps, please write a small, standalone test-case that 
shows the problem. This can then easily be debugged by someone else (but 
often you find the problem yourself when writing the test case).

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Issue with : Searcher.search() returning Hits of same length for different searches

Posted by Lokeya <lo...@gmail.com>.
Thanks for your reply. I should have given more information and will keep in
mind this for my future queries.
Regarding this one I have already done most of things you have asked like:

1. I am confirming what query is getting executed by using query.toString()
2. I read lot of posts in the forum particulary,
http://www.nabble.com/Search-matching-tf2030477.html#a5585431
http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71

I made sure all the above except the case that analyzer used is not the
simple one, which I will now test with some simple one. I like to point out
that my issue is unique for which I didn't see any posts in the forum :
Getting Same number of documents in the Hits when I test for a different set
of queries.



Erick Erickson wrote:
> 
> Well, there's nothing here to help you with, since you haven't provided
> any information to diagnose. Like:
> 
> What queries are actually produced in the different cases?
> Use query.toString().
> 
> I'm immediately suspicious of any statement that "my custom
> code shouldn't be the problem". Try the test again using
> one of the simplest analyzers you can.
> 
> Have you used Luke to query your index interactively and see
> what the results are? Or how the queries parse?
> 
> Please, when asking for help, try looking at the question
> you're asking from the perspective of someone who knows
> nothing about your code. Imagine that a coworker had asked
> you such a question.
> 
> Best
> Erick
> 
> On 4/11/07, Lokeya <lo...@gmail.com> wrote:
>>
>>
>> I am following all the points which are mentioned in the following link:
>>
>>
>> http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
>>
>> I am having the following issues:
>>
>> 1. For different Queries I give I get a Hits object where there are
>> always
>> 21 documents, but getting different set but same number everytime. Also
>> not
>> all documents have the query term which I have set.
>>
>> I have extended the Analyzer and using a new class
>> :StopStemmingAnalyzer.java. But that should not be the issue because I am
>> using that again for searching. Also I use  Field.Store.YES,
>> Field.Index.TOKENIZED for indexing.
>>
>> Help Appreciated.
>> --
>> View this message in context:
>> http://www.nabble.com/Issue-with-%3A-Searcher.search%28%29-returning-Hits-of-same-length-for-different-searches-tf3557277.html#a9933089
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Issue-with-%3A-Searcher.search%28%29-returning-Hits-of-same-length-for-different-searches-tf3557277.html#a9943315
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Issue with : Searcher.search() returning Hits of same length for different searches

Posted by Lokeya <lo...@gmail.com>.
Thanks for your reply. I should have given more information and will keep in
mind this for my future queries.
Regarding this one I have already done most of things you have asked like:

1. I am confirming what query is getting executed by using query.toString()
2. I read lot of posts in the forum particulary,
http://www.nabble.com/Search-matching-tf2030477.html#a5585431
http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71

I made sure all the above except the case that analyzer used is not the
simple one, which I will now test with some simple one. I like to point out
that my issue is unique for which I didn't see any posts in the forum :
Getting Same number of documents in the Hits when I test for a different set
of queries.



Erick Erickson wrote:
> 
> Well, there's nothing here to help you with, since you haven't provided
> any information to diagnose. Like:
> 
> What queries are actually produced in the different cases?
> Use query.toString().
> 
> I'm immediately suspicious of any statement that "my custom
> code shouldn't be the problem". Try the test again using
> one of the simplest analyzers you can.
> 
> Have you used Luke to query your index interactively and see
> what the results are? Or how the queries parse?
> 
> Please, when asking for help, try looking at the question
> you're asking from the perspective of someone who knows
> nothing about your code. Imagine that a coworker had asked
> you such a question.
> 
> Best
> Erick
> 
> On 4/11/07, Lokeya <lo...@gmail.com> wrote:
>>
>>
>> I am following all the points which are mentioned in the following link:
>>
>>
>> http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
>>
>> I am having the following issues:
>>
>> 1. For different Queries I give I get a Hits object where there are
>> always
>> 21 documents, but getting different set but same number everytime. Also
>> not
>> all documents have the query term which I have set.
>>
>> I have extended the Analyzer and using a new class
>> :StopStemmingAnalyzer.java. But that should not be the issue because I am
>> using that again for searching. Also I use  Field.Store.YES,
>> Field.Index.TOKENIZED for indexing.
>>
>> Help Appreciated.
>> --
>> View this message in context:
>> http://www.nabble.com/Issue-with-%3A-Searcher.search%28%29-returning-Hits-of-same-length-for-different-searches-tf3557277.html#a9933089
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/Issue-with-%3A-Searcher.search%28%29-returning-Hits-of-same-length-for-different-searches-tf3557277.html#a9943309
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Issue with : Searcher.search() returning Hits of same length for different searches

Posted by Erick Erickson <er...@gmail.com>.
Well, there's nothing here to help you with, since you haven't provided
any information to diagnose. Like:

What queries are actually produced in the different cases?
Use query.toString().

I'm immediately suspicious of any statement that "my custom
code shouldn't be the problem". Try the test again using
one of the simplest analyzers you can.

Have you used Luke to query your index interactively and see
what the results are? Or how the queries parse?

Please, when asking for help, try looking at the question
you're asking from the perspective of someone who knows
nothing about your code. Imagine that a coworker had asked
you such a question.

Best
Erick

On 4/11/07, Lokeya <lo...@gmail.com> wrote:
>
>
> I am following all the points which are mentioned in the following link:
>
>
> http://wiki.apache.org/lucene-java/LuceneFAQ#head-3558e5121806fb4fce80fc022d889484a9248b71
>
> I am having the following issues:
>
> 1. For different Queries I give I get a Hits object where there are always
> 21 documents, but getting different set but same number everytime. Also
> not
> all documents have the query term which I have set.
>
> I have extended the Analyzer and using a new class
> :StopStemmingAnalyzer.java. But that should not be the issue because I am
> using that again for searching. Also I use  Field.Store.YES,
> Field.Index.TOKENIZED for indexing.
>
> Help Appreciated.
> --
> View this message in context:
> http://www.nabble.com/Issue-with-%3A-Searcher.search%28%29-returning-Hits-of-same-length-for-different-searches-tf3557277.html#a9933089
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>