You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Mikhail Khludnev <mk...@apache.org> on 2017/06/20 08:23:38 UTC

Re: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

Hello Ranganath,

I guess you need to loop through LeafReaderContexts, create scorer/span for
them to get to 7th crore and beyond.

On Tue, Jun 20, 2017 at 10:59 AM, Ranganath B N <ra...@huawei.com>
wrote:

> Hi,
>
>
>     This is regarding the search limit of  SpanNearQuery Class.  I create
> a lucene index  consisting of 2 billion documents .
>    Then obtain a spans object  through    getspans method of Spanweight
> object  created from     SpanNearQuery  object.     Then I get the
> matching documents  by iterating through spans.nextdoc().   But   searching
> through   spans object      returns   results only if search terms are
> within first  6 crore   inserted  documents.   Am I missing  anything
> during initialization so that search is getting restricted or is this a
> limitation issue with  SpanNearQuery Class?
> I am using Apache lucene 6.5.0 version.  Please let me know about this
> since I am using this for a critical project?
>
> Thanks,
> Ranganath B. N.
>
>


-- 
Sincerely yours
Mikhail Khludnev

RE: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

Posted by Ranganath B N <ra...@huawei.com>.

Hi Allison  and Mikhail Khludnev,

Great. Your suggestion works. But I have one more question. In the same context, if my SpanNearQuery matches 10000 documents, it takes 20 seconds to retrieve the results (time is spent in getspans method). How can I optimize this time?   I need to obtain the positions of  Matches also (for this, I am using SpanWeight.Postings.POSITIONS argument in the getspans method)

Thanks,
Ranganath B. N. 

-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org] 
Sent: Tuesday, June 20, 2017 5:35 PM
To: java-user@lucene.apache.org
Subject: RE: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

As an example of Mikhail's suggestion:
https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/concordance/charoffsets/SpansCrawler.java

If you are trying to build a concordance, see ConcordanceSearcher in that package.
 
See examples on how to run the ConcordanceSearcher here: https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/test/java/org/apache/lucene/search/concordance/TestConcordanceSearcher.java 

Let me know if you have any questions/if the bad behavior still exists.

Finally, be aware of this potential deal-breaker with SpanQueries: https://issues.apache.org/jira/browse/LUCENE-7398 ("Nested SpanQueries are buggy")

Cheers,

            Tim

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org]
Sent: Tuesday, June 20, 2017 4:24 AM
To: java-user@lucene.apache.org
Subject: Re: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

Hello Ranganath,

I guess you need to loop through LeafReaderContexts, create scorer/span for them to get to 7th crore and beyond.

On Tue, Jun 20, 2017 at 10:59 AM, Ranganath B N <ra...@huawei.com>
wrote:

> Hi,
>
>
>     This is regarding the search limit of  SpanNearQuery Class.  I 
> create a lucene index  consisting of 2 billion documents .
>    Then obtain a spans object  through    getspans method of Spanweight
> object  created from     SpanNearQuery  object.     Then I get the
> matching documents  by iterating through spans.nextdoc().   But   searching
> through   spans object      returns   results only if search terms are
> within first  6 crore   inserted  documents.   Am I missing  anything
> during initialization so that search is getting restricted or is this 
> a limitation issue with  SpanNearQuery Class?
> I am using Apache lucene 6.5.0 version.  Please let me know about this 
> since I am using this for a critical project?
>
> Thanks,
> Ranganath B. N.
>
>
--
Sincerely yours
Mikhail Khludnev

RE: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

Posted by "Allison, Timothy B." <ta...@mitre.org>.

As an example of Mikhail's suggestion:
https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/concordance/charoffsets/SpansCrawler.java

If you are trying to build a concordance, see ConcordanceSearcher in that package.
 
See examples on how to run the ConcordanceSearcher here: https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/test/java/org/apache/lucene/search/concordance/TestConcordanceSearcher.java 

Let me know if you have any questions/if the bad behavior still exists.

Finally, be aware of this potential deal-breaker with SpanQueries: https://issues.apache.org/jira/browse/LUCENE-7398 ("Nested SpanQueries are buggy")

Cheers,

            Tim

-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org] 
Sent: Tuesday, June 20, 2017 4:24 AM
To: java-user@lucene.apache.org
Subject: Re: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)

Hello Ranganath,

I guess you need to loop through LeafReaderContexts, create scorer/span for them to get to 7th crore and beyond.

On Tue, Jun 20, 2017 at 10:59 AM, Ranganath B N <ra...@huawei.com>
wrote:

> Hi,
>
>
>     This is regarding the search limit of  SpanNearQuery Class.  I 
> create a lucene index  consisting of 2 billion documents .
>    Then obtain a spans object  through    getspans method of Spanweight
> object  created from     SpanNearQuery  object.     Then I get the
> matching documents  by iterating through spans.nextdoc().   But   searching
> through   spans object      returns   results only if search terms are
> within first  6 crore   inserted  documents.   Am I missing  anything
> during initialization so that search is getting restricted or is this 
> a limitation issue with  SpanNearQuery Class?
> I am using Apache lucene 6.5.0 version.  Please let me know about this 
> since I am using this for a critical project?
>
> Thanks,
> Ranganath B. N.
>
>
--
Sincerely yours
Mikhail Khludnev