You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mikhail Khludnev <mk...@apache.org> on 2017/06/20 08:23:38 UTC
Re: Correction: SpanNearQuery Class issue through spans object (Not
through Searcher.search() method)
Hello Ranganath,
I guess you need to loop through LeafReaderContexts, create scorer/span for
them to get to 7th crore and beyond.
On Tue, Jun 20, 2017 at 10:59 AM, Ranganath B N <ra...@huawei.com>
wrote:
> Hi,
>
>
> This is regarding the search limit of SpanNearQuery Class. I create
> a lucene index consisting of 2 billion documents .
> Then obtain a spans object through getspans method of Spanweight
> object created from SpanNearQuery object. Then I get the
> matching documents by iterating through spans.nextdoc(). But searching
> through spans object returns results only if search terms are
> within first 6 crore inserted documents. Am I missing anything
> during initialization so that search is getting restricted or is this a
> limitation issue with SpanNearQuery Class?
> I am using Apache lucene 6.5.0 version. Please let me know about this
> since I am using this for a critical project?
>
> Thanks,
> Ranganath B. N.
>
>
--
Sincerely yours
Mikhail Khludnev
RE: Correction: SpanNearQuery Class issue through spans object (Not
through Searcher.search() method)
Posted by Ranganath B N <ra...@huawei.com>.
Hi Allison and Mikhail Khludnev,
Great. Your suggestion works. But I have one more question. In the same context, if my SpanNearQuery matches 10000 documents, it takes 20 seconds to retrieve the results (time is spent in getspans method). How can I optimize this time? I need to obtain the positions of Matches also (for this, I am using SpanWeight.Postings.POSITIONS argument in the getspans method)
Thanks,
Ranganath B. N.
-----Original Message-----
From: Allison, Timothy B. [mailto:tallison@mitre.org]
Sent: Tuesday, June 20, 2017 5:35 PM
To: java-user@lucene.apache.org
Subject: RE: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)
As an example of Mikhail's suggestion:
https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/concordance/charoffsets/SpansCrawler.java
If you are trying to build a concordance, see ConcordanceSearcher in that package.
See examples on how to run the ConcordanceSearcher here: https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/test/java/org/apache/lucene/search/concordance/TestConcordanceSearcher.java
Let me know if you have any questions/if the bad behavior still exists.
Finally, be aware of this potential deal-breaker with SpanQueries: https://issues.apache.org/jira/browse/LUCENE-7398 ("Nested SpanQueries are buggy")
Cheers,
Tim
-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org]
Sent: Tuesday, June 20, 2017 4:24 AM
To: java-user@lucene.apache.org
Subject: Re: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)
Hello Ranganath,
I guess you need to loop through LeafReaderContexts, create scorer/span for them to get to 7th crore and beyond.
On Tue, Jun 20, 2017 at 10:59 AM, Ranganath B N <ra...@huawei.com>
wrote:
> Hi,
>
>
> This is regarding the search limit of SpanNearQuery Class. I
> create a lucene index consisting of 2 billion documents .
> Then obtain a spans object through getspans method of Spanweight
> object created from SpanNearQuery object. Then I get the
> matching documents by iterating through spans.nextdoc(). But searching
> through spans object returns results only if search terms are
> within first 6 crore inserted documents. Am I missing anything
> during initialization so that search is getting restricted or is this
> a limitation issue with SpanNearQuery Class?
> I am using Apache lucene 6.5.0 version. Please let me know about this
> since I am using this for a critical project?
>
> Thanks,
> Ranganath B. N.
>
>
--
Sincerely yours
Mikhail Khludnev
RE: Correction: SpanNearQuery Class issue through spans object (Not
through Searcher.search() method)
Posted by "Allison, Timothy B." <ta...@mitre.org>.
As an example of Mikhail's suggestion:
https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/main/java/org/apache/lucene/search/concordance/charoffsets/SpansCrawler.java
If you are trying to build a concordance, see ConcordanceSearcher in that package.
See examples on how to run the ConcordanceSearcher here: https://github.com/tballison/lucene-addons/blob/master/lucene-5317/src/test/java/org/apache/lucene/search/concordance/TestConcordanceSearcher.java
Let me know if you have any questions/if the bad behavior still exists.
Finally, be aware of this potential deal-breaker with SpanQueries: https://issues.apache.org/jira/browse/LUCENE-7398 ("Nested SpanQueries are buggy")
Cheers,
Tim
-----Original Message-----
From: Mikhail Khludnev [mailto:mkhl@apache.org]
Sent: Tuesday, June 20, 2017 4:24 AM
To: java-user@lucene.apache.org
Subject: Re: Correction: SpanNearQuery Class issue through spans object (Not through Searcher.search() method)
Hello Ranganath,
I guess you need to loop through LeafReaderContexts, create scorer/span for them to get to 7th crore and beyond.
On Tue, Jun 20, 2017 at 10:59 AM, Ranganath B N <ra...@huawei.com>
wrote:
> Hi,
>
>
> This is regarding the search limit of SpanNearQuery Class. I
> create a lucene index consisting of 2 billion documents .
> Then obtain a spans object through getspans method of Spanweight
> object created from SpanNearQuery object. Then I get the
> matching documents by iterating through spans.nextdoc(). But searching
> through spans object returns results only if search terms are
> within first 6 crore inserted documents. Am I missing anything
> during initialization so that search is getting restricted or is this
> a limitation issue with SpanNearQuery Class?
> I am using Apache lucene 6.5.0 version. Please let me know about this
> since I am using this for a critical project?
>
> Thanks,
> Ranganath B. N.
>
>
--
Sincerely yours
Mikhail Khludnev