You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Alexander Filipchik <af...@gmail.com> on 2017/07/01 18:11:03 UTC

Issue with range queries on Lucene 6.6 using IntPoint

Not sure if I'm doi9ng something wrong, or there is a bug somewhere but:

I was trying to create a test index of a lot every second in a year and try query it (doesn't have to be time, I'm using it to explain the problem).

Example document consists of 7 fields:
document.add(new IntPoint("year", year));
document.add(new IntPoint("month", month));
document.add(new IntPoint("hour", hour));
document.add(new IntPoint("day", day));
document.add(new IntPoint("minute", minute));
document.add(new IntPoint("second", second));
document.add(new StoredField("date", "y=" + year + "/m=" + month + "/d=" + day + "/h=" + hour + "/m=" + minute + "/s=" + second));
Then I tried to run range query like: 
BooleanQuery.Builder booleanQueryBuilder = new BooleanQuery.Builder()
        .add(IntPoint.newRangeQuery("year", 2016, 2020), BooleanClause.Occur.FILTER)
        .add(IntPoint.newRangeQuery("month", 1, 10), BooleanClause.Occur.FILTER)
        .add(IntPoint.newExactQuery("day", 1), BooleanClause.Occur.FILTER)
        .add(IntPoint.newExactQuery("hour", 1), BooleanClause.Occur.FILTER)
        .add(IntPoint.newExactQuery("minute", 1), BooleanClause.Occur.FILTER)
        .add(IntPoint.newExactQuery("second", 1), BooleanClause.Occur.FILTER);
To get all the first seconds of every hour seconds for month 1 to 10. While number of results are correct, I'm getting wrong stored fields:
y=2017/m=2/d=1/h=1/m=1/s=1
y=2017/m=3/d=1/h=1/m=1/s=1
y=2017/m=1/d=2/h=1/m=26/s=42
y=2017/m=2/d=2/h=1/m=26/s=42
y=2017/m=3/d=2/h=1/m=26/s=42
y=2017/m=1/d=3/h=1/m=52/s=23
y=2017/m=2/d=3/h=1/m=52/s=23
y=2017/m=3/d=3/h=1/m=52/s=23
y=2017/m=1/d=5/h=1/m=18/s=4

As you can see months are repeating + and results are incorrect. Only 2 first results do match the query.
If I remove seconds from the equation then everything is working ok. Is it something I'm doing wrong or I'm hitting some limitations?  

Here is the test code:
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.IntPoint;
import org.apache.lucene.document.StoredField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.SimpleCollector;
import org.apache.lucene.store.RAMDirectory;

import java.io.IOException;



public class Test {
    public Test() throws IOException {
        final RAMDirectory directory = new RAMDirectory();
        IndexWriter iwriter = null;
        final IndexWriterConfig config = new IndexWriterConfig(new StandardAnalyzer());
        iwriter = new IndexWriter(directory, config);

        //Indexing every second for full 2017
        for (int year = 2017; year <= 2017; year++) {
            for (int month = 1; month <= 12; month++) {
                for (int day = 1; day <= 31; day++) {
                    for (int hour = 1; hour <= 1; hour++) {
                        for (int minute = 1; minute <= 60; minute++) {
                            for (int second = 1; second <= 60; second++) {
                                Document document = new Document();
                                document.add(new IntPoint("year", year));
                                document.add(new IntPoint("month", month));
                                document.add(new IntPoint("hour", hour));
                                document.add(new IntPoint("day", day));
                                document.add(new IntPoint("minute", minute));
                                document.add(new IntPoint("second", second));
                                document.add(new StoredField("date", "y=" + year + "/m=" + month + "/d=" + day + "/h=" + hour + "/m=" + minute + "/s=" + second));
                                iwriter.addDocument(document);
                            }
                        }
                    }
                }
            }
        }

        iwriter.close();

        BooleanQuery.Builder booleanQueryBuilder = new BooleanQuery.Builder()
                .add(IntPoint.newRangeQuery("year", 2016, 2020), BooleanClause.Occur.FILTER)
                .add(IntPoint.newRangeQuery("month", 1, 10), BooleanClause.Occur.FILTER)
                .add(IntPoint.newExactQuery("day", 1), BooleanClause.Occur.FILTER)
                .add(IntPoint.newExactQuery("hour", 1), BooleanClause.Occur.FILTER)
                .add(IntPoint.newExactQuery("minute", 1), BooleanClause.Occur.FILTER)
                .add(IntPoint.newExactQuery("second", 1), BooleanClause.Occur.FILTER);

        final IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(directory));
        searcher.search(booleanQueryBuilder.build(), new SimpleCollector() {
            @Override
            public void collect(int doc) throws IOException {
                Document document = searcher.getIndexReader().document(doc);
                System.out.println(document.get("date"));
            }

            public boolean needsScores() {
                return false;
            }
        });

    }
}

Thank you,
Alex

Re: Issue with range queries on Lucene 6.6 using IntPoint

Posted by Michael McCandless <lu...@mikemccandless.com>.
I think there is a bug in your collector, because the "int doc" that is
passed to the collect method is per-segment, but you are passing it to the
top-level reader.

You should override the setNextReader method in Collector, and hold onto
the "int docBase" that's passed in that LeafReaderContext, then add docBase
and doc in your collect method before passing it to the top level reader.

Does that fix it?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Jul 1, 2017 at 2:11 PM, Alexander Filipchik <af...@gmail.com>
wrote:

> Not sure if I'm doi9ng something wrong, or there is a bug somewhere but:
>
> I was trying to create a test index of a lot every second in a year and
> try query it (doesn't have to be time, I'm using it to explain the problem).
>
> Example document consists of 7 fields:
> document.add(new IntPoint("year", year));
> document.add(new IntPoint("month", month));
> document.add(new IntPoint("hour", hour));
> document.add(new IntPoint("day", day));
> document.add(new IntPoint("minute", minute));
> document.add(new IntPoint("second", second));
> document.add(new StoredField("date", "y=" + year + "/m=" + month + "/d=" +
> day + "/h=" + hour + "/m=" + minute + "/s=" + second));
> Then I tried to run range query like:
> BooleanQuery.Builder booleanQueryBuilder = new BooleanQuery.Builder()
>         .add(IntPoint.newRangeQuery("year", 2016, 2020),
> BooleanClause.Occur.FILTER)
>         .add(IntPoint.newRangeQuery("month", 1, 10),
> BooleanClause.Occur.FILTER)
>         .add(IntPoint.newExactQuery("day", 1), BooleanClause.Occur.FILTER)
>         .add(IntPoint.newExactQuery("hour", 1),
> BooleanClause.Occur.FILTER)
>         .add(IntPoint.newExactQuery("minute", 1),
> BooleanClause.Occur.FILTER)
>         .add(IntPoint.newExactQuery("second", 1),
> BooleanClause.Occur.FILTER);
> To get all the first seconds of every hour seconds for month 1 to 10.
> While number of results are correct, I'm getting wrong stored fields:
> y=2017/m=2/d=1/h=1/m=1/s=1
> y=2017/m=3/d=1/h=1/m=1/s=1
> y=2017/m=1/d=2/h=1/m=26/s=42
> y=2017/m=2/d=2/h=1/m=26/s=42
> y=2017/m=3/d=2/h=1/m=26/s=42
> y=2017/m=1/d=3/h=1/m=52/s=23
> y=2017/m=2/d=3/h=1/m=52/s=23
> y=2017/m=3/d=3/h=1/m=52/s=23
> y=2017/m=1/d=5/h=1/m=18/s=4
>
> As you can see months are repeating + and results are incorrect. Only 2
> first results do match the query.
> If I remove seconds from the equation then everything is working ok. Is it
> something I'm doing wrong or I'm hitting some limitations?
>
> Here is the test code:
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.IntPoint;
> import org.apache.lucene.document.StoredField;
> import org.apache.lucene.index.DirectoryReader;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.index.IndexWriterConfig;
> import org.apache.lucene.search.BooleanClause;
> import org.apache.lucene.search.BooleanQuery;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.SimpleCollector;
> import org.apache.lucene.store.RAMDirectory;
>
> import java.io.IOException;
>
>
>
> public class Test {
>     public Test() throws IOException {
>         final RAMDirectory directory = new RAMDirectory();
>         IndexWriter iwriter = null;
>         final IndexWriterConfig config = new IndexWriterConfig(new
> StandardAnalyzer());
>         iwriter = new IndexWriter(directory, config);
>
>         //Indexing every second for full 2017
>         for (int year = 2017; year <= 2017; year++) {
>             for (int month = 1; month <= 12; month++) {
>                 for (int day = 1; day <= 31; day++) {
>                     for (int hour = 1; hour <= 1; hour++) {
>                         for (int minute = 1; minute <= 60; minute++) {
>                             for (int second = 1; second <= 60; second++) {
>                                 Document document = new Document();
>                                 document.add(new IntPoint("year", year));
>                                 document.add(new IntPoint("month", month));
>                                 document.add(new IntPoint("hour", hour));
>                                 document.add(new IntPoint("day", day));
>                                 document.add(new IntPoint("minute",
> minute));
>                                 document.add(new IntPoint("second",
> second));
>                                 document.add(new StoredField("date", "y="
> + year + "/m=" + month + "/d=" + day + "/h=" + hour + "/m=" + minute +
> "/s=" + second));
>                                 iwriter.addDocument(document);
>                             }
>                         }
>                     }
>                 }
>             }
>         }
>
>         iwriter.close();
>
>         BooleanQuery.Builder booleanQueryBuilder = new
> BooleanQuery.Builder()
>                 .add(IntPoint.newRangeQuery("year", 2016, 2020),
> BooleanClause.Occur.FILTER)
>                 .add(IntPoint.newRangeQuery("month", 1, 10),
> BooleanClause.Occur.FILTER)
>                 .add(IntPoint.newExactQuery("day", 1),
> BooleanClause.Occur.FILTER)
>                 .add(IntPoint.newExactQuery("hour", 1),
> BooleanClause.Occur.FILTER)
>                 .add(IntPoint.newExactQuery("minute", 1),
> BooleanClause.Occur.FILTER)
>                 .add(IntPoint.newExactQuery("second", 1),
> BooleanClause.Occur.FILTER);
>
>         final IndexSearcher searcher = new IndexSearcher(DirectoryReader.
> open(directory));
>         searcher.search(booleanQueryBuilder.build(), new
> SimpleCollector() {
>             @Override
>             public void collect(int doc) throws IOException {
>                 Document document = searcher.getIndexReader().
> document(doc);
>                 System.out.println(document.get("date"));
>             }
>
>             public boolean needsScores() {
>                 return false;
>             }
>         });
>
>     }
> }
>
> Thank you,
> Alex
>