You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Steve Rowe (JIRA)" <ji...@apache.org> on 2015/08/08 17:12:46 UTC

[jira] [Reopened] (LUCENE-6697) Use 1D KD tree for alternative to postings based numeric range filters

     [ https://issues.apache.org/jira/browse/LUCENE-6697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Rowe reopened LUCENE-6697:
--------------------------------

Seeing 100% reproducible failure on branch_5x:

{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestRangeTree -Dtests.method=testMultiValued -Dtests.seed=FD1D848DDE038459 -Dtests.slow=true -Dtests.locale=hr_HR -Dtests.timezone=Europe/Madrid -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
   [junit4] ERROR   0.05s J8 | TestRangeTree.testMultiValued <<<
   [junit4]    > Throwable #1: java.lang.IllegalArgumentException: maxValuesSortInHeap must be >= maxValuesInLeafNode; got 1250 vs maxValuesInLeafNode=2013
   [junit4]    > 	at __randomizedtesting.SeedInfo.seed([FD1D848DDE038459:293DE0BF10C1C411]:0)
   [junit4]    > 	at org.apache.lucene.rangetree.RangeTreeWriter.verifyParams(RangeTreeWriter.java:114)
   [junit4]    > 	at org.apache.lucene.rangetree.RangeTreeDocValuesFormat.<init>(RangeTreeDocValuesFormat.java:98)
   [junit4]    > 	at org.apache.lucene.rangetree.TestRangeTree.testMultiValued(TestRangeTree.java:128)
   [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
   [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {}, docValues:{}, sim=DefaultSimilarity, locale=hr_HR, timezone=Europe/Madrid
   [junit4]   2> NOTE: Linux 4.1.0-custom2-amd64 amd64/Oracle Corporation 1.8.0_45 (64-bit)/cpus=16,threads=1,free=441475104,total=504365056
   [junit4]   2> NOTE: All tests run in this JVM: [TestRangeTree]
{noformat}

> Use 1D KD tree for alternative to postings based numeric range filters
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-6697
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6697
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.3, Trunk
>
>         Attachments: LUCENE-6697.patch, LUCENE-6697.patch, LUCENE-6697.patch
>
>
> Today Lucene uses postings to index a numeric value at multiple
> precision levels for fast range searching.  It's somewhat costly: each
> numeric value is indexed with multiple terms (4 terms by default)
> ... I think a dedicated 1D BKD tree should be more compact and perform
> better.
> It should also easily generalize beyond 64 bits to arbitrary byte[],
> e.g. for LUCENE-5596, but I haven't explored that here.
> A 1D BKD tree just sorts all values, and then indexes adjacent leaf
> blocks of size 512-1024 (by default) values per block, and their
> docIDs, into a fully balanced binary tree.  Building the range filter
> is then just a recursive walk through this tree.
> It's the same structure we use for 2D lat/lon BKD tree, just with 1D
> instead.  I implemented it as a DocValuesFormat that also writes the
> numeric tree on the side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org