You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2014/05/05 15:21:19 UTC
[jira] [Resolved] (LUCENE-5609) Should we revisit the default
numeric precision step?
[ https://issues.apache.org/jira/browse/LUCENE-5609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless resolved LUCENE-5609.
----------------------------------------
Resolution: Fixed
> Should we revisit the default numeric precision step?
> -----------------------------------------------------
>
> Key: LUCENE-5609
> URL: https://issues.apache.org/jira/browse/LUCENE-5609
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5609.patch
>
>
> Right now it's 4, for both 8 (long/double) and 4 byte (int/float)
> numeric fields, but this is a pretty big hit on indexing speed and
> disk usage, especially for tiny documents, because it creates many (8
> or 16) terms for each value.
> Since we originally set these defaults, a lot has changed... e.g. we
> now rewrite MTQs per-segment, we have a faster (BlockTree) terms dict,
> a faster postings format, etc.
> Index size is important because it limits how much of the index will
> be hot (fit in the OS's IO cache). And more apps are using Lucene for
> tiny docs where the overhead of individual fields is sizable.
> I used the Geonames corpus to run a simple benchmark (all sources are
> committed to luceneutil). It has 8.6 M tiny docs, each with 23 fields,
> with these numeric fields:
> * lat/lng (double)
> * modified time, elevation, population (long)
> * dem (int)
> I tested 4, 8 and 16 precision steps:
> {noformat}
> indexing:
> PrecStep Size IndexTime
> 4 1812.7 MB 651.4 sec
> 8 1203.0 MB 443.2 sec
> 16 894.3 MB 361.6 sec
> searching:
> Field PrecStep QueryTime TermCount
> geoNameID 4 2872.5 ms 20306
> geoNameID 8 2903.3 ms 104856
> geoNameID 16 3371.9 ms 5871427
> latitude 4 2160.1 ms 36805
> latitude 8 2249.0 ms 240655
> latitude 16 2725.9 ms 4649273
> modified 4 2038.3 ms 13311
> modified 8 2029.6 ms 58344
> modified 16 2060.5 ms 77763
> longitude 4 3468.5 ms 33818
> longitude 8 3629.9 ms 214863
> longitude 16 4060.9 ms 4532032
> {noformat}
> Index time is with 1 thread (for identical index structure).
> The query time is time to run 100 random ranges for that field,
> averaged over 20 iterations. TermCount is the total number of terms
> the MTQ rewrote to across all 100 queries / segments, and it gets
> higher as expected as precStep gets higher, but the search time is not
> that heavily impacted ... negligible going from 4 to 8, and then some
> impact from 8 to 16.
> Maybe we should increase the int/float default precision step to 8 and
> long/double to 16? Or both to 16?
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org