You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jack Krupansky <ja...@basetechnology.com> on 2012/12/14 23:48:14 UTC
precisionStep for days in TrieDate
If I specify a precisionStep of 26 for a TrieDate field, what rough impact should this have on both performance and index size?
The input data has time in it, but the milliseconds per day is not needed for the app. Will Lucene store only the top 64 minus 26 bits of data and discard the low 26 bits?
I’ve read that a higher precisionStep will lower performance. Will a precisionStep of 26 have dramatically lower performance when referencing days (without time of day)?
I suppose that the piece of information I am missing is whether trie precisionStep simply affects some extra index table that trie keeps beyond the raw data values or the data values themselves.
-- Jack Krupansky
Re: precisionStep for days in TrieDate
Posted by Jack Krupansky <ja...@basetechnology.com>.
Thanks, you answered the main question - 26 doesn't simply lop off the time
of day. Although, I still don't completely follow how trie works (without
reading the paper itself.)
-- Jack Krupansky
-----Original Message-----
From: Uwe Schindler
Sent: Friday, December 14, 2012 5:58 PM
To: java-user@lucene.apache.org
Subject: RE: precisionStep for days in TrieDate
Hi,
> If I specify a precisionStep of 26 for a TrieDate field, what rough impact
> should this have on both performance and index size?
This value is mostly useless, everything > 8 does slowdown the queries tot
he speed of TermRangeQuery.
> The input data has time in it, but the milliseconds per day is not needed
> for
> the app. Will Lucene store only the top 64 minus 26 bits of data and
> discard
> the low 26 bits?
No, you may need to read the Javadocs of NumericRangeQuery, now updated with
formulas: http://goo.gl/nyXQR
The precisionStep is a count, after how many bits of the indexed value a new
term starts. The original value is always indexed in full precision.
Precision step of 4 for a 32 bit value(integer) means terms with these bit
counts:
All 32, left 28, left 24, left 20, left 16, left 12, left 8, left 4 bits of
the value (total 8 terms/value). A precision step of 26 would index 2 terms:
all 32 bits and one single term with the remaining 6 bits from the left.
> I’ve read that a higher precisionStep will lower performance. Will a
> precisionStep of 26 have dramatically lower performance when referencing
> days (without time of day)?
See above. The assumption that 26 will limit precision to days is wrong.
> I suppose that the piece of information I am missing is whether trie
> precisionStep simply affects some extra index table that trie keeps beyond
> the raw data values or the data values themselves.
It only affects how the value is indexed (how many terms), but not the
value.
Uwe
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
RE: precisionStep for days in TrieDate
Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,
> If I specify a precisionStep of 26 for a TrieDate field, what rough impact
> should this have on both performance and index size?
This value is mostly useless, everything > 8 does slowdown the queries tot he speed of TermRangeQuery.
> The input data has time in it, but the milliseconds per day is not needed for
> the app. Will Lucene store only the top 64 minus 26 bits of data and discard
> the low 26 bits?
No, you may need to read the Javadocs of NumericRangeQuery, now updated with formulas: http://goo.gl/nyXQR
The precisionStep is a count, after how many bits of the indexed value a new term starts. The original value is always indexed in full precision. Precision step of 4 for a 32 bit value(integer) means terms with these bit counts:
All 32, left 28, left 24, left 20, left 16, left 12, left 8, left 4 bits of the value (total 8 terms/value). A precision step of 26 would index 2 terms: all 32 bits and one single term with the remaining 6 bits from the left.
> I’ve read that a higher precisionStep will lower performance. Will a
> precisionStep of 26 have dramatically lower performance when referencing
> days (without time of day)?
See above. The assumption that 26 will limit precision to days is wrong.
> I suppose that the piece of information I am missing is whether trie
> precisionStep simply affects some extra index table that trie keeps beyond
> the raw data values or the data values themselves.
It only affects how the value is indexed (how many terms), but not the value.
Uwe
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org