You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jack Krupansky <ja...@basetechnology.com> on 2012/12/14 23:48:14 UTC

precisionStep for days in TrieDate

If I specify a precisionStep of 26 for a TrieDate field, what rough impact should this have on both performance and index size?

The input data has time in it, but the milliseconds per day is not needed for the app. Will Lucene store only the top 64 minus 26 bits of data and discard the low 26 bits?

I’ve read that a higher precisionStep will lower performance. Will a precisionStep of 26 have dramatically lower performance when referencing days (without time of day)?

I suppose that the piece of information I am missing is whether trie precisionStep simply affects some extra index table that trie keeps beyond the raw data values or the data values themselves.

-- Jack Krupansky

Re: precisionStep for days in TrieDate

Posted by Jack Krupansky <ja...@basetechnology.com>.
Thanks, you answered the main question - 26 doesn't simply lop off the time 
of day. Although, I still don't completely follow how trie works (without 
reading the paper itself.)

-- Jack Krupansky

-----Original Message----- 
From: Uwe Schindler
Sent: Friday, December 14, 2012 5:58 PM
To: java-user@lucene.apache.org
Subject: RE: precisionStep for days in TrieDate

Hi,

> If I specify a precisionStep of 26 for a TrieDate field, what rough impact
> should this have on both performance and index size?

This value is mostly useless, everything > 8 does slowdown the queries tot 
he speed of TermRangeQuery.

> The input data has time in it, but the milliseconds per day is not needed 
> for
> the app. Will Lucene store only the top 64 minus 26 bits of data and 
> discard
> the low 26 bits?

No, you may need to read the Javadocs of NumericRangeQuery, now updated with 
formulas: http://goo.gl/nyXQR
The precisionStep is a count, after how many bits of the indexed value a new 
term starts. The original value is always indexed in full precision. 
Precision step of 4 for a 32 bit value(integer) means terms with these bit 
counts:
All 32, left 28, left 24, left 20, left 16, left 12, left 8, left 4 bits of 
the value (total 8 terms/value). A precision step of 26 would index 2 terms: 
all 32 bits and one single term with the remaining 6 bits from the left.

> I’ve read that a higher precisionStep will lower performance. Will a
> precisionStep of 26 have dramatically lower performance when referencing
> days (without time of day)?

See above. The assumption that 26 will limit precision to days is wrong.

> I suppose that the piece of information I am missing is whether trie
> precisionStep simply affects some extra index table that trie keeps beyond
> the raw data values or the data values themselves.

It only affects how the value is indexed (how many terms), but not the 
value.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: precisionStep for days in TrieDate

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

> If I specify a precisionStep of 26 for a TrieDate field, what rough impact
> should this have on both performance and index size?

This value is mostly useless, everything > 8 does slowdown the queries tot he speed of TermRangeQuery.

> The input data has time in it, but the milliseconds per day is not needed for
> the app. Will Lucene store only the top 64 minus 26 bits of data and discard
> the low 26 bits?

No, you may need to read the Javadocs of NumericRangeQuery, now updated with formulas: http://goo.gl/nyXQR
The precisionStep is a count, after how many bits of the indexed value a new term starts. The original value is always indexed in full precision. Precision step of 4 for a 32 bit value(integer) means terms with these bit counts:
All 32, left 28, left 24, left 20, left 16, left 12, left 8, left 4 bits of the value (total 8 terms/value). A precision step of 26 would index 2 terms: all 32 bits and one single term with the remaining 6 bits from the left.

> I’ve read that a higher precisionStep will lower performance. Will a
> precisionStep of 26 have dramatically lower performance when referencing
> days (without time of day)?

See above. The assumption that 26 will limit precision to days is wrong.

> I suppose that the piece of information I am missing is whether trie
> precisionStep simply affects some extra index table that trie keeps beyond
> the raw data values or the data values themselves.

It only affects how the value is indexed (how many terms), but not the value.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org