You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jmlucjav <jm...@gmail.com> on 2012/12/14 19:16:46 UTC

optimun precisionStep for DAY granularity in a TrieDateField

Hi

I have a TrieDateField in my index, where I will index dates (range
2000-2020). I am only interested in the DAY granularity, that is , I dont
care about time (I'll index all based on the same Timezone).

Is there an optimun value for precisionStep that I can use so I don't index
info I will not ever use?? I have looked but have not found some info on
what values of precisionStep map to year/month/../day/hour... (not sure if
the mapping is straightforward anyway).

thanks for the help.





--
View this message in context: http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by jmlucjav <jm...@gmail.com>.
thanks Lance. 

I new about rounding in the request params, but I want to know if there is
something to tweak at indexing time (by changing precisionSteop in
schema.xml) in order to store only needed information. At query time yes, I
would round to /DAY



--
View this message in context: http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078p4027120.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by Lance Norskog <go...@gmail.com>.
Do you use rounding in your dates? You can index a date rounded to the 
nearest minute, N minutes, hour or day. This way a range query has to 
look at such a small number of terms that you may not need to tune the 
precision step. Hunt for NOW/DAY or 5DAYS in the queries.

http://wiki.apache.org/solr/SimpleFacetParameters

On 12/14/2012 10:16 AM, jmlucjav wrote:
> Hi
>
> I have a TrieDateField in my index, where I will index dates (range
> 2000-2020). I am only interested in the DAY granularity, that is , I dont
> care about time (I'll index all based on the same Timezone).
>
> Is there an optimun value for precisionStep that I can use so I don't index
> info I will not ever use?? I have looked but have not found some info on
> what values of precisionStep map to year/month/../day/hour... (not sure if
> the mapping is straightforward anyway).
>
> thanks for the help.
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by Erick Erickson <er...@gmail.com>.
Well, it's just a long so unless you have a humongous number of documents
in your shards, there are probably bigger fish to fry.

So I'd just try indexing with date math (e.g. /DAY) until I could
demonstrate a problem.

Best
Erick


On Sat, Dec 15, 2012 at 10:27 AM, Jack Krupansky <ja...@basetechnology.com>wrote:

> Maybe we're at the stage of raising the issue of whether the significant
> extra storage for time of day warrants a storage format that is optimized
> for day only, call it TrieDay (or TrieDateTimeless.)
>
>
> -- Jack Krupansky
>
> -----Original Message----- From: jmlucjav
> Sent: Saturday, December 15, 2012 6:04 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: optimun precisionStep for DAY granularity in a TrieDateField
>
> without going through such rigorous testing, maybe for my case (interested
> only in DAY), I could just index the trielong values such as 20121010,
> 20110101 etc...
>
> This would take less space than trieDate (I guess), and I still have a date
> looking number (for easier handling). I could even base the days on
> 2000/01/01 and just index a single int (1..365, 366,...), but I don't think
> it's worth for now, I prefer to keep an easier to understand number.
>
> thanks
>
>
>
> --
> View this message in context: http://lucene.472066.n3.**
> nabble.com/optimun-**precisionStep-for-DAY-**granularity-in-a-**
> TrieDateField-**tp4027078p4027193.html<http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078p4027193.html>
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by Jack Krupansky <ja...@basetechnology.com>.
Maybe we're at the stage of raising the issue of whether the significant 
extra storage for time of day warrants a storage format that is optimized 
for day only, call it TrieDay (or TrieDateTimeless.)

-- Jack Krupansky

-----Original Message----- 
From: jmlucjav
Sent: Saturday, December 15, 2012 6:04 AM
To: solr-user@lucene.apache.org
Subject: Re: optimun precisionStep for DAY granularity in a TrieDateField

without going through such rigorous testing, maybe for my case (interested
only in DAY), I could just index the trielong values such as 20121010,
20110101 etc...

This would take less space than trieDate (I guess), and I still have a date
looking number (for easier handling). I could even base the days on
2000/01/01 and just index a single int (1..365, 366,...), but I don't think
it's worth for now, I prefer to keep an easier to understand number.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078p4027193.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by jmlucjav <jm...@gmail.com>.
without going through such rigorous testing, maybe for my case (interested
only in DAY), I could just index the trielong values such as 20121010,
20110101 etc... 

This would take less space than trieDate (I guess), and I still have a date
looking number (for easier handling). I could even base the days on
2000/01/01 and just index a single int (1..365, 366,...), but I don't think
it's worth for now, I prefer to keep an easier to understand number.

thanks



--
View this message in context: http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078p4027193.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by Shawn Heisey <so...@elyograg.org>.
On 12/14/2012 4:15 PM, Jack Krupansky wrote:
> And the "official" answer when I posed the question on the Lucene User 
> list is that the time of day bits would still be stored in the index 
> in spite of the precisionStep. So, it doesn't really matter very much 
> at all what precisionStep you use for trie date fields..

As I understand it, the precisionStep parameter is all about indexing 
additional data (taking more memory and disk space) in order to give a 
major speed boost to range queries and date faceting.

More info:
http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/search/NumericRangeQuery.html#precisionStepDesc

It will not result in any rounding of the actual dates contained in your 
index.  Unless you have a special component or DIH transformer, you'd 
have to do any rounding before the data made it to Solr's update handler.

The default precisionStep value for the tdate fieldType in the example 
schema is 6.  TrieDateField is a 64-bit data type. For contrast, the 
default precisionStep for tlong, another 64-bit value, is 8.

My guess is that unless you expect to have a very low number of unique 
values in your tdate field, that you'll want to just leave it at the 
default of 6.  Usually a date field, especially one accurate to 
millseconds like TrieDateField, ends up with a lot of unique values.  If 
your data source is only accurate to one second, it might be a good idea 
to bump the precisionStep a little, perhaps 8-10.  If the data source is 
less accurate than that, you can make it even higher.  Finding the 
optimum value would require rigorous performance testing.

Thanks,
Shawn


Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by Jack Krupansky <ja...@basetechnology.com>.
And the "official" answer when I posed the question on the Lucene User list 
is that the time of day bits would still be stored in the index in spite of 
the precisionStep. So, it doesn't really matter very much at all what 
precisionStep you use for trie date fields..

-- Jack Krupansky

-----Original Message----- 
From: Jack Krupansky
Sent: Friday, December 14, 2012 5:48 PM
To: solr-user@lucene.apache.org
Subject: Re: optimun precisionStep for DAY granularity in a TrieDateField

I've tried to figure this out and haven't fully resolved it. I mean, sure,
you can set the precisionStep to 26, which may ignore the milliseconds per
day, but supposedly it makes it much slower to lookup and may not actually
throw away those 26 bits.

-- Jack Krupansky

-----Original Message----- 
From: jmlucjav
Sent: Friday, December 14, 2012 1:16 PM
To: solr-user@lucene.apache.org
Subject: optimun precisionStep for DAY granularity in a TrieDateField

Hi

I have a TrieDateField in my index, where I will index dates (range
2000-2020). I am only interested in the DAY granularity, that is , I dont
care about time (I'll index all based on the same Timezone).

Is there an optimun value for precisionStep that I can use so I don't index
info I will not ever use?? I have looked but have not found some info on
what values of precisionStep map to year/month/../day/hour... (not sure if
the mapping is straightforward anyway).

thanks for the help.





--
View this message in context:
http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078.html
Sent from the Solr - User mailing list archive at Nabble.com. 


Re: optimun precisionStep for DAY granularity in a TrieDateField

Posted by Jack Krupansky <ja...@basetechnology.com>.
I've tried to figure this out and haven't fully resolved it. I mean, sure, 
you can set the precisionStep to 26, which may ignore the milliseconds per 
day, but supposedly it makes it much slower to lookup and may not actually 
throw away those 26 bits.

-- Jack Krupansky

-----Original Message----- 
From: jmlucjav
Sent: Friday, December 14, 2012 1:16 PM
To: solr-user@lucene.apache.org
Subject: optimun precisionStep for DAY granularity in a TrieDateField

Hi

I have a TrieDateField in my index, where I will index dates (range
2000-2020). I am only interested in the DAY granularity, that is , I dont
care about time (I'll index all based on the same Timezone).

Is there an optimun value for precisionStep that I can use so I don't index
info I will not ever use?? I have looked but have not found some info on
what values of precisionStep map to year/month/../day/hour... (not sure if
the mapping is straightforward anyway).

thanks for the help.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimun-precisionStep-for-DAY-granularity-in-a-TrieDateField-tp4027078.html
Sent from the Solr - User mailing list archive at Nabble.com.