You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Peter Sturge (JIRA)" <ji...@apache.org> on 2010/01/08 01:20:54 UTC

[jira] Created: (SOLR-1709) Distributed Date Faceting

Distributed Date Faceting
-------------------------

                 Key: SOLR-1709
                 URL: https://issues.apache.org/jira/browse/SOLR-1709
             Project: Solr
          Issue Type: Improvement
          Components: SearchComponents - other
    Affects Versions: 1.4
            Reporter: Peter Sturge
            Priority: Minor


This patch is for adding support for date facets when using distributed searches.

Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
There are several reasons for this:
  * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
  * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
        (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
    This could be dealt with if timezone and skew information was added, and the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.

The patch affects 2 files in the Solr core:
  org.apache.solr.handler.component.FacetComponent.java
  org.apache.solr.handler.component.ResponseBuilder.java

The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
Comments & suggestions welcome.

As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


RE: [jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by Peter S <pe...@hotmail.com>.
The time skew/TZ is really the 'other half' of what the patch would/should ultimately be.

Since the current patch only deals with dist responses, it will be perfectly happy to receive facet_dates that have been generated in sync with the requester.

 

I'm not really familiar with the distributed sending part of the code, but I would suspect that whatever component is delegated the task of fanning out shard requests would be a good candidate for 'owning' the marking of 'NOW' and adding the appropriate parameters to send to the shards (might this be the very same FacetComponent in distributedProcess()?).

 

Then there's the task of the remote shard digesting the new parameters and adjusting its dates accordingly. Presumably this would be handled by SimpleFacets?

 

For facet.date.start/facet.date.end, I guess if these are/can only be relative times (is it allowed to set an explicit start/end time?), then the remote shard can simply interpret NOW as the passed-in NOW, rather than its own NOW. Are there any options for facet.date.start/end that don't involve NOW at all?

 

Peter

 

 

 

> Date: Fri, 8 Jan 2010 20:35:54 +0000
> From: jira@apache.org
> To: solr-dev@lucene.apache.org
> Subject: [jira] Commented: (SOLR-1709) Distributed Date Faceting
> 
> 
> [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798170#action_12798170 ] 
> 
> Yonik Seeley commented on SOLR-1709:
> ------------------------------------
> 
> I haven't checked the patch, but it seems like we should take a generic approach to NOW...
> The first time NOW is used anywhere in the request (and is not passed in as a request argument), either a thread local or something in the request context should be set to the current time. Subsequent references to NOW would yield the first value set.
> This would allow NOW to be referenced more than once in the same request with consistent results.
> 
> Passing in "NOW" as a request parameter would simply set it explicitly... the question is, who (which solr component) should be responsible for that?
> 
> > Distributed Date Faceting
> > -------------------------
> >
> > Key: SOLR-1709
> > URL: https://issues.apache.org/jira/browse/SOLR-1709
> > Project: Solr
> > Issue Type: Improvement
> > Components: SearchComponents - other
> > Affects Versions: 1.4
> > Reporter: Peter Sturge
> > Priority: Minor
> > Attachments: FacetComponent.java, ResponseBuilder.java
> >
> >
> > This patch is for adding support for date facets when using distributed searches.
> > Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> > Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> > The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> > This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> > There are several reasons for this:
> > * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
> > * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
> > (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
> > This could be dealt with if timezone and skew information was added, and the dates were normalized.
> > One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> > The patch affects 2 files in the Solr core:
> > org.apache.solr.handler.component.FacetComponent.java
> > org.apache.solr.handler.component.ResponseBuilder.java
> > The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> > One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> > Comments & suggestions welcome.
> > As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).
> 
> -- 
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
> 

 		 	   		  
_________________________________________________________________
Do you have a story that started on Hotmail? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Hoss Man (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798163#action_12798163 ] 

Hoss Man commented on SOLR-1709:
--------------------------------

bq. Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting. This would avoid having to translate loads of time strings at merge time.

I was thinking the same thing ... as long as the "coordinator" evaluated any DateMath in the facet.date.start and facet.date.end params before executing the sub-requests to the shards, the ranges coming back from the individual shards should all be in sync.

> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1709) Distributed Date Faceting

Posted by "Thomas Hammerl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Hammerl updated SOLR-1709:
---------------------------------

    Attachment: solr-1.4.0-solr-1709.patch

Hi Peter!

Thanks for your advice! I have simply removed the introduced _termsHelper member variable in ResponseBuilder.java from the patch since it was not used anywhere. Now everything compiles fine with the code base from http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/. I have attached a patch file in unified diff format.

> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java, solr-1.4.0-solr-1709.patch
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1709) Distributed Date Faceting

Posted by "Peter Sturge (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Sturge updated SOLR-1709:
-------------------------------

    Attachment: ResponseBuilder.java
                FacetComponent.java

Sorry, guys, can't get svn to create a patch file correctly on windows, so I'm attaching the source files here. With some time, which at the moment I don't have, I'm sure I could get svn working. Rather than anyone have to wait for me to get the patch file created, I thought it best to get the source uploaded, so people can start using it.
Thanks, Peter


> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1709) Distributed Date Faceting

Posted by "Peter Sturge (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Peter Sturge updated SOLR-1709:
-------------------------------

    Attachment: FacetComponent.java

Updated version of FacetComponent.java after more testing and sync with FacetParams.FACET_DATE_NOW (see SOLR-1729).
For use with the 1.4 trunk (along with the existing ResponseBuilder.java in this patch).


> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Jason Rutherglen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797898#action_12797898 ] 

Jason Rutherglen commented on SOLR-1709:
----------------------------------------

Tim,

Thanks for the patch...

bq. as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

TortoiseSVN works well on Windows, even for creating patches.  Have you tried it?  



> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Peter Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798411#action_12798411 ] 

Peter Sturge commented on SOLR-1709:
------------------------------------

Yonik,

Yes, I can see what you mean that of course NOW will affect anything date-related to a given query.
I'm wondering whether the passing of 'NOW' to shards should be a separate issue/patch from this one (e.g. something like 'Time Sync to Remote Shards'), as its scope and ramifications go far beyond simply distributed date faceting.
The whole area of code relating to date math is one that I'm not familiar with, but do let me know if there's anything you'd like me to look at.


> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798243#action_12798243 ] 

Yonik Seeley commented on SOLR-1709:
------------------------------------

Seems useful enough that setting NOW should be advertised (i.e. not just an internal call).  For example, it would be a convenient way to keep the rest of your request the same, but check how the current date affects your date boosting strategies.  NOW isn't just for date faceting, but for anything that uses date math.

As for the format, 20091231 is ambiguous if you want flexible dates... is it a date or milliseconds?
I first thought of a prefix (ms:123456789) but it makes it look like a field query.
It might be safest to make it unambiguous somehow... postfix with ms?  123456789ms


> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Peter Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834222#action_12834222 ] 

Peter Sturge commented on SOLR-1709:
------------------------------------

Hi Thomas,

Hmmm...TermsHelper is an inner class inside TermsComponent.
In the code base that I have, this class exists within TermsComponent. I've just had a look on the http://mirrors.dedipower.com/ftp.apache.org/lucene/solr/1.4.0/ mirror, and the TermsComponent *doesn't* have this inner class.

Not sure where the difference is, as I would have got my codebase from the same set of mirrors as you (unless some mirrors are out-of-sync?). 

TermsComponent hasn't changed in this patch, so I don't know much about this class. One thing to try is to diff the 2 files above with your 1.4 codebase, and merge the changes into your codebase. The differences should be very easy to see.

This does highlight the very good policy for putting patch files as attachments rather than source files. This is my fault, as we don't use svn in our (win) environment, and Tortoise SVN crashes explorer64, so i'm not able to make compatible diff files - sorry.

If you do create a couple of diff files, it would be very kind of you if you could post it up on this issue for others?

Thanks!


> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798170#action_12798170 ] 

Yonik Seeley commented on SOLR-1709:
------------------------------------

I haven't checked the patch, but it seems like we should take a generic approach to NOW...
The first time NOW is used anywhere in the request (and is not passed in as a request argument), either a thread local or something in the request context should be set to the current time.  Subsequent references to NOW would yield the first value set.
This would allow NOW to be referenced more than once in the same request with consistent results.

Passing in "NOW" as a request parameter would simply set it explicitly... the question is, who (which solr component) should be responsible for that?

> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798203#action_12798203 ] 

Yonik Seeley commented on SOLR-1709:
------------------------------------

Date formatting and parsing also tend to be surprisingly expensive.
So *if* we support passing NOW as a date string, it would be nice to also support standard milliseconds.  That can also be easier for clients to generate rather than trying to figure out how to get the correct date format.  Perhaps that should even be an addition to the standard datemath syntax.

> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Peter Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798233#action_12798233 ] 

Peter Sturge commented on SOLR-1709:
------------------------------------

Definitely true! -- messing about with Date strings isn't great for performance.

As the NOW parameter would be for internal request use only (i.e. not for the indexer, not for human consumption), could it not just be an epoch long? The adjustment math should then be nice and quick (no string/date parsing/formatting; at worst just one Date.getTimeInMillis() call if the time is stored locally as a string).

> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Peter Sturge (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797957#action_12797957 ] 

Peter Sturge commented on SOLR-1709:
------------------------------------

I've heard of Tortoise, I'll give that a try, thanks.

On the time-zone/skew issue, perhaps a more efficient approach would be a 'push' rather than 'pull' - i.e.:

Requesters would include an optional parameter that told remote shards what time to use as 'NOW', and which TZ to use for date faceting.
This would avoid having to translate loads of time strings at merge time.

Thanks,
Peter


> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1709) Distributed Date Faceting

Posted by "Thomas Hammerl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833861#action_12833861 ] 

Thomas Hammerl commented on SOLR-1709:
--------------------------------------

Unfortunately, I am not able to apply this patch. I get the following compile error:

{noformat}
[javac] /home/systemone/Desktop/solr-1.4/src/java/org/apache/solr/handler/component/ResponseBuilder.java:138: cannot find symbol
[javac] symbol  : class TermsHelper
[javac] location: class org.apache.solr.handler.component.TermsComponent
[javac]   TermsComponent.TermsHelper _termsHelper;
[javac]
{noformat}

What I've done basically is download the attached sources and apply the following commands:

{noformat}
svn co http://svn.apache.org/repos/asf/lucene/solr/tags/release-1.4.0/ solr-1.4.0
cp FacetComponent.java solr-1.4.0/src/java/org/apache/solr/handler/component/
cp ResponseBuilder.java solr-1.4.0/src/java/org/apache/solr/handler/component/
cd solr-1.4.0
ant package
{noformat}

I also tried to apply the patch to the 1.4 branch at http://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/ resulting in the same compile error.

Any help would be very appreciated.

> Distributed Date Faceting
> -------------------------
>
>                 Key: SOLR-1709
>                 URL: https://issues.apache.org/jira/browse/SOLR-1709
>             Project: Solr
>          Issue Type: Improvement
>          Components: SearchComponents - other
>    Affects Versions: 1.4
>            Reporter: Peter Sturge
>            Priority: Minor
>         Attachments: FacetComponent.java, FacetComponent.java, ResponseBuilder.java
>
>
> This patch is for adding support for date facets when using distributed searches.
> Date faceting across multiple machines exposes some time-based issues that anyone interested in this behaviour should be aware of:
> Any time and/or time-zone differences are not accounted for in the patch (i.e. merged date facets are at a time-of-day, not necessarily at a universal 'instant-in-time', unless all shards are time-synced to the exact same time).
> The implementation uses the first encountered shard's facet_dates as the basis for subsequent shards' data to be merged in.
> This means that if subsequent shards' facet_dates are skewed in relation to the first by >1 'gap', these 'earlier' or 'later' facets will not be merged in.
> There are several reasons for this:
>   * Performance: It's faster to check facet_date lists against a single map's data, rather than against each other, particularly if there are many shards
>   * If 'earlier' and/or 'later' facet_dates are added in, this will make the time range larger than that which was requested
>         (e.g. a request for one hour's worth of facets could bring back 2, 3 or more hours of data)
>     This could be dealt with if timezone and skew information was added, and the dates were normalized.
> One possibility for adding such support is to [optionally] add 'timezone' and 'now' parameters to the 'facet_dates' map. This would tell requesters what time and TZ the remote server thinks it is, and so multiple shards' time data can be normalized.
> The patch affects 2 files in the Solr core:
>   org.apache.solr.handler.component.FacetComponent.java
>   org.apache.solr.handler.component.ResponseBuilder.java
> The main changes are in FacetComponent - ResponseBuilder is just to hold the completed SimpleOrderedMap until the finishStage.
> One possible enhancement is to perhaps make this an optional parameter, but really, if facet.date parameters are specified, it is assumed they are desired.
> Comments & suggestions welcome.
> As a favour to ask, if anyone could take my 2 source files and create a PATCH file from it, it would be greatly appreciated, as I'm having a bit of trouble with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.