You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Hoss Man (JIRA)" <ji...@apache.org> on 2014/05/06 02:15:26 UTC

[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

     [ https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hoss Man updated SOLR-2894:
---------------------------

    Attachment: SOLR-2894.patch

I'm going to spend some time this week reviewing the state of things.

First up, some minor tweaks to the latest patch...

* fixed a typo in TestDistributedSearch ("facet.fiedl" -> "facet.field")
** this is my biggest anoyance about most of our existing distributed serach tests -- they just assert that queries return the same thing as single node tests, but don't assert anything about the response, so mistakes in the input, or mistakes in indexing hte docs, resulting in a useless test aren't caught)
** this also relates to marks comment about removing "ndate" since that field no longer exists in the test configs - using tdate_a & tdate_b here should be fine
* removed the "nocommit" mark mentioned regarding DateField - that method moved to TrieDateField so his fix is correct.


Some comments/questions based on what I've reviewed so far (note: many of these comments/questions come from a place of genuine ignorance since i've only reviewed about 30% of the patch so far)...

* even at a glance, it's obvious the SimpleFacets changes are a simple refactoring and totally fine.
* In FacetComponent - Setting asside the core pivot facet changes...
** Most of the other changes in seem like straight forward (and much needed!) variable renaming (+1) to help eliminate ambiguity between the existing field faceting refinement and the new pivot faceting refinement
** the new "fieldsToOverRequestOn" Map confuses me in a few ways...
*** As is, i don't understand why this is a Map and not a Set.
*** Some odd conditional logic is used when iterating over this "Set" to determine the overrequest limit - i'm still trying to wrap my head arround this but in particular the comment {{// dff has the info, the params have been scrubbed}} confuses me -- where are these params "scrubbed" ?
*** I like these new explict overrequest count/ratio params, and i get that the end-game here is that they can be used to affect the amount of overrequest done for both fact.field and facet.pivot -- but i'm not understanding the value of building up this "fieldsToOverRequestOn" set of names (for every shard request) and then iterating over it and consulting *either* the DFF or the params to decide which limit value to use on the shard requests, and then (conditionally?) removing the limit/offset/overrequest params from the shard requests.  Wouldn't it be simplier to have modifyRequest *always* remove the limit/offset/overrequest params from the shard params, and then have the individual code paths (for both facet.field & facet.pivot) take responsibility for adding back the new limit params based on the overrequest calculations using the *original* request params (ie: {{rb.req.getParams()}}).
*** My chief concern here being that (at first glance) this change seems like it adds a small amount of overhead to the overrequest limit calculations, and makes this bit of code more confusing, w/o any obvious (to me) advantage.
* I don't yet understand the need for the new "PURPOSE_REFINE_PIVOT_FACETS" stage of shard requests? ... can someone clarify why  pivot facets can't just be refined during the existing "PURPOSE_REFINE_FACETS" stage?
* I notice that the new DistributedFacetPivotTest directly extends BaseDistributedSearchTestCase and uses a fixed shard count, and indexes some docs directly to certain clients
** is there something about the functionality (or about the test) that requires certain data locality (ie: certain docs on same shard) to work?
** if not: is there any other reason we can't switch this over to a Cloud based test with a variable numbers of shards and compairons against the control collection?



> Implement distributed pivot faceting
> ------------------------------------
>
>                 Key: SOLR-2894
>                 URL: https://issues.apache.org/jira/browse/SOLR-2894
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Erik Hatcher
>             Fix For: 4.9, 5.0
>
>         Attachments: SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, dateToObject.patch
>
>
> Following up on SOLR-792, pivot faceting currently only supports undistributed mode.  Distributed pivot faceting needs to be implemented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org