You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2015/02/05 06:33:34 UTC

[jira] [Commented] (LUCENE-5735) Faceting for DateRangePrefixTree

    [ https://issues.apache.org/jira/browse/LUCENE-5735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14306666#comment-14306666 ] 

David Smiley commented on LUCENE-5735:
--------------------------------------

The PrefixTreeFacetCounter utility is good; if it doesn't get committed to 5x as part of this issue first, it will for the heatmap one.

There's a bug in NumberRangePrefixTreeStrategy.calcFacets in which all cells above the parent are counted as topLeaves, when really that can only be done if the leaf cell _contains_ the facet range.  I have a fix in-progress in which I detect this and if the cell doesn't contain the facet range then I walk the sub-cells and increment the counters on the parent facet cells.  _There's a rare-ish bug I need to debug still._  But thus far there are a few changes pending in my local check-out:
* Make TreeCellIterator public (lucene.internal, still) and allow the 'cell' to be a cell other than the top world cell.  Probably add a reset() constructor-like method to re-use an instance.
* NRCell has an optimization when getting subCells that seems to work fine in the normal code-paths thus far but the updated faceting code in-progress has shown the optimization to be faulty, so I just removed it as I don't think it was worth trying to make it work.
* NRCell sometimes can't get subCells if it was initialized from a short length shape/bytes; it should instead always initialize it's array to maxLevels.  Again; this apparently never happen in normal code paths but in some toy test code I triggered it.
* Refactor the two main date range tests to share a random calendar utility (RandomCalHelper).

> Faceting for DateRangePrefixTree
> --------------------------------
>
>                 Key: LUCENE-5735
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5735
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.x
>
>         Attachments: LUCENE-5735.patch, LUCENE-5735.patch, LUCENE-5735__PrefixTreeFacetCounter.patch
>
>
> The newly added DateRangePrefixTree (DRPT) encodes terms in a fashion amenable to faceting by meaningful time buckets. The motivation for this feature is to efficiently populate a calendar bar chart or [heat-map|http://bl.ocks.org/mbostock/4063318]. It's not hard if you have date instances like many do but it's challenging for date ranges.
> Internally this is going to iterate over the terms using seek/next with TermsEnum as appropriate.  It should be quite efficient; it won't need any special caches. I should be able to re-use SPT traversal code in AbstractVisitingPrefixTreeFilter.  If this goes especially well; the underlying implementation will be re-usable for geospatial heat-map faceting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org