You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "David Smiley (JIRA)" <ji...@apache.org> on 2015/03/10 15:37:39 UTC

[jira] [Commented] (LUCENE-6191) Spatial 2D faceting (heatmaps)

    [ https://issues.apache.org/jira/browse/LUCENE-6191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14354971#comment-14354971 ] 

David Smiley commented on LUCENE-6191:
--------------------------------------

FYI I'm adding an advisory to the javadocs to PrefixTreeFacetCounter that double-counting can occur in certain avoidable situations:
{code:java}
 * <em>NOTE:</em> If for a given document and a given field using 
 * {@link org.apache.lucene.spatial.prefix.RecursivePrefixTreeStrategy}
 * multiple values are indexed (i.e. multi-valued) and at least one of them is a non-point, then there is a possibility
 * of double-counting the document in the facet results.  Since each shape is independently turned into grid cells at
 * a resolution chosen by the shape's size, it's possible they will be indexed at different resolutions.  This means
 * the document could be present in BOTH the postings for a cell in both its prefix and leaf variants.  To avoid this,
 * use a single valued field with a {@link com.spatial4j.core.shape.ShapeCollection} (or WKT equivalent).  Or
 * calculate a suitable level/distErr to index both and call
 * {@link org.apache.lucene.spatial.prefix.PrefixTreeStrategy#createIndexableFields(com.spatial4j.core.shape.Shape, int)}
 * with the same value for all shapes for a given document/field.
{code}

> Spatial 2D faceting (heatmaps)
> ------------------------------
>
>                 Key: LUCENE-6191
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6191
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.1
>
>         Attachments: LUCENE-6191__Spatial_heatmap.patch, LUCENE-6191__Spatial_heatmap.patch, LUCENE-6191__Spatial_heatmap.patch
>
>
> Lucene spatial's PrefixTree (grid) based strategies index data in a way highly amenable to faceting on grids cells to compute a so-called _heatmap_. The underlying code in this patch uses the PrefixTreeFacetCounter utility class which was recently refactored out of faceting for NumberRangePrefixTree LUCENE-5735.  At a low level, the terms (== grid cells) are navigated per-segment, forward only with TermsEnum.seek, so it's pretty quick and furthermore requires no extra caches & no docvalues.  Ideally you should use QuadPrefixTree (or Flex once it comes out) to maximize the number grid levels which in turn maximizes the fidelity of choices when you ask for a grid covering a region.  Conveniently, the provided capability returns the data in a 2-D grid of counts, so the caller needn't know a thing about how the data is encoded in the prefix tree.  Well almost... at this point they need to provide a grid level, but I'll soon provide a means of deriving the grid level based on a min/max cell count.
> I recommend QuadPrefixTree with geo=false so that you can provide a square world-bounds (360x360 degrees), which means square grid cells which are more desirable to display than rectangular cells.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org