You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by "THORMAN, ROBERT D" <rt...@att.com> on 2014/07/24 16:10:47 UTC

Z-Curve/Hilbert Curve

Can anyone share a Java method to convert lat/lon (decimal degrees) to Z-Curve (string)?  I need to order my geo-spatial data into lexical order.

v/r
Bob Thorman
Principal Big Data Engineer
AT&T Big Data CoE
2900 W. Plano Parkway
Plano, TX 75075
972-658-1714



Re: Z-Curve/Hilbert Curve

Posted by Russ Weeks <rw...@newbrightidea.com>.
Does it have to be a Z-curve? The excellent Accumulo recipes project has a
very easy-to-use geospatial index based on a quadtree.

https://github.com/calrissian/accumulo-recipes/tree/master/store/geospatial-store/src/main/java/org/calrissian/accumulorecipes/geospatialstore/support

-Russ


On Thu, Jul 24, 2014 at 10:17 AM, William Slacum <
wilhelm.von.cloud@accumulo.net> wrote:

> Quick google search yielded:
>
>
> https://github.com/GeoLatte/geolatte-geom/blob/master/src/main/java/org/geolatte/geom/curve/MortonCode.java
>
>
> On Thu, Jul 24, 2014 at 10:10 AM, THORMAN, ROBERT D <rt...@att.com>
> wrote:
>
>>  Can anyone share a Java method to convert lat/lon (decimal degrees) to
>> Z-Curve (string)?  I need to order my geo-spatial data into lexical order.
>>
>>  v/r
>> Bob Thorman
>> Principal Big Data Engineer
>> AT&T Big Data CoE
>> 2900 W. Plano Parkway
>> Plano, TX 75075
>> 972-658-1714
>>
>>
>>
>

Re: Z-Curve/Hilbert Curve

Posted by William Slacum <wi...@accumulo.net>.
Quick google search yielded:

https://github.com/GeoLatte/geolatte-geom/blob/master/src/main/java/org/geolatte/geom/curve/MortonCode.java


On Thu, Jul 24, 2014 at 10:10 AM, THORMAN, ROBERT D <rt...@att.com> wrote:

>  Can anyone share a Java method to convert lat/lon (decimal degrees) to
> Z-Curve (string)?  I need to order my geo-spatial data into lexical order.
>
>  v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>
>

Re: Z-Curve/Hilbert Curve

Posted by Chris Bennight <ch...@slowcar.net>.
So I put together a quick mockup @
https://github.com/chrisbennight/wgs84lexicoder of what this might work
like.  It's functional, but I haven't done more than a cursory test, so
caveat emptor.

On the performance side, a full decomposition (no slop) (on a hilbert
curve) crossed the 1 second mark at between 18 and 19 bits precision.  This
is on an i7-920.   At 18 bits an average error (where error is the
deviation after an encode/decode cycle due to quantization effects)
measurement for longitude was at 0.0006639 degrees, or roughly 73meters at
the equator.  This should be the worst error you will see latitude was on
average ~ 1/2 the error by degrees (since it had a smaller range for the
same number of bits - -90 to +90 vs. -180 to +180).

Anyway, this was just a simplistic stab to ground the discussion.  The
performance of the full decomposition was worse than I expected, so that
probably places some constraints on what defaults should be.  Here's the
output of the test I wrote:


----------------------------------------------------
2 bits took 0.030 seconds for  1 ranges
Actual latitude error:         30.7720588
Worst case avg longitude error: 60.3973510
----------------------------------------------------
3 bits took 0.003 seconds for  3 ranges
Actual latitude error:         12.8098739
Worst case avg longitude error: 25.8845790
----------------------------------------------------
4 bits took 0.002 seconds for  6 ranges
Actual latitude error:         5.3602941
Worst case avg longitude error: 10.8079470
----------------------------------------------------
5 bits took 0.003 seconds for  13 ranges
Actual latitude error:         2.8071632
Worst case avg longitude error: 5.6910916
----------------------------------------------------
6 bits took 0.012 seconds for  46 ranges
Actual latitude error:         1.3392857
Worst case avg longitude error: 2.7625355
----------------------------------------------------
7 bits took 0.025 seconds for  73 ranges
Actual latitude error:         0.6852131
Worst case avg longitude error: 1.3516191
----------------------------------------------------
8 bits took 0.074 seconds for  313 ranges
Actual latitude error:         0.3153114
Worst case avg longitude error: 0.6264122
----------------------------------------------------
9 bits took 0.037 seconds for  419 ranges
Actual latitude error:         0.1702976
Worst case avg longitude error: 0.3452521
----------------------------------------------------
10 bits took 0.040 seconds for  910 ranges
Actual latitude error:         0.0850656
Worst case avg longitude error: 0.1724573
----------------------------------------------------
11 bits took 0.040 seconds for  1906 ranges
Actual latitude error:         0.0438051
Worst case avg longitude error: 0.0861865
----------------------------------------------------
12 bits took 0.035 seconds for  6335 ranges
Actual latitude error:         0.0144635
Worst case avg longitude error: 0.0343498
----------------------------------------------------
13 bits took 0.034 seconds for  7957 ranges
Actual latitude error:         0.0090083
Worst case avg longitude error: 0.0215387
----------------------------------------------------
14 bits took 0.070 seconds for  18518 ranges
Actual latitude error:         0.0056349
Worst case avg longitude error: 0.0117874
----------------------------------------------------
15 bits took 0.132 seconds for  33670 ranges
Actual latitude error:         0.0027366
Worst case avg longitude error: 0.0055297
----------------------------------------------------
16 bits took 0.333 seconds for  83506 ranges
Actual latitude error:         0.0012269
Worst case avg longitude error: 0.0024738
----------------------------------------------------
17 bits took 0.651 seconds for  167546 ranges
Actual latitude error:         0.0006639
Worst case avg longitude error: 0.0013460
----------------------------------------------------
18 bits took 0.773 seconds for  198597 ranges
Actual latitude error:         0.0003219
Worst case avg longitude error: 0.0006639
----------------------------------------------------
19 bits took 1.329 seconds for  335470 ranges
Actual latitude error:         0.0001660
Worst case avg longitude error: 0.0003274
----------------------------------------------------
20 bits took 5.268 seconds for  1049209 ranges
Actual latitude error:         0.0000767
Worst case avg longitude error: 0.0001523
----------------------------------------------------
21 bits took 6.735 seconds for  1433559 ranges
Actual latitude error:         0.0000415
Worst case avg longitude error: 0.0000841
----------------------------------------------------
22 bits took 29.073 seconds for  5371949 ranges
Actual latitude error:         0.0000207
Worst case avg longitude error: 0.0000421
----------------------------------------------------
23 bits took 44.222 seconds for  7681223 ranges
Actual latitude error:         0.0000107
Worst case avg longitude error: 0.0000210
----------------------------------------------------
24 bits took 72.172 seconds for  11490434 ranges
Actual latitude error:         0.0000035
Worst case avg longitude error: 0.0000084
----------------------------------------------------



On Mon, Jul 28, 2014 at 7:59 PM, Chris Bennight <ch...@slowcar.net> wrote:

> Anthony -
> So that sounds reasonable - it's sounding like the consensus is a basic
> WGS84 (lat/long) type (point only), range query (lower left, upper right),
> and a decompose fully method  (i.e. only intersecting ranges).  (caveat
> here that we probably want to cap the precision here, as a full
> decomposition of, say, a 31/31 bit z-curve might take a long time).
>
> What's the performance of your query like for a 24bit/24bit (x,y) z-curve
> on a full decomposition over a significant portion of the range?  I'm
> thinking in the interest of keeping it simple/easy it might be best just to
> fix the precision and document it.  24 bits at the equator would still give
> us a resolution of ~2m (should be worst case) which should be good enough
> for most things. (basically 0.00002 would be the max granularity for
> longitude).
>
> If it's fast enough I would just cap at 64 (well, maybe 63 for signed
> issues) total bits to to keep the key as a long = 31bits/31bits gives a
> longitudinal granularity of ~0.00000017 - or 1.9cm at the equator.  I think
> a full decomposition on a 31/31bit grid might be expensive.
>
> (the values above are assuming we normalization of the input value - which
> I think is a given for this implementation?)
>
> If we can fix the precision then that takes out most of the variables and
> it's pretty straightforward - I think decomp performance is the main
> limiting factor.
>
> Jared -
> Yep, exactly that - which would be unintuitive behaviour (Getting back
> ranges that didn't really match).  That said, an inclusive only range set
> can be generated; you can model it as a quadtree decomposition and proceed
> recursively if your current region ("quad") isn't fully covered by your
> selection window - otherwise return the range of your current "quad".
>  Which, I think, is essentially what Anthony described in his
> implementation (though with stack collection instead of recursive; probably
> for the best there).
>
>
> Corey -
> So you think the lexicoder would make sense/be reasonable if there is a
> performant  range query tool to go with it (one that resulted in 0 false
> positives)?
>
>
>
> On Mon, Jul 28, 2014 at 5:10 PM, Corey Nolet <cj...@gmail.com> wrote:
>
>> On Calrissian, we had once considered making a lexicoder for lat/long
>> that really transformed the two-dimensions (lat/lon) down into a geohash
>> based on the z-curve.
>>
>> The reason we decided against a first class data-type for this is exactly
>> the same reason that Anthony brings up in his previous comment- the geohash
>> only makes sense as a query tool when you have a good way to structure (at
>> least as many as possible) the non-sequential ranges you would need in
>> order to find all the overlapping regions (with minimal as possible amount
>> of false-positives).
>>
>>
>> On Mon, Jul 28, 2014 at 2:00 PM, Jared Winick <ja...@gmail.com>
>> wrote:
>>
>>> As several people have commented, a single range for a query can produce
>>> a lot of false positives that need to be filtered out. I had made this
>>> visualization a while back that lets you interactively (click-drag a
>>> bounding box) see this behavior.
>>>
>>> http://bl.ocks.org/jaredwinick/5073432
>>>
>>>
>>>
>>>
>>> On Sun, Jul 27, 2014 at 2:15 PM, Anthony Fox <an...@ccri.com>
>>> wrote:
>>>
>>>> My first thought was just something simple for a first pass - lat/lon
>>>> -> a single lexicographic dimension -  as it would cover most basic use
>>>> cases.  Precision (number of bits in encoding) could be an arg or a config
>>>> variable.  For WITHIN/INTERSECTS topological predicates, we need to
>>>> decompose the query geometry into the (possibly/probably non-contiguous) 1D
>>>> ranges that cover the region in question.  GeoMesa has an algorithm to
>>>> quickly perform this decomposition that computes the minimum number of
>>>> geohash prefixes at different resolutions to fully cover the query
>>>> polygon.  Basically, it recursively traverses through levels of geohash
>>>> resolutions, prioritizing rectangles that intersect the region and
>>>> discarding non-intersecting rectangles at the lowest precisions, to produce
>>>> a sequence of ranges to scan over.  Fully contained rectangles are
>>>> discovered at their lowest resolution at which point the algorithm pops the
>>>> stack and searches the next prioritized prefix.  I think something like
>>>> this would definitely need to be ported over and included in a lexicoder
>>>> implementation to make it useful.  Also, rather than materialize the entire
>>>> set of ranges in memory, we can either return a lazy iterator of prefixes
>>>> that can be fed into a scanner in batches or we can have a short-circuit
>>>> config that tunes the amount of slop that's tolerable and cuts off the
>>>> traversal at a certain level of precision.  GeoMesa uses something like the
>>>> former on attribute indexes to coordinate parallel scanners on separate
>>>> index and record tables.
>>>>
>>>> Thoughts?  I'm inclined to keep the implementation to the bare minimum
>>>> necessary for the basic use cases (lat/lon and bbox queries) though I do
>>>> think a general dimensionality reducing lexicoder would be very useful.
>>>>
>>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <ch...@slowcar.net>
>>>> wrote:
>>>>
>>>>> A couple of related issues come up when considering implementing a
>>>>> dimensionality reducing encoding -- just want to toss those out to see what
>>>>> people think the interface might look like.
>>>>>
>>>>> There's a couple of aspects that could be brought in here, but lets
>>>>> keep it simple and considering the original question: (lat/lon) -> number.
>>>>>
>>>>> --Desired precision of the binning process
>>>>> The more bits we add to the z-curve, the more precise our comparison -
>>>>> i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key.
>>>>>
>>>>> Would you see a reasonable default getting picked, make this user
>>>>> configurable, or both? (i.e. default to a value, extended options with a
>>>>> new constructor?)
>>>>>
>>>>> --Semantics for turning two lat/long pairs into a range
>>>>> I'm extrapolating here, but the only reason I see that locality
>>>>> matters is if we want to preserve locality for range searches.  The
>>>>> internal implementation of the encoding/lexicoding process is going to
>>>>> directly impact the implementation of the range query.
>>>>> Now sure, someone could encode the lower left point, encode the upper
>>>>> right point, and construct a range out of that to pass for a scan, but
>>>>> that's going to be wildly inefficient in most cases. See:
>>>>> https://dl.dropboxusercontent.com/u/6649380/bbox.png
>>>>> If we just lexicode the lower left and upper right we traverse across
>>>>> the entire curve - hitting lots of areas that aren't actually in the
>>>>> original range.
>>>>> Now we can turn a single 2D range into a set of 1D ranges.   There is
>>>>> some potential tuning here now, as the algorithm has a tradeoff on time to
>>>>> compute the ranges (and number of ranges) vs.  "slop"  (or inclusion of
>>>>> ranges which aren't actually in the original query).
>>>>> Would you see a static method perhaps on the z-curve lexicoder that
>>>>> returns a series of ranges based on an input window?  Some other mechanism?
>>>>> And in the case of "slop" - would we just document that the ranges
>>>>> could actually include values not expected - or would we always fully
>>>>> decompose?
>>>>>
>>>>> ------------------
>>>>>
>>>>> The other, "making it more complicated" questions would probably
>>>>> resolve around generalizing this to a multi-dimensional lexicoder.   I
>>>>> would expect the WGS84 (lat/long) encoder would just be a 2D instance with
>>>>> -180/+180 -90/+90 bounds.   There are probably completely different cases
>>>>> where it would be useful to have a locality sensitive hash.
>>>>> But with the big caveat that this requires more methods - I now need a
>>>>> way of defining a range for each of the dimensions (which before were
>>>>> defined as lat/lon), and I need an interface/class to pass those dimensions
>>>>> to the encoder - and should be able to use those same definitions to decode
>>>>> my values on the way out.
>>>>>
>>>>> I'm not trying to make a mountain out of a molehill - one could
>>>>> definitely pick some sane defaults, document behaviour, and put in a WGS84
>>>>> specific implementation.  I'm just thinking what's the minimum viable bit
>>>>> that's needed for this to be useful - and I suspect it's also the range
>>>>> decomposition piece - as I suspect *everyone* would really need that (if
>>>>> they cared about locality in the first place).
>>>>>
>>>>> My main question here would be to what extent would you see this
>>>>> going?  A one off for WGS84, or a more generic multi-dimensional lexicoder;
>>>>>  and in either case, what would the thoughts/consensus be for
>>>>> implementation of the additional methods (and possibly classes) required?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <bu...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <af...@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE
>>>>>>> (which is waiting on approval by the Eclipse Foundation's IP review process
>>>>>>> and should happen in a month or so).  I think we could break out a z-curve
>>>>>>> lexicoder and contribute before then though.  I'll poke around at the
>>>>>>> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> That sounds great Anthony. Let us know if we can help with anything!
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Z-Curve/Hilbert Curve

Posted by Chris Bennight <ch...@slowcar.net>.
Anthony -
So that sounds reasonable - it's sounding like the consensus is a basic
WGS84 (lat/long) type (point only), range query (lower left, upper right),
and a decompose fully method  (i.e. only intersecting ranges).  (caveat
here that we probably want to cap the precision here, as a full
decomposition of, say, a 31/31 bit z-curve might take a long time).

What's the performance of your query like for a 24bit/24bit (x,y) z-curve
on a full decomposition over a significant portion of the range?  I'm
thinking in the interest of keeping it simple/easy it might be best just to
fix the precision and document it.  24 bits at the equator would still give
us a resolution of ~2m (should be worst case) which should be good enough
for most things. (basically 0.00002 would be the max granularity for
longitude).

If it's fast enough I would just cap at 64 (well, maybe 63 for signed
issues) total bits to to keep the key as a long = 31bits/31bits gives a
longitudinal granularity of ~0.00000017 - or 1.9cm at the equator.  I think
a full decomposition on a 31/31bit grid might be expensive.

(the values above are assuming we normalization of the input value - which
I think is a given for this implementation?)

If we can fix the precision then that takes out most of the variables and
it's pretty straightforward - I think decomp performance is the main
limiting factor.

Jared -
Yep, exactly that - which would be unintuitive behaviour (Getting back
ranges that didn't really match).  That said, an inclusive only range set
can be generated; you can model it as a quadtree decomposition and proceed
recursively if your current region ("quad") isn't fully covered by your
selection window - otherwise return the range of your current "quad".
 Which, I think, is essentially what Anthony described in his
implementation (though with stack collection instead of recursive; probably
for the best there).


Corey -
So you think the lexicoder would make sense/be reasonable if there is a
performant  range query tool to go with it (one that resulted in 0 false
positives)?



On Mon, Jul 28, 2014 at 5:10 PM, Corey Nolet <cj...@gmail.com> wrote:

> On Calrissian, we had once considered making a lexicoder for lat/long that
> really transformed the two-dimensions (lat/lon) down into a geohash based
> on the z-curve.
>
> The reason we decided against a first class data-type for this is exactly
> the same reason that Anthony brings up in his previous comment- the geohash
> only makes sense as a query tool when you have a good way to structure (at
> least as many as possible) the non-sequential ranges you would need in
> order to find all the overlapping regions (with minimal as possible amount
> of false-positives).
>
>
> On Mon, Jul 28, 2014 at 2:00 PM, Jared Winick <ja...@gmail.com>
> wrote:
>
>> As several people have commented, a single range for a query can produce
>> a lot of false positives that need to be filtered out. I had made this
>> visualization a while back that lets you interactively (click-drag a
>> bounding box) see this behavior.
>>
>> http://bl.ocks.org/jaredwinick/5073432
>>
>>
>>
>>
>> On Sun, Jul 27, 2014 at 2:15 PM, Anthony Fox <an...@ccri.com> wrote:
>>
>>> My first thought was just something simple for a first pass - lat/lon ->
>>> a single lexicographic dimension -  as it would cover most basic use
>>> cases.  Precision (number of bits in encoding) could be an arg or a config
>>> variable.  For WITHIN/INTERSECTS topological predicates, we need to
>>> decompose the query geometry into the (possibly/probably non-contiguous) 1D
>>> ranges that cover the region in question.  GeoMesa has an algorithm to
>>> quickly perform this decomposition that computes the minimum number of
>>> geohash prefixes at different resolutions to fully cover the query
>>> polygon.  Basically, it recursively traverses through levels of geohash
>>> resolutions, prioritizing rectangles that intersect the region and
>>> discarding non-intersecting rectangles at the lowest precisions, to produce
>>> a sequence of ranges to scan over.  Fully contained rectangles are
>>> discovered at their lowest resolution at which point the algorithm pops the
>>> stack and searches the next prioritized prefix.  I think something like
>>> this would definitely need to be ported over and included in a lexicoder
>>> implementation to make it useful.  Also, rather than materialize the entire
>>> set of ranges in memory, we can either return a lazy iterator of prefixes
>>> that can be fed into a scanner in batches or we can have a short-circuit
>>> config that tunes the amount of slop that's tolerable and cuts off the
>>> traversal at a certain level of precision.  GeoMesa uses something like the
>>> former on attribute indexes to coordinate parallel scanners on separate
>>> index and record tables.
>>>
>>> Thoughts?  I'm inclined to keep the implementation to the bare minimum
>>> necessary for the basic use cases (lat/lon and bbox queries) though I do
>>> think a general dimensionality reducing lexicoder would be very useful.
>>>
>>>
>>>
>>> On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <ch...@slowcar.net>
>>> wrote:
>>>
>>>> A couple of related issues come up when considering implementing a
>>>> dimensionality reducing encoding -- just want to toss those out to see what
>>>> people think the interface might look like.
>>>>
>>>> There's a couple of aspects that could be brought in here, but lets
>>>> keep it simple and considering the original question: (lat/lon) -> number.
>>>>
>>>> --Desired precision of the binning process
>>>> The more bits we add to the z-curve, the more precise our comparison -
>>>> i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key.
>>>>
>>>> Would you see a reasonable default getting picked, make this user
>>>> configurable, or both? (i.e. default to a value, extended options with a
>>>> new constructor?)
>>>>
>>>> --Semantics for turning two lat/long pairs into a range
>>>> I'm extrapolating here, but the only reason I see that locality matters
>>>> is if we want to preserve locality for range searches.  The internal
>>>> implementation of the encoding/lexicoding process is going to directly
>>>> impact the implementation of the range query.
>>>> Now sure, someone could encode the lower left point, encode the upper
>>>> right point, and construct a range out of that to pass for a scan, but
>>>> that's going to be wildly inefficient in most cases. See:
>>>> https://dl.dropboxusercontent.com/u/6649380/bbox.png
>>>> If we just lexicode the lower left and upper right we traverse across
>>>> the entire curve - hitting lots of areas that aren't actually in the
>>>> original range.
>>>> Now we can turn a single 2D range into a set of 1D ranges.   There is
>>>> some potential tuning here now, as the algorithm has a tradeoff on time to
>>>> compute the ranges (and number of ranges) vs.  "slop"  (or inclusion of
>>>> ranges which aren't actually in the original query).
>>>> Would you see a static method perhaps on the z-curve lexicoder that
>>>> returns a series of ranges based on an input window?  Some other mechanism?
>>>> And in the case of "slop" - would we just document that the ranges
>>>> could actually include values not expected - or would we always fully
>>>> decompose?
>>>>
>>>> ------------------
>>>>
>>>> The other, "making it more complicated" questions would probably
>>>> resolve around generalizing this to a multi-dimensional lexicoder.   I
>>>> would expect the WGS84 (lat/long) encoder would just be a 2D instance with
>>>> -180/+180 -90/+90 bounds.   There are probably completely different cases
>>>> where it would be useful to have a locality sensitive hash.
>>>> But with the big caveat that this requires more methods - I now need a
>>>> way of defining a range for each of the dimensions (which before were
>>>> defined as lat/lon), and I need an interface/class to pass those dimensions
>>>> to the encoder - and should be able to use those same definitions to decode
>>>> my values on the way out.
>>>>
>>>> I'm not trying to make a mountain out of a molehill - one could
>>>> definitely pick some sane defaults, document behaviour, and put in a WGS84
>>>> specific implementation.  I'm just thinking what's the minimum viable bit
>>>> that's needed for this to be useful - and I suspect it's also the range
>>>> decomposition piece - as I suspect *everyone* would really need that (if
>>>> they cared about locality in the first place).
>>>>
>>>> My main question here would be to what extent would you see this going?
>>>>  A one off for WGS84, or a more generic multi-dimensional lexicoder;  and
>>>> in either case, what would the thoughts/consensus be for implementation of
>>>> the additional methods (and possibly classes) required?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <bu...@cloudera.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <af...@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE
>>>>>> (which is waiting on approval by the Eclipse Foundation's IP review process
>>>>>> and should happen in a month or so).  I think we could break out a z-curve
>>>>>> lexicoder and contribute before then though.  I'll poke around at the
>>>>>> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>>>>>>
>>>>>>
>>>>>
>>>>> That sounds great Anthony. Let us know if we can help with anything!
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Sean
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Z-Curve/Hilbert Curve

Posted by Corey Nolet <cj...@gmail.com>.
On Calrissian, we had once considered making a lexicoder for lat/long that
really transformed the two-dimensions (lat/lon) down into a geohash based
on the z-curve.

The reason we decided against a first class data-type for this is exactly
the same reason that Anthony brings up in his previous comment- the geohash
only makes sense as a query tool when you have a good way to structure (at
least as many as possible) the non-sequential ranges you would need in
order to find all the overlapping regions (with minimal as possible amount
of false-positives).


On Mon, Jul 28, 2014 at 2:00 PM, Jared Winick <ja...@gmail.com> wrote:

> As several people have commented, a single range for a query can produce a
> lot of false positives that need to be filtered out. I had made this
> visualization a while back that lets you interactively (click-drag a
> bounding box) see this behavior.
>
> http://bl.ocks.org/jaredwinick/5073432
>
>
>
>
> On Sun, Jul 27, 2014 at 2:15 PM, Anthony Fox <an...@ccri.com> wrote:
>
>> My first thought was just something simple for a first pass - lat/lon ->
>> a single lexicographic dimension -  as it would cover most basic use
>> cases.  Precision (number of bits in encoding) could be an arg or a config
>> variable.  For WITHIN/INTERSECTS topological predicates, we need to
>> decompose the query geometry into the (possibly/probably non-contiguous) 1D
>> ranges that cover the region in question.  GeoMesa has an algorithm to
>> quickly perform this decomposition that computes the minimum number of
>> geohash prefixes at different resolutions to fully cover the query
>> polygon.  Basically, it recursively traverses through levels of geohash
>> resolutions, prioritizing rectangles that intersect the region and
>> discarding non-intersecting rectangles at the lowest precisions, to produce
>> a sequence of ranges to scan over.  Fully contained rectangles are
>> discovered at their lowest resolution at which point the algorithm pops the
>> stack and searches the next prioritized prefix.  I think something like
>> this would definitely need to be ported over and included in a lexicoder
>> implementation to make it useful.  Also, rather than materialize the entire
>> set of ranges in memory, we can either return a lazy iterator of prefixes
>> that can be fed into a scanner in batches or we can have a short-circuit
>> config that tunes the amount of slop that's tolerable and cuts off the
>> traversal at a certain level of precision.  GeoMesa uses something like the
>> former on attribute indexes to coordinate parallel scanners on separate
>> index and record tables.
>>
>> Thoughts?  I'm inclined to keep the implementation to the bare minimum
>> necessary for the basic use cases (lat/lon and bbox queries) though I do
>> think a general dimensionality reducing lexicoder would be very useful.
>>
>>
>>
>> On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <ch...@slowcar.net>
>> wrote:
>>
>>> A couple of related issues come up when considering implementing a
>>> dimensionality reducing encoding -- just want to toss those out to see what
>>> people think the interface might look like.
>>>
>>> There's a couple of aspects that could be brought in here, but lets keep
>>> it simple and considering the original question: (lat/lon) -> number.
>>>
>>> --Desired precision of the binning process
>>> The more bits we add to the z-curve, the more precise our comparison -
>>> i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key.
>>>
>>> Would you see a reasonable default getting picked, make this user
>>> configurable, or both? (i.e. default to a value, extended options with a
>>> new constructor?)
>>>
>>> --Semantics for turning two lat/long pairs into a range
>>> I'm extrapolating here, but the only reason I see that locality matters
>>> is if we want to preserve locality for range searches.  The internal
>>> implementation of the encoding/lexicoding process is going to directly
>>> impact the implementation of the range query.
>>> Now sure, someone could encode the lower left point, encode the upper
>>> right point, and construct a range out of that to pass for a scan, but
>>> that's going to be wildly inefficient in most cases. See:
>>> https://dl.dropboxusercontent.com/u/6649380/bbox.png
>>> If we just lexicode the lower left and upper right we traverse across
>>> the entire curve - hitting lots of areas that aren't actually in the
>>> original range.
>>> Now we can turn a single 2D range into a set of 1D ranges.   There is
>>> some potential tuning here now, as the algorithm has a tradeoff on time to
>>> compute the ranges (and number of ranges) vs.  "slop"  (or inclusion of
>>> ranges which aren't actually in the original query).
>>> Would you see a static method perhaps on the z-curve lexicoder that
>>> returns a series of ranges based on an input window?  Some other mechanism?
>>> And in the case of "slop" - would we just document that the ranges could
>>> actually include values not expected - or would we always fully decompose?
>>>
>>> ------------------
>>>
>>> The other, "making it more complicated" questions would probably resolve
>>> around generalizing this to a multi-dimensional lexicoder.   I would expect
>>> the WGS84 (lat/long) encoder would just be a 2D instance with -180/+180
>>> -90/+90 bounds.   There are probably completely different cases where it
>>> would be useful to have a locality sensitive hash.
>>> But with the big caveat that this requires more methods - I now need a
>>> way of defining a range for each of the dimensions (which before were
>>> defined as lat/lon), and I need an interface/class to pass those dimensions
>>> to the encoder - and should be able to use those same definitions to decode
>>> my values on the way out.
>>>
>>> I'm not trying to make a mountain out of a molehill - one could
>>> definitely pick some sane defaults, document behaviour, and put in a WGS84
>>> specific implementation.  I'm just thinking what's the minimum viable bit
>>> that's needed for this to be useful - and I suspect it's also the range
>>> decomposition piece - as I suspect *everyone* would really need that (if
>>> they cared about locality in the first place).
>>>
>>> My main question here would be to what extent would you see this going?
>>>  A one off for WGS84, or a more generic multi-dimensional lexicoder;  and
>>> in either case, what would the thoughts/consensus be for implementation of
>>> the additional methods (and possibly classes) required?
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <bu...@cloudera.com>
>>> wrote:
>>>
>>>>
>>>> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <af...@gmail.com> wrote:
>>>>
>>>>>
>>>>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE
>>>>> (which is waiting on approval by the Eclipse Foundation's IP review process
>>>>> and should happen in a month or so).  I think we could break out a z-curve
>>>>> lexicoder and contribute before then though.  I'll poke around at the
>>>>> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>>>>>
>>>>>
>>>>
>>>> That sounds great Anthony. Let us know if we can help with anything!
>>>>
>>>>
>>>>
>>>> --
>>>> Sean
>>>>
>>>
>>>
>>
>

Re: Z-Curve/Hilbert Curve

Posted by Jared Winick <ja...@gmail.com>.
As several people have commented, a single range for a query can produce a
lot of false positives that need to be filtered out. I had made this
visualization a while back that lets you interactively (click-drag a
bounding box) see this behavior.

http://bl.ocks.org/jaredwinick/5073432




On Sun, Jul 27, 2014 at 2:15 PM, Anthony Fox <an...@ccri.com> wrote:

> My first thought was just something simple for a first pass - lat/lon -> a
> single lexicographic dimension -  as it would cover most basic use cases.
> Precision (number of bits in encoding) could be an arg or a config
> variable.  For WITHIN/INTERSECTS topological predicates, we need to
> decompose the query geometry into the (possibly/probably non-contiguous) 1D
> ranges that cover the region in question.  GeoMesa has an algorithm to
> quickly perform this decomposition that computes the minimum number of
> geohash prefixes at different resolutions to fully cover the query
> polygon.  Basically, it recursively traverses through levels of geohash
> resolutions, prioritizing rectangles that intersect the region and
> discarding non-intersecting rectangles at the lowest precisions, to produce
> a sequence of ranges to scan over.  Fully contained rectangles are
> discovered at their lowest resolution at which point the algorithm pops the
> stack and searches the next prioritized prefix.  I think something like
> this would definitely need to be ported over and included in a lexicoder
> implementation to make it useful.  Also, rather than materialize the entire
> set of ranges in memory, we can either return a lazy iterator of prefixes
> that can be fed into a scanner in batches or we can have a short-circuit
> config that tunes the amount of slop that's tolerable and cuts off the
> traversal at a certain level of precision.  GeoMesa uses something like the
> former on attribute indexes to coordinate parallel scanners on separate
> index and record tables.
>
> Thoughts?  I'm inclined to keep the implementation to the bare minimum
> necessary for the basic use cases (lat/lon and bbox queries) though I do
> think a general dimensionality reducing lexicoder would be very useful.
>
>
>
> On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <ch...@slowcar.net> wrote:
>
>> A couple of related issues come up when considering implementing a
>> dimensionality reducing encoding -- just want to toss those out to see what
>> people think the interface might look like.
>>
>> There's a couple of aspects that could be brought in here, but lets keep
>> it simple and considering the original question: (lat/lon) -> number.
>>
>> --Desired precision of the binning process
>> The more bits we add to the z-curve, the more precise our comparison -
>> i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key.
>>
>> Would you see a reasonable default getting picked, make this user
>> configurable, or both? (i.e. default to a value, extended options with a
>> new constructor?)
>>
>> --Semantics for turning two lat/long pairs into a range
>> I'm extrapolating here, but the only reason I see that locality matters
>> is if we want to preserve locality for range searches.  The internal
>> implementation of the encoding/lexicoding process is going to directly
>> impact the implementation of the range query.
>> Now sure, someone could encode the lower left point, encode the upper
>> right point, and construct a range out of that to pass for a scan, but
>> that's going to be wildly inefficient in most cases. See:
>> https://dl.dropboxusercontent.com/u/6649380/bbox.png
>> If we just lexicode the lower left and upper right we traverse across the
>> entire curve - hitting lots of areas that aren't actually in the original
>> range.
>> Now we can turn a single 2D range into a set of 1D ranges.   There is
>> some potential tuning here now, as the algorithm has a tradeoff on time to
>> compute the ranges (and number of ranges) vs.  "slop"  (or inclusion of
>> ranges which aren't actually in the original query).
>> Would you see a static method perhaps on the z-curve lexicoder that
>> returns a series of ranges based on an input window?  Some other mechanism?
>> And in the case of "slop" - would we just document that the ranges could
>> actually include values not expected - or would we always fully decompose?
>>
>> ------------------
>>
>> The other, "making it more complicated" questions would probably resolve
>> around generalizing this to a multi-dimensional lexicoder.   I would expect
>> the WGS84 (lat/long) encoder would just be a 2D instance with -180/+180
>> -90/+90 bounds.   There are probably completely different cases where it
>> would be useful to have a locality sensitive hash.
>> But with the big caveat that this requires more methods - I now need a
>> way of defining a range for each of the dimensions (which before were
>> defined as lat/lon), and I need an interface/class to pass those dimensions
>> to the encoder - and should be able to use those same definitions to decode
>> my values on the way out.
>>
>> I'm not trying to make a mountain out of a molehill - one could
>> definitely pick some sane defaults, document behaviour, and put in a WGS84
>> specific implementation.  I'm just thinking what's the minimum viable bit
>> that's needed for this to be useful - and I suspect it's also the range
>> decomposition piece - as I suspect *everyone* would really need that (if
>> they cared about locality in the first place).
>>
>> My main question here would be to what extent would you see this going?
>>  A one off for WGS84, or a more generic multi-dimensional lexicoder;  and
>> in either case, what would the thoughts/consensus be for implementation of
>> the additional methods (and possibly classes) required?
>>
>>
>>
>>
>>
>> On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <bu...@cloudera.com>
>> wrote:
>>
>>>
>>> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <af...@gmail.com> wrote:
>>>
>>>>
>>>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE
>>>> (which is waiting on approval by the Eclipse Foundation's IP review process
>>>> and should happen in a month or so).  I think we could break out a z-curve
>>>> lexicoder and contribute before then though.  I'll poke around at the
>>>> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>>>>
>>>>
>>>
>>> That sounds great Anthony. Let us know if we can help with anything!
>>>
>>>
>>>
>>> --
>>> Sean
>>>
>>
>>
>

Re: Z-Curve/Hilbert Curve

Posted by Anthony Fox <an...@ccri.com>.
My first thought was just something simple for a first pass - lat/lon -> a
single lexicographic dimension -  as it would cover most basic use cases.
Precision (number of bits in encoding) could be an arg or a config
variable.  For WITHIN/INTERSECTS topological predicates, we need to
decompose the query geometry into the (possibly/probably non-contiguous) 1D
ranges that cover the region in question.  GeoMesa has an algorithm to
quickly perform this decomposition that computes the minimum number of
geohash prefixes at different resolutions to fully cover the query
polygon.  Basically, it recursively traverses through levels of geohash
resolutions, prioritizing rectangles that intersect the region and
discarding non-intersecting rectangles at the lowest precisions, to produce
a sequence of ranges to scan over.  Fully contained rectangles are
discovered at their lowest resolution at which point the algorithm pops the
stack and searches the next prioritized prefix.  I think something like
this would definitely need to be ported over and included in a lexicoder
implementation to make it useful.  Also, rather than materialize the entire
set of ranges in memory, we can either return a lazy iterator of prefixes
that can be fed into a scanner in batches or we can have a short-circuit
config that tunes the amount of slop that's tolerable and cuts off the
traversal at a certain level of precision.  GeoMesa uses something like the
former on attribute indexes to coordinate parallel scanners on separate
index and record tables.

Thoughts?  I'm inclined to keep the implementation to the bare minimum
necessary for the basic use cases (lat/lon and bbox queries) though I do
think a general dimensionality reducing lexicoder would be very useful.


On Fri, Jul 25, 2014 at 6:51 PM, Chris Bennight <ch...@slowcar.net> wrote:

> A couple of related issues come up when considering implementing a
> dimensionality reducing encoding -- just want to toss those out to see what
> people think the interface might look like.
>
> There's a couple of aspects that could be brought in here, but lets keep
> it simple and considering the original question: (lat/lon) -> number.
>
> --Desired precision of the binning process
> The more bits we add to the z-curve, the more precise our comparison -
> i.e. a 63 bit key would have more "locations" to sort by than a 24 bit key.
>
> Would you see a reasonable default getting picked, make this user
> configurable, or both? (i.e. default to a value, extended options with a
> new constructor?)
>
> --Semantics for turning two lat/long pairs into a range
> I'm extrapolating here, but the only reason I see that locality matters is
> if we want to preserve locality for range searches.  The internal
> implementation of the encoding/lexicoding process is going to directly
> impact the implementation of the range query.
> Now sure, someone could encode the lower left point, encode the upper
> right point, and construct a range out of that to pass for a scan, but
> that's going to be wildly inefficient in most cases. See:
> https://dl.dropboxusercontent.com/u/6649380/bbox.png
> If we just lexicode the lower left and upper right we traverse across the
> entire curve - hitting lots of areas that aren't actually in the original
> range.
> Now we can turn a single 2D range into a set of 1D ranges.   There is some
> potential tuning here now, as the algorithm has a tradeoff on time to
> compute the ranges (and number of ranges) vs.  "slop"  (or inclusion of
> ranges which aren't actually in the original query).
> Would you see a static method perhaps on the z-curve lexicoder that
> returns a series of ranges based on an input window?  Some other mechanism?
> And in the case of "slop" - would we just document that the ranges could
> actually include values not expected - or would we always fully decompose?
>
> ------------------
>
> The other, "making it more complicated" questions would probably resolve
> around generalizing this to a multi-dimensional lexicoder.   I would expect
> the WGS84 (lat/long) encoder would just be a 2D instance with -180/+180
> -90/+90 bounds.   There are probably completely different cases where it
> would be useful to have a locality sensitive hash.
> But with the big caveat that this requires more methods - I now need a way
> of defining a range for each of the dimensions (which before were defined
> as lat/lon), and I need an interface/class to pass those dimensions to the
> encoder - and should be able to use those same definitions to decode my
> values on the way out.
>
> I'm not trying to make a mountain out of a molehill - one could definitely
> pick some sane defaults, document behaviour, and put in a WGS84 specific
> implementation.  I'm just thinking what's the minimum viable bit that's
> needed for this to be useful - and I suspect it's also the range
> decomposition piece - as I suspect *everyone* would really need that (if
> they cared about locality in the first place).
>
> My main question here would be to what extent would you see this going?  A
> one off for WGS84, or a more generic multi-dimensional lexicoder;  and in
> either case, what would the thoughts/consensus be for implementation of the
> additional methods (and possibly classes) required?
>
>
>
>
>
> On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <bu...@cloudera.com> wrote:
>
>>
>> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <af...@gmail.com> wrote:
>>
>>>
>>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE
>>> (which is waiting on approval by the Eclipse Foundation's IP review process
>>> and should happen in a month or so).  I think we could break out a z-curve
>>> lexicoder and contribute before then though.  I'll poke around at the
>>> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>>>
>>>
>>
>> That sounds great Anthony. Let us know if we can help with anything!
>>
>>
>>
>> --
>> Sean
>>
>
>

Re: Z-Curve/Hilbert Curve

Posted by Chris Bennight <ch...@slowcar.net>.
A couple of related issues come up when considering implementing a
dimensionality reducing encoding -- just want to toss those out to see what
people think the interface might look like.

There's a couple of aspects that could be brought in here, but lets keep it
simple and considering the original question: (lat/lon) -> number.

--Desired precision of the binning process
The more bits we add to the z-curve, the more precise our comparison - i.e.
a 63 bit key would have more "locations" to sort by than a 24 bit key.
Would you see a reasonable default getting picked, make this user
configurable, or both? (i.e. default to a value, extended options with a
new constructor?)

--Semantics for turning two lat/long pairs into a range
I'm extrapolating here, but the only reason I see that locality matters is
if we want to preserve locality for range searches.  The internal
implementation of the encoding/lexicoding process is going to directly
impact the implementation of the range query.
Now sure, someone could encode the lower left point, encode the upper right
point, and construct a range out of that to pass for a scan, but that's
going to be wildly inefficient in most cases. See:
https://dl.dropboxusercontent.com/u/6649380/bbox.png
If we just lexicode the lower left and upper right we traverse across the
entire curve - hitting lots of areas that aren't actually in the original
range.
Now we can turn a single 2D range into a set of 1D ranges.   There is some
potential tuning here now, as the algorithm has a tradeoff on time to
compute the ranges (and number of ranges) vs.  "slop"  (or inclusion of
ranges which aren't actually in the original query).
Would you see a static method perhaps on the z-curve lexicoder that returns
a series of ranges based on an input window?  Some other mechanism?
And in the case of "slop" - would we just document that the ranges could
actually include values not expected - or would we always fully decompose?

------------------

The other, "making it more complicated" questions would probably resolve
around generalizing this to a multi-dimensional lexicoder.   I would expect
the WGS84 (lat/long) encoder would just be a 2D instance with -180/+180
-90/+90 bounds.   There are probably completely different cases where it
would be useful to have a locality sensitive hash.
But with the big caveat that this requires more methods - I now need a way
of defining a range for each of the dimensions (which before were defined
as lat/lon), and I need an interface/class to pass those dimensions to the
encoder - and should be able to use those same definitions to decode my
values on the way out.

I'm not trying to make a mountain out of a molehill - one could definitely
pick some sane defaults, document behaviour, and put in a WGS84 specific
implementation.  I'm just thinking what's the minimum viable bit that's
needed for this to be useful - and I suspect it's also the range
decomposition piece - as I suspect *everyone* would really need that (if
they cared about locality in the first place).

My main question here would be to what extent would you see this going?  A
one off for WGS84, or a more generic multi-dimensional lexicoder;  and in
either case, what would the thoughts/consensus be for implementation of the
additional methods (and possibly classes) required?





On Fri, Jul 25, 2014 at 12:37 PM, Sean Busbey <bu...@cloudera.com> wrote:

>
> On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <af...@gmail.com> wrote:
>
>>
>> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE
>> (which is waiting on approval by the Eclipse Foundation's IP review process
>> and should happen in a month or so).  I think we could break out a z-curve
>> lexicoder and contribute before then though.  I'll poke around at the
>> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>>
>>
>
> That sounds great Anthony. Let us know if we can help with anything!
>
>
>
> --
> Sean
>

Re: Z-Curve/Hilbert Curve

Posted by Sean Busbey <bu...@cloudera.com>.
On Fri, Jul 25, 2014 at 8:45 AM, Anthony F <af...@gmail.com> wrote:

>
> GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE (which
> is waiting on approval by the Eclipse Foundation's IP review process and
> should happen in a month or so).  I think we could break out a z-curve
> lexicoder and contribute before then though.  I'll poke around at the
> lexicoder stuff in 1.6 and see what it would take to use that interface now.
>
>

That sounds great Anthony. Let us know if we can help with anything!



-- 
Sean

Re: Z-Curve/Hilbert Curve

Posted by Anthony F <af...@gmail.com>.
Sean,

GeoMesa plans to update to 1.6 as soon as we get past a 1.0 RELEASE (which
is waiting on approval by the Eclipse Foundation's IP review process and
should happen in a month or so).  I think we could break out a z-curve
lexicoder and contribute before then though.  I'll poke around at the
lexicoder stuff in 1.6 and see what it would take to use that interface now.

Thanks,
Anthony


On Thu, Jul 24, 2014 at 10:14 PM, Sean Busbey <bu...@cloudera.com> wrote:

> Hi Hunter!
>
> Any plans to update GeoMesa for 1.6?
>
> Could we convince y'all to contribute a Z-curve lexicoder for the core
> project?
>
> http://accumulo.apache.org/1.6/accumulo_user_manual.html#_lexicoders
>
> --
> Sean
> On Jul 24, 2014 2:06 PM, "Hunter Provyn" <fh...@ccri.com> wrote:
>
>>  Bob,
>>
>> There are a number of open source packages / collections of useful things
>> out there you could use to generate Z-curves for accumulo:
>>
>> GeoMesa - http://geomesa.github.io/,
>> https://github.com/locationtech/geomesa/
>> (Disclaimer: I'm a commiter to GeoMesa)
>> Calrissian -
>> https://github.com/calrissian/accumulo-recipes/tree/master/store/geospatial-store
>> GeoWave - https://github.com/ngageoint/geowave
>>
>> GeoMesa currently supports accumulo 1.4 and 1.5. It has a GeoServer
>> plugin to facilitate analysis.
>>
>> Hunter
>> On 07/24/2014 10:10 AM, THORMAN, ROBERT D wrote:
>>
>> Can anyone share a Java method to convert lat/lon (decimal degrees) to
>> Z-Curve (string)?  I need to order my geo-spatial data into lexical order.
>>
>>  v/r
>> Bob Thorman
>> Principal Big Data Engineer
>> AT&T Big Data CoE
>> 2900 W. Plano Parkway
>> Plano, TX 75075
>> 972-658-1714
>>
>>
>>
>>

Re: Z-Curve/Hilbert Curve

Posted by Sean Busbey <bu...@cloudera.com>.
Hi Hunter!

Any plans to update GeoMesa for 1.6?

Could we convince y'all to contribute a Z-curve lexicoder for the core
project?

http://accumulo.apache.org/1.6/accumulo_user_manual.html#_lexicoders

-- 
Sean
On Jul 24, 2014 2:06 PM, "Hunter Provyn" <fh...@ccri.com> wrote:

>  Bob,
>
> There are a number of open source packages / collections of useful things
> out there you could use to generate Z-curves for accumulo:
>
> GeoMesa - http://geomesa.github.io/,
> https://github.com/locationtech/geomesa/
> (Disclaimer: I'm a commiter to GeoMesa)
> Calrissian -
> https://github.com/calrissian/accumulo-recipes/tree/master/store/geospatial-store
> GeoWave - https://github.com/ngageoint/geowave
>
> GeoMesa currently supports accumulo 1.4 and 1.5. It has a GeoServer plugin
> to facilitate analysis.
>
> Hunter
> On 07/24/2014 10:10 AM, THORMAN, ROBERT D wrote:
>
> Can anyone share a Java method to convert lat/lon (decimal degrees) to
> Z-Curve (string)?  I need to order my geo-spatial data into lexical order.
>
>  v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>
>
>

Re: Z-Curve/Hilbert Curve

Posted by Hunter Provyn <fh...@ccri.com>.
Bob,

There are a number of open source packages / collections of useful 
things out there you could use to generate Z-curves for accumulo:

GeoMesa - http://geomesa.github.io/, 
https://github.com/locationtech/geomesa/
(Disclaimer: I'm a commiter to GeoMesa)
Calrissian - 
https://github.com/calrissian/accumulo-recipes/tree/master/store/geospatial-store
GeoWave - https://github.com/ngageoint/geowave

GeoMesa currently supports accumulo 1.4 and 1.5. It has a GeoServer 
plugin to facilitate analysis.

Hunter
On 07/24/2014 10:10 AM, THORMAN, ROBERT D wrote:
> Can anyone share a Java method to convert lat/lon (decimal degrees) to 
> Z-Curve (string)?  I need to order my geo-spatial data into lexical 
> order.
>
> v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>


Re: Z-Curve/Hilbert Curve

Posted by Kurt Christensen <ho...@hoodel.com>.
The first thing that comes to my mind is bit shuffling. Scale and shift 
lat and lon into [0,1) ranges. That is from 0.000... to 0.111... Then, 
starting just past the decimal point, take the first lon bit (first 
since double the range of lat), then the first lat bit. Then just 
alternate, second bits, third bits, until you have as much precision as 
you like. There are discontinuities in the resulting curve, but any 
point is easy to encode and decode, and it provides a 1-dimensional sort.

I hope that helps.

Kurt

On 7/24/14 10:10 AM, THORMAN, ROBERT D wrote:
> Can anyone share a Java method to convert lat/lon (decimal degrees) to 
> Z-Curve (string)?  I need to order my geo-spatial data into lexical 
> order.
>
> v/r
> Bob Thorman
> Principal Big Data Engineer
> AT&T Big Data CoE
> 2900 W. Plano Parkway
> Plano, TX 75075
> 972-658-1714
>
>

-- 

Kurt Christensen
P.O. Box 811
Westminster, MD 21158-0811

------------------------------------------------------------------------
If you can't explain it simply, you don't understand it well enough. -- 
Albert Einstein