You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2010/01/04 22:19:59 UTC

Re: svn commit: r895750 - /lucene/solr/trunk/src/java/org/apache/solr/search/function/distance/DistanceUtils.java

On Mon, Jan 4, 2010 at 2:29 PM,  <gs...@apache.org> wrote:
> +  public static final double KM_TO_MILES = 0.621371192;
> +  public static final double MILES_TO_KM = 1.609344;

I don't care if these exist, but what are your plans for actually using them?

For spatial search, it seems like we should simply standardize on
something, probably either meters or kilometers and be done with it.
It's trivial for clients to convert (and clients aren't end-users),
and will reduce confusion about how to specify units, etc.

Likewise for points/locations - they should simply be lat,lon in
degrees.  No need to specify if it's in radians or degrees when
degrees is more of an external standard and it's as simple for a
client to convert as it is to specify.

-Yonik
http://www.lucidimagination.com

Re: svn commit: r895750 - /lucene/solr/trunk/src/java/org/apache/solr/search/function/distance/DistanceUtils.java

Posted by Grant Ingersoll <gs...@apache.org>.
On Jan 4, 2010, at 5:30 PM, Yonik Seeley wrote:

> On Mon, Jan 4, 2010 at 5:07 PM, Grant Ingersoll <gs...@apache.org> wrote:
>> 
>> On Jan 4, 2010, at 4:19 PM, Yonik Seeley wrote:
>> 
>>> On Mon, Jan 4, 2010 at 2:29 PM,  <gs...@apache.org> wrote:
>>>> +  public static final double KM_TO_MILES = 0.621371192;
>>>> +  public static final double MILES_TO_KM = 1.609344;
>>> 
>>> I don't care if these exist, but what are your plans for actually using them?
>> 
>> Probably premature to commit on my part, I was working on SOLR-1568 and was allowing the user to pass in the units for the distance value.
> 
> I still think it's no simpler for a client, and more complex over all.
> You either must require units to be passed in (yuck) or decide on
> default units.  Once you have decided on default units, extra
> parameters for different units is just increased complexity that is
> just as trivial for the client to implement.  They either have to know
> the code for what units they are using or they have to know how to
> convert to the standard units - about the same amount of complexity.
> 
>>> For spatial search, it seems like we should simply standardize on
>>> something, probably either meters or kilometers and be done with it.
>>> It's trivial for clients to convert (and clients aren't end-users),
>>> and will reduce confusion about how to specify units, etc.
>>> 
>>> Likewise for points/locations - they should simply be lat,lon in
>>> degrees.  No need to specify if it's in radians or degrees when
>>> degrees is more of an external standard and it's as simple for a
>>> client to convert as it is to specify.
>> 
>> Possibly, except you can save a few operations per document if you just store radians when using haversine.
> 
> A single multiply (~3cycles?).  If that's worth saving, we should just
> index it that way for the user...

Sure, point type could have an init parameter, I suppose, that specified whether to convert.  Or, the user can just send it in radians to begin with.  What I want as a designer is to specify it up front based on the type of accuracy I want out of my distances.  To me, that's what it all comes back to.  The app designer doing spatial says:  how accurate do I need my calculations to be?  Then, they make decisions about data structures based on that.

> but given the computational cost of
> haversine, it's really in the noise... we should figure out other ways
> to speed things up.
> 

Times 20-100 million records to score/filter?  Not a huge amount of savings, but still could be worthwhile for some applications under high load and w/ lots of docs without costing anyone else anything different. 


> A location in the xml, when using our built-in field types should be
> unambiguously degrees in lat,lon format.  How it's indexed to increase
> speed, save space, etc, is up to the field type and it's
> configuration.

Actually, it is unambiguous as x,y(,z...).  We have points in a n-dimensional space, as of now, but we can add lat/lon specifically if that helps.

> 
>> I'm just not sure I see this as a big deal.  Technically, we could hide all the complexity of numerics from the user too, but yet we offer ints, floats and doubles (we could parse them on our side and figure out which is what).
> 
> But we do hide the complexity of numerics from the user (clients) as
> much as we can.  popularity:10 popularity:[5 TO 10] all work without
> the client knowing what kind of numeric field is being used (with the
> exception of plain numerics which are offered only for compatibility
> with existing lucene indexes).

I just don't get why normal spatial calculations are any different from other function queries, with the exception, right now, of tiles.  Perhaps if in the future we have other complex types that require one offs, then we can unify on hiding all of this, but for now the _only_ thing that doesn't work out of the box are tiles.  All the rest can be handled through function queries and the FunctionRangeQParser.  I don't see much benefit in writing/maintaining code that is very marginally more readable than using function queries and will be templated in an application anyway and then left to do its job.

-Grant

Re: svn commit: r895750 - /lucene/solr/trunk/src/java/org/apache/solr/search/function/distance/DistanceUtils.java

Posted by Yonik Seeley <yo...@lucidimagination.com>.
On Mon, Jan 4, 2010 at 5:07 PM, Grant Ingersoll <gs...@apache.org> wrote:
>
> On Jan 4, 2010, at 4:19 PM, Yonik Seeley wrote:
>
>> On Mon, Jan 4, 2010 at 2:29 PM,  <gs...@apache.org> wrote:
>>> +  public static final double KM_TO_MILES = 0.621371192;
>>> +  public static final double MILES_TO_KM = 1.609344;
>>
>> I don't care if these exist, but what are your plans for actually using them?
>
> Probably premature to commit on my part, I was working on SOLR-1568 and was allowing the user to pass in the units for the distance value.

I still think it's no simpler for a client, and more complex over all.
You either must require units to be passed in (yuck) or decide on
default units.  Once you have decided on default units, extra
parameters for different units is just increased complexity that is
just as trivial for the client to implement.  They either have to know
the code for what units they are using or they have to know how to
convert to the standard units - about the same amount of complexity.

>> For spatial search, it seems like we should simply standardize on
>> something, probably either meters or kilometers and be done with it.
>> It's trivial for clients to convert (and clients aren't end-users),
>> and will reduce confusion about how to specify units, etc.
>>
>> Likewise for points/locations - they should simply be lat,lon in
>> degrees.  No need to specify if it's in radians or degrees when
>> degrees is more of an external standard and it's as simple for a
>> client to convert as it is to specify.
>
> Possibly, except you can save a few operations per document if you just store radians when using haversine.

A single multiply (~3cycles?).  If that's worth saving, we should just
index it that way for the user... but given the computational cost of
haversine, it's really in the noise... we should figure out other ways
to speed things up.

A location in the xml, when using our built-in field types should be
unambiguously degrees in lat,lon format.  How it's indexed to increase
speed, save space, etc, is up to the field type and it's
configuration.

> I'm just not sure I see this as a big deal.  Technically, we could hide all the complexity of numerics from the user too, but yet we offer ints, floats and doubles (we could parse them on our side and figure out which is what).

But we do hide the complexity of numerics from the user (clients) as
much as we can.  popularity:10 popularity:[5 TO 10] all work without
the client knowing what kind of numeric field is being used (with the
exception of plain numerics which are offered only for compatibility
with existing lucene indexes).

>  I'm more of the mindset that I think the app designer should be able to make the choice, but possibly with some guidance from us as to what is appropriate for each situation, just as we do with other field types.

Yes, absolutely.  The app *designer* can make the choices and use the
appropriate field types and config, and we should isolate clients from
those choices (and changes in those choices) to the degree that it's
practical.  That's what we currently do.

-Yonik
http://www.lucidimagination.com

Re: svn commit: r895750 - /lucene/solr/trunk/src/java/org/apache/solr/search/function/distance/DistanceUtils.java

Posted by Grant Ingersoll <gs...@apache.org>.
On Jan 4, 2010, at 4:19 PM, Yonik Seeley wrote:

> On Mon, Jan 4, 2010 at 2:29 PM,  <gs...@apache.org> wrote:
>> +  public static final double KM_TO_MILES = 0.621371192;
>> +  public static final double MILES_TO_KM = 1.609344;
> 
> I don't care if these exist, but what are your plans for actually using them?

Probably premature to commit on my part, I was working on SOLR-1568 and was allowing the user to pass in the units for the distance value.

> 
> For spatial search, it seems like we should simply standardize on
> something, probably either meters or kilometers and be done with it.
> It's trivial for clients to convert (and clients aren't end-users),
> and will reduce confusion about how to specify units, etc.
> 
> Likewise for points/locations - they should simply be lat,lon in
> degrees.  No need to specify if it's in radians or degrees when
> degrees is more of an external standard and it's as simple for a
> client to convert as it is to specify.

Possibly, except you can save a few operations per document if you just store radians when using haversine. 

I'm just not sure I see this as a big deal.  Technically, we could hide all the complexity of numerics from the user too, but yet we offer ints, floats and doubles (we could parse them on our side and figure out which is what).  I'm more of the mindset that I think the app designer should be able to make the choice, but possibly with some guidance from us as to what is appropriate for each situation, just as we do with other field types.