You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Yonik Seeley <yo...@lucidimagination.com> on 2010/05/02 19:32:31 UTC

BytesRef comparable

Any objections to making BytesRef comparable?  It would make it much
easier to use with containers that don't take comparators as
parameters.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, May 3, 2010 at 9:02 AM, Robert Muir <rc...@gmail.com> wrote:

> In my opinion, BytesRef should be comparable, with the default being
> 'natural' (unsigned/codepoint order).

+1, once we cutover standard codec to this too.

> And you can still specify a crazy comparator in your codec, but not all
> things would work (its an expert customization)

Right.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Robert Muir <rc...@gmail.com>.

On Mon, May 3, 2010 at 8:55 AM, Yonik Seeley <yo...@lucidimagination.com>wrote:

>
> I thought we were going to be changing lucene's index order to the
> natural byte order (same as the unicode code point order)?
>
> Solr's BoundedTreeSet doesn't take a comparator.  I could change it so
> that it could of course... but it just seemed natural to "fix"
> BytesRef.
>
>
In my opinion, BytesRef should be comparable, with the default being
'natural' (unsigned/codepoint order).
And you can still specify a crazy comparator in your codec, but not all
things would work (its an expert customization)

I mean, things like automaton won't work if you change the sort order to
something wacky anyway... so theres nothing strange about Solr's
BoundedTreeSet not using a custom codec comparator?

-- 
Robert Muir
rcmuir@gmail.com

Re: BytesRef comparable

Posted by Robert Muir <rc...@gmail.com>.

On Mon, May 3, 2010 at 9:07 AM, Uwe Schindler <uw...@thetaphi.de> wrote:

> In current flex, the TermsEnum has a method called getComparator() that
> returns the comparator used in this TermsEnum. This is the reason behind it.
> If we change this to only support natural byte[] ordering (of course
> unsigned, die, die Java's signedness of byte!), and codecs should not
> support other ordering in future (In my opinion this would break most MTQs
> depending on the order like range, fuzzy, wildcard,*), its obsolete.
>

 Yeah even simpler ones like PrefixQuery will break if you were to change
the sort order to say, sort backwards... so I think this is a very expert
customization.

-- 
Robert Muir
rcmuir@gmail.com

Re: BytesRef comparable

Posted by Michael McCandless <lu...@mikemccandless.com>.

Yeah it can return null -- eg MultiTermsEnum does this if it has 0 subs.

I suppose we could have it return a fake comparator, since if there
are no terms how would it matter... but that makes me nervous...

Mike

On Mon, May 3, 2010 at 9:46 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> Is it allowed to return null? - oh no. In Lucene's Codecs it does not do this, maybe it’s a javadocs relict! Mike?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
>> Seeley
>> Sent: Monday, May 03, 2010 3:31 PM
>> To: dev@lucene.apache.org
>> Subject: Re: BytesRef comparable
>>
>> On Mon, May 3, 2010 at 9:07 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
>> > In current flex, the TermsEnum has a method called getComparator()
>> that returns the comparator used in this TermsEnum.
>>
>> Gah.... more code that "may return null".
>> Anyway, I made a mistake... Solr's BoundedTreeSet does accept a
>> comparator.
>>
>> -Yonik
>> Apache Lucene Eurocon 2010
>> 18-21 May 2010 | Prague
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: BytesRef comparable

Posted by Uwe Schindler <uw...@thetaphi.de>.

Is it allowed to return null? - oh no. In Lucene's Codecs it does not do this, maybe it’s a javadocs relict! Mike?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Monday, May 03, 2010 3:31 PM
> To: dev@lucene.apache.org
> Subject: Re: BytesRef comparable
> 
> On Mon, May 3, 2010 at 9:07 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > In current flex, the TermsEnum has a method called getComparator()
> that returns the comparator used in this TermsEnum.
> 
> Gah.... more code that "may return null".
> Anyway, I made a mistake... Solr's BoundedTreeSet does accept a
> comparator.
> 
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, May 3, 2010 at 9:07 AM, Uwe Schindler <uw...@thetaphi.de> wrote:
> In current flex, the TermsEnum has a method called getComparator() that returns the comparator used in this TermsEnum.

Gah.... more code that "may return null".
Anyway, I made a mistake... Solr's BoundedTreeSet does accept a comparator.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: BytesRef comparable

Posted by Uwe Schindler <uw...@thetaphi.de>.

In current flex, the TermsEnum has a method called getComparator() that returns the comparator used in this TermsEnum. This is the reason behind it. If we change this to only support natural byte[] ordering (of course unsigned, die, die Java's signedness of byte!), and codecs should not support other ordering in future (In my opinion this would break most MTQs depending on the order like range, fuzzy, wildcard,*), its obsolete.

But with current code the BoundedTreeSet should take the comparator provided by the TermsEnum.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Monday, May 03, 2010 2:56 PM
> To: dev@lucene.apache.org
> Subject: Re: BytesRef comparable
> 
> On Mon, May 3, 2010 at 6:30 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
> > The problem is BytesRef is not really a concrete object.  It can't
> > know how the terms it's representing are supposed to sort.
> > Yet nearly all the time this sort will be lucene's default term sort
> > (only custom codecs can change this), so I'm +1 on making BytesRef
> > sort according to that (note that this is not actually natural byte[]
> > order, because we must interp the UTF8 bytes as unsigned to sort in
> > unicode code point order).
> 
> I thought we were going to be changing lucene's index order to the
> natural byte order (same as the unicode code point order)?
> 
> Solr's BoundedTreeSet doesn't take a comparator.  I could change it so
> that it could of course... but it just seemed natural to "fix"
> BytesRef.
> 
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, May 3, 2010 at 6:30 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> The problem is BytesRef is not really a concrete object.  It can't
> know how the terms it's representing are supposed to sort.
> Yet nearly all the time this sort will be lucene's default term sort
> (only custom codecs can change this), so I'm +1 on making BytesRef
> sort according to that (note that this is not actually natural byte[]
> order, because we must interp the UTF8 bytes as unsigned to sort in
> unicode code point order).

I thought we were going to be changing lucene's index order to the
natural byte order (same as the unicode code point order)?

Solr's BoundedTreeSet doesn't take a comparator.  I could change it so
that it could of course... but it just seemed natural to "fix"
BytesRef.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: BytesRef comparable

Posted by Uwe Schindler <uw...@thetaphi.de>.

4.0 will be not backwards compatible, so if you not convert the fields with numerics, they can no longer be queried with NRQ, so people have to reindex or we provide a tool to convert.

If we change stopwords there is no way around reindexing, but what we can change we should. Indexes should at least be readable with newer versions, so also the numeric values in older indexes.

But as Mike said, we should give the index conversion utility some options to turn on specific conversions.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Monday, May 03, 2010 7:25 PM
> To: dev@lucene.apache.org
> Subject: Re: BytesRef comparable
> 
> On Mon, May 3, 2010 at 1:19 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> > We will provide it for 4.0 for the index order change and possibly to
> convert NumericFields to full-width byte[] (only problem: how to detect
> them without strange heuristics?)!
> 
> I don't think numerics should be changed (by default at least)...
> people could legitimately want to use the old version.  Just like if
> stemming or stopwords or anything else changed, that's more up to the
> user.
> 
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, May 3, 2010 at 1:19 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> We will provide it for 4.0 for the index order change and possibly to convert NumericFields to full-width byte[] (only problem: how to detect them without strange heuristics?)!

I don't think numerics should be changed (by default at least)...
people could legitimately want to use the old version.  Just like if
stemming or stopwords or anything else changed, that's more up to the
user.

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Michael McCandless <lu...@mikemccandless.com>.

I think the tool would require you to specify which [numeric] fields
need conversion?

And maybe also for collated fields?

Mike

On Mon, May 3, 2010 at 1:19 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
> We will provide it for 4.0 for the index order change and possibly to convert NumericFields to full-width byte[] (only problem: how to detect them without strange heuristics?)!
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Shai Erera [mailto:serera@gmail.com]
>> Sent: Monday, May 03, 2010 6:09 PM
>> To: dev@lucene.apache.org
>> Subject: Re: BytesRef comparable
>>
>> Trunk matters here only if this changes within the same manor release.
>> If sort order changes between 4.0 and 5.0, we'll needto provide and
>> index migration tool or something, because such change does not
>> justify reindexing.
>>
>> Shai
>>
>> On Monday, May 3, 2010, Michael McCandless <lu...@mikemccandless.com>
>> wrote:
>> > On Mon, May 3, 2010 at 10:37 AM, Yonik Seeley
>> > <yo...@lucidimagination.com> wrote:
>> >> On Mon, May 3, 2010 at 6:30 AM, Michael McCandless
>> >> <lu...@mikemccandless.com> wrote:
>> >>> But... we probably should do this after we switch to sorting terms
>> by
>> >>> unicode code point order?  (LUCENE-2426)
>> >>
>> >> So perhaps we should define this in terms of Lucene's default index
>> >> order instead... so it could be UTF16 order for now, and switch to
>> >> code-point / unsigned byte order when the index format does?
>> >
>> > This seems a bit dangerous (first sorts by UTF16 order and then by
>> > codepoint order, when we switch)... ie, I'd rather make BytesRef
>> > compare by code point order, after we've cutover to that order by
>> > default.
>> >
>> > Though it is "trunk", so, we could do that I guess... (Erik: this is
>> > what "unstable" means ;) ).
>> >
>> > Mike
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

RE: BytesRef comparable

Posted by Uwe Schindler <uw...@thetaphi.de>.

We will provide it for 4.0 for the index order change and possibly to convert NumericFields to full-width byte[] (only problem: how to detect them without strange heuristics?)!

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Shai Erera [mailto:serera@gmail.com]
> Sent: Monday, May 03, 2010 6:09 PM
> To: dev@lucene.apache.org
> Subject: Re: BytesRef comparable
> 
> Trunk matters here only if this changes within the same manor release.
> If sort order changes between 4.0 and 5.0, we'll needto provide and
> index migration tool or something, because such change does not
> justify reindexing.
> 
> Shai
> 
> On Monday, May 3, 2010, Michael McCandless <lu...@mikemccandless.com>
> wrote:
> > On Mon, May 3, 2010 at 10:37 AM, Yonik Seeley
> > <yo...@lucidimagination.com> wrote:
> >> On Mon, May 3, 2010 at 6:30 AM, Michael McCandless
> >> <lu...@mikemccandless.com> wrote:
> >>> But... we probably should do this after we switch to sorting terms
> by
> >>> unicode code point order?  (LUCENE-2426)
> >>
> >> So perhaps we should define this in terms of Lucene's default index
> >> order instead... so it could be UTF16 order for now, and switch to
> >> code-point / unsigned byte order when the index format does?
> >
> > This seems a bit dangerous (first sorts by UTF16 order and then by
> > codepoint order, when we switch)... ie, I'd rather make BytesRef
> > compare by code point order, after we've cutover to that order by
> > default.
> >
> > Though it is "trunk", so, we could do that I guess... (Erik: this is
> > what "unstable" means ;) ).
> >
> > Mike
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Shai Erera <se...@gmail.com>.

Trunk matters here only if this changes within the same manor release.
If sort order changes between 4.0 and 5.0, we'll needto provide and
index migration tool or something, because such change does not
justify reindexing.

Shai

On Monday, May 3, 2010, Michael McCandless <lu...@mikemccandless.com> wrote:
> On Mon, May 3, 2010 at 10:37 AM, Yonik Seeley
> <yo...@lucidimagination.com> wrote:
>> On Mon, May 3, 2010 at 6:30 AM, Michael McCandless
>> <lu...@mikemccandless.com> wrote:
>>> But... we probably should do this after we switch to sorting terms by
>>> unicode code point order?  (LUCENE-2426)
>>
>> So perhaps we should define this in terms of Lucene's default index
>> order instead... so it could be UTF16 order for now, and switch to
>> code-point / unsigned byte order when the index format does?
>
> This seems a bit dangerous (first sorts by UTF16 order and then by
> codepoint order, when we switch)... ie, I'd rather make BytesRef
> compare by code point order, after we've cutover to that order by
> default.
>
> Though it is "trunk", so, we could do that I guess... (Erik: this is
> what "unstable" means ;) ).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Mon, May 3, 2010 at 10:37 AM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Mon, May 3, 2010 at 6:30 AM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>> But... we probably should do this after we switch to sorting terms by
>> unicode code point order?  (LUCENE-2426)
>
> So perhaps we should define this in terms of Lucene's default index
> order instead... so it could be UTF16 order for now, and switch to
> code-point / unsigned byte order when the index format does?

This seems a bit dangerous (first sorts by UTF16 order and then by
codepoint order, when we switch)... ie, I'd rather make BytesRef
compare by code point order, after we've cutover to that order by
default.

Though it is "trunk", so, we could do that I guess... (Erik: this is
what "unstable" means ;) ).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Mon, May 3, 2010 at 6:30 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> But... we probably should do this after we switch to sorting terms by
> unicode code point order?  (LUCENE-2426)

So perhaps we should define this in terms of Lucene's default index
order instead... so it could be UTF16 order for now, and switch to
code-point / unsigned byte order when the index format does?

-Yonik
Apache Lucene Eurocon 2010
18-21 May 2010 | Prague

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Michael McCandless <lu...@mikemccandless.com>.

I agree -- objects that directly impl Comparable are great.

The problem is BytesRef is not really a concrete object.  It can't
know how the terms it's representing are supposed to sort.

Yet nearly all the time this sort will be lucene's default term sort
(only custom codecs can change this), so I'm +1 on making BytesRef
sort according to that (note that this is not actually natural byte[]
order, because we must interp the UTF8 bytes as unsigned to sort in
unicode code point order).  The expert users of custom codecs that
alter their sort order can be expected to pass their own
Comparators...

But... we probably should do this after we switch to sorting terms by
unicode code point order?  (LUCENE-2426)

Mike

On Mon, May 3, 2010 at 6:15 AM, Shai Erera <se...@gmail.com> wrote:
> I don't know what Yonik's specific use case is, but I generally like objects
> that are Comparable, rather than passing around Comparators. For example, I
> think that if ScoreDoc was comparable, less people would need to extend PQ
> as well as ScoreDoc. They could just impl their ScoreDocExt sort logic ...
>
> Comparators however give you more flexibility, if e.g. you want to compare
> the same objects using different criteria.
>
> You can still do both - the object controls its "natural" order while
> external comparators can sort differently.
>
> It also depends whether you can pass the Comparator down the call stack or
> not.
>
> I myself am for having objects implement Comparable (when it makes sense)
> and also open up the use of Comparator if that really is needed.
>
> Shai
>
> On Mon, May 3, 2010 at 12:36 PM, Michael McCandless
> <lu...@mikemccandless.com> wrote:
>>
>> It used to implement Comparable (hardwired to natural byte[] order),
>> but I removed it, so that all comparisons are forced to be explicit.
>> The problem is... it's dangerous to assume comparison in natural order
>> is always correct.  Eg Lucene today sorts terms using
>> UTF8SortedAsUTF16Comparator (which is not natural byte[] ordering).
>>
>> Of course we are moving away from this, so terms will by default be
>> sorted in Unicode code point order (which matches UTF8 byte[] natural
>> order), under LUCENE-2426, but a codec could still customize the sort
>> order.
>>
>> We could still put it back, hardwired to natural byte order?  And
>> javadoc the dangers...
>>
>> Or I guess we could tell each BytesRef the comparator it should
>> delegate to, but that's rather inefficient.
>>
>> Where are you needing to compare BytesRefs?  And which container won't
>> accept Compator...?
>>
>> Mike
>>
>> On Sun, May 2, 2010 at 1:32 PM, Yonik Seeley <yo...@lucidimagination.com>
>> wrote:
>> > Any objections to making BytesRef comparable?  It would make it much
>> > easier to use with containers that don't take comparators as
>> > parameters.
>> >
>> > -Yonik
>> > Apache Lucene Eurocon 2010
>> > 18-21 May 2010 | Prague
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Shai Erera <se...@gmail.com>.

I don't know what Yonik's specific use case is, but I generally like objects
that are Comparable, rather than passing around Comparators. For example, I
think that if ScoreDoc was comparable, less people would need to extend PQ
as well as ScoreDoc. They could just impl their ScoreDocExt sort logic ...

Comparators however give you more flexibility, if e.g. you want to compare
the same objects using different criteria.

You can still do both - the object controls its "natural" order while
external comparators can sort differently.

It also depends whether you can pass the Comparator down the call stack or
not.

I myself am for having objects implement Comparable (when it makes sense)
and also open up the use of Comparator if that really is needed.

Shai

On Mon, May 3, 2010 at 12:36 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> It used to implement Comparable (hardwired to natural byte[] order),
> but I removed it, so that all comparisons are forced to be explicit.
> The problem is... it's dangerous to assume comparison in natural order
> is always correct.  Eg Lucene today sorts terms using
> UTF8SortedAsUTF16Comparator (which is not natural byte[] ordering).
>
> Of course we are moving away from this, so terms will by default be
> sorted in Unicode code point order (which matches UTF8 byte[] natural
> order), under LUCENE-2426, but a codec could still customize the sort
> order.
>
> We could still put it back, hardwired to natural byte order?  And
> javadoc the dangers...
>
> Or I guess we could tell each BytesRef the comparator it should
> delegate to, but that's rather inefficient.
>
> Where are you needing to compare BytesRefs?  And which container won't
> accept Compator...?
>
> Mike
>
> On Sun, May 2, 2010 at 1:32 PM, Yonik Seeley <yo...@lucidimagination.com>
> wrote:
> > Any objections to making BytesRef comparable?  It would make it much
> > easier to use with containers that don't take comparators as
> > parameters.
> >
> > -Yonik
> > Apache Lucene Eurocon 2010
> > 18-21 May 2010 | Prague
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

Re: BytesRef comparable

Posted by Michael McCandless <lu...@mikemccandless.com>.

It used to implement Comparable (hardwired to natural byte[] order),
but I removed it, so that all comparisons are forced to be explicit.
The problem is... it's dangerous to assume comparison in natural order
is always correct.  Eg Lucene today sorts terms using
UTF8SortedAsUTF16Comparator (which is not natural byte[] ordering).

Of course we are moving away from this, so terms will by default be
sorted in Unicode code point order (which matches UTF8 byte[] natural
order), under LUCENE-2426, but a codec could still customize the sort
order.

We could still put it back, hardwired to natural byte order?  And
javadoc the dangers...

Or I guess we could tell each BytesRef the comparator it should
delegate to, but that's rather inefficient.

Where are you needing to compare BytesRefs?  And which container won't
accept Compator...?

Mike

On Sun, May 2, 2010 at 1:32 PM, Yonik Seeley <yo...@lucidimagination.com> wrote:
> Any objections to making BytesRef comparable?  It would make it much
> easier to use with containers that don't take comparators as
> parameters.
>
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: BytesRef comparable

Posted by Simon Willnauer <si...@googlemail.com>.

+1 sounds good

On Sun, May 2, 2010 at 7:59 PM, Shai Erera <se...@gmail.com> wrote:

> +1
>
> Shai
>
>
> On Sun, May 2, 2010 at 8:58 PM, Uwe Schindler <uw...@thetaphi.de> wrote:
>
>> When we do the refactoring in flex to remove the comparator, this would be
>> the first that comes to my mind.
>>
>> +1
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>> > -----Original Message-----
>> > From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
>> > Seeley
>> > Sent: Sunday, May 02, 2010 7:33 PM
>> > To: java-dev@lucene.apache.org
>> > Subject: BytesRef comparable
>> >
>> > Any objections to making BytesRef comparable?  It would make it much
>> > easier to use with containers that don't take comparators as
>> > parameters.
>> >
>> > -Yonik
>> > Apache Lucene Eurocon 2010
>> > 18-21 May 2010 | Prague
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

Re: BytesRef comparable

Posted by Shai Erera <se...@gmail.com>.

+1

Shai

On Sun, May 2, 2010 at 8:58 PM, Uwe Schindler <uw...@thetaphi.de> wrote:

> When we do the refactoring in flex to remove the comparator, this would be
> the first that comes to my mind.
>
> +1
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> > Seeley
> > Sent: Sunday, May 02, 2010 7:33 PM
> > To: java-dev@lucene.apache.org
> > Subject: BytesRef comparable
> >
> > Any objections to making BytesRef comparable?  It would make it much
> > easier to use with containers that don't take comparators as
> > parameters.
> >
> > -Yonik
> > Apache Lucene Eurocon 2010
> > 18-21 May 2010 | Prague
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

RE: BytesRef comparable

Posted by Uwe Schindler <uw...@thetaphi.de>.

When we do the refactoring in flex to remove the comparator, this would be the first that comes to my mind.

+1

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: yseeley@gmail.com [mailto:yseeley@gmail.com] On Behalf Of Yonik
> Seeley
> Sent: Sunday, May 02, 2010 7:33 PM
> To: java-dev@lucene.apache.org
> Subject: BytesRef comparable
> 
> Any objections to making BytesRef comparable?  It would make it much
> easier to use with containers that don't take comparators as
> parameters.
> 
> -Yonik
> Apache Lucene Eurocon 2010
> 18-21 May 2010 | Prague
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org