You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Victor Podberezski <vp...@cms-medios.com> on 2015/01/17 01:00:10 UTC

Help with a fieldcomparator!

I need a hand with a custom comparator.

I have a field filled with words separated by spaces. Some words has
numbers inside.

 I need to extract those numbers and sort the documents by this number. I
need to get the lower if there are more than 1 number .

For example:

doc1 "val2 aaaa val3" --> 2, 3 --> 2
doc2 "val5 aaaa val1" --> 5, 1 --> 1
doc3 "val7 bbbbb val5" --> 7, 5 ---> 5

the sorted results have to be:

doc2
doc1
doc3

how can I achieve this?

I have trouble migrating a functional solution from lucene 2.4 to lucene
3.9 or higher (migration from ScoreDocComparator to fieldComparator).

I try this:

    public void setNextReader(IndexReader reader, int docBase) throws
IOException {

      currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
FieldCache.IntParser() {
      public final int parseInt(final String val) {
     return extractNumber(val);
      }
});

and the rest equal to the IntComparator.
but this is not working

Anybody has an idea of how resolve this problem?
Thanks,

Víctor Podberezski

Re: Help with a fieldcomparator!

Posted by Victor Podberezski <vp...@cms-medios.com>.
Erick:

(sorry, I misspelled your name in my last email )

I tried a bunch of solutions.... none worked as I expected.

Basically none of them sorts the documents using the pattern as I expect.

This is my simplified code:

public class PatternFieldComparatorSource
extends FieldComparatorSource {


private String pattern;
private boolean ascending = false;

public PatternFieldComparatorSource(String pattern, boolean ascending){

this.ascending = ascending;
this.pattern = pattern;
 }

public  FieldComparator newComparator(String fieldname, int numHits, int
sortPos, boolean reversed) throws IOException {

return new PatternFieldComparator(numHits, fieldname);
}



class PatternFieldComparator extends FieldComparator {

    private final int[] values;

    private int[] currentReaderValues;

    private final String field;
    private int bottom;                           // Value of bottom of
queue

    HighTrafficFieldComparator(int numHits, String field) {
      values = new int[numHits];
      this.field = field;
    }

    public int compare(int slot1, int slot2) {
      // TODO: there are sneaky non-branch ways to compute
      // -1/+1/0 sign
      // Cannot return values[slot1] - values[slot2] because that
      // may overflow
      final int v1 = values[slot1];
      final int v2 = values[slot2];

      if (v1 > v2) {
        return 1;
      } else if (v1 < v2) {
        return -1;
      } else {
        return 0;
      }
    }

    public int compareBottom(int doc) {
      // TODO: there are sneaky non-branch ways to compute
      // -1/+1/0 sign
      // Cannot return bottom - values[slot2] because that
      // may overflow
      final int v2 = currentReaderValues[doc];
      if (bottom > v2) {
        return 1;
      } else if (bottom < v2) {
        return -1;
      } else {
        return 0;
      }
    }

    public void copy(int slot, int doc) {
      values[slot] = currentReaderValues[doc];
}

    public void setNextReader(IndexReader reader, int docBase) throws
IOException {


     currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
FieldCache.IntParser() {
      public final int parseInt(final String val) {
      return getValueByPattern(val);
}
});
    }

    public void setBottom(final int bottom) {
      this.bottom = values[bottom];

    }

    public Comparable value(int slot) {
      return values[slot];
    }

}

private Integer getValueByPattern(String text) {
// if pattern is not present I return then max or min value possible
(depends if sort is ascending or descending).
int value = !ascending ? Integer.MAX_VALUE : Integer.MIN_VALUE;
 // if pattern is pressent...
if (text.contains(pattern)
{
value = Integer.parseInt(...) // extract the value and return
}
 return new Integer(value);
}

}

My code does not sort fine. I'm not finding a explanation why.

Thanks
Víctor

On Sat, Jan 17, 2015 at 9:04 PM, Erick Erickson <er...@gmail.com>
wrote:

> Ah, OK. H.L. Mencken wrote something like:
> "For every complex problem there is a solution
> that is simple, elegant, and wrong". I specialize in these...
>
> I don't have a good answer for your question then. How
> is what you're trying failing?
>
> Best,
> Erick
>
> On Fri, Jan 16, 2015 at 4:59 PM, Victor Podberezski
> <vp...@cms-medios.com> wrote:
> > Erik, Thanks for your reply.
> >
> > I wrote a simplification of the problem. Not only the values in the field
> > that can be sorted are "val1, val2,..." . they can also be "patternX1,
> > patternX2", etc.
> >
> > and in that case I need to sort according to different criteria. They're
> a
> > lot of differents patterns but not to much documents as result of the
> query
> > filter
> > For that reason I think the best way is a custom FieldComparator.
> >
> > Thanks
> > Víctor Podberezski
> >
> > On Fri, Jan 16, 2015 at 9:31 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> Personally I would do this on the ingestion side with a new field.
> >> That is, analyze the input field when you were indexing the doc,
> >> extract the min value from any numbers, and put that in a
> >> new field. Then it's simply sorting by the new field. This is likely
> >> to be much more performant than reprocessing this at query
> >> time in a comparator.
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
> >> <vp...@cms-medios.com> wrote:
> >> > I need a hand with a custom comparator.
> >> >
> >> > I have a field filled with words separated by spaces. Some words has
> >> > numbers inside.
> >> >
> >> >  I need to extract those numbers and sort the documents by this
> number. I
> >> > need to get the lower if there are more than 1 number .
> >> >
> >> > For example:
> >> >
> >> > doc1 "val2 aaaa val3" --> 2, 3 --> 2
> >> > doc2 "val5 aaaa val1" --> 5, 1 --> 1
> >> > doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
> >> >
> >> > the sorted results have to be:
> >> >
> >> > doc2
> >> > doc1
> >> > doc3
> >> >
> >> > how can I achieve this?
> >> >
> >> > I have trouble migrating a functional solution from lucene 2.4 to
> lucene
> >> > 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
> >> >
> >> > I try this:
> >> >
> >> >     public void setNextReader(IndexReader reader, int docBase) throws
> >> > IOException {
> >> >
> >> >       currentReaderValues = FieldCache.DEFAULT.getInts(reader, field,
> new
> >> > FieldCache.IntParser() {
> >> >       public final int parseInt(final String val) {
> >> >      return extractNumber(val);
> >> >       }
> >> > });
> >> >
> >> > and the rest equal to the IntComparator.
> >> > but this is not working
> >> >
> >> > Anybody has an idea of how resolve this problem?
> >> > Thanks,
> >> >
> >> > Víctor Podberezski
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Help with a fieldcomparator!

Posted by Erick Erickson <er...@gmail.com>.
Ah, OK. H.L. Mencken wrote something like:
"For every complex problem there is a solution
that is simple, elegant, and wrong". I specialize in these...

I don't have a good answer for your question then. How
is what you're trying failing?

Best,
Erick

On Fri, Jan 16, 2015 at 4:59 PM, Victor Podberezski
<vp...@cms-medios.com> wrote:
> Erik, Thanks for your reply.
>
> I wrote a simplification of the problem. Not only the values in the field
> that can be sorted are "val1, val2,..." . they can also be "patternX1,
> patternX2", etc.
>
> and in that case I need to sort according to different criteria. They're a
> lot of differents patterns but not to much documents as result of the query
> filter
> For that reason I think the best way is a custom FieldComparator.
>
> Thanks
> Víctor Podberezski
>
> On Fri, Jan 16, 2015 at 9:31 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Personally I would do this on the ingestion side with a new field.
>> That is, analyze the input field when you were indexing the doc,
>> extract the min value from any numbers, and put that in a
>> new field. Then it's simply sorting by the new field. This is likely
>> to be much more performant than reprocessing this at query
>> time in a comparator.
>>
>> FWIW,
>> Erick
>>
>> On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
>> <vp...@cms-medios.com> wrote:
>> > I need a hand with a custom comparator.
>> >
>> > I have a field filled with words separated by spaces. Some words has
>> > numbers inside.
>> >
>> >  I need to extract those numbers and sort the documents by this number. I
>> > need to get the lower if there are more than 1 number .
>> >
>> > For example:
>> >
>> > doc1 "val2 aaaa val3" --> 2, 3 --> 2
>> > doc2 "val5 aaaa val1" --> 5, 1 --> 1
>> > doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
>> >
>> > the sorted results have to be:
>> >
>> > doc2
>> > doc1
>> > doc3
>> >
>> > how can I achieve this?
>> >
>> > I have trouble migrating a functional solution from lucene 2.4 to lucene
>> > 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
>> >
>> > I try this:
>> >
>> >     public void setNextReader(IndexReader reader, int docBase) throws
>> > IOException {
>> >
>> >       currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
>> > FieldCache.IntParser() {
>> >       public final int parseInt(final String val) {
>> >      return extractNumber(val);
>> >       }
>> > });
>> >
>> > and the rest equal to the IntComparator.
>> > but this is not working
>> >
>> > Anybody has an idea of how resolve this problem?
>> > Thanks,
>> >
>> > Víctor Podberezski
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Help with a fieldcomparator!

Posted by Victor Podberezski <vp...@cms-medios.com>.
Erik, Thanks for your reply.

I wrote a simplification of the problem. Not only the values in the field
that can be sorted are "val1, val2,..." . they can also be "patternX1,
patternX2", etc.

and in that case I need to sort according to different criteria. They're a
lot of differents patterns but not to much documents as result of the query
filter
For that reason I think the best way is a custom FieldComparator.

Thanks
Víctor Podberezski

On Fri, Jan 16, 2015 at 9:31 PM, Erick Erickson <er...@gmail.com>
wrote:

> Personally I would do this on the ingestion side with a new field.
> That is, analyze the input field when you were indexing the doc,
> extract the min value from any numbers, and put that in a
> new field. Then it's simply sorting by the new field. This is likely
> to be much more performant than reprocessing this at query
> time in a comparator.
>
> FWIW,
> Erick
>
> On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
> <vp...@cms-medios.com> wrote:
> > I need a hand with a custom comparator.
> >
> > I have a field filled with words separated by spaces. Some words has
> > numbers inside.
> >
> >  I need to extract those numbers and sort the documents by this number. I
> > need to get the lower if there are more than 1 number .
> >
> > For example:
> >
> > doc1 "val2 aaaa val3" --> 2, 3 --> 2
> > doc2 "val5 aaaa val1" --> 5, 1 --> 1
> > doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
> >
> > the sorted results have to be:
> >
> > doc2
> > doc1
> > doc3
> >
> > how can I achieve this?
> >
> > I have trouble migrating a functional solution from lucene 2.4 to lucene
> > 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
> >
> > I try this:
> >
> >     public void setNextReader(IndexReader reader, int docBase) throws
> > IOException {
> >
> >       currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
> > FieldCache.IntParser() {
> >       public final int parseInt(final String val) {
> >      return extractNumber(val);
> >       }
> > });
> >
> > and the rest equal to the IntComparator.
> > but this is not working
> >
> > Anybody has an idea of how resolve this problem?
> > Thanks,
> >
> > Víctor Podberezski
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Help with a fieldcomparator!

Posted by Erick Erickson <er...@gmail.com>.
Personally I would do this on the ingestion side with a new field.
That is, analyze the input field when you were indexing the doc,
extract the min value from any numbers, and put that in a
new field. Then it's simply sorting by the new field. This is likely
to be much more performant than reprocessing this at query
time in a comparator.

FWIW,
Erick

On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
<vp...@cms-medios.com> wrote:
> I need a hand with a custom comparator.
>
> I have a field filled with words separated by spaces. Some words has
> numbers inside.
>
>  I need to extract those numbers and sort the documents by this number. I
> need to get the lower if there are more than 1 number .
>
> For example:
>
> doc1 "val2 aaaa val3" --> 2, 3 --> 2
> doc2 "val5 aaaa val1" --> 5, 1 --> 1
> doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
>
> the sorted results have to be:
>
> doc2
> doc1
> doc3
>
> how can I achieve this?
>
> I have trouble migrating a functional solution from lucene 2.4 to lucene
> 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
>
> I try this:
>
>     public void setNextReader(IndexReader reader, int docBase) throws
> IOException {
>
>       currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
> FieldCache.IntParser() {
>       public final int parseInt(final String val) {
>      return extractNumber(val);
>       }
> });
>
> and the rest equal to the IntComparator.
> but this is not working
>
> Anybody has an idea of how resolve this problem?
> Thanks,
>
> Víctor Podberezski

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org