You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Victor Podberezski <vp...@cms-medios.com> on 2015/01/17 01:00:10 UTC
Help with a fieldcomparator!
I need a hand with a custom comparator.
I have a field filled with words separated by spaces. Some words has
numbers inside.
I need to extract those numbers and sort the documents by this number. I
need to get the lower if there are more than 1 number .
For example:
doc1 "val2 aaaa val3" --> 2, 3 --> 2
doc2 "val5 aaaa val1" --> 5, 1 --> 1
doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
the sorted results have to be:
doc2
doc1
doc3
how can I achieve this?
I have trouble migrating a functional solution from lucene 2.4 to lucene
3.9 or higher (migration from ScoreDocComparator to fieldComparator).
I try this:
public void setNextReader(IndexReader reader, int docBase) throws
IOException {
currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
FieldCache.IntParser() {
public final int parseInt(final String val) {
return extractNumber(val);
}
});
and the rest equal to the IntComparator.
but this is not working
Anybody has an idea of how resolve this problem?
Thanks,
Víctor Podberezski
Re: Help with a fieldcomparator!
Posted by Victor Podberezski <vp...@cms-medios.com>.
Erick:
(sorry, I misspelled your name in my last email )
I tried a bunch of solutions.... none worked as I expected.
Basically none of them sorts the documents using the pattern as I expect.
This is my simplified code:
public class PatternFieldComparatorSource
extends FieldComparatorSource {
private String pattern;
private boolean ascending = false;
public PatternFieldComparatorSource(String pattern, boolean ascending){
this.ascending = ascending;
this.pattern = pattern;
}
public FieldComparator newComparator(String fieldname, int numHits, int
sortPos, boolean reversed) throws IOException {
return new PatternFieldComparator(numHits, fieldname);
}
class PatternFieldComparator extends FieldComparator {
private final int[] values;
private int[] currentReaderValues;
private final String field;
private int bottom; // Value of bottom of
queue
HighTrafficFieldComparator(int numHits, String field) {
values = new int[numHits];
this.field = field;
}
public int compare(int slot1, int slot2) {
// TODO: there are sneaky non-branch ways to compute
// -1/+1/0 sign
// Cannot return values[slot1] - values[slot2] because that
// may overflow
final int v1 = values[slot1];
final int v2 = values[slot2];
if (v1 > v2) {
return 1;
} else if (v1 < v2) {
return -1;
} else {
return 0;
}
}
public int compareBottom(int doc) {
// TODO: there are sneaky non-branch ways to compute
// -1/+1/0 sign
// Cannot return bottom - values[slot2] because that
// may overflow
final int v2 = currentReaderValues[doc];
if (bottom > v2) {
return 1;
} else if (bottom < v2) {
return -1;
} else {
return 0;
}
}
public void copy(int slot, int doc) {
values[slot] = currentReaderValues[doc];
}
public void setNextReader(IndexReader reader, int docBase) throws
IOException {
currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
FieldCache.IntParser() {
public final int parseInt(final String val) {
return getValueByPattern(val);
}
});
}
public void setBottom(final int bottom) {
this.bottom = values[bottom];
}
public Comparable value(int slot) {
return values[slot];
}
}
private Integer getValueByPattern(String text) {
// if pattern is not present I return then max or min value possible
(depends if sort is ascending or descending).
int value = !ascending ? Integer.MAX_VALUE : Integer.MIN_VALUE;
// if pattern is pressent...
if (text.contains(pattern)
{
value = Integer.parseInt(...) // extract the value and return
}
return new Integer(value);
}
}
My code does not sort fine. I'm not finding a explanation why.
Thanks
Víctor
On Sat, Jan 17, 2015 at 9:04 PM, Erick Erickson <er...@gmail.com>
wrote:
> Ah, OK. H.L. Mencken wrote something like:
> "For every complex problem there is a solution
> that is simple, elegant, and wrong". I specialize in these...
>
> I don't have a good answer for your question then. How
> is what you're trying failing?
>
> Best,
> Erick
>
> On Fri, Jan 16, 2015 at 4:59 PM, Victor Podberezski
> <vp...@cms-medios.com> wrote:
> > Erik, Thanks for your reply.
> >
> > I wrote a simplification of the problem. Not only the values in the field
> > that can be sorted are "val1, val2,..." . they can also be "patternX1,
> > patternX2", etc.
> >
> > and in that case I need to sort according to different criteria. They're
> a
> > lot of differents patterns but not to much documents as result of the
> query
> > filter
> > For that reason I think the best way is a custom FieldComparator.
> >
> > Thanks
> > Víctor Podberezski
> >
> > On Fri, Jan 16, 2015 at 9:31 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> Personally I would do this on the ingestion side with a new field.
> >> That is, analyze the input field when you were indexing the doc,
> >> extract the min value from any numbers, and put that in a
> >> new field. Then it's simply sorting by the new field. This is likely
> >> to be much more performant than reprocessing this at query
> >> time in a comparator.
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
> >> <vp...@cms-medios.com> wrote:
> >> > I need a hand with a custom comparator.
> >> >
> >> > I have a field filled with words separated by spaces. Some words has
> >> > numbers inside.
> >> >
> >> > I need to extract those numbers and sort the documents by this
> number. I
> >> > need to get the lower if there are more than 1 number .
> >> >
> >> > For example:
> >> >
> >> > doc1 "val2 aaaa val3" --> 2, 3 --> 2
> >> > doc2 "val5 aaaa val1" --> 5, 1 --> 1
> >> > doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
> >> >
> >> > the sorted results have to be:
> >> >
> >> > doc2
> >> > doc1
> >> > doc3
> >> >
> >> > how can I achieve this?
> >> >
> >> > I have trouble migrating a functional solution from lucene 2.4 to
> lucene
> >> > 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
> >> >
> >> > I try this:
> >> >
> >> > public void setNextReader(IndexReader reader, int docBase) throws
> >> > IOException {
> >> >
> >> > currentReaderValues = FieldCache.DEFAULT.getInts(reader, field,
> new
> >> > FieldCache.IntParser() {
> >> > public final int parseInt(final String val) {
> >> > return extractNumber(val);
> >> > }
> >> > });
> >> >
> >> > and the rest equal to the IntComparator.
> >> > but this is not working
> >> >
> >> > Anybody has an idea of how resolve this problem?
> >> > Thanks,
> >> >
> >> > Víctor Podberezski
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Help with a fieldcomparator!
Posted by Erick Erickson <er...@gmail.com>.
Ah, OK. H.L. Mencken wrote something like:
"For every complex problem there is a solution
that is simple, elegant, and wrong". I specialize in these...
I don't have a good answer for your question then. How
is what you're trying failing?
Best,
Erick
On Fri, Jan 16, 2015 at 4:59 PM, Victor Podberezski
<vp...@cms-medios.com> wrote:
> Erik, Thanks for your reply.
>
> I wrote a simplification of the problem. Not only the values in the field
> that can be sorted are "val1, val2,..." . they can also be "patternX1,
> patternX2", etc.
>
> and in that case I need to sort according to different criteria. They're a
> lot of differents patterns but not to much documents as result of the query
> filter
> For that reason I think the best way is a custom FieldComparator.
>
> Thanks
> Víctor Podberezski
>
> On Fri, Jan 16, 2015 at 9:31 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Personally I would do this on the ingestion side with a new field.
>> That is, analyze the input field when you were indexing the doc,
>> extract the min value from any numbers, and put that in a
>> new field. Then it's simply sorting by the new field. This is likely
>> to be much more performant than reprocessing this at query
>> time in a comparator.
>>
>> FWIW,
>> Erick
>>
>> On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
>> <vp...@cms-medios.com> wrote:
>> > I need a hand with a custom comparator.
>> >
>> > I have a field filled with words separated by spaces. Some words has
>> > numbers inside.
>> >
>> > I need to extract those numbers and sort the documents by this number. I
>> > need to get the lower if there are more than 1 number .
>> >
>> > For example:
>> >
>> > doc1 "val2 aaaa val3" --> 2, 3 --> 2
>> > doc2 "val5 aaaa val1" --> 5, 1 --> 1
>> > doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
>> >
>> > the sorted results have to be:
>> >
>> > doc2
>> > doc1
>> > doc3
>> >
>> > how can I achieve this?
>> >
>> > I have trouble migrating a functional solution from lucene 2.4 to lucene
>> > 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
>> >
>> > I try this:
>> >
>> > public void setNextReader(IndexReader reader, int docBase) throws
>> > IOException {
>> >
>> > currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
>> > FieldCache.IntParser() {
>> > public final int parseInt(final String val) {
>> > return extractNumber(val);
>> > }
>> > });
>> >
>> > and the rest equal to the IntComparator.
>> > but this is not working
>> >
>> > Anybody has an idea of how resolve this problem?
>> > Thanks,
>> >
>> > Víctor Podberezski
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Help with a fieldcomparator!
Posted by Victor Podberezski <vp...@cms-medios.com>.
Erik, Thanks for your reply.
I wrote a simplification of the problem. Not only the values in the field
that can be sorted are "val1, val2,..." . they can also be "patternX1,
patternX2", etc.
and in that case I need to sort according to different criteria. They're a
lot of differents patterns but not to much documents as result of the query
filter
For that reason I think the best way is a custom FieldComparator.
Thanks
Víctor Podberezski
On Fri, Jan 16, 2015 at 9:31 PM, Erick Erickson <er...@gmail.com>
wrote:
> Personally I would do this on the ingestion side with a new field.
> That is, analyze the input field when you were indexing the doc,
> extract the min value from any numbers, and put that in a
> new field. Then it's simply sorting by the new field. This is likely
> to be much more performant than reprocessing this at query
> time in a comparator.
>
> FWIW,
> Erick
>
> On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
> <vp...@cms-medios.com> wrote:
> > I need a hand with a custom comparator.
> >
> > I have a field filled with words separated by spaces. Some words has
> > numbers inside.
> >
> > I need to extract those numbers and sort the documents by this number. I
> > need to get the lower if there are more than 1 number .
> >
> > For example:
> >
> > doc1 "val2 aaaa val3" --> 2, 3 --> 2
> > doc2 "val5 aaaa val1" --> 5, 1 --> 1
> > doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
> >
> > the sorted results have to be:
> >
> > doc2
> > doc1
> > doc3
> >
> > how can I achieve this?
> >
> > I have trouble migrating a functional solution from lucene 2.4 to lucene
> > 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
> >
> > I try this:
> >
> > public void setNextReader(IndexReader reader, int docBase) throws
> > IOException {
> >
> > currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
> > FieldCache.IntParser() {
> > public final int parseInt(final String val) {
> > return extractNumber(val);
> > }
> > });
> >
> > and the rest equal to the IntComparator.
> > but this is not working
> >
> > Anybody has an idea of how resolve this problem?
> > Thanks,
> >
> > Víctor Podberezski
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Help with a fieldcomparator!
Posted by Erick Erickson <er...@gmail.com>.
Personally I would do this on the ingestion side with a new field.
That is, analyze the input field when you were indexing the doc,
extract the min value from any numbers, and put that in a
new field. Then it's simply sorting by the new field. This is likely
to be much more performant than reprocessing this at query
time in a comparator.
FWIW,
Erick
On Fri, Jan 16, 2015 at 4:00 PM, Victor Podberezski
<vp...@cms-medios.com> wrote:
> I need a hand with a custom comparator.
>
> I have a field filled with words separated by spaces. Some words has
> numbers inside.
>
> I need to extract those numbers and sort the documents by this number. I
> need to get the lower if there are more than 1 number .
>
> For example:
>
> doc1 "val2 aaaa val3" --> 2, 3 --> 2
> doc2 "val5 aaaa val1" --> 5, 1 --> 1
> doc3 "val7 bbbbb val5" --> 7, 5 ---> 5
>
> the sorted results have to be:
>
> doc2
> doc1
> doc3
>
> how can I achieve this?
>
> I have trouble migrating a functional solution from lucene 2.4 to lucene
> 3.9 or higher (migration from ScoreDocComparator to fieldComparator).
>
> I try this:
>
> public void setNextReader(IndexReader reader, int docBase) throws
> IOException {
>
> currentReaderValues = FieldCache.DEFAULT.getInts(reader, field, new
> FieldCache.IntParser() {
> public final int parseInt(final String val) {
> return extractNumber(val);
> }
> });
>
> and the rest equal to the IntComparator.
> but this is not working
>
> Anybody has an idea of how resolve this problem?
> Thanks,
>
> Víctor Podberezski
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org