You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Uwe Reh <re...@hebis.uni-frankfurt.de> on 2012/08/30 07:57:43 UTC

Sorting on mutivalued fields still impossible?

Hi,
just to be sure.

There is still no way to sort by multivalued fields?
"...&sort=max(datefield) desc&...."

There is no smarter option, than creating additional singelevalued 
fields just for sorting?
"eg. datafield_max and datefield_min"

Uwe

Re: Sorting on mutivalued fields still impossible?

Posted by Chris Hostetter <ho...@fucit.org>.
: My question is, why do i need two redundant fields to sort a multivalued field
: ('date_max' and 'date_min' for 'date')
: For me it's just a waste of space, poisoning the fieldcache.

how does two fields "poion the fieldcache" ? ... if there was a function 
that could find the "min" or "max" value of a multi-valued field, it would 
need to construct an UInvertedField of all N of the field values of each 
doc in order to find the min/max at query time -- by pre-computing a 
min_field and max_field at indexing time you only need FieldCache's for 
those 2 fields (where 2 <= N, and N may be very big)

Generall speaking: most solr use cases are willing to pay a slightly 
higher indexing "cost" (time/cpu) to have faster searches -- which answers 
your earlier question...

>> Now four month's later i still wounder, why there is no pluginable 
>> function to map multivalued fields into a single value.

...because no one has written/contributed these functions (because most 
people would rather pay that cost at indexing time)



-Hoss

Re: Sorting on mutivalued fields still impossible?

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi,

like I just wrote in my reply to the similar suggestion form Jack.
I'm not looking for a way to preprocess my data.

My question is, why do i need two redundant fields to sort a multivalued 
field ('date_max' and 'date_min' for 'date')
For me it's just a waste of space, poisoning the fieldcache.

There is also an other class of problems, where a filterfunction like 
'mapMultipleToOne' may helpful. In the thread 'theory of sets' (this 
list) I described a hack with the function strdist, an own class and the 
mapping of a multiple values as a cvs list in a single value field.

Uwe




Am 07.01.2013 14:54, schrieb Alexandre Rafalovitch:
> If the Multiple-to-one mapping would be stable (e.g. independent of a
> query), why not implement it as a custom update.chain processor with a copy
> to a separate field? There is already a couple of implementations
> under FieldValueMutatingUpdateProcessor (first, last, max, min).
>
> Regards,
>     Alex.
>


Re: Sorting on mutivalued fields still impossible?

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
If the Multiple-to-one mapping would be stable (e.g. independent of a
query), why not implement it as a custom update.chain processor with a copy
to a separate field? There is already a couple of implementations
under FieldValueMutatingUpdateProcessor (first, last, max, min).

Regards,
   Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Mon, Jan 7, 2013 at 8:19 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:

> Am 31.08.2012 13:35, schrieb Erick Erickson:
>
>> ... what would the correct behavior
>>
>> be for "sorting on a multivalued field"
>>
>
> Hi Erick,
>
> in generally you are right, the question of multivalued fields is which
> value the reference is. But there are thousands of cases where this
> question is implicit answered. See my example "...&sort=max(datefield)
> desc&...." It is obvious, that the newest date should win. I see no reason
> why simple filters like max can't handle multivalued fields.
>
> Now four month's later i still wounder, why there is no pluginable
> function to map multivalued fields into a single value.
> eg. "...&sort=sqrt(**mapMultipleToOne(FQN, fieldname)) asc&..."
>
> Uwe
> (Sorry late reaction)
>
>
>

Re: Sorting on mutivalued fields still impossible?

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Am 31.08.2012 13:35, schrieb Erick Erickson:
> ... what would the correct behavior
> be for "sorting on a multivalued field"

Hi Erick,

in generally you are right, the question of multivalued fields is which 
value the reference is. But there are thousands of cases where this 
question is implicit answered. See my example "...&sort=max(datefield) 
desc&...." It is obvious, that the newest date should win. I see no 
reason why simple filters like max can't handle multivalued fields.

Now four month's later i still wounder, why there is no pluginable 
function to map multivalued fields into a single value.
eg. "...&sort=sqrt(mapMultipleToOne(FQN, fieldname)) asc&..."

Uwe
(Sorry late reaction)



Re: Sorting on mutivalued fields still impossible?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2012-09-07 at 06:55 +0200, Erick Erickson wrote:
> I may prefer the first, and you may prefer the second. Neither is
> necessarily more "correct" IMO, it depends on the problem
> space. Choosing either one will be unpopular with anyone
> who likes the other....

Sorry, I did not make myself clear: If we decide that there is only a
few "obvious" (that's a loaded word. Maybe "common"?) solutions, my idea
was to implement them all. Especially if they can be reduced to the same
underlying algorithm with a few tweaks for each case.

> And I suspect that 99 times out of 100, someone wanting to sort on
> fields with multiple tokens hasn't thought the problem through
> carefully.

That might very well be the case. I must admit that I have mostly seen
the issue as "User asks for X, how do we implement X?", instead of "User
asks for X, would user be better off with Y?".

> And duplicate entries in the result set gets ugly. Say a user sorts
> on a field containing 10,000 tokens. Now one doc is repeated
> 10,000 times in the result set. How many docs are set for
> numFound? Faceting? Grouping?

I don't see the difference between 2 and 10,000 tokens for this, but I
concede that there is no clear answer and that choosing by setup would
require the user to have a fairly deep understanding.

I accept that there is no clear need for the functionality at this point
in time and defer hacking on it.

Thank you for your input,
Toke Eskildsen


Re: Sorting on mutivalued fields still impossible?

Posted by Erick Erickson <er...@gmail.com>.
And you've illustrated my viewpoint I think by saying
"two obvious choices".

I may prefer the first, and you may prefer the second. Neither is
necessarily more "correct" IMO, it depends on the problem
space. Choosing either one will be unpopular with anyone
who likes the other....

And I suspect that 99 times out of 100, someone wanting to sort on
fields with multiple tokens hasn't thought the problem through
carefully. So I favor forcing the person with the use-case where this
is actually _desired_ behavior to work to implement rather than
have to deal with "surprising" orderings.

And duplicate entries in the result set gets ugly. Say a user sorts
on a field containing 10,000 tokens. Now one doc is repeated
10,000 times in the result set. How many docs are set for
numFound? Faceting? Grouping?

I think your first option is at least easy to explain, but I don't see
it as compelling enough to put the work into it, although I confess
I don't know the guts of how much work it would take to find the
first (and last, don't forget specifying desc) token for each doc....

Anyway, that's my story and I'm sticking to it <G>...

Best
Erick

On Wed, Sep 5, 2012 at 12:54 AM, Toke Eskildsen <te...@statsbiblioteket.dk> wrote:
> On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote:
>> Imagine you have two entries, aardvark and emu in your
>> multiValued field. How should that document sort relative to
>> another doc with camel and zebra? Any heuristic
>> you apply will be wrong for someone else....
>
> I see two obvious choices here:
>
> 1) Sort by the value that is ordered first by the comparator function.
> Doc1: aardvark, (emu)
> Doc2: camel, (zebra)
> This is what Uwe wants to do and it is normally done by preprocessing
> and collapsing to a single value.
> It could be implemented with an ordered multi-valued field cache by
> comparing on the first (or last, in the case of reverse sort) entry for
> each matching document.
>
> 2) Make duplicate entries in the result set, one for each value.
> Doc1: aardvark, (emu)
> Doc2: camel, (zebra)
> Doc1: (aardvark), emu
> Doc2: (camel), zebra
> I have a hard time coming up with a real world use case for this.
> It could be implemented by using a multi-valued field cache as above and
> putting the same document ID into the sliding window sorter once for
> each field value.
>
> Collapsing this into a single algorithm:
> Step through all IDs. For each ID, give access to the list of field
> values and provide a callback for adding one or more (value, ID)-pairs
> to the sliding windows sorter.
>
>
> Are there some other realistic heuristics that I have missed?
>

Re: Sorting on mutivalued fields still impossible?

Posted by Toke Eskildsen <te...@statsbiblioteket.dk>.
On Fri, 2012-08-31 at 13:35 +0200, Erick Erickson wrote:
> Imagine you have two entries, aardvark and emu in your
> multiValued field. How should that document sort relative to
> another doc with camel and zebra? Any heuristic
> you apply will be wrong for someone else....

I see two obvious choices here:

1) Sort by the value that is ordered first by the comparator function.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
This is what Uwe wants to do and it is normally done by preprocessing
and collapsing to a single value.
It could be implemented with an ordered multi-valued field cache by
comparing on the first (or last, in the case of reverse sort) entry for
each matching document.

2) Make duplicate entries in the result set, one for each value.
Doc1: aardvark, (emu)
Doc2: camel, (zebra)
Doc1: (aardvark), emu
Doc2: (camel), zebra
I have a hard time coming up with a real world use case for this.
It could be implemented by using a multi-valued field cache as above and
putting the same document ID into the sliding window sorter once for
each field value.

Collapsing this into a single algorithm:
Step through all IDs. For each ID, give access to the list of field
values and provide a callback for adding one or more (value, ID)-pairs
to the sliding windows sorter. 


Are there some other realistic heuristics that I have missed?


Re: Sorting on mutivalued fields still impossible?

Posted by Erick Erickson <er...@gmail.com>.
In addition to Jack's comment, what would the correct behavior
be for "sorting on a multivalued field"? The reason this is disallowed
is because there is no correct behavior in the general case.

Imagine you have two entries, aardvark and emu in your
multiValued field. How should that document sort relative to
another doc with camel and zebra? Any heuristic
you apply will be wrong for someone else....

Best
Erick


On Thu, Aug 30, 2012 at 1:57 AM, Uwe Reh <re...@hebis.uni-frankfurt.de> wrote:
> Hi,
> just to be sure.
>
> There is still no way to sort by multivalued fields?
> "...&sort=max(datefield) desc&...."
>
> There is no smarter option, than creating additional singelevalued fields
> just for sorting?
> "eg. datafield_max and datefield_min"
>
> Uwe

Re: Sorting on mutivalued fields still impossible?

Posted by Uwe Reh <re...@hebis.uni-frankfurt.de>.
Hi Jack,

thank you for the hint.
Since I have already a solrj client to do the preprocessing, mapping to 
sort fields isn't my problem. I will try to explain better in my reply 
to Erick.

Uwe
(Sorry late reaction)


Am 30.08.2012 16:04, schrieb Jack Krupansky:
> You can also use a "Field Mutating Update Processor" to do a "smart"
> copy of a multi-valued field to a sortable single-valued field.
>
> See:
> http://wiki.apache.org/solr/UpdateRequestProcessor#Field_Mutating_Update_Processors
>
>
> Such as using the maximum value via MaxFieldValueUpdateProcessorFactory.
>
> See:
> http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/update/processor/MaxFieldValueUpdateProcessorFactory.html
>
>
> Which value of a multi-valued field do you wish to sort by?
>
> -- Jack Krupansky


Re: Sorting on mutivalued fields still impossible?

Posted by Jack Krupansky <ja...@basetechnology.com>.
You can also use a "Field Mutating Update Processor" to do a "smart" copy of 
a multi-valued field to a sortable single-valued field.

See:
http://wiki.apache.org/solr/UpdateRequestProcessor#Field_Mutating_Update_Processors

Such as using the maximum value via MaxFieldValueUpdateProcessorFactory.

See:
http://lucene.apache.org/solr/api-4_0_0-BETA/org/apache/solr/update/processor/MaxFieldValueUpdateProcessorFactory.html

Which value of a multi-valued field do you wish to sort by?

-- Jack Krupansky

-----Original Message----- 
From: Uwe Reh
Sent: Thursday, August 30, 2012 1:57 AM
To: solr-user@lucene.apache.org
Subject: Sorting on mutivalued fields still impossible?

Hi,
just to be sure.

There is still no way to sort by multivalued fields?
"...&sort=max(datefield) desc&...."

There is no smarter option, than creating additional singelevalued
fields just for sorting?
"eg. datafield_max and datefield_min"

Uwe