You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dragan Jotanovic <dr...@diosphere.com> on 2010/05/18 18:01:34 UTC

How to achive this kind of document ordering

|Hi, I need to sort results by two fields. First one is numeric and sorting should be in ascending order.
Second one should be ordered in a "levels" structure.
Here is the example:

Unsorted:
DocId      SortFieldA          SortFieldB
   1        101A
   2        102        B
   3        102        A   |
|   4        101        C
||   5        102        B|
|   6        101        A   |
|   7        101        B|
|   8        101        C|
|   9        101        B|
|  10        101        A|



Sorted:
|DocId      SortFieldA          SortFieldB
   1        101A   |
|   7        101        B|
|   4        101        C|
|   6        101        A|
|   9        101        B|
|   8        101        C|
|  10        101        A|
|   ||3        102        A|||||
|   2        102        B||
||   5        102        B||||

First, all results are ordered by SortFieldA in ascending order. Than by SortFieldB so that all documents with the same SortFieldA value are ordered in "levels" structure. Each level consists of documents with distinct SortFieldB values.
So, the requirenment is to show documents from first level first, than second level, and so on.
It will not be possible to order documents while indexing, so I will need search time ordering.
Is this achievable with lucene? What would be the best approach to solve this without huge performance impact on multimillion documents index?
|


Re: How to achive this kind of document ordering

Posted by Dragan Jotanovic <dr...@diosphere.com>.
Thanks Frank,
the idea of preparing set of structured lists is what I initially 
thought I will have to do, but I'm afraid there will be serious 
performance penalty, because I would have to traverse the documents 
until I find all distinct values of SortFieldB. But I guess there is no 
other way.
Could you point me to the source code of oal.search.TopFieldCollector so 
I could take a look?

Dragan


On 5/19/2010 9:53 PM, Frank Wesemann wrote:
> Hi Dragan,
>    
>>
>> First, all results are ordered by SortFieldA in ascending order. Than
>> by SortFieldB so that all documents with the same SortFieldA value are
>> ordered in "levels" structure. Each level consists of documents with
>> distinct SortFieldB values.
>> So, the requirenment is to show documents from first level first, than
>> second level, and so on.
>> It will not be possible to order documents while indexing, so I will
>> need search time ordering.
>> Is this achievable with lucene? What would be the best approach to
>> solve this without huge performance impact on multimillion documents
>> index?
>>      
> You might want to have a look at oal.search.TopFieldCollector and the
> accompanied Comparator Classes
> For a similar approach ( I had to sort by the quantity on a certain
> Field ) I borrowed a lot of the idea of these classes.
>   From the basic principles therein I derived DocumentComparators, which
> compare not only FieldValues but whole documents.
> The outcome is not one sorted list, but a structure of lists which I
> combine in nested
> while( list<SortFieldA>.hasNext() ) {
>      overallResult.add ( listOfSortFieldA.get( listOfSortFieldB).next() )
> }
> loops.
>
> Performance is ofcourse affected by numbers of documents you want to return.
>
> --
> mit freundlichem Gruß,
>
> Frank Wesemann
> Fotofinder GmbH         USt-IdNr. DE812854514
> Software Entwicklung    Web: http://www.fotofinder.com/
> Potsdamer Str. 96       Tel: +49 30 25 79 28 90
> 10785 Berlin            Fax: +49 30 25 79 28 999
>
> Sitz: Berlin
> Amtsgericht Berlin Charlottenburg (HRB 73099)
> Geschäftsführer: Ali Paczensky
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to achive this kind of document ordering

Posted by Frank Wesemann <f....@fotofinder.net>.
Hi Dragan,
>
>
> First, all results are ordered by SortFieldA in ascending order. Than 
> by SortFieldB so that all documents with the same SortFieldA value are 
> ordered in "levels" structure. Each level consists of documents with 
> distinct SortFieldB values.
> So, the requirenment is to show documents from first level first, than 
> second level, and so on.
> It will not be possible to order documents while indexing, so I will 
> need search time ordering.
> Is this achievable with lucene? What would be the best approach to 
> solve this without huge performance impact on multimillion documents 
> index?
You might want to have a look at oal.search.TopFieldCollector and the 
accompanied Comparator Classes
For a similar approach ( I had to sort by the quantity on a certain 
Field ) I borrowed a lot of the idea of these classes.
 From the basic principles therein I derived DocumentComparators, which 
compare not only FieldValues but whole documents.
The outcome is not one sorted list, but a structure of lists which I 
combine in nested
while( list<SortFieldA>.hasNext() ) {
    overallResult.add ( listOfSortFieldA.get( listOfSortFieldB).next() )
}
loops.

Performance is ofcourse affected by numbers of documents you want to return.

-- 
mit freundlichem Gruß,

Frank Wesemann
Fotofinder GmbH         USt-IdNr. DE812854514
Software Entwicklung    Web: http://www.fotofinder.com/
Potsdamer Str. 96       Tel: +49 30 25 79 28 90
10785 Berlin            Fax: +49 30 25 79 28 999

Sitz: Berlin
Amtsgericht Berlin Charlottenburg (HRB 73099)
Geschäftsführer: Ali Paczensky




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to achive this kind of document ordering

Posted by Dragan Jotanovic <dr...@diosphere.com>.
Thanks for trying to help, but that is not the answer I was looking for.
The requirenment is a bit more specific. When sorting by second field, I 
don't want to sort it as A,A,A,B,B,C,C but A,B,C,A,B,C,A.
I'm not sure even if this is possible using the Sort API. Maybe custom 
scoring or document boosting or some other approach would be more suitable?


On 5/19/2010 1:12 AM, Erick Erickson wrote:
> I just skimmed your message, but Lucene provides
> for multiple sorts. You can construct a Sort object
> from an arbitrary number of fields, and any documents
> that all sort equally for fields 1..k will be resolved
> by considering field k+1.
>
> The performance impact when searching is mostly
> upon the very first sort when the caches are filled.
> If you have a HUGE number of unique values for
> a sort field, that may have heavy memory demands.
>
> As I said, I just skimmed your message, so if I'm off
> base let me know...
>
> HTH
> Erick
>
> On Tue, May 18, 2010 at 12:01 PM, Dragan Jotanovic<
> dragan.jotanovic@diosphere.com>  wrote:
>
>    
>> |Hi, I need to sort results by two fields. First one is numeric and sorting
>> should be in ascending order.
>> Second one should be ordered in a "levels" structure.
>> Here is the example:
>>
>> Unsorted:
>> DocId      SortFieldA          SortFieldB
>>   1        101A
>>   2        102        B
>>   3        102        A   |
>> |   4        101        C
>> ||   5        102        B|
>> |   6        101        A   |
>> |   7        101        B|
>> |   8        101        C|
>> |   9        101        B|
>> |  10        101        A|
>>
>>
>>
>> Sorted:
>> |DocId      SortFieldA          SortFieldB
>>   1        101A   |
>> |   7        101        B|
>> |   4        101        C|
>> |   6        101        A|
>> |   9        101        B|
>> |   8        101        C|
>> |  10        101        A|
>> |   ||3        102        A|||||
>> |   2        102        B||
>> ||   5        102        B||||
>>
>> First, all results are ordered by SortFieldA in ascending order. Than by
>> SortFieldB so that all documents with the same SortFieldA value are ordered
>> in "levels" structure. Each level consists of documents with distinct
>> SortFieldB values.
>> So, the requirenment is to show documents from first level first, than
>> second level, and so on.
>> It will not be possible to order documents while indexing, so I will need
>> search time ordering.
>> Is this achievable with lucene? What would be the best approach to solve
>> this without huge performance impact on multimillion documents index?
>> |
>>
>>
>>      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: How to achive this kind of document ordering

Posted by Erick Erickson <er...@gmail.com>.
I just skimmed your message, but Lucene provides
for multiple sorts. You can construct a Sort object
from an arbitrary number of fields, and any documents
that all sort equally for fields 1..k will be resolved
by considering field k+1.

The performance impact when searching is mostly
upon the very first sort when the caches are filled.
If you have a HUGE number of unique values for
a sort field, that may have heavy memory demands.

As I said, I just skimmed your message, so if I'm off
base let me know...

HTH
Erick

On Tue, May 18, 2010 at 12:01 PM, Dragan Jotanovic <
dragan.jotanovic@diosphere.com> wrote:

> |Hi, I need to sort results by two fields. First one is numeric and sorting
> should be in ascending order.
> Second one should be ordered in a "levels" structure.
> Here is the example:
>
> Unsorted:
> DocId      SortFieldA          SortFieldB
>  1        101A
>  2        102        B
>  3        102        A   |
> |   4        101        C
> ||   5        102        B|
> |   6        101        A   |
> |   7        101        B|
> |   8        101        C|
> |   9        101        B|
> |  10        101        A|
>
>
>
> Sorted:
> |DocId      SortFieldA          SortFieldB
>  1        101A   |
> |   7        101        B|
> |   4        101        C|
> |   6        101        A|
> |   9        101        B|
> |   8        101        C|
> |  10        101        A|
> |   ||3        102        A|||||
> |   2        102        B||
> ||   5        102        B||||
>
> First, all results are ordered by SortFieldA in ascending order. Than by
> SortFieldB so that all documents with the same SortFieldA value are ordered
> in "levels" structure. Each level consists of documents with distinct
> SortFieldB values.
> So, the requirenment is to show documents from first level first, than
> second level, and so on.
> It will not be possible to order documents while indexing, so I will need
> search time ordering.
> Is this achievable with lucene? What would be the best approach to solve
> this without huge performance impact on multimillion documents index?
> |
>
>