You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mike Sokolov <so...@ifactory.com> on 2011/08/01 15:47:12 UTC

ideas for versioning query?

A customer has an interesting problem: some documents will have multiple 
versions. In search results, only the most recent version of a given 
document should be shown. The trick is that each user has access to a 
different set of document versions, and each user should see only the 
most recent version of a document that they have access to.

Is this something that can reasonably be solved with grouping?  In 3.x? 
I haven't followed the grouping discussions closely: would someone point 
me in the right direction please?

-- 
Michael Sokolov
Engineering Director
www.ifactory.com


Re: ideas for versioning query?

Posted by Mike Sokolov <so...@ifactory.com>.
I think a 30% increase is acceptable. Yes, I think we'll try it.  
Although our case is more like # groups ~  # documents / N, where N is a 
smallish number (~1-5?).  We are planning for a variety of different 
index sizes, but aiming for a sweet spot around a few M docs.

-Mike

On 08/01/2011 11:00 AM, Martijn v Groningen wrote:
> Hi Mike, how many docs and groups do you have in your index?
> I think the group.sort option fits your requirements.
>
> If I remember correctly group.ngroup=true adds something like 30% extra time
> on top of the search request with grouping,
> but that was on my local test dataset (~30M docs, ~8000 groups)  and my
> machine. You might encounter different search times when setting
> group.ngroup=true.
>
> Martijn
>
> 2011/8/1 Mike Sokolov<so...@ifactory.com>
>
>    
>> Thanks, Tomas.  Yes we are planning to keep a "current" flag in the most
>> current document.  But there are cases where, for a given user, the most
>> current document is not that one, because they only have access to some
>> older documents.
>>
>> I took a look at http://wiki.apache.org/solr/**FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing>and it seems as if it will do what we need here.  My one concern is that it
>> might not be efficient at computing group.ngroups for a very large number of
>> groups, which we would ideally want.  Is that something I should be worried
>> about?
>>
>> -Mike
>>
>>
>> On 08/01/2011 10:08 AM, Tomás Fernández Löbbe wrote:
>>
>>      
>>> Hi Michael, I guess this could be solved using grouping as you said.
>>> Documents inside a group can be sorted on a field (in your case, the
>>> version
>>> field, see parameter group.sort), and you can show only the first one. It
>>> will be more complex to show facets (post grouping faceting is work in
>>> progress but still not committed to the trunk).
>>>
>>> I would be easier from the Solr side if you could do something at index
>>> time, like indicating which document is the "current" one and which one is
>>> an old one (you would need to update the old document whenever a new
>>> version
>>> is indexed).
>>>
>>> Regards,
>>>
>>> Tomás
>>>
>>> On Mon, Aug 1, 2011 at 10:47 AM, Mike Sokolov<so...@ifactory.com>
>>>   wrote:
>>>
>>>
>>>
>>>        
>>>> A customer has an interesting problem: some documents will have multiple
>>>> versions. In search results, only the most recent version of a given
>>>> document should be shown. The trick is that each user has access to a
>>>> different set of document versions, and each user should see only the
>>>> most
>>>> recent version of a document that they have access to.
>>>>
>>>> Is this something that can reasonably be solved with grouping?  In 3.x? I
>>>> haven't followed the grouping discussions closely: would someone point me
>>>> in
>>>> the right direction please?
>>>>
>>>> --
>>>> Michael Sokolov
>>>> Engineering Director
>>>> www.ifactory.com
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>
>>>        
>>      
>
>    

Re: ideas for versioning query?

Posted by Martijn v Groningen <ma...@gmail.com>.
Hi Mike, how many docs and groups do you have in your index?
I think the group.sort option fits your requirements.

If I remember correctly group.ngroup=true adds something like 30% extra time
on top of the search request with grouping,
but that was on my local test dataset (~30M docs, ~8000 groups)  and my
machine. You might encounter different search times when setting
group.ngroup=true.

Martijn

2011/8/1 Mike Sokolov <so...@ifactory.com>

> Thanks, Tomas.  Yes we are planning to keep a "current" flag in the most
> current document.  But there are cases where, for a given user, the most
> current document is not that one, because they only have access to some
> older documents.
>
> I took a look at http://wiki.apache.org/solr/**FieldCollapsing<http://wiki.apache.org/solr/FieldCollapsing>and it seems as if it will do what we need here.  My one concern is that it
> might not be efficient at computing group.ngroups for a very large number of
> groups, which we would ideally want.  Is that something I should be worried
> about?
>
> -Mike
>
>
> On 08/01/2011 10:08 AM, Tomás Fernández Löbbe wrote:
>
>> Hi Michael, I guess this could be solved using grouping as you said.
>> Documents inside a group can be sorted on a field (in your case, the
>> version
>> field, see parameter group.sort), and you can show only the first one. It
>> will be more complex to show facets (post grouping faceting is work in
>> progress but still not committed to the trunk).
>>
>> I would be easier from the Solr side if you could do something at index
>> time, like indicating which document is the "current" one and which one is
>> an old one (you would need to update the old document whenever a new
>> version
>> is indexed).
>>
>> Regards,
>>
>> Tomás
>>
>> On Mon, Aug 1, 2011 at 10:47 AM, Mike Sokolov<so...@ifactory.com>
>>  wrote:
>>
>>
>>
>>> A customer has an interesting problem: some documents will have multiple
>>> versions. In search results, only the most recent version of a given
>>> document should be shown. The trick is that each user has access to a
>>> different set of document versions, and each user should see only the
>>> most
>>> recent version of a document that they have access to.
>>>
>>> Is this something that can reasonably be solved with grouping?  In 3.x? I
>>> haven't followed the grouping discussions closely: would someone point me
>>> in
>>> the right direction please?
>>>
>>> --
>>> Michael Sokolov
>>> Engineering Director
>>> www.ifactory.com
>>>
>>>
>>>
>>>
>>
>>
>


-- 
Met vriendelijke groet,

Martijn van Groningen

Re: ideas for versioning query?

Posted by Mike Sokolov <so...@ifactory.com>.
Thanks, Tomas.  Yes we are planning to keep a "current" flag in the most 
current document.  But there are cases where, for a given user, the most 
current document is not that one, because they only have access to some 
older documents.

I took a look at http://wiki.apache.org/solr/FieldCollapsing and it 
seems as if it will do what we need here.  My one concern is that it 
might not be efficient at computing group.ngroups for a very large 
number of groups, which we would ideally want.  Is that something I 
should be worried about?

-Mike

On 08/01/2011 10:08 AM, Tomás Fernández Löbbe wrote:
> Hi Michael, I guess this could be solved using grouping as you said.
> Documents inside a group can be sorted on a field (in your case, the version
> field, see parameter group.sort), and you can show only the first one. It
> will be more complex to show facets (post grouping faceting is work in
> progress but still not committed to the trunk).
>
> I would be easier from the Solr side if you could do something at index
> time, like indicating which document is the "current" one and which one is
> an old one (you would need to update the old document whenever a new version
> is indexed).
>
> Regards,
>
> Tomás
>
> On Mon, Aug 1, 2011 at 10:47 AM, Mike Sokolov<so...@ifactory.com>  wrote:
>
>    
>> A customer has an interesting problem: some documents will have multiple
>> versions. In search results, only the most recent version of a given
>> document should be shown. The trick is that each user has access to a
>> different set of document versions, and each user should see only the most
>> recent version of a document that they have access to.
>>
>> Is this something that can reasonably be solved with grouping?  In 3.x? I
>> haven't followed the grouping discussions closely: would someone point me in
>> the right direction please?
>>
>> --
>> Michael Sokolov
>> Engineering Director
>> www.ifactory.com
>>
>>
>>      
>    

Re: ideas for versioning query?

Posted by Tomás Fernández Löbbe <to...@gmail.com>.
Hi Michael, I guess this could be solved using grouping as you said.
Documents inside a group can be sorted on a field (in your case, the version
field, see parameter group.sort), and you can show only the first one. It
will be more complex to show facets (post grouping faceting is work in
progress but still not committed to the trunk).

I would be easier from the Solr side if you could do something at index
time, like indicating which document is the "current" one and which one is
an old one (you would need to update the old document whenever a new version
is indexed).

Regards,

Tomás

On Mon, Aug 1, 2011 at 10:47 AM, Mike Sokolov <so...@ifactory.com> wrote:

> A customer has an interesting problem: some documents will have multiple
> versions. In search results, only the most recent version of a given
> document should be shown. The trick is that each user has access to a
> different set of document versions, and each user should see only the most
> recent version of a document that they have access to.
>
> Is this something that can reasonably be solved with grouping?  In 3.x? I
> haven't followed the grouping discussions closely: would someone point me in
> the right direction please?
>
> --
> Michael Sokolov
> Engineering Director
> www.ifactory.com
>
>