You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Adrian Sutton <ad...@symphonious.net> on 2007/09/13 05:35:03 UTC

Searching Versioned Resources

Hi all,
The document's we're indexing are versioned and generally we only  
want search results to return the latest version of a document,  
however there's a couple of scenarios where I'd like to be able to  
include previous versions in the search result.

It feels like a straight-forward case of a filter, but given that  
each document has independent version numbers it's hard to know what  
to filter on. The only solution I can think of at the moment is to  
index each new version twice - once with the version and once with  
version=latest. We'd then tweak the ID field in such a way that there  
is only one version of each document with version=latest. It's then  
simple to use a filter for version=latest when we search.

Is there a better way? Is there a way to achieve this without having  
to index the document twice?

Thanks in advance,

Adrian Sutton
http://www.symphonious.net




Re: Searching Versioned Resources

Posted by Adrian Sutton <ad...@symphonious.net>.
On 13/09/2007, at 2:36 PM, Adrian Sutton wrote:

>> I think you can use the CollapseFilter to collapse on "version"  
>> field.
>> However, I think you need to modify the CollapseFilter code to  
>> sort by
>> "version" and get the latest version returned.
>
> Ooo, that's very cool. I assume the patches haven't actually been  
> applied yet? This would let me just collapse on the name field  and  
> if I could get it to sort by modification date before collapsing  
> it'd be perfect.
>
> I have a feeling I'm going to wind up extremely lost, but I'll  
> delve into the patch and see what I can find.

For the benefit of the archives (and anyone else following along), it  
looks like the current version of the patch in JIRA will actually do  
what I need. The collapse filter will return the first N documents it  
iterates over and collapse the rest, but before iterating it will  
sort the documents by the sort parameters you specify. So to get the  
latest version I simply set sort=version,desc.

It seems to work well, though the patch needs to be updated again to  
work with HEAD, it's not too hard to resolve the differences.

Regards,

Adrian Sutton.
http://www.symphonious.net/

Re: Searching Versioned Resources

Posted by Adrian Sutton <ad...@symphonious.net>.
> I think you can use the CollapseFilter to collapse on "version" field.
> However, I think you need to modify the CollapseFilter code to sort by
> "version" and get the latest version returned.

Ooo, that's very cool. I assume the patches haven't actually been  
applied yet? This would let me just collapse on the name field  and  
if I could get it to sort by modification date before collapsing it'd  
be perfect.

I have a feeling I'm going to wind up extremely lost, but I'll delve  
into the patch and see what I can find.

Thanks,

Adrian Sutton
http://www.symphonious.net

Re: Searching Versioned Resources

Posted by climbingrose <cl...@gmail.com>.
I think you can use the CollapseFilter to collapse on "version" field.
However, I think you need to modify the CollapseFilter code to sort by
"version" and get the latest version returned.

On 9/13/07, Adrian Sutton <ad...@symphonious.net> wrote:
>
> Hi all,
> The document's we're indexing are versioned and generally we only
> want search results to return the latest version of a document,
> however there's a couple of scenarios where I'd like to be able to
> include previous versions in the search result.
>
> It feels like a straight-forward case of a filter, but given that
> each document has independent version numbers it's hard to know what
> to filter on. The only solution I can think of at the moment is to
> index each new version twice - once with the version and once with
> version=latest. We'd then tweak the ID field in such a way that there
> is only one version of each document with version=latest. It's then
> simple to use a filter for version=latest when we search.
>
> Is there a better way? Is there a way to achieve this without having
> to index the document twice?
>
> Thanks in advance,
>
> Adrian Sutton
> http://www.symphonious.net
>
>
>
>


-- 
Regards,

Cuong Hoang