You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Adrian Sutton <ad...@symphonious.net> on 2007/09/13 05:35:03 UTC
Searching Versioned Resources
Hi all,
The document's we're indexing are versioned and generally we only
want search results to return the latest version of a document,
however there's a couple of scenarios where I'd like to be able to
include previous versions in the search result.
It feels like a straight-forward case of a filter, but given that
each document has independent version numbers it's hard to know what
to filter on. The only solution I can think of at the moment is to
index each new version twice - once with the version and once with
version=latest. We'd then tweak the ID field in such a way that there
is only one version of each document with version=latest. It's then
simple to use a filter for version=latest when we search.
Is there a better way? Is there a way to achieve this without having
to index the document twice?
Thanks in advance,
Adrian Sutton
http://www.symphonious.net
Re: Searching Versioned Resources
Posted by Adrian Sutton <ad...@symphonious.net>.
On 13/09/2007, at 2:36 PM, Adrian Sutton wrote:
>> I think you can use the CollapseFilter to collapse on "version"
>> field.
>> However, I think you need to modify the CollapseFilter code to
>> sort by
>> "version" and get the latest version returned.
>
> Ooo, that's very cool. I assume the patches haven't actually been
> applied yet? This would let me just collapse on the name field and
> if I could get it to sort by modification date before collapsing
> it'd be perfect.
>
> I have a feeling I'm going to wind up extremely lost, but I'll
> delve into the patch and see what I can find.
For the benefit of the archives (and anyone else following along), it
looks like the current version of the patch in JIRA will actually do
what I need. The collapse filter will return the first N documents it
iterates over and collapse the rest, but before iterating it will
sort the documents by the sort parameters you specify. So to get the
latest version I simply set sort=version,desc.
It seems to work well, though the patch needs to be updated again to
work with HEAD, it's not too hard to resolve the differences.
Regards,
Adrian Sutton.
http://www.symphonious.net/
Re: Searching Versioned Resources
Posted by Adrian Sutton <ad...@symphonious.net>.
> I think you can use the CollapseFilter to collapse on "version" field.
> However, I think you need to modify the CollapseFilter code to sort by
> "version" and get the latest version returned.
Ooo, that's very cool. I assume the patches haven't actually been
applied yet? This would let me just collapse on the name field and
if I could get it to sort by modification date before collapsing it'd
be perfect.
I have a feeling I'm going to wind up extremely lost, but I'll delve
into the patch and see what I can find.
Thanks,
Adrian Sutton
http://www.symphonious.net
Re: Searching Versioned Resources
Posted by climbingrose <cl...@gmail.com>.
I think you can use the CollapseFilter to collapse on "version" field.
However, I think you need to modify the CollapseFilter code to sort by
"version" and get the latest version returned.
On 9/13/07, Adrian Sutton <ad...@symphonious.net> wrote:
>
> Hi all,
> The document's we're indexing are versioned and generally we only
> want search results to return the latest version of a document,
> however there's a couple of scenarios where I'd like to be able to
> include previous versions in the search result.
>
> It feels like a straight-forward case of a filter, but given that
> each document has independent version numbers it's hard to know what
> to filter on. The only solution I can think of at the moment is to
> index each new version twice - once with the version and once with
> version=latest. We'd then tweak the ID field in such a way that there
> is only one version of each document with version=latest. It's then
> simple to use a filter for version=latest when we search.
>
> Is there a better way? Is there a way to achieve this without having
> to index the document twice?
>
> Thanks in advance,
>
> Adrian Sutton
> http://www.symphonious.net
>
>
>
>
--
Regards,
Cuong Hoang