You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Skiles <da...@docfinity.com> on 2011/10/05 21:55:59 UTC

Field Collapsing and Record Filtering

A while back I sent a question to the list about only returning the most
recent version of a document, based on a numerical version field stored in
each record.  Someone suggested that I use field collapsing to do so, and in
most cases it seems to work well.  However, I've hit a snag and I'd
appreciate it if anyone could offer some pointers.

At the moment, my scheme looks roughly like this (not using exact data
types):

contents : string
documentId : string
version:  float


When I query on contents, I can use field collapsing to group by documentId,
only return one instance of documentId, and sort each group by version in
descending order.  If the newest version of the document is returned by the
query, everything works great.

What I've realized, though, is that using field collapsing doesn't
necessarily get me the most recent version of the document, if it matches
the query, but the most recent version of any document that matches the
query.

Is there any good way to get the most recent version of the document that
matches the query, but only if it's the record with the highest version
number?



For example, with the following record set:

contents:  angry horse
documentId:  1a
version:  1.0

contents:  distraught horse
documentId:  1a
version:  1.1

contents:  peevish horse
documentId:  1a
version:  2.0

Searching for "horse" will return version 2.0 of 1a using collapsing in the
manner that I described above.

If I search for "angry", I'll get back version 1.0 of 1a.  I'd rather get
back nothing at all.  Is this possible?

Re: Field Collapsing and Record Filtering

Posted by Michael Sokolov <so...@ifactory.com>.
On 10/13/2011 5:04 PM, lee carroll wrote:
> current: bool //for fq which searches only current versions
> last_current_at: date time // for date range queries or group sorting
> what was current for a given date
>
> sorry if i've missed a requirement
>
> lee c
>
Lee the idea of "last_current_at" is interesting; could you expand on 
what you mean by "group sorting" though?  Would that provide a means to 
get only the most recent version?  Say I have access to versions 1,3,4 
of some document and the current version is 5.  I'd like to get version 
4 as the result.  Would you use field collapsing/grouping for that? 
Something else?

-Mike

Re: Field Collapsing and Record Filtering

Posted by lee carroll <le...@googlemail.com>.
sorry missed the permission stuff:

I think thats ok if you index the acl as part of the document. That is
to say each version has its own acl. Match users against version acl
data
as a filter query and use last_current_at date as a sort



On 13 October 2011 22:04, lee carroll <le...@googlemail.com> wrote:
> current: bool //for fq which searches only current versions
> last_current_at: date time // for date range queries or group sorting
> what was current for a given date
>
> sorry if i've missed a requirement
>
> lee c
>
> On 13 October 2011 15:01, Mike Sokolov <so...@ifactory.com> wrote:
>> We have the identical problem in our system.
>>
>> Our plan is to encode the most recent version of a document using an
>> explicit field/value;
>> ie
>>
>> version=current
>>
>> (or maybe current=true)
>>
>> We also need to be able to allow users to search for the most current, but
>> only within versions they have access to (might be among the 2010 and the
>> 2007 versions only).  I can't see any way to do this other than to index
>> each document with a "most current as of 2010" flag, or something like that.
>>
>> But if anyone has brighter ideas on how to do this with a query, I'd be
>> excited to here them!
>>
>> -Mike
>>
>> On 10/07/2011 05:21 AM, Martijn v Groningen wrote:
>>>
>>> I don't think this possible in only one search with what Solr
>>> currently has to offer. I guess the only way to support this, is by
>>> post processing your results on the client side.
>>> So for each group you display you query what to latest version is. If
>>> that doesn't match then you omit the result from rendering or execute
>>> a second grouped search to get more
>>> groups. The downsides are that pagination will never be correct and
>>> overall search time will take more time.
>>>
>>> Martijn
>>>
>>> On 5 October 2011 21:55, Daniel Skiles<da...@docfinity.com>
>>>  wrote:
>>>
>>>>
>>>> A while back I sent a question to the list about only returning the most
>>>> recent version of a document, based on a numerical version field stored
>>>> in
>>>> each record.  Someone suggested that I use field collapsing to do so, and
>>>> in
>>>> most cases it seems to work well.  However, I've hit a snag and I'd
>>>> appreciate it if anyone could offer some pointers.
>>>>
>>>> At the moment, my scheme looks roughly like this (not using exact data
>>>> types):
>>>>
>>>> contents : string
>>>> documentId : string
>>>> version:  float
>>>>
>>>>
>>>> When I query on contents, I can use field collapsing to group by
>>>> documentId,
>>>> only return one instance of documentId, and sort each group by version in
>>>> descending order.  If the newest version of the document is returned by
>>>> the
>>>> query, everything works great.
>>>>
>>>> What I've realized, though, is that using field collapsing doesn't
>>>> necessarily get me the most recent version of the document, if it matches
>>>> the query, but the most recent version of any document that matches the
>>>> query.
>>>>
>>>> Is there any good way to get the most recent version of the document that
>>>> matches the query, but only if it's the record with the highest version
>>>> number?
>>>>
>>>>
>>>>
>>>> For example, with the following record set:
>>>>
>>>> contents:  angry horse
>>>> documentId:  1a
>>>> version:  1.0
>>>>
>>>> contents:  distraught horse
>>>> documentId:  1a
>>>> version:  1.1
>>>>
>>>> contents:  peevish horse
>>>> documentId:  1a
>>>> version:  2.0
>>>>
>>>> Searching for "horse" will return version 2.0 of 1a using collapsing in
>>>> the
>>>> manner that I described above.
>>>>
>>>> If I search for "angry", I'll get back version 1.0 of 1a.  I'd rather get
>>>> back nothing at all.  Is this possible?
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: Field Collapsing and Record Filtering

Posted by lee carroll <le...@googlemail.com>.
current: bool //for fq which searches only current versions
last_current_at: date time // for date range queries or group sorting
what was current for a given date

sorry if i've missed a requirement

lee c

On 13 October 2011 15:01, Mike Sokolov <so...@ifactory.com> wrote:
> We have the identical problem in our system.
>
> Our plan is to encode the most recent version of a document using an
> explicit field/value;
> ie
>
> version=current
>
> (or maybe current=true)
>
> We also need to be able to allow users to search for the most current, but
> only within versions they have access to (might be among the 2010 and the
> 2007 versions only).  I can't see any way to do this other than to index
> each document with a "most current as of 2010" flag, or something like that.
>
> But if anyone has brighter ideas on how to do this with a query, I'd be
> excited to here them!
>
> -Mike
>
> On 10/07/2011 05:21 AM, Martijn v Groningen wrote:
>>
>> I don't think this possible in only one search with what Solr
>> currently has to offer. I guess the only way to support this, is by
>> post processing your results on the client side.
>> So for each group you display you query what to latest version is. If
>> that doesn't match then you omit the result from rendering or execute
>> a second grouped search to get more
>> groups. The downsides are that pagination will never be correct and
>> overall search time will take more time.
>>
>> Martijn
>>
>> On 5 October 2011 21:55, Daniel Skiles<da...@docfinity.com>
>>  wrote:
>>
>>>
>>> A while back I sent a question to the list about only returning the most
>>> recent version of a document, based on a numerical version field stored
>>> in
>>> each record.  Someone suggested that I use field collapsing to do so, and
>>> in
>>> most cases it seems to work well.  However, I've hit a snag and I'd
>>> appreciate it if anyone could offer some pointers.
>>>
>>> At the moment, my scheme looks roughly like this (not using exact data
>>> types):
>>>
>>> contents : string
>>> documentId : string
>>> version:  float
>>>
>>>
>>> When I query on contents, I can use field collapsing to group by
>>> documentId,
>>> only return one instance of documentId, and sort each group by version in
>>> descending order.  If the newest version of the document is returned by
>>> the
>>> query, everything works great.
>>>
>>> What I've realized, though, is that using field collapsing doesn't
>>> necessarily get me the most recent version of the document, if it matches
>>> the query, but the most recent version of any document that matches the
>>> query.
>>>
>>> Is there any good way to get the most recent version of the document that
>>> matches the query, but only if it's the record with the highest version
>>> number?
>>>
>>>
>>>
>>> For example, with the following record set:
>>>
>>> contents:  angry horse
>>> documentId:  1a
>>> version:  1.0
>>>
>>> contents:  distraught horse
>>> documentId:  1a
>>> version:  1.1
>>>
>>> contents:  peevish horse
>>> documentId:  1a
>>> version:  2.0
>>>
>>> Searching for "horse" will return version 2.0 of 1a using collapsing in
>>> the
>>> manner that I described above.
>>>
>>> If I search for "angry", I'll get back version 1.0 of 1a.  I'd rather get
>>> back nothing at all.  Is this possible?
>>>
>>>
>>
>>
>>
>

Re: Field Collapsing and Record Filtering

Posted by Mike Sokolov <so...@ifactory.com>.
We have the identical problem in our system.

Our plan is to encode the most recent version of a document using an 
explicit field/value;
ie

version=current

(or maybe current=true)

We also need to be able to allow users to search for the most current, 
but only within versions they have access to (might be among the 2010 
and the 2007 versions only).  I can't see any way to do this other than 
to index each document with a "most current as of 2010" flag, or 
something like that.

But if anyone has brighter ideas on how to do this with a query, I'd be 
excited to here them!

-Mike

On 10/07/2011 05:21 AM, Martijn v Groningen wrote:
> I don't think this possible in only one search with what Solr
> currently has to offer. I guess the only way to support this, is by
> post processing your results on the client side.
> So for each group you display you query what to latest version is. If
> that doesn't match then you omit the result from rendering or execute
> a second grouped search to get more
> groups. The downsides are that pagination will never be correct and
> overall search time will take more time.
>
> Martijn
>
> On 5 October 2011 21:55, Daniel Skiles<da...@docfinity.com>  wrote:
>    
>> A while back I sent a question to the list about only returning the most
>> recent version of a document, based on a numerical version field stored in
>> each record.  Someone suggested that I use field collapsing to do so, and in
>> most cases it seems to work well.  However, I've hit a snag and I'd
>> appreciate it if anyone could offer some pointers.
>>
>> At the moment, my scheme looks roughly like this (not using exact data
>> types):
>>
>> contents : string
>> documentId : string
>> version:  float
>>
>>
>> When I query on contents, I can use field collapsing to group by documentId,
>> only return one instance of documentId, and sort each group by version in
>> descending order.  If the newest version of the document is returned by the
>> query, everything works great.
>>
>> What I've realized, though, is that using field collapsing doesn't
>> necessarily get me the most recent version of the document, if it matches
>> the query, but the most recent version of any document that matches the
>> query.
>>
>> Is there any good way to get the most recent version of the document that
>> matches the query, but only if it's the record with the highest version
>> number?
>>
>>
>>
>> For example, with the following record set:
>>
>> contents:  angry horse
>> documentId:  1a
>> version:  1.0
>>
>> contents:  distraught horse
>> documentId:  1a
>> version:  1.1
>>
>> contents:  peevish horse
>> documentId:  1a
>> version:  2.0
>>
>> Searching for "horse" will return version 2.0 of 1a using collapsing in the
>> manner that I described above.
>>
>> If I search for "angry", I'll get back version 1.0 of 1a.  I'd rather get
>> back nothing at all.  Is this possible?
>>
>>      
>
>
>    

Re: Field Collapsing and Record Filtering

Posted by Martijn v Groningen <ma...@gmail.com>.
I don't think this possible in only one search with what Solr
currently has to offer. I guess the only way to support this, is by
post processing your results on the client side.
So for each group you display you query what to latest version is. If
that doesn't match then you omit the result from rendering or execute
a second grouped search to get more
groups. The downsides are that pagination will never be correct and
overall search time will take more time.

Martijn

On 5 October 2011 21:55, Daniel Skiles <da...@docfinity.com> wrote:
> A while back I sent a question to the list about only returning the most
> recent version of a document, based on a numerical version field stored in
> each record.  Someone suggested that I use field collapsing to do so, and in
> most cases it seems to work well.  However, I've hit a snag and I'd
> appreciate it if anyone could offer some pointers.
>
> At the moment, my scheme looks roughly like this (not using exact data
> types):
>
> contents : string
> documentId : string
> version:  float
>
>
> When I query on contents, I can use field collapsing to group by documentId,
> only return one instance of documentId, and sort each group by version in
> descending order.  If the newest version of the document is returned by the
> query, everything works great.
>
> What I've realized, though, is that using field collapsing doesn't
> necessarily get me the most recent version of the document, if it matches
> the query, but the most recent version of any document that matches the
> query.
>
> Is there any good way to get the most recent version of the document that
> matches the query, but only if it's the record with the highest version
> number?
>
>
>
> For example, with the following record set:
>
> contents:  angry horse
> documentId:  1a
> version:  1.0
>
> contents:  distraught horse
> documentId:  1a
> version:  1.1
>
> contents:  peevish horse
> documentId:  1a
> version:  2.0
>
> Searching for "horse" will return version 2.0 of 1a using collapsing in the
> manner that I described above.
>
> If I search for "angry", I'll get back version 1.0 of 1a.  I'd rather get
> back nothing at all.  Is this possible?
>



-- 
Met vriendelijke groet,

Martijn van Groningen