You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Daniel Skiles <da...@docfinity.com> on 2011/08/17 20:20:08 UTC

Return records based on aggregate functions?

I've recently started using Solr and I'm stumped by a problem I'm currently
encountering.  Given that I can't really find anything close to what I'm
trying to do on Google or the mailing lists, I figured I'd ask if anyone
here had suggestions on how to do it.

I currently have a schema that looks more or less like this:

uniqueId (string) -- Unique identifier for a record
documentId (string) -- Id of document represented by this record
contents (string) -- contents of file represented by this record
version (float) -- Numeric representation of the version of this document


What I'd like to do is submit a query to the server that returns records
that match against contents, but only if the record has a version field that
is the largest value for all records that share the same documentId.

In other words, I'd like to be able to only search the most recent version
of a document in some scenarios.

Is this possible with Solr?  I'm at an early enough phase that I'm also able
to modify my solr schema if necessary.

Thank you,
Daniel

Re: Return records based on aggregate functions?

Posted by Daniel Skiles <da...@docfinity.com>.
It's actually an analyzed String.  I figured that out after the first test
run.

On Thu, Aug 18, 2011 at 9:00 AM, Erick Erickson <er...@gmail.com>wrote:

> Side comment: Is your content field really a "string" value in your
> schema.xml? that's an un-analyzed type and unless you're
> always searching for *exactly* the full contents of the field,
> you'll have problems....
>
> Best
> Erick
>
> On Wed, Aug 17, 2011 at 2:20 PM, Daniel Skiles
> <da...@docfinity.com> wrote:
> > I've recently started using Solr and I'm stumped by a problem I'm
> currently
> > encountering.  Given that I can't really find anything close to what I'm
> > trying to do on Google or the mailing lists, I figured I'd ask if anyone
> > here had suggestions on how to do it.
> >
> > I currently have a schema that looks more or less like this:
> >
> > uniqueId (string) -- Unique identifier for a record
> > documentId (string) -- Id of document represented by this record
> > contents (string) -- contents of file represented by this record
> > version (float) -- Numeric representation of the version of this document
> >
> >
> > What I'd like to do is submit a query to the server that returns records
> > that match against contents, but only if the record has a version field
> that
> > is the largest value for all records that share the same documentId.
> >
> > In other words, I'd like to be able to only search the most recent
> version
> > of a document in some scenarios.
> >
> > Is this possible with Solr?  I'm at an early enough phase that I'm also
> able
> > to modify my solr schema if necessary.
> >
> > Thank you,
> > Daniel
> >
>

Re: Return records based on aggregate functions?

Posted by Erick Erickson <er...@gmail.com>.
Side comment: Is your content field really a "string" value in your
schema.xml? that's an un-analyzed type and unless you're
always searching for *exactly* the full contents of the field,
you'll have problems....

Best
Erick

On Wed, Aug 17, 2011 at 2:20 PM, Daniel Skiles
<da...@docfinity.com> wrote:
> I've recently started using Solr and I'm stumped by a problem I'm currently
> encountering.  Given that I can't really find anything close to what I'm
> trying to do on Google or the mailing lists, I figured I'd ask if anyone
> here had suggestions on how to do it.
>
> I currently have a schema that looks more or less like this:
>
> uniqueId (string) -- Unique identifier for a record
> documentId (string) -- Id of document represented by this record
> contents (string) -- contents of file represented by this record
> version (float) -- Numeric representation of the version of this document
>
>
> What I'd like to do is submit a query to the server that returns records
> that match against contents, but only if the record has a version field that
> is the largest value for all records that share the same documentId.
>
> In other words, I'd like to be able to only search the most recent version
> of a document in some scenarios.
>
> Is this possible with Solr?  I'm at an early enough phase that I'm also able
> to modify my solr schema if necessary.
>
> Thank you,
> Daniel
>

RE: Return records based on aggregate functions?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Yes:

solrquery.add("group.main", true);
solrquery.add("group.format", "simple");

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Daniel Skiles [mailto:daniel.skiles@docfinity.com] 
Sent: Wednesday, August 17, 2011 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Return records based on aggregate functions?

For response option 1, would I add the group.main=true and
group.format=simple parameters to the SolrQuery object?

On Wed, Aug 17, 2011 at 3:09 PM, Dyer, James <Ja...@ingrambook.com>wrote:

> For the request end, you can just use something like:
>
> solrquery.add("group", true);
> ..etc..
>
> For the response, you have 3 options:
>
> 1. specify "group.main=true&group.format=simple" .  (note: When I tested
> this on a nightly build from back in February I noticed a significant
> performance impact from using these params although I imagine the version
> that is committed to 3.3 does not have this problem.)
>
> This will return your 1-document-per-group as if it is a regular
> non-grouped query and the response will come back just like any other query.
> (see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr
>  and the javadocs: http://lucene.apache.org/solr/api/overview-summary.htmlthen scroll to the solrj section.)
>
> 2. Full SolrJ support was just added to the 3.x branch so you'll have to
> use a nightly build (which ought to be stable & production-quality).  See
> https://issues.apache.org/jira/browse/SOLR-2637 for more information.
>  After building the solrj documentation, look for classes that start with
> "Group"
>
> 3. See this posting on how to parse the response "by-hand".  This is for a
> slightly older version of Field Collapsing than what was committed so it
> might not be 100% accurate.
> http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Daniel Skiles [mailto:daniel.skiles@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Return records based on aggregate functions?
>
> Woah.  That looks like exactly what I need.  Thanks you very much.  Is
> there
> any documentation for how to do that using the SolrJ API?
>
> On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James <James.Dyer@ingrambook.com
> >wrote:
>
> > Daniel,
> >
> > This looks like a good usecase for FieldCollapsing (see
> > http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something
> like:
> >
> > &group=true&group.field=documentId&group.limit=1&group.sort=version desc
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Daniel Skiles [mailto:daniel.skiles@docfinity.com]
> > Sent: Wednesday, August 17, 2011 1:20 PM
> > To: solr-user@lucene.apache.org
> > Subject: Return records based on aggregate functions?
> >
> > I've recently started using Solr and I'm stumped by a problem I'm
> currently
> > encountering.  Given that I can't really find anything close to what I'm
> > trying to do on Google or the mailing lists, I figured I'd ask if anyone
> > here had suggestions on how to do it.
> >
> > I currently have a schema that looks more or less like this:
> >
> > uniqueId (string) -- Unique identifier for a record
> > documentId (string) -- Id of document represented by this record
> > contents (string) -- contents of file represented by this record
> > version (float) -- Numeric representation of the version of this document
> >
> >
> > What I'd like to do is submit a query to the server that returns records
> > that match against contents, but only if the record has a version field
> > that
> > is the largest value for all records that share the same documentId.
> >
> > In other words, I'd like to be able to only search the most recent
> version
> > of a document in some scenarios.
> >
> > Is this possible with Solr?  I'm at an early enough phase that I'm also
> > able
> > to modify my solr schema if necessary.
> >
> > Thank you,
> > Daniel
> >
>

Re: Return records based on aggregate functions?

Posted by Daniel Skiles <da...@docfinity.com>.
For response option 1, would I add the group.main=true and
group.format=simple parameters to the SolrQuery object?

On Wed, Aug 17, 2011 at 3:09 PM, Dyer, James <Ja...@ingrambook.com>wrote:

> For the request end, you can just use something like:
>
> solrquery.add("group", true);
> ..etc..
>
> For the response, you have 3 options:
>
> 1. specify "group.main=true&group.format=simple" .  (note: When I tested
> this on a nightly build from back in February I noticed a significant
> performance impact from using these params although I imagine the version
> that is committed to 3.3 does not have this problem.)
>
> This will return your 1-document-per-group as if it is a regular
> non-grouped query and the response will come back just like any other query.
> (see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr
>  and the javadocs: http://lucene.apache.org/solr/api/overview-summary.htmlthen scroll to the solrj section.)
>
> 2. Full SolrJ support was just added to the 3.x branch so you'll have to
> use a nightly build (which ought to be stable & production-quality).  See
> https://issues.apache.org/jira/browse/SOLR-2637 for more information.
>  After building the solrj documentation, look for classes that start with
> "Group"
>
> 3. See this posting on how to parse the response "by-hand".  This is for a
> slightly older version of Field Collapsing than what was committed so it
> might not be 100% accurate.
> http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Daniel Skiles [mailto:daniel.skiles@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:32 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Return records based on aggregate functions?
>
> Woah.  That looks like exactly what I need.  Thanks you very much.  Is
> there
> any documentation for how to do that using the SolrJ API?
>
> On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James <James.Dyer@ingrambook.com
> >wrote:
>
> > Daniel,
> >
> > This looks like a good usecase for FieldCollapsing (see
> > http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something
> like:
> >
> > &group=true&group.field=documentId&group.limit=1&group.sort=version desc
> >
> > James Dyer
> > E-Commerce Systems
> > Ingram Content Group
> > (615) 213-4311
> >
> >
> > -----Original Message-----
> > From: Daniel Skiles [mailto:daniel.skiles@docfinity.com]
> > Sent: Wednesday, August 17, 2011 1:20 PM
> > To: solr-user@lucene.apache.org
> > Subject: Return records based on aggregate functions?
> >
> > I've recently started using Solr and I'm stumped by a problem I'm
> currently
> > encountering.  Given that I can't really find anything close to what I'm
> > trying to do on Google or the mailing lists, I figured I'd ask if anyone
> > here had suggestions on how to do it.
> >
> > I currently have a schema that looks more or less like this:
> >
> > uniqueId (string) -- Unique identifier for a record
> > documentId (string) -- Id of document represented by this record
> > contents (string) -- contents of file represented by this record
> > version (float) -- Numeric representation of the version of this document
> >
> >
> > What I'd like to do is submit a query to the server that returns records
> > that match against contents, but only if the record has a version field
> > that
> > is the largest value for all records that share the same documentId.
> >
> > In other words, I'd like to be able to only search the most recent
> version
> > of a document in some scenarios.
> >
> > Is this possible with Solr?  I'm at an early enough phase that I'm also
> > able
> > to modify my solr schema if necessary.
> >
> > Thank you,
> > Daniel
> >
>

RE: Return records based on aggregate functions?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
For the request end, you can just use something like:

solrquery.add("group", true);
..etc..

For the response, you have 3 options:

1. specify "group.main=true&group.format=simple" .  (note: When I tested this on a nightly build from back in February I noticed a significant performance impact from using these params although I imagine the version that is committed to 3.3 does not have this problem.)

This will return your 1-document-per-group as if it is a regular non-grouped query and the response will come back just like any other query.  
(see the wiki: http://wiki.apache.org/solr/Solrj#Reading_Data_from_Solr 
 and the javadocs: http://lucene.apache.org/solr/api/overview-summary.html then scroll to the solrj section.) 

2. Full SolrJ support was just added to the 3.x branch so you'll have to use a nightly build (which ought to be stable & production-quality).  See https://issues.apache.org/jira/browse/SOLR-2637 for more information.  After building the solrj documentation, look for classes that start with "Group"

3. See this posting on how to parse the response "by-hand".  This is for a slightly older version of Field Collapsing than what was committed so it might not be 100% accurate.  http://www.lucidimagination.com/search/document/148ba23aec5ee2d8/solrquery_api_for_adding_group_filter

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Daniel Skiles [mailto:daniel.skiles@docfinity.com] 
Sent: Wednesday, August 17, 2011 1:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Return records based on aggregate functions?

Woah.  That looks like exactly what I need.  Thanks you very much.  Is there
any documentation for how to do that using the SolrJ API?

On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James <Ja...@ingrambook.com>wrote:

> Daniel,
>
> This looks like a good usecase for FieldCollapsing (see
> http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something like:
>
> &group=true&group.field=documentId&group.limit=1&group.sort=version desc
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Daniel Skiles [mailto:daniel.skiles@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:20 PM
> To: solr-user@lucene.apache.org
> Subject: Return records based on aggregate functions?
>
> I've recently started using Solr and I'm stumped by a problem I'm currently
> encountering.  Given that I can't really find anything close to what I'm
> trying to do on Google or the mailing lists, I figured I'd ask if anyone
> here had suggestions on how to do it.
>
> I currently have a schema that looks more or less like this:
>
> uniqueId (string) -- Unique identifier for a record
> documentId (string) -- Id of document represented by this record
> contents (string) -- contents of file represented by this record
> version (float) -- Numeric representation of the version of this document
>
>
> What I'd like to do is submit a query to the server that returns records
> that match against contents, but only if the record has a version field
> that
> is the largest value for all records that share the same documentId.
>
> In other words, I'd like to be able to only search the most recent version
> of a document in some scenarios.
>
> Is this possible with Solr?  I'm at an early enough phase that I'm also
> able
> to modify my solr schema if necessary.
>
> Thank you,
> Daniel
>

Re: Return records based on aggregate functions?

Posted by Daniel Skiles <da...@docfinity.com>.
Woah.  That looks like exactly what I need.  Thanks you very much.  Is there
any documentation for how to do that using the SolrJ API?

On Wed, Aug 17, 2011 at 2:26 PM, Dyer, James <Ja...@ingrambook.com>wrote:

> Daniel,
>
> This looks like a good usecase for FieldCollapsing (see
> http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something like:
>
> &group=true&group.field=documentId&group.limit=1&group.sort=version desc
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: Daniel Skiles [mailto:daniel.skiles@docfinity.com]
> Sent: Wednesday, August 17, 2011 1:20 PM
> To: solr-user@lucene.apache.org
> Subject: Return records based on aggregate functions?
>
> I've recently started using Solr and I'm stumped by a problem I'm currently
> encountering.  Given that I can't really find anything close to what I'm
> trying to do on Google or the mailing lists, I figured I'd ask if anyone
> here had suggestions on how to do it.
>
> I currently have a schema that looks more or less like this:
>
> uniqueId (string) -- Unique identifier for a record
> documentId (string) -- Id of document represented by this record
> contents (string) -- contents of file represented by this record
> version (float) -- Numeric representation of the version of this document
>
>
> What I'd like to do is submit a query to the server that returns records
> that match against contents, but only if the record has a version field
> that
> is the largest value for all records that share the same documentId.
>
> In other words, I'd like to be able to only search the most recent version
> of a document in some scenarios.
>
> Is this possible with Solr?  I'm at an early enough phase that I'm also
> able
> to modify my solr schema if necessary.
>
> Thank you,
> Daniel
>

RE: Return records based on aggregate functions?

Posted by "Dyer, James" <Ja...@ingrambook.com>.
Daniel,

This looks like a good usecase for FieldCollapsing (see http://wiki.apache.org/solr/FieldCollapsing).  Perhaps try something like:

&group=true&group.field=documentId&group.limit=1&group.sort=version desc

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Daniel Skiles [mailto:daniel.skiles@docfinity.com] 
Sent: Wednesday, August 17, 2011 1:20 PM
To: solr-user@lucene.apache.org
Subject: Return records based on aggregate functions?

I've recently started using Solr and I'm stumped by a problem I'm currently
encountering.  Given that I can't really find anything close to what I'm
trying to do on Google or the mailing lists, I figured I'd ask if anyone
here had suggestions on how to do it.

I currently have a schema that looks more or less like this:

uniqueId (string) -- Unique identifier for a record
documentId (string) -- Id of document represented by this record
contents (string) -- contents of file represented by this record
version (float) -- Numeric representation of the version of this document


What I'd like to do is submit a query to the server that returns records
that match against contents, but only if the record has a version field that
is the largest value for all records that share the same documentId.

In other words, I'd like to be able to only search the most recent version
of a document in some scenarios.

Is this possible with Solr?  I'm at an early enough phase that I'm also able
to modify my solr schema if necessary.

Thank you,
Daniel