You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sushil Vegad <vs...@serebrum.com> on 2008/12/26 17:25:10 UTC

Retrieve documents that contain max value for a field

Hi,
Can someone please help with how to write a query for the following
scenario?

We index Topics that contains text. A topic can have many versions, each
version is indexed.  Our schema has topicid, versionId and timestamp fields,
amongst others. Topicid is not a uniqueField because multiple verisons of a
topic have the same topicId. Instead the versionId and timestamp differ for
each version as follows.

SolrInputDocument doc1 = new SolrInputDocument();
doc1.addField("id", "1");
doc1.addField("versionId", "1");
doc1.addField("versionDate", "2008-12-23T23:59:59Z");

SolrInputDocument doc1_1 = new SolrInputDocument();
doc1_1.addField("id", "1");
doc1_1.addField("versionId", "2");
doc1_1.addField("versionDate", "2008-12-24T23:59:59Z");

SolrInputDocument doc2 = new SolrInputDocument();
doc2.addField("id", "2");
doc2.addField("versionId", "1");
doc2.addField("versionDate", "2008-12-23T23:59:59Z");

SolrInputDocument doc2_1 = new SolrInputDocument();
doc2_1.addField("id", "2");
doc2_1.addField("versionId", "2");
doc2_1.addField("versionDate", "2008-12-24T23:59:59Z");

SolrInputDocument doc2_2 = new SolrInputDocument();
doc2_2.addField("id", "2");
doc2_2.addField("versionId", "3");
doc2_2.addField("versionDate", "2008-12-25T23:59:59Z");

We want to write a single query where the query returns doc1_1, doc2_2 and
so on...that is for documents that have the same id, we want the query to
return the document with highest versionId or the latest timestamp.

Any thoughts how this can be done?

Thanks,
Sushil
-- 
View this message in context: http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21175643.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Retrieve documents that contain max value for a field

Posted by Sushil Vegad <vs...@serebrum.com>.
Thanks Otis, for the quick reply.

I dont find anything on the forums or the FAQ. I guess I cant get solr to
compare search result documents that have the same topicId and then look for
the maximum value of versionId in that set.

I may need to write that logic in my application.

Thanks,
Sushil


Otis Gospodnetic wrote:
> 
> Hi,
> 
> I'm not sure if Solr is the right tool for the job, if that's all there is
> to your application, but you might be able to get what you want by simply
> sorting on the version field.  Your version field is a very precise
> timestamp, which means the version field will have LOTS of unique values,
> which means that sorting by that field will eat your memory and increase
> your searchers' warmup time.  Please check the mailing lists for more
> information, or maybe we already have this covered in the Solr FAQ?
> 
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> 
> 
> ----- Original Message ----
>> From: Sushil Vegad <vs...@serebrum.com>
>> To: solr-user@lucene.apache.org
>> Sent: Friday, December 26, 2008 11:25:10 AM
>> Subject: Retrieve documents that contain max value for a field
>> 
>> 
>> Hi,
>> Can someone please help with how to write a query for the following
>> scenario?
>> 
>> We index Topics that contains text. A topic can have many versions, each
>> version is indexed.  Our schema has topicid, versionId and timestamp
>> fields,
>> amongst others. Topicid is not a uniqueField because multiple verisons of
>> a
>> topic have the same topicId. Instead the versionId and timestamp differ
>> for
>> each version as follows.
>> 
>> SolrInputDocument doc1 = new SolrInputDocument();
>> doc1.addField("id", "1");
>> doc1.addField("versionId", "1");
>> doc1.addField("versionDate", "2008-12-23T23:59:59Z");
>> 
>> SolrInputDocument doc1_1 = new SolrInputDocument();
>> doc1_1.addField("id", "1");
>> doc1_1.addField("versionId", "2");
>> doc1_1.addField("versionDate", "2008-12-24T23:59:59Z");
>> 
>> SolrInputDocument doc2 = new SolrInputDocument();
>> doc2.addField("id", "2");
>> doc2.addField("versionId", "1");
>> doc2.addField("versionDate", "2008-12-23T23:59:59Z");
>> 
>> SolrInputDocument doc2_1 = new SolrInputDocument();
>> doc2_1.addField("id", "2");
>> doc2_1.addField("versionId", "2");
>> doc2_1.addField("versionDate", "2008-12-24T23:59:59Z");
>> 
>> SolrInputDocument doc2_2 = new SolrInputDocument();
>> doc2_2.addField("id", "2");
>> doc2_2.addField("versionId", "3");
>> doc2_2.addField("versionDate", "2008-12-25T23:59:59Z");
>> 
>> We want to write a single query where the query returns doc1_1, doc2_2
>> and
>> so on...that is for documents that have the same id, we want the query to
>> return the document with highest versionId or the latest timestamp.
>> 
>> Any thoughts how this can be done?
>> 
>> Thanks,
>> Sushil
>> -- 
>> View this message in context: 
>> http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21175643.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21178529.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Retrieve documents that contain max value for a field

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

I'm not sure if Solr is the right tool for the job, if that's all there is to your application, but you might be able to get what you want by simply sorting on the version field.  Your version field is a very precise timestamp, which means the version field will have LOTS of unique values, which means that sorting by that field will eat your memory and increase your searchers' warmup time.  Please check the mailing lists for more information, or maybe we already have this covered in the Solr FAQ?


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Sushil Vegad <vs...@serebrum.com>
> To: solr-user@lucene.apache.org
> Sent: Friday, December 26, 2008 11:25:10 AM
> Subject: Retrieve documents that contain max value for a field
> 
> 
> Hi,
> Can someone please help with how to write a query for the following
> scenario?
> 
> We index Topics that contains text. A topic can have many versions, each
> version is indexed.  Our schema has topicid, versionId and timestamp fields,
> amongst others. Topicid is not a uniqueField because multiple verisons of a
> topic have the same topicId. Instead the versionId and timestamp differ for
> each version as follows.
> 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField("id", "1");
> doc1.addField("versionId", "1");
> doc1.addField("versionDate", "2008-12-23T23:59:59Z");
> 
> SolrInputDocument doc1_1 = new SolrInputDocument();
> doc1_1.addField("id", "1");
> doc1_1.addField("versionId", "2");
> doc1_1.addField("versionDate", "2008-12-24T23:59:59Z");
> 
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField("id", "2");
> doc2.addField("versionId", "1");
> doc2.addField("versionDate", "2008-12-23T23:59:59Z");
> 
> SolrInputDocument doc2_1 = new SolrInputDocument();
> doc2_1.addField("id", "2");
> doc2_1.addField("versionId", "2");
> doc2_1.addField("versionDate", "2008-12-24T23:59:59Z");
> 
> SolrInputDocument doc2_2 = new SolrInputDocument();
> doc2_2.addField("id", "2");
> doc2_2.addField("versionId", "3");
> doc2_2.addField("versionDate", "2008-12-25T23:59:59Z");
> 
> We want to write a single query where the query returns doc1_1, doc2_2 and
> so on...that is for documents that have the same id, we want the query to
> return the document with highest versionId or the latest timestamp.
> 
> Any thoughts how this can be done?
> 
> Thanks,
> Sushil
> -- 
> View this message in context: 
> http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21175643.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Retrieve documents that contain max value for a field

Posted by Sushil Vegad <vs...@serebrum.com>.
This looks useful, but I am not sure how to use the component. Could you
please elaborate?

Also, this is not available in Solr 1.3. Any equivalent of it in 1.3?

Thanks,
Sushil


ryantxu wrote:
> 
> not exactly what you are asking for, but check:
> http://wiki.apache.org/solr/StatsComponent
> 
> this will at least tell you the max/min versionId...   right now it  
> only works with numeric values, so it won't help for timestamp.
> 
> ryan
> 

-- 
View this message in context: http://www.nabble.com/Retrieve-documents-that-contain-max-value-for-a-field-tp21175643p21203697.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Retrieve documents that contain max value for a field

Posted by Ryan McKinley <ry...@gmail.com>.
>
> We want to write a single query where the query returns doc1_1,  
> doc2_2 and
> so on...that is for documents that have the same id, we want the  
> query to
> return the document with highest versionId or the latest timestamp.
>
> Any thoughts how this can be done?
>

not exactly what you are asking for, but check:
http://wiki.apache.org/solr/StatsComponent

this will at least tell you the max/min versionId...   right now it  
only works with numeric values, so it won't help for timestamp.

ryan