You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jacob Singh <ja...@gmail.com> on 2008/12/09 14:18:12 UTC

Related Documents (attachments)

Hi folks,

I'm working on creating a schema which will accommodate the following
(likely common) scenario and was hoping for some best practices:

We have stories which are objects culled from various fields in our
database.  We currently index them with a bunch of meta-data for faceting,
sorting, etc.

We have files which are attached to stories.  So if there is a story about a
restaurant, there may be a PDF of the menu attached.

We would like to index the file in solr using the UpdateRichDocument
handler, but also be able to associate it with the story it came from.  How
do people generally do this?  Is there someway to "join" the two on the Solr
side, so I don't store the content twice, but have a reference field to the
attachment?

So a search for "Egg Foo Young" could return the story about a Chinese
restaurant even if the attachment which contained the text was indexed later
and referenced as part of its data that it belonged to story[@id=123] ?
Sort of like a copyTo field but copying to another record.

I know the answer to that is probably no.  And I know Solr is not a RDBS,
but how is this generally tackled?

Thanks,
Jacob



-- 

+1 510 277-0891 (o)
+91 9999 33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsingh@gmail.com

Re: Related Documents (attachments)

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Jacob,

One option is that if you later index additional content, you also index the old content you previously added to Solr.  In other words, keep both the meta-data and the other (e.g. PDF) data in a same record/document in 1 index.

If you can't do that, no, there is no join across document or across indices, but you can use the same unique key in in both docs/indices and manually query one index and then the other, including the unique identifier in the second query.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Jacob Singh <ja...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, December 9, 2008 8:18:12 AM
> Subject: Related Documents (attachments)
> 
> Hi folks,
> 
> I'm working on creating a schema which will accommodate the following
> (likely common) scenario and was hoping for some best practices:
> 
> We have stories which are objects culled from various fields in our
> database.  We currently index them with a bunch of meta-data for faceting,
> sorting, etc.
> 
> We have files which are attached to stories.  So if there is a story about a
> restaurant, there may be a PDF of the menu attached.
> 
> We would like to index the file in solr using the UpdateRichDocument
> handler, but also be able to associate it with the story it came from.  How
> do people generally do this?  Is there someway to "join" the two on the Solr
> side, so I don't store the content twice, but have a reference field to the
> attachment?
> 
> So a search for "Egg Foo Young" could return the story about a Chinese
> restaurant even if the attachment which contained the text was indexed later
> and referenced as part of its data that it belonged to story[@id=123] ?
> Sort of like a copyTo field but copying to another record.
> 
> I know the answer to that is probably no.  And I know Solr is not a RDBS,
> but how is this generally tackled?
> 
> Thanks,
> Jacob
> 
> 
> 
> -- 
> 
> +1 510 277-0891 (o)
> +91 9999 33 7458 (m)
> 
> web: http://pajamadesign.com
> 
> Skype: pajamadesign
> Yahoo: jacobsingh
> AIM: jacobsingh
> gTalk: jacobsingh@gmail.com