You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Travis Low <tl...@4centurion.com> on 2011/07/19 19:50:08 UTC

use case: structured DB records with a bunch of related files

Greetings.  I have a bunch of highly structured DB records, and I'm pretty
clear on how to index those.  However, each of those records may have any
number of related documents (Word, Excel, PDF, PPT, etc.).  All of this
information will change over time.

Can someone point me to a use case or some good reading to get me started on
configuring Solr to index the DB records and files in such a way as to
relate the two types of information?  By "relate", I mean that if there's a
hit in a related file, then I need to show the user a link to the DB record
as well as a link to the file.

Thanks in advance.

cheers,

Travis

-- 

**

*Travis Low, Director of Development*


** <tl...@4centurion.com>* *

*Centurion Research Solutions, LLC*

*14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*

*703-956-6276 *•* 703-378-4474 (fax)*

*http://www.centurionresearch.com* <http://www.centurionresearch.com>

**The information contained in this email message is confidential and
protected from disclosure.  If you are not the intended recipient, any use
or dissemination of this communication, including attachments, is strictly
prohibited.  If you received this email message in error, please delete it
and immediately notify the sender.

This email message and any attachments have been scanned and are believed to
be free of malicious software and defects that might affect any computer
system in which they are received and opened. No responsibility is accepted
by Centurion Research Solutions, LLC for any loss or damage arising from the
content of this email.

Re: use case: structured DB records with a bunch of related files

Posted by Erick Erickson <er...@gmail.com>.
I suspect you'll have to use Tika to parse the attachments, and as you
do add the info that'll allow you to display the link to the meta-data that
Tika generates. I'm in a bit of a rush, but one approach would be to use
SolrJ to do your indexing database querying, and you can ask Tika
from the SolrJ to parse the attachments and actually assemble the document
for the attachment and send it to Solr from the Tika output, add whatever
meta-data you want for a link back to the DB record and index the doc.

Hope this helps
Erick

On Tue, Jul 19, 2011 at 12:50 PM, Travis Low <tl...@4centurion.com> wrote:
> Greetings.  I have a bunch of highly structured DB records, and I'm pretty
> clear on how to index those.  However, each of those records may have any
> number of related documents (Word, Excel, PDF, PPT, etc.).  All of this
> information will change over time.
>
> Can someone point me to a use case or some good reading to get me started on
> configuring Solr to index the DB records and files in such a way as to
> relate the two types of information?  By "relate", I mean that if there's a
> hit in a related file, then I need to show the user a link to the DB record
> as well as a link to the file.
>
> Thanks in advance.
>
> cheers,
>
> Travis
>
> --
>
> **
>
> *Travis Low, Director of Development*
>
>
> ** <tl...@4centurion.com>* *
>
> *Centurion Research Solutions, LLC*
>
> *14048 ParkEast Circle *•* Suite 100 *•* Chantilly, VA 20151*
>
> *703-956-6276 *•* 703-378-4474 (fax)*
>
> *http://www.centurionresearch.com* <http://www.centurionresearch.com>
>
> **The information contained in this email message is confidential and
> protected from disclosure.  If you are not the intended recipient, any use
> or dissemination of this communication, including attachments, is strictly
> prohibited.  If you received this email message in error, please delete it
> and immediately notify the sender.
>
> This email message and any attachments have been scanned and are believed to
> be free of malicious software and defects that might affect any computer
> system in which they are received and opened. No responsibility is accepted
> by Centurion Research Solutions, LLC for any loss or damage arising from the
> content of this email.
>