You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lee Smith <le...@weblee.co.uk> on 2010/02/15 16:39:29 UTC

Apache Tika/ Solar Cell

Hey All,

Hope someone can advise me and a way to go.

I have a my Solr setup and working well. I am using DIH to handle all my data input.

Now I need to add content from word docs pdf's meta data etc and looking to use Solar Cell

A few questions regarding this. Would it be best to add these to a different core ?

How would I handle Documents removed from the server as I would want these removed from index as well.

Hope you can advise

Thank you in advanced

Re: Apache Tika/ Solar Cell

Posted by Erick Erickson <er...@gmail.com>.
<<Add them to a different core:>>
Well, It Depends (tm). What do you want to accomplish? Do you want
searches to get results from both the database import AND the imported
documents? Or are these orthogonal data sets? If they are orthogonal,
then putting them in their own core if probably conceptually easiest
(there's no requirement to do this, since you could use mutually
exclusive fields but many might find that design confusing).

<<How would I handle Documents removed from the server as I
would want these removed from index as well>>
This is trickier. *If* you know when documents are deleted, you
can delete by unique ID or by query. But that's the easy part.
Knowing what documents are deleted is harder, and you'll have
to do that part yourself, there isn't anything in SOLR that I know
of that helps there...

HTH
Erick

On Mon, Feb 15, 2010 at 10:39 AM, Lee Smith <le...@weblee.co.uk> wrote:

> Hey All,
>
> Hope someone can advise me and a way to go.
>
> I have a my Solr setup and working well. I am using DIH to handle all my
> data input.
>
> Now I need to add content from word docs pdf's meta data etc and looking to
> use Solar Cell
>
> A few questions regarding this. Would it be best to add these to a
> different core ?
>
> How would I handle Documents removed from the server as I would want these
> removed from index as well.
>
> Hope you can advise
>
> Thank you in advanced