You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Wouter Samaey <wo...@gmail.com> on 2009/04/29 11:54:21 UTC

Advice on custom DIH or other solutions

Hi there,

I'm currently in the process of learning more about Solr, and how I
can implement it into my project.

Since my database is very large and complex, I'm looking into the way
of keeping my documents current in Solr. I have read the pages about
DIH, and find it usefull, but I may need more logic to filter out
documents or manipulate them. In order to use DIH, I'd need to run
huge queries and joins...

Now, I see several ways of going forward:

- customize DIH with a new classes so I can read directly from my
RDBMS (will be slow)
- let the webapp build an XML, and simply take that as a datasource
instead of the RDBMS (less queries, and can use memcached for the
heavy stuff)
- let the webapp instruct Solr to add, update or remove a document as
changes occur in real time instead of the DIH delta queries. For
loading a fresh situation, I'll still need to find a solution like the
ones above. (webapp drives solr directly, instead of DIH polling)

Is there some general advice you can give? I understand every app is
different..but this must be an issue many have considered before.

Kind regards

Wouter Samaey

Re: Advice on custom DIH or other solutions

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.

On Wed, Apr 29, 2009 at 3:24 PM, Wouter Samaey <wo...@gmail.com> wrote:
> Hi there,
>
> I'm currently in the process of learning more about Solr, and how I
> can implement it into my project.
>
> Since my database is very large and complex, I'm looking into the way
> of keeping my documents current in Solr. I have read the pages about
> DIH, and find it usefull, but I may need more logic to filter out
> documents or manipulate them. In order to use DIH, I'd need to run
> huge queries and joins...
>
> Now, I see several ways of going forward:
>
> - customize DIH with a new classes so I can read directly from my
> RDBMS (will be slow)
> - let the webapp build an XML, and simply take that as a datasource
> instead of the RDBMS (less queries, and can use memcached for the
> heavy stuff)
> - let the webapp instruct Solr to add, update or remove a document as
> changes occur in real time instead of the DIH delta queries. For
> loading a fresh situation, I'll still need to find a solution like the
> ones above. (webapp drives solr directly, instead of DIH polling)
>
> Is there some general advice you can give? I understand every app is
> different..but this must be an issue many have considered before.
>
> Kind regards
>
> Wouter Samaey
>
The disadvantage of DIH pulling data out of your db could be that
complex queries take long. The best strategy as I see it is maintain a
simple temp db where your app can write rows as you generate data.
Periodically , ask DIH to read from this temp DB and update the index.
This approach is good even even you wish to rebuild the index


-- 
--Noble Paul