You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Erick Erickson <er...@gmail.com> on 2019/02/02 15:27:27 UTC
Re: Alternative for DIH

Depending on how complicated you need this to be, you can just write
your own in SolrJ, see:

https://lucidworks.com/2012/02/14/indexing-with-solrj/

You haven't said a lot about the characteristics of your situation.
Are you talking 1B rows
from the DB? 1M? what is the pain point? Because until one gets to
massive amounts of
data, 9 times out of 10 poor indexing performance is a result of the
DB query being
used executing very slowly.

Before jumping to a solution, it'd be good to know
1> why you're dissatisfied with DIH, i.e. what is the problem you're seeing
2> some information about your situation, size of DB, how fast DIH
works now etc.

This latter is important, 'cause it's a totally different question if,
say, your problem
statement is
"it takes 8 hours to import 1,000,000,000 rows and the docs are 1M long"
.vs.
"it takes 8 hours to import 100,000 rows that are 1K each".

Until there are answers to questions like that it's not clear at all you even
_have_ a problem that's solvable by any of the suggestions so far.

Best,
Erick

On Thu, Jan 31, 2019 at 12:34 PM Alexandre Rafalovitch
<ar...@gmail.com> wrote:
>
> Apache NiFi may also be something of interest: https://nifi.apache.org/
>
> Regards,
>    Alex.
>
> On Thu, 31 Jan 2019 at 11:15, Mikhail Khludnev <mk...@apache.org> wrote:
> >
> > Hello,
> >
> > I did this deck some time ago. It might be useful for choosing one.
> > https://docs.google.com/presentation/d/e/2PACX-1vQzi3QOZAwLh_t3zs1gH9EGCB2HKUgiN3WJRGHpULyA-GleCrQ41dIOINa18h_XG64BX5D_ZG6jKmXL/pub?start=false&loop=false&delayms=3000
> > Note, as far as I understand Lucidworks' answer to this is Spark.
> >
> >
> > On Thu, Jan 31, 2019 at 2:15 PM Srinivas Kashyap <sr...@bamboorose.com>
> > wrote:
> >
> > > Hello,
> > >
> > > As we all know DIH is single threaded and has it's own issues while
> > > indexing.
> > >
> > > Got to know that we can write our own API's to pull data from DB and push
> > > it into solr. One such I heard was Apache Kafka being used for the purpose.
> > >
> > > Can any of you send me the links and guides to use apache kafka to pull
> > > data from DB and push into solr?
> > >
> > > If there are any other alternatives please suggest.
> > >
> > > Thanks and Regards,
> > > Srinivas Kashyap
> > > ________________________________
> > > DISCLAIMER:
> > > E-mails and attachments from Bamboo Rose, LLC are confidential.
> > > If you are not the intended recipient, please notify the sender
> > > immediately by replying to the e-mail, and then delete it without making
> > > copies or using it in any way.
> > > No representation is made that this email or any attachments are free of
> > > viruses. Virus scanning is recommended and is the responsibility of the
> > > recipient.
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev