You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Jianbin Dai <jd...@huawei.com> on 2011/01/28 01:30:19 UTC

Solr for noSQL

Hi,

 

Do we have data import handler to fast read in data from noSQL database,
specifically, MongoDB I am thinking to use? 

Or a more general question, how does Solr work with noSQL database?

Thanks.

 

Jianbin

Re: Solr for noSQL

Posted by Gora Mohanty <go...@mimirtech.com>.

On Fri, Jan 28, 2011 at 6:00 AM, Jianbin Dai <jd...@huawei.com> wrote:
[...]
> Do we have data import handler to fast read in data from noSQL database,
> specifically, MongoDB I am thinking to use?
[...]

Have you tried the links that a Google search turns up? Some of
them look like pretty good prospects.

Regards,
Gora

Re: Solr for noSQL

Posted by Alejandro Delgadillo <ad...@febg.org>.

Have you tried indexing using HTTP POST, you just call your information or
documents from your DB and store it in a variable, next you just loop the
POST as many register you have, and problem solve.

With this method it doesn't matter what kind of DB you are using...


On 1/28/11 7:43 AM, "Erick Erickson" <er...@gmail.com> wrote:

> I'll reply for Lance because I'm awake earlier <G>...
> 
> To make your own DIH, you have to solve all the
> problems you'd have to solve to use a Java program
> connect to your datasource via JDBC, PLUS
> fit it into the DIH framework. Why do the extra work?
> 
> The other thing is that writing your own code gives
> you much greater control over, say, error handling,
> exception handling, continue-or-abort decisions, etc.
> DIH is a good tool, don't get me wrong, but I prefer
> more control in production situations.
> 
> Plus, connecting to Solr via SolrJ AND
> connecting to your database takes about 20 lines
> of code, it's not very complex. You can have that
> done pretty quickly...
> 
> But if you'd rather make your own DIH, it's up to you.
> 
> Best
> Erick
> 
> On Fri, Jan 28, 2011 at 12:38 AM, Dennis Gearon <ge...@sbcglobal.net>wrote:
> 
>> Why not make one's own DIH handler, Lance?
>> 
>>  Dennis Gearon
>> 
>> 
>> Signature Warning
>> ----------------
>> It is always a good idea to learn from your own mistakes. It is usually a
>> better
>> idea to learn from others¹ mistakes, so you do not have to make them
>> yourself.
>> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>> 
>> 
>> EARTH has a Right To Life,
>> otherwise we all die.
>> 
>> 
>> 
>> ----- Original Message ----
>> From: Lance Norskog <go...@gmail.com>
>> To: solr-user@lucene.apache.org
>> Sent: Thu, January 27, 2011 9:33:25 PM
>> Subject: Re: Solr for noSQL
>> 
>> There no special connectors available to read from the key-value
>> stores like memcache/cassandra/mongodb. You would have to get a Java
>> client library for the DB and code your own dataimporthandler
>> datasource.  I cannot recommend this; you should make your own program
>> to read data and upload to Solr with one of the Solr client libraries.
>> 
>> Lance
>> 
>> On 1/27/11, Jianbin Dai <jd...@huawei.com> wrote:
>>> Hi,
>>> 
>>> 
>>> 
>>> Do we have data import handler to fast read in data from noSQL database,
>>> specifically, MongoDB I am thinking to use?
>>> 
>>> Or a more general question, how does Solr work with noSQL database?
>>> 
>>> Thanks.
>>> 
>>> 
>>> 
>>> Jianbin
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> --
>> Lance Norskog
>> goksron@gmail.com
>> 
>>

Re: Solr for noSQL

Posted by Erick Erickson <er...@gmail.com>.

I'll reply for Lance because I'm awake earlier <G>...

To make your own DIH, you have to solve all the
problems you'd have to solve to use a Java program
connect to your datasource via JDBC, PLUS
fit it into the DIH framework. Why do the extra work?

The other thing is that writing your own code gives
you much greater control over, say, error handling,
exception handling, continue-or-abort decisions, etc.
DIH is a good tool, don't get me wrong, but I prefer
more control in production situations.

Plus, connecting to Solr via SolrJ AND
connecting to your database takes about 20 lines
of code, it's not very complex. You can have that
done pretty quickly...

But if you'd rather make your own DIH, it's up to you.

Best
Erick

On Fri, Jan 28, 2011 at 12:38 AM, Dennis Gearon <ge...@sbcglobal.net>wrote:

> Why not make one's own DIH handler, Lance?
>
>  Dennis Gearon
>
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better
> idea to learn from others’ mistakes, so you do not have to make them
> yourself.
> from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
>
> EARTH has a Right To Life,
> otherwise we all die.
>
>
>
> ----- Original Message ----
> From: Lance Norskog <go...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, January 27, 2011 9:33:25 PM
> Subject: Re: Solr for noSQL
>
> There no special connectors available to read from the key-value
> stores like memcache/cassandra/mongodb. You would have to get a Java
> client library for the DB and code your own dataimporthandler
> datasource.  I cannot recommend this; you should make your own program
> to read data and upload to Solr with one of the Solr client libraries.
>
> Lance
>
> On 1/27/11, Jianbin Dai <jd...@huawei.com> wrote:
> > Hi,
> >
> >
> >
> > Do we have data import handler to fast read in data from noSQL database,
> > specifically, MongoDB I am thinking to use?
> >
> > Or a more general question, how does Solr work with noSQL database?
> >
> > Thanks.
> >
> >
> >
> > Jianbin
> >
> >
> >
> >
>
>
> --
> Lance Norskog
> goksron@gmail.com
>
>

Re: Solr for noSQL

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Personally, I just create a view that flattens out the database and renames the 
fields as I desire. Then I call the view with the DIH to import it.

Solr doesn't knwo anything about the databsae, except how to get a connection 
and fetch rows. And that's pretty darn useful, just that much less code to 
write.

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



----- Original Message ----
From: Upayavira <uv...@odoko.co.uk>
To: solr-user@lucene.apache.org
Sent: Fri, January 28, 2011 1:41:42 AM
Subject: Re: Solr for noSQL



On Thu, 27 Jan 2011 21:38 -0800, "Dennis Gearon" <ge...@sbcglobal.net>
wrote:
> Why not make one's own DIH handler, Lance?

Personally, I don't like that approach. Solr is best related to as
something of a black box that you configure, then push content to.
Having Solr know about your data sources, and pull content in seems to
me to be mixing concerns.

I relate to the DIH as a useful tool for smaller sites or for
prototyping, but would expect anything more substantial to require an
indexing application that gives you full control over the indexing
process. It could be a lightweight app that uses a MongoDB java client
and SolrJ, and simply pulls from one and pushes to the other. If you
don't want to run another JVM, it could run as a separate webapp within
your Solr JVM.

From an architectural point of view, do you configure Mysql, or MongoDB
for that matter, to pull content into itself? Likewise, Solr should be a
service that listens, waiting to be given data.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: Solr for noSQL

Posted by Upayavira <uv...@odoko.co.uk>.

On Thu, 27 Jan 2011 21:38 -0800, "Dennis Gearon" <ge...@sbcglobal.net>
wrote:
> Why not make one's own DIH handler, Lance?

Personally, I don't like that approach. Solr is best related to as
something of a black box that you configure, then push content to.
Having Solr know about your data sources, and pull content in seems to
me to be mixing concerns.

I relate to the DIH as a useful tool for smaller sites or for
prototyping, but would expect anything more substantial to require an
indexing application that gives you full control over the indexing
process. It could be a lightweight app that uses a MongoDB java client
and SolrJ, and simply pulls from one and pushes to the other. If you
don't want to run another JVM, it could run as a separate webapp within
your Solr JVM.

>From an architectural point of view, do you configure Mysql, or MongoDB
for that matter, to pull content into itself? Likewise, Solr should be a
service that listens, waiting to be given data.

Upayavira
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: Solr for noSQL

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Why not make one's own DIH handler, Lance?

 Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

----- Original Message ----
From: Lance Norskog <go...@gmail.com>
To: solr-user@lucene.apache.org
Sent: Thu, January 27, 2011 9:33:25 PM
Subject: Re: Solr for noSQL

There no special connectors available to read from the key-value
stores like memcache/cassandra/mongodb. You would have to get a Java
client library for the DB and code your own dataimporthandler
datasource.  I cannot recommend this; you should make your own program
to read data and upload to Solr with one of the Solr client libraries.

Lance

On 1/27/11, Jianbin Dai <jd...@huawei.com> wrote:
> Hi,
>
>
>
> Do we have data import handler to fast read in data from noSQL database,
> specifically, MongoDB I am thinking to use?
>
> Or a more general question, how does Solr work with noSQL database?
>
> Thanks.
>
>
>
> Jianbin
>
>
>
>

-- 
Lance Norskog
goksron@gmail.com

Re: Solr for noSQL

Posted by Dai Jianbin 00901725 <jd...@huawei.com>.

Do we have performance measurement? Would it be much slower compared to other DIH?


> There no special connectors available to read from the key-value
> stores like memcache/cassandra/mongodb. You would have to get a Java
> client library for the DB and code your own dataimporthandler
> datasource.  I cannot recommend this; you should make your own program
> to read data and upload to Solr with one of the Solr client libraries.
> 
> Lance
> 
> On 1/27/11, Jianbin Dai <jd...@huawei.com> wrote:
> > Hi,
> >
> >
> >
> > Do we have data import handler to fast read in data from noSQL 
> database,> specifically, MongoDB I am thinking to use?
> >
> > Or a more general question, how does Solr work with noSQL database?
> >
> > Thanks.
> >
> >
> >
> > Jianbin
> >
> >
> >
> >
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
>

Re: Solr for noSQL

Posted by Lance Norskog <go...@gmail.com>.

There no special connectors available to read from the key-value
stores like memcache/cassandra/mongodb. You would have to get a Java
client library for the DB and code your own dataimporthandler
datasource.  I cannot recommend this; you should make your own program
to read data and upload to Solr with one of the Solr client libraries.

Lance

On 1/27/11, Jianbin Dai <jd...@huawei.com> wrote:
> Hi,
>
>
>
> Do we have data import handler to fast read in data from noSQL database,
> specifically, MongoDB I am thinking to use?
>
> Or a more general question, how does Solr work with noSQL database?
>
> Thanks.
>
>
>
> Jianbin
>
>
>
>

-- 
Lance Norskog
goksron@gmail.com

Re: Solr for noSQL

Posted by openvictor Open <op...@gmail.com>.

Hi All I don't know if it answers any of your question but if you are
interested by that check out :

Lucandra ( Cassandra + Lucene)



2011/2/1 Steven Noels <st...@outerthought.org>

> On Tue, Feb 1, 2011 at 11:52 AM, Upayavira <uv...@odoko.co.uk> wrote:
>
>
> >
> > Apologies if my "nothing funky" sounded like you weren't doing cool
> > stuff.
>
>
> No offense whatsoever. I think my longer reply paints a more accurate light
> on what Lily means in terms of "SOLR for NoSQL", and it was your reaction
> who triggered this additional explanation.
>
>
> > I was merely attempting to say that I very much doubt you were
> > doing anything funky like putting HBase underneath Solr as a replacement
> > of FSDirectory.
>
>
> There are some initiatives in the context of Cassandra IIRC, as well as a
> project which stores Lucene index files in HBase tables, but frankly they
> seem more experimentation, and also I think the nature of how Lucene/SOLR
> works + what HBase does on top of Hadoop FS somehow is in conflict with
> each
> other. Too many layers of indirection will kill performance on every layer.
>
>
>
> > I was trying to imply that, likely your integration with
> > Solr was relatively conventional (interacting with its REST interface),
> >
>
>
> Yep. We figured that was the wiser road to walk, and leaves a clear-defined
> interface and possible area of improvement against a too-low level of
> integration.
>
>
> > and the "funky" stuff that you are doing sits outside of that space.
> >
> > Hope that's a clearer (and more accurate?) attempt at what I was trying
> > to say.
> >
> > Upayavira (who finds the Lily project interesting, and would love to
> > find the time to play with it)
> >
>
> Anytime, Upayavira. Anytime! ;-)
>
> Steven.
> --
> Steven Noels
> http://outerthought.org/
> Scalable Smart Data
> Makers of Kauri, Daisy CMS and Lily
>

Re: Solr for noSQL

Posted by Steven Noels <st...@outerthought.org>.

On Tue, Feb 1, 2011 at 11:52 AM, Upayavira <uv...@odoko.co.uk> wrote:

>
> Apologies if my "nothing funky" sounded like you weren't doing cool
> stuff.

No offense whatsoever. I think my longer reply paints a more accurate light
on what Lily means in terms of "SOLR for NoSQL", and it was your reaction
who triggered this additional explanation.

> I was merely attempting to say that I very much doubt you were
> doing anything funky like putting HBase underneath Solr as a replacement
> of FSDirectory.

There are some initiatives in the context of Cassandra IIRC, as well as a
project which stores Lucene index files in HBase tables, but frankly they
seem more experimentation, and also I think the nature of how Lucene/SOLR
works + what HBase does on top of Hadoop FS somehow is in conflict with each
other. Too many layers of indirection will kill performance on every layer.

> I was trying to imply that, likely your integration with
> Solr was relatively conventional (interacting with its REST interface),
>

Yep. We figured that was the wiser road to walk, and leaves a clear-defined
interface and possible area of improvement against a too-low level of
integration.

> and the "funky" stuff that you are doing sits outside of that space.
>
> Hope that's a clearer (and more accurate?) attempt at what I was trying
> to say.
>
> Upayavira (who finds the Lily project interesting, and would love to
> find the time to play with it)
>

Anytime, Upayavira. Anytime! ;-)

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily

Re: Solr for noSQL

Posted by Upayavira <uv...@odoko.co.uk>.


On Tue, 01 Feb 2011 07:22 +0100, "Steven Noels"
<st...@outerthought.org> wrote:
> On Mon, Jan 31, 2011 at 9:38 PM, Upayavira <uv...@odoko.co.uk> wrote:
> 
> >
> >
> > On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups"
> > <es...@gmail.com> wrote:
> > > What are the advantages of using something like HBase over your standard
> > > Lucene index with Solr? It would seem to me like you'd be losing a lot of
> > > what Lucene has to offer!?!
> >
> > I think Steven is saying that he has an indexer app that reads from
> > HBase and writes to a standard Solr by hitting its Rest API.
> >
> > So, nothing funky, just a little app that reads from HBase and posts to
> > Solr.
> >
> 
> 
> We're doing something like offering a relational-database-like experience
> (i.e. a schema language, storing typed data instead of byte[]s, secondary
> indexing facilities), with some content management features (versioning,
> blob storage), combined with SOLR as a search index (with mapping between
> our schema and that of SOLR), the index being maintained incrementally
> and
> through map/reduce (for reindexing). We keep multiple versions of the
> index
> if you want, with state management and we do text extraction with Tika.
> All
> this happens fully distributed, so you can play with different boxes
> serving
> as HBase datanode, or index feeder, SOLR search node, etc etc.
> 
> All that sits behind a Java API that uses Avro underneath, and a REST
> interface as well (searches go directly to SOLR). For future versions, we
> will integrate a recommendation engine and some analytics tools as well.
> 
> So yes, we do more (or rather: different things) than what Lucene/SOLR
> does,
> as we offer a full-featured data storage environment, stuffing your data
> in
> HBase (which scales better than MySQL), and make it searchable through
> SOLR.
> 
> The 'funky app' you're referring at now sits at about 3 manyears of
> fulltime
> development, BTW. ;-)

Apologies if my "nothing funky" sounded like you weren't doing cool
stuff. I was merely attempting to say that I very much doubt you were
doing anything funky like putting HBase underneath Solr as a replacement
of FSDirectory. I was trying to imply that, likely your integration with
Solr was relatively conventional (interacting with its REST interface),
and the "funky" stuff that you are doing sits outside of that space.

Hope that's a clearer (and more accurate?) attempt at what I was trying
to say.

Upayavira (who finds the Lily project interesting, and would love to
find the time to play with it)
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: Solr for noSQL

Posted by Steven Noels <st...@outerthought.org>.

On Mon, Jan 31, 2011 at 9:38 PM, Upayavira <uv...@odoko.co.uk> wrote:

>
>
> On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups"
> <es...@gmail.com> wrote:
> > What are the advantages of using something like HBase over your standard
> > Lucene index with Solr? It would seem to me like you'd be losing a lot of
> > what Lucene has to offer!?!
>
> I think Steven is saying that he has an indexer app that reads from
> HBase and writes to a standard Solr by hitting its Rest API.
>
> So, nothing funky, just a little app that reads from HBase and posts to
> Solr.
>

We're doing something like offering a relational-database-like experience
(i.e. a schema language, storing typed data instead of byte[]s, secondary
indexing facilities), with some content management features (versioning,
blob storage), combined with SOLR as a search index (with mapping between
our schema and that of SOLR), the index being maintained incrementally and
through map/reduce (for reindexing). We keep multiple versions of the index
if you want, with state management and we do text extraction with Tika. All
this happens fully distributed, so you can play with different boxes serving
as HBase datanode, or index feeder, SOLR search node, etc etc.

All that sits behind a Java API that uses Avro underneath, and a REST
interface as well (searches go directly to SOLR). For future versions, we
will integrate a recommendation engine and some analytics tools as well.

So yes, we do more (or rather: different things) than what Lucene/SOLR does,
as we offer a full-featured data storage environment, stuffing your data in
HBase (which scales better than MySQL), and make it searchable through SOLR.

The 'funky app' you're referring at now sits at about 3 manyears of fulltime
development, BTW. ;-)

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily

Re: Solr for noSQL

Posted by Upayavira <uv...@odoko.co.uk>.


On Mon, 31 Jan 2011 08:40 -0500, "Estrada Groups"
<es...@gmail.com> wrote:
> What are the advantages of using something like HBase over your standard
> Lucene index with Solr? It would seem to me like you'd be losing a lot of
> what Lucene has to offer!?!

I think Steven is saying that he has an indexer app that reads from
HBase and writes to a standard Solr by hitting its Rest API.

So, nothing funky, just a little app that reads from HBase and posts to
Solr.

Upayavira

> On Jan 31, 2011, at 5:34 AM, Steven Noels <st...@outerthought.org>
> wrote:
> 
> > On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai <jd...@huawei.com> wrote:
> > 
> >> Hi,
> >> 
> >> 
> >> 
> >> Do we have data import handler to fast read in data from noSQL database,
> >> specifically, MongoDB I am thinking to use?
> >> 
> >> Or a more general question, how does Solr work with noSQL database?
> >> 
> > 
> > 
> > Can't say anything about MongoDB, but we have an integration of SOLR with
> > HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR
> > index update API rather than a DIH - as we had the need to have incremental
> > updates. The Indexer component we wrote does mapping from Lily/HBase schema
> > to SOLR, as we also felt the need that both schemas shouldn't necessarily be
> > identical.
> > 
> > Steven.
> > -- 
> > Steven Noels
> > http://outerthought.org/
> > Scalable Smart Data
> > Makers of Kauri, Daisy CMS and Lily
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: Solr for noSQL

Posted by Estrada Groups <es...@gmail.com>.

What are the advantages of using something like HBase over your standard Lucene index with Solr? It would seem to me like you'd be losing a lot of what Lucene has to offer!?!

Adam

On Jan 31, 2011, at 5:34 AM, Steven Noels <st...@outerthought.org> wrote:

> On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai <jd...@huawei.com> wrote:
> 
>> Hi,
>> 
>> 
>> 
>> Do we have data import handler to fast read in data from noSQL database,
>> specifically, MongoDB I am thinking to use?
>> 
>> Or a more general question, how does Solr work with noSQL database?
>> 
> 
> 
> Can't say anything about MongoDB, but we have an integration of SOLR with
> HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR
> index update API rather than a DIH - as we had the need to have incremental
> updates. The Indexer component we wrote does mapping from Lily/HBase schema
> to SOLR, as we also felt the need that both schemas shouldn't necessarily be
> identical.
> 
> Steven.
> -- 
> Steven Noels
> http://outerthought.org/
> Scalable Smart Data
> Makers of Kauri, Daisy CMS and Lily

Re: Solr for noSQL

Posted by Steven Noels <st...@outerthought.org>.

On Fri, Jan 28, 2011 at 1:30 AM, Jianbin Dai <jd...@huawei.com> wrote:

> Hi,
>
>
>
> Do we have data import handler to fast read in data from noSQL database,
> specifically, MongoDB I am thinking to use?
>
> Or a more general question, how does Solr work with noSQL database?
>

Can't say anything about MongoDB, but we have an integration of SOLR with
HBase inside Lily - www.lilyproject.org. It indeed uses the 'normal' SOLR
index update API rather than a DIH - as we had the need to have incremental
updates. The Indexer component we wrote does mapping from Lily/HBase schema
to SOLR, as we also felt the need that both schemas shouldn't necessarily be
identical.

Steven.
-- 
Steven Noels
http://outerthought.org/
Scalable Smart Data
Makers of Kauri, Daisy CMS and Lily