You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "bharath.mvkumar" <bh...@gmail.com> on 2017/05/09 02:43:51 UTC

SOLR as nosql database store

Hi All,

We have a use case where we have mysql database which stores documents and
also some of the fields in the document is also indexed in solr. 
We plan to move all those documents to solr by making solr as the nosql
datastore for storing those documents. The reason we plan to do this is
because we have to support cross center data replication for both mysql and
solr and we are in a way duplicating the same data.The number of writes we
do per second is around 10,000. Also currently we have only one shard and we
have around 70 million records and we plan to support close to 1 billion
records and also perform sharding.

Using solr as the nosql database is a good choice or should we look at
Cassandra for our use case? 

Thanks,
Bharath Kumar



--
View this message in context: http://lucene.472066.n3.nabble.com/SOLR-as-nosql-database-store-tp4334119.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR as nosql database store

Posted by Hrishikesh Gadre <ga...@gmail.com>.
Hi Bharath,

In general its not a good idea to use Solr as the *primary data store* for
various reasons listed here,

https://wiki.apache.org/solr/HowToReindex

<https://wiki.apache.org/solr/HowToReindex>
But if you design your system such that at-least one copy of the raw data
is stored in some other storage system then you can use Solr as the
operational database.

Hope this helps.

-Hrishikesh




On Mon, May 8, 2017 at 7:43 PM, bharath.mvkumar <bh...@gmail.com>
wrote:

> Hi All,
>
> We have a use case where we have mysql database which stores documents and
> also some of the fields in the document is also indexed in solr.
> We plan to move all those documents to solr by making solr as the nosql
> datastore for storing those documents. The reason we plan to do this is
> because we have to support cross center data replication for both mysql and
> solr and we are in a way duplicating the same data.The number of writes we
> do per second is around 10,000. Also currently we have only one shard and
> we
> have around 70 million records and we plan to support close to 1 billion
> records and also perform sharding.
>
> Using solr as the nosql database is a good choice or should we look at
> Cassandra for our use case?
>
> Thanks,
> Bharath Kumar
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: SOLR as nosql database store

Posted by Bharath Kumar <bh...@gmail.com>.
Thanks Walter and Mike. In our use case we have same schema on both source
and target sites. The idea is if we can avoid mysql replication on the
target site for a particular table in our mysql schema. Currently, we index
some of the fields in that table in solr, we want to move all the fields to
solr and index some of them and so store only for others.

On Wed, May 10, 2017 at 10:09 AM, Bharath Kumar <bh...@gmail.com>
wrote:

> Yes Mike we have CDCR replication as well.
>
> On Wed, May 10, 2017 at 1:15 AM, Mike Drob <md...@apache.org> wrote:
>
>> > The searching install will be able to rebuild itself from the data
>> storage install when that
>> is required.
>>
>> Is this a use case for CDCR?
>>
>> Mike
>>
>> On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>>
>> > On 5/9/2017 12:58 AM, Bharath Kumar wrote:
>> > > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
>> > will that not serve as backup when something goes wrong? Also we use
>> latest
>> > solr 6 and from the documentation of solr, the indexing performance has
>> > been good. The reason is that we are using MySQL as the primary data
>> store
>> > and the performance might not be optimal if we write data at a very
>> rapid
>> > rate. Already we index almost half the fields that are in MySQL in solr.
>> >
>> > A replica is protection against data loss in the event of hardware
>> > failure, but there are classes of problems that it cannot protect
>> against.
>> >
>> > Although Solr (Lucene) does try *really* hard to never lose data that it
>> > hasn't been asked to delete, it is not designed to be a database.  It's
>> > a search engine.  Solr doesn't offer the same kinds of guarantees about
>> > the data it contains that software like MySQL does.
>> >
>> > I personally don't recommend trying to use Solr as a primary data store,
>> > but if that's what you really want to do, then I would suggest that you
>> > have two complete Solr installs, with multiple replicas on both.  One of
>> > them will be used for searching and have a configuration you're already
>> > familiar with, the other will be purely for data storage -- only certain
>> > fields like the uniqueKey will be indexed, but every other field will be
>> > stored only.
>> >
>> > Running with two separate Solr installs will allow you to optimize one
>> > for searching and the other for data storage.  The searching install
>> > will be able to rebuild itself from the data storage install when that
>> > is required.  If better performance is needed for the rebuild, you have
>> > the option of writing a multi-threaded or multi-process program that
>> > reads from one and writes to the other.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: SOLR as nosql database store

Posted by Bharath Kumar <bh...@gmail.com>.
Yes Mike we have CDCR replication as well.

On Wed, May 10, 2017 at 1:15 AM, Mike Drob <md...@apache.org> wrote:

> > The searching install will be able to rebuild itself from the data
> storage install when that
> is required.
>
> Is this a use case for CDCR?
>
> Mike
>
> On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
> > On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> > will that not serve as backup when something goes wrong? Also we use
> latest
> > solr 6 and from the documentation of solr, the indexing performance has
> > been good. The reason is that we are using MySQL as the primary data
> store
> > and the performance might not be optimal if we write data at a very rapid
> > rate. Already we index almost half the fields that are in MySQL in solr.
> >
> > A replica is protection against data loss in the event of hardware
> > failure, but there are classes of problems that it cannot protect
> against.
> >
> > Although Solr (Lucene) does try *really* hard to never lose data that it
> > hasn't been asked to delete, it is not designed to be a database.  It's
> > a search engine.  Solr doesn't offer the same kinds of guarantees about
> > the data it contains that software like MySQL does.
> >
> > I personally don't recommend trying to use Solr as a primary data store,
> > but if that's what you really want to do, then I would suggest that you
> > have two complete Solr installs, with multiple replicas on both.  One of
> > them will be used for searching and have a configuration you're already
> > familiar with, the other will be purely for data storage -- only certain
> > fields like the uniqueKey will be indexed, but every other field will be
> > stored only.
> >
> > Running with two separate Solr installs will allow you to optimize one
> > for searching and the other for data storage.  The searching install
> > will be able to rebuild itself from the data storage install when that
> > is required.  If better performance is needed for the rebuild, you have
> > the option of writing a multi-threaded or multi-process program that
> > reads from one and writes to the other.
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: SOLR as nosql database store

Posted by Walter Underwood <wu...@wunderwood.org>.
CDCR doesn’t rebuild it so much as copy it.

To change the schema, you’ll need to reindex.

I’ve worked on two NoSQL databases (Objectivity and MarkLogic) and I’ve worked on Solr. They are utterly different designs, intended to do different things.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 10, 2017, at 5:24 AM, Shawn Heisey <ap...@elyograg.org> wrote:
> 
> On 5/10/2017 2:15 AM, Mike Drob wrote:
>>> The searching install will be able to rebuild itself from the data
>> storage install when that
>> is required.
>> 
>> Is this a use case for CDCR?
> 
> Does CDCR require an identical schema between locations?  If not, then I
> think CDCR can keep a searching install up to date by copying
> transaction logs, but I don't think it would be able to do the initial
> population.
> 
> I'm pretty sure that index creation would have to be done from scratch
> by indexing.  The source could be the storage install, but you'd have to
> use DIH or a custom program to take care of it.
> 
> Thanks,
> Shawn
> 


Re: SOLR as nosql database store

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/10/2017 2:15 AM, Mike Drob wrote:
>> The searching install will be able to rebuild itself from the data
> storage install when that
> is required.
>
> Is this a use case for CDCR?

Does CDCR require an identical schema between locations?  If not, then I
think CDCR can keep a searching install up to date by copying
transaction logs, but I don't think it would be able to do the initial
population.

I'm pretty sure that index creation would have to be done from scratch
by indexing.  The source could be the storage install, but you'd have to
use DIH or a custom program to take care of it.

Thanks,
Shawn


Re: SOLR as nosql database store

Posted by Mike Drob <md...@apache.org>.
> The searching install will be able to rebuild itself from the data
storage install when that
is required.

Is this a use case for CDCR?

Mike

On Tue, May 9, 2017 at 6:39 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> will that not serve as backup when something goes wrong? Also we use latest
> solr 6 and from the documentation of solr, the indexing performance has
> been good. The reason is that we are using MySQL as the primary data store
> and the performance might not be optimal if we write data at a very rapid
> rate. Already we index almost half the fields that are in MySQL in solr.
>
> A replica is protection against data loss in the event of hardware
> failure, but there are classes of problems that it cannot protect against.
>
> Although Solr (Lucene) does try *really* hard to never lose data that it
> hasn't been asked to delete, it is not designed to be a database.  It's
> a search engine.  Solr doesn't offer the same kinds of guarantees about
> the data it contains that software like MySQL does.
>
> I personally don't recommend trying to use Solr as a primary data store,
> but if that's what you really want to do, then I would suggest that you
> have two complete Solr installs, with multiple replicas on both.  One of
> them will be used for searching and have a configuration you're already
> familiar with, the other will be purely for data storage -- only certain
> fields like the uniqueKey will be indexed, but every other field will be
> stored only.
>
> Running with two separate Solr installs will allow you to optimize one
> for searching and the other for data storage.  The searching install
> will be able to rebuild itself from the data storage install when that
> is required.  If better performance is needed for the rebuild, you have
> the option of writing a multi-threaded or multi-process program that
> reads from one and writes to the other.
>
> Thanks,
> Shawn
>
>

Re: SOLR as nosql database store

Posted by Bharath Kumar <bh...@gmail.com>.
Thanks Shawn and Rick for your suggestions. We will surely look at these
options.

On Tue, May 9, 2017 at 4:39 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> > Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
> will that not serve as backup when something goes wrong? Also we use latest
> solr 6 and from the documentation of solr, the indexing performance has
> been good. The reason is that we are using MySQL as the primary data store
> and the performance might not be optimal if we write data at a very rapid
> rate. Already we index almost half the fields that are in MySQL in solr.
>
> A replica is protection against data loss in the event of hardware
> failure, but there are classes of problems that it cannot protect against.
>
> Although Solr (Lucene) does try *really* hard to never lose data that it
> hasn't been asked to delete, it is not designed to be a database.  It's
> a search engine.  Solr doesn't offer the same kinds of guarantees about
> the data it contains that software like MySQL does.
>
> I personally don't recommend trying to use Solr as a primary data store,
> but if that's what you really want to do, then I would suggest that you
> have two complete Solr installs, with multiple replicas on both.  One of
> them will be used for searching and have a configuration you're already
> familiar with, the other will be purely for data storage -- only certain
> fields like the uniqueKey will be indexed, but every other field will be
> stored only.
>
> Running with two separate Solr installs will allow you to optimize one
> for searching and the other for data storage.  The searching install
> will be able to rebuild itself from the data storage install when that
> is required.  If better performance is needed for the rebuild, you have
> the option of writing a multi-threaded or multi-process program that
> reads from one and writes to the other.
>
> Thanks,
> Shawn
>
>


-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: SOLR as nosql database store

Posted by Shawn Heisey <ap...@elyograg.org>.
On 5/9/2017 12:58 AM, Bharath Kumar wrote:
> Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas, will that not serve as backup when something goes wrong? Also we use latest solr 6 and from the documentation of solr, the indexing performance has been good. The reason is that we are using MySQL as the primary data store and the performance might not be optimal if we write data at a very rapid rate. Already we index almost half the fields that are in MySQL in solr.

A replica is protection against data loss in the event of hardware
failure, but there are classes of problems that it cannot protect against.

Although Solr (Lucene) does try *really* hard to never lose data that it
hasn't been asked to delete, it is not designed to be a database.  It's
a search engine.  Solr doesn't offer the same kinds of guarantees about
the data it contains that software like MySQL does.

I personally don't recommend trying to use Solr as a primary data store,
but if that's what you really want to do, then I would suggest that you
have two complete Solr installs, with multiple replicas on both.  One of
them will be used for searching and have a configuration you're already
familiar with, the other will be purely for data storage -- only certain
fields like the uniqueKey will be indexed, but every other field will be
stored only.

Running with two separate Solr installs will allow you to optimize one
for searching and the other for data storage.  The searching install
will be able to rebuild itself from the data storage install when that
is required.  If better performance is needed for the rebuild, you have
the option of writing a multi-threaded or multi-process program that
reads from one and writes to the other.

Thanks,
Shawn


Re: SOLR as nosql database store

Posted by Rick Leir <rl...@leirtech.com>.
The NoSQL DB can be Mongo Couch or something else. Choose a document DB by preference. You can add to these faster than MySQL (I think, test for sure). These DB's can have replicas easily. Choose one of them and use a simple script to index into Solr. Cheers -- Rick

On May 9, 2017 2:58:21 AM EDT, Bharath Kumar <bh...@gmail.com> wrote:
>Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas,
>will
>that not serve as backup when something goes wrong? Also we use latest
>solr
>6 and from the documentation of solr, the indexing performance has been
>good. The reason is that we are using MySQL as the primary data store
>and
>the performance might not be optimal if we write data at a very rapid
>rate.
>Already we index almost half the fields that are in MySQL in solr.
>
>On Mon, May 8, 2017 at 9:24 PM, Dave <ha...@gmail.com>
>wrote:
>
>> You will want to have both solr and a sql/nosql data storage option.
>They
>> serve different purposes
>>
>>
>> > On May 8, 2017, at 10:43 PM, bharath.mvkumar
><bh...@gmail.com>
>> wrote:
>> >
>> > Hi All,
>> >
>> > We have a use case where we have mysql database which stores
>documents
>> and
>> > also some of the fields in the document is also indexed in solr.
>> > We plan to move all those documents to solr by making solr as the
>nosql
>> > datastore for storing those documents. The reason we plan to do
>this is
>> > because we have to support cross center data replication for both
>mysql
>> and
>> > solr and we are in a way duplicating the same data.The number of
>writes
>> we
>> > do per second is around 10,000. Also currently we have only one
>shard
>> and we
>> > have around 70 million records and we plan to support close to 1
>billion
>> > records and also perform sharding.
>> >
>> > Using solr as the nosql database is a good choice or should we look
>at
>> > Cassandra for our use case?
>> >
>> > Thanks,
>> > Bharath Kumar
>> >
>> >
>> >
>> > --
>> > View this message in context: http://lucene.472066.n3.
>> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
>-- 
>Thanks & Regards,
>Bharath MV Kumar
>
>"Life is short, enjoy every moment of it"

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: SOLR as nosql database store

Posted by Bharath Kumar <bh...@gmail.com>.
Thanks Hrishikesh and Dave. We use SOLR cloud with 2 extra replicas, will
that not serve as backup when something goes wrong? Also we use latest solr
6 and from the documentation of solr, the indexing performance has been
good. The reason is that we are using MySQL as the primary data store and
the performance might not be optimal if we write data at a very rapid rate.
Already we index almost half the fields that are in MySQL in solr.

On Mon, May 8, 2017 at 9:24 PM, Dave <ha...@gmail.com> wrote:

> You will want to have both solr and a sql/nosql data storage option. They
> serve different purposes
>
>
> > On May 8, 2017, at 10:43 PM, bharath.mvkumar <bh...@gmail.com>
> wrote:
> >
> > Hi All,
> >
> > We have a use case where we have mysql database which stores documents
> and
> > also some of the fields in the document is also indexed in solr.
> > We plan to move all those documents to solr by making solr as the nosql
> > datastore for storing those documents. The reason we plan to do this is
> > because we have to support cross center data replication for both mysql
> and
> > solr and we are in a way duplicating the same data.The number of writes
> we
> > do per second is around 10,000. Also currently we have only one shard
> and we
> > have around 70 million records and we plan to support close to 1 billion
> > records and also perform sharding.
> >
> > Using solr as the nosql database is a good choice or should we look at
> > Cassandra for our use case?
> >
> > Thanks,
> > Bharath Kumar
> >
> >
> >
> > --
> > View this message in context: http://lucene.472066.n3.
> nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: SOLR as nosql database store

Posted by Dave <ha...@gmail.com>.
You will want to have both solr and a sql/nosql data storage option. They serve different purposes 


> On May 8, 2017, at 10:43 PM, bharath.mvkumar <bh...@gmail.com> wrote:
> 
> Hi All,
> 
> We have a use case where we have mysql database which stores documents and
> also some of the fields in the document is also indexed in solr. 
> We plan to move all those documents to solr by making solr as the nosql
> datastore for storing those documents. The reason we plan to do this is
> because we have to support cross center data replication for both mysql and
> solr and we are in a way duplicating the same data.The number of writes we
> do per second is around 10,000. Also currently we have only one shard and we
> have around 70 million records and we plan to support close to 1 billion
> records and also perform sharding.
> 
> Using solr as the nosql database is a good choice or should we look at
> Cassandra for our use case? 
> 
> Thanks,
> Bharath Kumar
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/SOLR-as-nosql-database-store-tp4334119.html
> Sent from the Solr - User mailing list archive at Nabble.com.