You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Cool Techi <co...@outlook.com> on 2012/11/16 21:10:29 UTC

Architecture Question



Hi,

I am not sure if this is the right forum for this question, but it would be great if I could be pointed in the right direction. We have been using a combination of MySql and Solr for all our company full text and query needs.  But as our customers have grow so has the amount of data and MySql is just not proving to be a right option for storing/querying.

I have been looking at Solr Cloud and it looks really impressive, but and not sure if we should give away our storage system. So, I have been exploring DataStax but a commercial option is out of question. So we were thinking of using hbase to store the data and at the same time index the data into Solr cloud, but for many reasons this design doesn't seem convincing (Also seen basic of Lilly).

1) Would it be recommended to just user Solr cloud with multiple replication or hbase-solr seems like good option
2) How much strain would be to keep both Solr Shard and Hbase node on the same machine
3) if there a calculation on what kind of machine configuration would I need to store 500-1000 million records. Most of these with be social data (Twitter/facebook/blogs etc) and how many shards.

Regards,
Ayush 


 		 	   		  

RE: Architecture Question

Posted by "Buttler, David" <bu...@llnl.gov>.
If you just want to store the data, you can dump it into HDFS sequence files.  While HBase is really nice if you want to process and serve data real-time, it adds overhead to use it as pure storage.
Dave

-----Original Message-----
From: Cool Techi [mailto:cooltechie@outlook.com] 
Sent: Friday, November 16, 2012 8:26 PM
To: solr-user@lucene.apache.org
Subject: RE: Architecture Question

Hi Otis,

Thanks for your reply, just wanted to check what NoSql structure would be best suited to store data and use the least amount of memory, since for most of my work Solr would be sufficient and I want to store data just in case we want to reindex and as a backup.

Regards,
Ayush

> Date: Fri, 16 Nov 2012 15:47:40 -0500
> Subject: Re: Architecture Question
> From: otis.gospodnetic@gmail.com
> To: solr-user@lucene.apache.org
> 
> Hello,
> 
> 
> 
> > I am not sure if this is the right forum for this question, but it would
> > be great if I could be pointed in the right direction. We have been using a
> > combination of MySql and Solr for all our company full text and query
> > needs.  But as our customers have grow so has the amount of data and MySql
> > is just not proving to be a right option for storing/querying.
> >
> > I have been looking at Solr Cloud and it looks really impressive, but and
> > not sure if we should give away our storage system. So, I have been
> > exploring DataStax but a commercial option is out of question. So we were
> > thinking of using hbase to store the data and at the same time index the
> > data into Solr cloud, but for many reasons this design doesn't seem
> > convincing (Also seen basic of Lilly).
> >
> > 1) Would it be recommended to just user Solr cloud with multiple
> > replication or hbase-solr seems like good option
> >
> 
> If you trust SolrCloud with replication and keep all your fields stored
> then you could live without an external DB.  At this point I personally
> would still want an external DB.  Whether HBase is the right DB for the job
> I can't tell because I don't know anything about your data, volume, access
> patterns, etc.  I can tell you that HBase does scale well - we have tables
> with many billions of rows stored in it for instance.
> 
> 
> > 2) How much strain would be to keep both Solr Shard and Hbase node on the
> > same machine
> >
> 
> HBase loves memory.  So does Solr.  They both dislike disk IO (who
> doesn't!).  Solr can use a lot of CPU for indexing/searching, depending on
> the volume.  HBase RegionServers can use a lot of CPU if you run MapReuce
> on data in HBase.
> 
> 
> > 3) if there a calculation on what kind of machine configuration would I
> > need to store 500-1000 million records. Most of these with be social data
> > (Twitter/facebook/blogs etc) and how many shards.
> >
> 
> No recipe here, unfortunately.  You'd have to experiment and test, do load
> and performance testing, etc.  If you need help with Solr + HBase, we
> happen to have a lot of experience with both and have even used them
> together for some of our clients.
> 
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
 		 	   		  

RE: Architecture Question

Posted by Cool Techi <co...@outlook.com>.
Hi Otis,

Thanks for your reply, just wanted to check what NoSql structure would be best suited to store data and use the least amount of memory, since for most of my work Solr would be sufficient and I want to store data just in case we want to reindex and as a backup.

Regards,
Ayush

> Date: Fri, 16 Nov 2012 15:47:40 -0500
> Subject: Re: Architecture Question
> From: otis.gospodnetic@gmail.com
> To: solr-user@lucene.apache.org
> 
> Hello,
> 
> 
> 
> > I am not sure if this is the right forum for this question, but it would
> > be great if I could be pointed in the right direction. We have been using a
> > combination of MySql and Solr for all our company full text and query
> > needs.  But as our customers have grow so has the amount of data and MySql
> > is just not proving to be a right option for storing/querying.
> >
> > I have been looking at Solr Cloud and it looks really impressive, but and
> > not sure if we should give away our storage system. So, I have been
> > exploring DataStax but a commercial option is out of question. So we were
> > thinking of using hbase to store the data and at the same time index the
> > data into Solr cloud, but for many reasons this design doesn't seem
> > convincing (Also seen basic of Lilly).
> >
> > 1) Would it be recommended to just user Solr cloud with multiple
> > replication or hbase-solr seems like good option
> >
> 
> If you trust SolrCloud with replication and keep all your fields stored
> then you could live without an external DB.  At this point I personally
> would still want an external DB.  Whether HBase is the right DB for the job
> I can't tell because I don't know anything about your data, volume, access
> patterns, etc.  I can tell you that HBase does scale well - we have tables
> with many billions of rows stored in it for instance.
> 
> 
> > 2) How much strain would be to keep both Solr Shard and Hbase node on the
> > same machine
> >
> 
> HBase loves memory.  So does Solr.  They both dislike disk IO (who
> doesn't!).  Solr can use a lot of CPU for indexing/searching, depending on
> the volume.  HBase RegionServers can use a lot of CPU if you run MapReuce
> on data in HBase.
> 
> 
> > 3) if there a calculation on what kind of machine configuration would I
> > need to store 500-1000 million records. Most of these with be social data
> > (Twitter/facebook/blogs etc) and how many shards.
> >
> 
> No recipe here, unfortunately.  You'd have to experiment and test, do load
> and performance testing, etc.  If you need help with Solr + HBase, we
> happen to have a lot of experience with both and have even used them
> together for some of our clients.
> 
> Otis
> --
> Performance Monitoring - http://sematext.com/spm/index.html
> Search Analytics - http://sematext.com/search-analytics/index.html
 		 	   		  

Re: Architecture Question

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hello,



> I am not sure if this is the right forum for this question, but it would
> be great if I could be pointed in the right direction. We have been using a
> combination of MySql and Solr for all our company full text and query
> needs.  But as our customers have grow so has the amount of data and MySql
> is just not proving to be a right option for storing/querying.
>
> I have been looking at Solr Cloud and it looks really impressive, but and
> not sure if we should give away our storage system. So, I have been
> exploring DataStax but a commercial option is out of question. So we were
> thinking of using hbase to store the data and at the same time index the
> data into Solr cloud, but for many reasons this design doesn't seem
> convincing (Also seen basic of Lilly).
>
> 1) Would it be recommended to just user Solr cloud with multiple
> replication or hbase-solr seems like good option
>

If you trust SolrCloud with replication and keep all your fields stored
then you could live without an external DB.  At this point I personally
would still want an external DB.  Whether HBase is the right DB for the job
I can't tell because I don't know anything about your data, volume, access
patterns, etc.  I can tell you that HBase does scale well - we have tables
with many billions of rows stored in it for instance.


> 2) How much strain would be to keep both Solr Shard and Hbase node on the
> same machine
>

HBase loves memory.  So does Solr.  They both dislike disk IO (who
doesn't!).  Solr can use a lot of CPU for indexing/searching, depending on
the volume.  HBase RegionServers can use a lot of CPU if you run MapReuce
on data in HBase.


> 3) if there a calculation on what kind of machine configuration would I
> need to store 500-1000 million records. Most of these with be social data
> (Twitter/facebook/blogs etc) and how many shards.
>

No recipe here, unfortunately.  You'd have to experiment and test, do load
and performance testing, etc.  If you need help with Solr + HBase, we
happen to have a lot of experience with both and have even used them
together for some of our clients.

Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html