You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Surender Singh <su...@gmail.com> on 2011/01/20 13:07:13 UTC

Use Cassandra to store 2 million records of persons

Hi All

I want to use Apache Cassandra to store information (like first name, last
name, gender, address)  about 2 million people.  Then need to perform
analytic and reporting on that data.
is need to store information about 2 million people in Mysql and then
transfer that information into Cassandra.?

Please help me as i m new to Apache Cassandra.

if you have any use case like that, please share.

Thanks and regards
Surender Singh

Re: Use Cassandra to store 2 million records of persons

Posted by Dave Gardner <da...@imagini.net>.
Our experience of Cassandra+Hadoop is good.

We have a 16 node Cassandra cluster storing 110m users plus a 5 node
Hadoop cluster. We can scan through all rows in about 2.5 hours.

Dave


On Thursday, 20 January 2011, David G. Boney
<db...@semanticartifacts.com> wrote:
> I don't think the below statement accurately describes data mining or using Cassandra for data mining. All the techniques I am familiar with for either data mining or machine learning, which data mining is a subset, make one or more sequential scans through the data to abstract statistics or build models. The question is how well does Cassandra perform with sequential scans through the data? The Hadoop model works very well for many machine learning problems because it is oriented toward sequential scans through the data. The speed of the Hadoop interface to Cassandra would have a lot of bearing on the application of Cassandra to data mining or machine learning problems.
>
> -------------Sincerely,David G. Boneydboney1@semanticartifacts.comhttp://www.semanticartifacts.com
>
>
>
>
> On Jan 20, 2011, at 6:35 AM, David Boxenhorn wrote:
> Cassandra is not a good solution for data mining type problems, since it doesn't have ad-hoc queries. Cassandra is designed to maximize throughput, which is not usually a problem for data mining.
>
> On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh <su...@gmail.com> wrote:
>
> Hi All
>
> I want to use Apache Cassandra to store information (like first name, last
> name, gender, address)  about 2 million people.  Then need to perform
> analytic and reporting on that data.
> is need to store information about 2 million people in Mysql and then
> transfer that information into Cassandra.?
>
> Please help me as i m new to Apache Cassandra.
>
> if you have any use case like that, please share.
>
> Thanks and regards
>
> Surender Singh
>
>
>
>
>

-- 
*Dave Gardner*
Technical Architect

[image: imagini_58mmX15mm.png]   [image: VisualDNA-Logo-small.png]

*Imagini Europe Limited*
7 Moor Street, London W1D 5NB

[image: phone_icon.png] +44 20 7734 7033
[image: skype_icon.png] daveg79
[image: emailIcon.png] dave.gardner@imagini.net
[image: icon-web.png] http://www.visualdna.com

Imagini Europe Limited, Company number 5565112 (England
and Wales), Registered address: c/o Bird & Bird,
90 Fetter Lane, London, EC4A 1EQ, United Kingdom

Re: Use Cassandra to store 2 million records of persons

Posted by "David G. Boney" <db...@semanticartifacts.com>.
I don't think the below statement accurately describes data mining or using Cassandra for data mining. All the techniques I am familiar with for either data mining or machine learning, which data mining is a subset, make one or more sequential scans through the data to abstract statistics or build models. The question is how well does Cassandra perform with sequential scans through the data? The Hadoop model works very well for many machine learning problems because it is oriented toward sequential scans through the data. The speed of the Hadoop interface to Cassandra would have a lot of bearing on the application of Cassandra to data mining or machine learning problems.

-------------
Sincerely,
David G. Boney
dboney1@semanticartifacts.com
http://www.semanticartifacts.com




On Jan 20, 2011, at 6:35 AM, David Boxenhorn wrote:

> Cassandra is not a good solution for data mining type problems, since it doesn't have ad-hoc queries. Cassandra is designed to maximize throughput, which is not usually a problem for data mining. 
> 
> On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh <su...@gmail.com> wrote:
> Hi All
> 
> I want to use Apache Cassandra to store information (like first name, last
> name, gender, address)  about 2 million people.  Then need to perform
> analytic and reporting on that data.
> is need to store information about 2 million people in Mysql and then
> transfer that information into Cassandra.?
> 
> Please help me as i m new to Apache Cassandra.
> 
> if you have any use case like that, please share.
> 
> Thanks and regards
> Surender Singh
> 
> 


Re: Use Cassandra to store 2 million records of persons

Posted by Surender Singh <su...@gmail.com>.
David

Please tell me any solution for it.

Thanks and regards
Surender Singh

On Thu, Jan 20, 2011 at 6:05 PM, David Boxenhorn <da...@lookin2.com> wrote:

> Cassandra is not a good solution for data mining type problems, since it
> doesn't have ad-hoc queries. Cassandra is designed to maximize throughput,
> which is not usually a problem for data mining.
>
> On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh <su...@gmail.com>wrote:
>
>> Hi All
>>
>> I want to use Apache Cassandra to store information (like first name, last
>> name, gender, address)  about 2 million people.  Then need to perform
>> analytic and reporting on that data.
>> is need to store information about 2 million people in Mysql and then
>> transfer that information into Cassandra.?
>>
>> Please help me as i m new to Apache Cassandra.
>>
>> if you have any use case like that, please share.
>>
>> Thanks and regards
>> Surender Singh
>>
>>
>

Re: Use Cassandra to store 2 million records of persons

Posted by David Boxenhorn <da...@lookin2.com>.
Cassandra is not a good solution for data mining type problems, since it
doesn't have ad-hoc queries. Cassandra is designed to maximize throughput,
which is not usually a problem for data mining.

On Thu, Jan 20, 2011 at 2:07 PM, Surender Singh <su...@gmail.com>wrote:

> Hi All
>
> I want to use Apache Cassandra to store information (like first name, last
> name, gender, address)  about 2 million people.  Then need to perform
> analytic and reporting on that data.
> is need to store information about 2 million people in Mysql and then
> transfer that information into Cassandra.?
>
> Please help me as i m new to Apache Cassandra.
>
> if you have any use case like that, please share.
>
> Thanks and regards
> Surender Singh
>
>