You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by fr...@gmail.com on 2012/01/25 03:38:16 UTC

Cassandra & usage

Do you think that for a standard project with 50.000.000 of rows on 2-3 machines cassandra is appropriate  
or i should use a normal dbms?

-- 
francesco.tangari.inf@gmail.com
Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)


Re: Cassandra & usage

Posted by fr...@gmail.com.
ok this makes sense thank u.  

--  
francesco.tangari.inf@gmail.com
Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)


Il giorno giovedì 26 gennaio 2012, alle ore 21.36, aaron morton ha scritto:  

> Yes it is.
>  
> But it depends on what you want to select.  
>  
> Cassandra does not have a complete query language like SQL in an RDBMS. You need to design your data model to support the queries you wish to make. Normally this means denormlising data so that queries are essentially reading from a materialized view.
>  
> So if you store users by user id, you can insert/update and select users by user id.  
>  
> If you want so select all users who live in Spain then you need to have a Secondary Index on the country column. Secondary indexes have overheads and limitations just like indexes in a RDBMS. So if this is a common query may want to denomalise the data so users are stored by user id and country.  
>  
> If one day you decide to select all users who are older than 30 but younger than 40 you have to do some extra work. You could add another index on the birthdate, but secondary indexes queries must have an equality clause so you cannot do birthdate > x and birthday < y. This is where other query languages such as HIVE or PIG and HADOOP come into play.  
>  
> Hope that helps.  
>   
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
>  
>  
>  
>  
>  
> On 26/01/2012, at 4:13 PM, francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com) wrote:
> > i don't get it. Suppose i have a data model and i have million of rows and suppose i want perform some select and some insert , it is not feasible to use cassandra for those reasons?
> >  
> > --  
> > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> >  
> >  
> > Il giorno mercoledì 25 gennaio 2012, alle ore 09.13, aaron morton ha scritto:  
> >  
> > > You data load is fine.  
> > >  
> > > It sounds like you will run into issues with the data model and functionality of cassandra. "Standard Analysis" in the RDBMS sense of throwing any ad-hoc query at the data and letting the query engine work it out is not possible without using HIVE/PIG or some other query language.
> > >  
> > > You will need to understand what sort of questions you want from the data up from. The best way to learn this lesson is put together and quick prototype and see how the data model works.  
> > >  
> > > Hope that helps.  
> > >  
> > > -----------------
> > > Aaron Morton
> > > Freelance Developer
> > > @aaronmorton
> > > http://www.thelastpickle.com (http://www.thelastpickle.com/)
> > >  
> > >  
> > >  
> > >  
> > >  
> > >  
> > > On 25/01/2012, at 8:09 PM, francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com) wrote:
> > > > make example of cases please?
> > > >  
> > > > --  
> > > > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > > > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> > > >  
> > > >  
> > > > Il giorno mercoledì 25 gennaio 2012, alle ore 05.29, Gustavo Gustavo ha scritto:  
> > > >  
> > > > > That's for sure not much.
> > > > > Your rdbms can probably hold the entire dataset in memory, and you can do all kinds for queries that you want. Cassandra is for some very specific use cases.  
> > > > > If you really need a cluster, have you thought about MySQL Cluster?  
> > > > >  
> > > > > 2012/1/25 <francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)>
> > > > > > Standard analysis,  display or aggregate some rows
> > > > > > or standard operations that i can do on a normal dbms
> > > > > >  
> > > > > >  
> > > > > > --  
> > > > > > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > > > > > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > > >  
> > > > > >  
> > > > > > Il giorno mercoledì 25 gennaio 2012, alle ore 04.26, Maxim Potekhin ha scritto:  
> > > > > >  
> > > > > > > You provide zero information on what you are planning to do with the data.
> > > > > > > Thus, your question is impossible to answer.
> > > > > > >  
> > > > > > >  
> > > > > > > On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com) wrote:  
> > > > > > > > Do you think that for a standard project with 50.000.000 of rows on 2-3 machines cassandra is appropriate   
> > > > > > > > or i should use a normal dbms?
> > > > > > > >  
> > > > > > > > --   
> > > > > > > > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > > > > > > > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > > > > >  
> > > > > > >  
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
>  


Re: Cassandra & usage

Posted by aaron morton <aa...@thelastpickle.com>.
Yes it is.

But it depends on what you want to select. 

Cassandra does not have a complete query language like SQL in an RDBMS. You need to design your data model to support the queries you wish to make. Normally this means denormlising data so that queries are essentially reading from a materialized view.

So if you store users by user id, you can insert/update and select users by user id. 

If you want so select all users who live in Spain then you need to have a Secondary Index on the country column. Secondary indexes have overheads and limitations just like indexes in a RDBMS. So if this is a common query may want to denomalise the data so users are stored by user id and country. 

If one day you decide to select all users who are older than 30 but younger than 40 you have to do some extra work. You could add another index on the birthdate, but secondary indexes queries must have an equality clause so you cannot do birthdate > x and birthday < y. This is where other query languages such as HIVE or PIG and HADOOP come into play. 

Hope that helps. 
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/01/2012, at 4:13 PM, francesco.tangari.inf@gmail.com wrote:

> i don't get it. Suppose i have a data model and i have million of rows and suppose i want perform some select and some insert , it is not feasible to use cassandra for those reasons?
> 
> -- 
> francesco.tangari.inf@gmail.com
> Inviato con Sparrow
> 
> Il giorno mercoledì 25 gennaio 2012, alle ore 09.13, aaron morton ha scritto:
> 
>> You data load is fine. 
>> 
>> It sounds like you will run into issues with the data model and functionality of cassandra. "Standard Analysis" in the RDBMS sense of throwing any ad-hoc query at the data and letting the query engine work it out is not possible without using HIVE/PIG or some other query language.
>> 
>> You will need to understand what sort of questions you want from the data up from. The best way to learn this lesson is put together and quick prototype and see how the data model works. 
>> 
>> Hope that helps. 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 25/01/2012, at 8:09 PM, francesco.tangari.inf@gmail.com wrote:
>> 
>>> make example of cases please?
>>> 
>>> -- 
>>> francesco.tangari.inf@gmail.com
>>> Inviato con Sparrow
>>> 
>>> Il giorno mercoledì 25 gennaio 2012, alle ore 05.29, Gustavo Gustavo ha scritto:
>>> 
>>>> That's for sure not much.
>>>> Your rdbms can probably hold the entire dataset in memory, and you can do all kinds for queries that you want. Cassandra is for some very specific use cases. 
>>>> If you really need a cluster, have you thought about MySQL Cluster? 
>>>> 
>>>> 2012/1/25 <fr...@gmail.com>
>>>>> Standard analysis,  display or aggregate some rows
>>>>> or standard operations that i can do on a normal dbms
>>>>> 
>>>>> -- 
>>>>> francesco.tangari.inf@gmail.com
>>>>> Inviato con Sparrow
>>>>> 
>>>>> Il giorno mercoledì 25 gennaio 2012, alle ore 04.26, Maxim Potekhin ha scritto:
>>>>> 
>>>>>> You provide zero information on what you are planning to do with the data.
>>>>>> Thus, your question is impossible to answer.
>>>>>> 
>>>>>> 
>>>>>> On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com wrote:
>>>>>>> 
>>>>>>> Do you think that for a standard project with 50.000.000 of rows on 2-3 machines cassandra is appropriate 
>>>>>>> or i should use a normal dbms?
>>>>>>> 
>>>>>>> -- 
>>>>>>> francesco.tangari.inf@gmail.com
>>>>>>> Inviato con Sparrow
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: Cassandra & usage

Posted by fr...@gmail.com.
i don't get it. Suppose i have a data model and i have million of rows and suppose i want perform some select and some insert , it is not feasible to use cassandra for those reasons?

--  
francesco.tangari.inf@gmail.com
Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)


Il giorno mercoledì 25 gennaio 2012, alle ore 09.13, aaron morton ha scritto:  

> You data load is fine.  
>  
> It sounds like you will run into issues with the data model and functionality of cassandra. "Standard Analysis" in the RDBMS sense of throwing any ad-hoc query at the data and letting the query engine work it out is not possible without using HIVE/PIG or some other query language.
>  
> You will need to understand what sort of questions you want from the data up from. The best way to learn this lesson is put together and quick prototype and see how the data model works.  
>  
> Hope that helps.  
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>  
>  
>  
>  
>  
>  
> On 25/01/2012, at 8:09 PM, francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com) wrote:
> > make example of cases please?
> >  
> > --  
> > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> >  
> >  
> > Il giorno mercoledì 25 gennaio 2012, alle ore 05.29, Gustavo Gustavo ha scritto:  
> >  
> > > That's for sure not much.
> > > Your rdbms can probably hold the entire dataset in memory, and you can do all kinds for queries that you want. Cassandra is for some very specific use cases.  
> > > If you really need a cluster, have you thought about MySQL Cluster?  
> > >  
> > > 2012/1/25 <francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)>
> > > > Standard analysis,  display or aggregate some rows
> > > > or standard operations that i can do on a normal dbms
> > > >  
> > > >  
> > > > --  
> > > > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > > > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> > > >  
> > > >  
> > > > Il giorno mercoledì 25 gennaio 2012, alle ore 04.26, Maxim Potekhin ha scritto:  
> > > >  
> > > > > You provide zero information on what you are planning to do with the data.
> > > > > Thus, your question is impossible to answer.
> > > > >  
> > > > >  
> > > > > On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com) wrote:  
> > > > > > Do you think that for a standard project with 50.000.000 of rows on 2-3 machines cassandra is appropriate   
> > > > > > or i should use a normal dbms?
> > > > > >  
> > > > > > --   
> > > > > > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > > > > > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > > >  
> > > > >  
> > > >  
> > >  
> >  
>  


Re: Cassandra & usage

Posted by aaron morton <aa...@thelastpickle.com>.
You data load is fine. 

It sounds like you will run into issues with the data model and functionality of cassandra. "Standard Analysis" in the RDBMS sense of throwing any ad-hoc query at the data and letting the query engine work it out is not possible without using HIVE/PIG or some other query language.

You will need to understand what sort of questions you want from the data up from. The best way to learn this lesson is put together and quick prototype and see how the data model works. 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 25/01/2012, at 8:09 PM, francesco.tangari.inf@gmail.com wrote:

> make example of cases please?
> 
> -- 
> francesco.tangari.inf@gmail.com
> Inviato con Sparrow
> 
> Il giorno mercoledì 25 gennaio 2012, alle ore 05.29, Gustavo Gustavo ha scritto:
> 
>> That's for sure not much.
>> Your rdbms can probably hold the entire dataset in memory, and you can do all kinds for queries that you want. Cassandra is for some very specific use cases. 
>> If you really need a cluster, have you thought about MySQL Cluster? 
>> 
>> 2012/1/25 <fr...@gmail.com>
>>> Standard analysis,  display or aggregate some rows
>>> or standard operations that i can do on a normal dbms
>>> 
>>> -- 
>>> francesco.tangari.inf@gmail.com
>>> Inviato con Sparrow
>>> 
>>> Il giorno mercoledì 25 gennaio 2012, alle ore 04.26, Maxim Potekhin ha scritto:
>>> 
>>>> You provide zero information on what you are planning to do with the data.
>>>> Thus, your question is impossible to answer.
>>>> 
>>>> 
>>>> On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com wrote:
>>>>> 
>>>>> Do you think that for a standard project with 50.000.000 of rows on 2-3 machines cassandra is appropriate 
>>>>> or i should use a normal dbms?
>>>>> 
>>>>> -- 
>>>>> francesco.tangari.inf@gmail.com
>>>>> Inviato con Sparrow
>>>>> 
>>>> 
>>> 
>> 
> 


Re: Cassandra & usage

Posted by fr...@gmail.com.
make example of cases please?

--  
francesco.tangari.inf@gmail.com
Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)


Il giorno mercoledì 25 gennaio 2012, alle ore 05.29, Gustavo Gustavo ha scritto:  

> That's for sure not much.
> Your rdbms can probably hold the entire dataset in memory, and you can do all kinds for queries that you want. Cassandra is for some very specific use cases.  
> If you really need a cluster, have you thought about MySQL Cluster?  
>  
> 2012/1/25 <francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)>
> > Standard analysis,  display or aggregate some rows
> > or standard operations that i can do on a normal dbms
> >  
> >  
> > --  
> > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> >  
> >  
> > Il giorno mercoledì 25 gennaio 2012, alle ore 04.26, Maxim Potekhin ha scritto:  
> >  
> > > You provide zero information on what you are planning to do with the data.
> > > Thus, your question is impossible to answer.
> > >  
> > >  
> > > On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com) wrote:  
> > > > Do you think that for a standard project with 50.000.000 of rows on 2-3 machines cassandra is appropriate   
> > > > or i should use a normal dbms?
> > > >  
> > > > --   
> > > > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > > > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> > > >  
> > >  
> >  
>  


Re: Cassandra & usage

Posted by Gustavo Gustavo <do...@gmail.com>.
That's for sure not much.
Your rdbms can probably hold the entire dataset in memory, and you can do
all kinds for queries that you want. Cassandra is for some very specific
use cases.
If you really need a cluster, have you thought about MySQL Cluster?

2012/1/25 <fr...@gmail.com>

> Standard analysis,  display or aggregate some rows
> or standard operations that i can do on a normal dbms
>
> --
> francesco.tangari.inf@gmail.com
> Inviato con Sparrow <http://www.sparrowmailapp.com/?sig>
>
> Il giorno mercoledì 25 gennaio 2012, alle ore 04.26, Maxim Potekhin ha
> scritto:
>
>  You provide zero information on what you are planning to do with the data.
> Thus, your question is impossible to answer.
>
>
> On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com wrote:
>
>  Do you think that for a standard project with 50.000.000 of rows on 2-3
> machines cassandra is appropriate
> or i should use a normal dbms?
>
>  --
> francesco.tangari.inf@gmail.com
> Inviato con Sparrow <http://www.sparrowmailapp.com/?sig>
>
>
>
>

Re: Cassandra & usage

Posted by fr...@gmail.com.
Standard analysis,  display or aggregate some rows
or standard operations that i can do on a normal dbms


--  
francesco.tangari.inf@gmail.com
Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)


Il giorno mercoledì 25 gennaio 2012, alle ore 04.26, Maxim Potekhin ha scritto:  

> You provide zero information on what you are planning to do with the data.
> Thus, your question is impossible to answer.
>  
>  
> On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com) wrote:  
> > Do you think that for a standard project with 50.000.000 of rows on 2-3 machines cassandra is appropriate   
> > or i should use a normal dbms?
> >  
> > --   
> > francesco.tangari.inf@gmail.com (mailto:francesco.tangari.inf@gmail.com)
> > Inviato con Sparrow (http://www.sparrowmailapp.com/?sig)
> >  
>  


Re: Cassandra & usage

Posted by Maxim Potekhin <po...@bnl.gov>.
You provide zero information on what you are planning to do with the data.
Thus, your question is impossible to answer.


On 1/24/2012 9:38 PM, francesco.tangari.inf@gmail.com wrote:
> Do you think that for a standard project with 50.000.000 of rows on 
> 2-3 machines cassandra is appropriate
> or i should use a normal dbms?
>
> -- 
> francesco.tangari.inf@gmail.com
> Inviato con Sparrow <http://www.sparrowmailapp.com/?sig>
>