You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Andrés Ivaldi <ia...@gmail.com> on 2016/02/28 14:39:56 UTC

Cassandra Ussages

Hello, At my work we are looking for new technologies for an Analysis
Engine, and we are evaluating differents technologies one of them is
Cassandra as our Data repository.

Now we can execute query analysis agains an OLAP Cube and RDBMS, using
MSSQL as our data repository. Cube is obsolete and SQL server engine is
slow as data repository.

I don't know much about cassandra, I read some books, and looks to fit well
on what we are needing, but there are some things that looks like a problem
for us.

Our engine is designed to be scalable, flexible and dynamic, any user can
add new dimensions or measures from any source, all the data is stored on
Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
tables with the dimension values.


Ok, with the context given I'll like to clear some doubts

- I able to flat the table with all the possible dimension values to
cassandra, creating the pk against the dimension columns? this will give me
the "sensation" of data pivot over the PK columns? If correct, what if I
want to select the order of the columns, or add another or reduce them?
- It's possible to extend the values of a row dynamically? What we do often
is join row against a value of a mapped external data value to extend the
dimensions hierarchical value structure (ie state->Country->Continent)

I know we can do some of this things in the core of our engine, like the
dimension extension of the values or reduce columns, but as we are
evaluating differents technologies is good to know.

Regards!!


-- 
Ing. Ivaldi Andres

Re: Cassandra Ussages

Posted by Andrés Ivaldi <ia...@gmail.com>.

Thanks all for the tips,
Mainly we are replacing an OLAP cube, but our engine works fine with RDBMS
directly so with the low latency of cassandra it could work nice
(extensibility of this is what worries me).
We will give a try to Cassandra + Spark

Thanks again!!

On Tue, Mar 1, 2016 at 2:59 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> I would spin it as Cassandra being the right choice where your primary
> need in OLTP and with a secondary need for analytics. IOW, where you would
> otherwise need to use two separate databases for the same data.
>
>
> -- Jack Krupansky
>
> On Tue, Mar 1, 2016 at 12:40 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> Spark & Cassandra work just fine together, but, as I said, Cassandra is
>> *primarily* used for OLTP.  If your main use case is analytics, I would use
>> something that's built for analytics.  If 90%+ of your queries are going to
>> be 1-10ms & customer facing, then you're good to go.  If you're building
>> something to replace OLAP cubes, I'd look at something else.
>>
>> On Tue, Mar 1, 2016 at 8:52 AM Jack Krupansky <ja...@gmail.com>
>> wrote:
>>
>>> OLAP using Cassandra and Spark:
>>>
>>> http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandra-and-spark
>>>
>>> What is the cardinality of your cube dimenstions? Obviously any
>>> multi-dimensional data must be flattened.
>>>
>>> Cassandra tables have fixed named columns, but... the map datatype with
>>> string key values effectively gives you extensible columns.
>>>
>>>
>>>
>>> -- Jack Krupansky
>>>
>>> On Tue, Mar 1, 2016 at 11:22 AM, Andrés Ivaldi <ia...@gmail.com>
>>> wrote:
>>>
>>>> Jonathan thanks for the link,
>>>> I believe that maybe is good as Data Store part, because is fast for
>>>> I/o and handles Time Series, for analytics could be with Apache Ignite
>>>> and/or Apache Spark
>>>> what it worries me is that looks very complex create the structure for
>>>> each Fact table and then extend
>>>>
>>>> regards.
>>>>
>>>> On Sun, Feb 28, 2016 at 12:28 PM, Jonathan Haddad <jo...@jonhaddad.com>
>>>> wrote:
>>>>
>>>>> Cassandra is primarily used as an OLTP database, not analytics. You
>>>>> should watch this 30 min video discussing Cassandra core concepts (coming
>>>>> from a relational background):
>>>>> https://academy.datastax.com/courses/ds101-introduction-cassandra
>>>>>
>>>>> On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi <ia...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hello, At my work we are looking for new technologies for an Analysis
>>>>>> Engine, and we are evaluating differents technologies one of them is
>>>>>> Cassandra as our Data repository.
>>>>>>
>>>>>> Now we can execute query analysis agains an OLAP Cube and RDBMS,
>>>>>> using MSSQL as our data repository. Cube is obsolete and SQL server engine
>>>>>> is slow as data repository.
>>>>>>
>>>>>> I don't know much about cassandra, I read some books, and looks to
>>>>>> fit well on what we are needing, but there are some things that looks like
>>>>>> a problem for us.
>>>>>>
>>>>>> Our engine is designed to be scalable, flexible and dynamic, any user
>>>>>> can add new dimensions or measures from any source, all the data is stored
>>>>>> on Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
>>>>>> tables with the dimension values.
>>>>>>
>>>>>>
>>>>>> Ok, with the context given I'll like to clear some doubts
>>>>>>
>>>>>> - I able to flat the table with all the possible dimension values to
>>>>>> cassandra, creating the pk against the dimension columns? this will give me
>>>>>> the "sensation" of data pivot over the PK columns? If correct, what if I
>>>>>> want to select the order of the columns, or add another or reduce them?
>>>>>> - It's possible to extend the values of a row dynamically? What we do
>>>>>> often is join row against a value of a mapped external data value to extend
>>>>>> the dimensions hierarchical value structure (ie state->Country->Continent)
>>>>>>
>>>>>> I know we can do some of this things in the core of our engine, like
>>>>>> the dimension extension of the values or reduce columns, but as we are
>>>>>> evaluating differents technologies is good to know.
>>>>>>
>>>>>> Regards!!
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ing. Ivaldi Andres
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ing. Ivaldi Andres
>>>>
>>>
>>>
>


-- 
Ing. Ivaldi Andres

Re: Cassandra Ussages

Posted by Jack Krupansky <ja...@gmail.com>.

I would spin it as Cassandra being the right choice where your primary need
in OLTP and with a secondary need for analytics. IOW, where you would
otherwise need to use two separate databases for the same data.


-- Jack Krupansky

On Tue, Mar 1, 2016 at 12:40 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Spark & Cassandra work just fine together, but, as I said, Cassandra is
> *primarily* used for OLTP.  If your main use case is analytics, I would use
> something that's built for analytics.  If 90%+ of your queries are going to
> be 1-10ms & customer facing, then you're good to go.  If you're building
> something to replace OLAP cubes, I'd look at something else.
>
> On Tue, Mar 1, 2016 at 8:52 AM Jack Krupansky <ja...@gmail.com>
> wrote:
>
>> OLAP using Cassandra and Spark:
>>
>> http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandra-and-spark
>>
>> What is the cardinality of your cube dimenstions? Obviously any
>> multi-dimensional data must be flattened.
>>
>> Cassandra tables have fixed named columns, but... the map datatype with
>> string key values effectively gives you extensible columns.
>>
>>
>>
>> -- Jack Krupansky
>>
>> On Tue, Mar 1, 2016 at 11:22 AM, Andrés Ivaldi <ia...@gmail.com>
>> wrote:
>>
>>> Jonathan thanks for the link,
>>> I believe that maybe is good as Data Store part, because is fast for I/o
>>> and handles Time Series, for analytics could be with Apache Ignite and/or
>>> Apache Spark
>>> what it worries me is that looks very complex create the structure for
>>> each Fact table and then extend
>>>
>>> regards.
>>>
>>> On Sun, Feb 28, 2016 at 12:28 PM, Jonathan Haddad <jo...@jonhaddad.com>
>>> wrote:
>>>
>>>> Cassandra is primarily used as an OLTP database, not analytics. You
>>>> should watch this 30 min video discussing Cassandra core concepts (coming
>>>> from a relational background):
>>>> https://academy.datastax.com/courses/ds101-introduction-cassandra
>>>>
>>>> On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi <ia...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hello, At my work we are looking for new technologies for an Analysis
>>>>> Engine, and we are evaluating differents technologies one of them is
>>>>> Cassandra as our Data repository.
>>>>>
>>>>> Now we can execute query analysis agains an OLAP Cube and RDBMS, using
>>>>> MSSQL as our data repository. Cube is obsolete and SQL server engine is
>>>>> slow as data repository.
>>>>>
>>>>> I don't know much about cassandra, I read some books, and looks to fit
>>>>> well on what we are needing, but there are some things that looks like a
>>>>> problem for us.
>>>>>
>>>>> Our engine is designed to be scalable, flexible and dynamic, any user
>>>>> can add new dimensions or measures from any source, all the data is stored
>>>>> on Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
>>>>> tables with the dimension values.
>>>>>
>>>>>
>>>>> Ok, with the context given I'll like to clear some doubts
>>>>>
>>>>> - I able to flat the table with all the possible dimension values to
>>>>> cassandra, creating the pk against the dimension columns? this will give me
>>>>> the "sensation" of data pivot over the PK columns? If correct, what if I
>>>>> want to select the order of the columns, or add another or reduce them?
>>>>> - It's possible to extend the values of a row dynamically? What we do
>>>>> often is join row against a value of a mapped external data value to extend
>>>>> the dimensions hierarchical value structure (ie state->Country->Continent)
>>>>>
>>>>> I know we can do some of this things in the core of our engine, like
>>>>> the dimension extension of the values or reduce columns, but as we are
>>>>> evaluating differents technologies is good to know.
>>>>>
>>>>> Regards!!
>>>>>
>>>>>
>>>>> --
>>>>> Ing. Ivaldi Andres
>>>>>
>>>>
>>>
>>>
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>>

Re: Cassandra Ussages

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Spark & Cassandra work just fine together, but, as I said, Cassandra is
*primarily* used for OLTP.  If your main use case is analytics, I would use
something that's built for analytics.  If 90%+ of your queries are going to
be 1-10ms & customer facing, then you're good to go.  If you're building
something to replace OLAP cubes, I'd look at something else.

On Tue, Mar 1, 2016 at 8:52 AM Jack Krupansky <ja...@gmail.com>
wrote:

> OLAP using Cassandra and Spark:
>
> http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandra-and-spark
>
> What is the cardinality of your cube dimenstions? Obviously any
> multi-dimensional data must be flattened.
>
> Cassandra tables have fixed named columns, but... the map datatype with
> string key values effectively gives you extensible columns.
>
>
>
> -- Jack Krupansky
>
> On Tue, Mar 1, 2016 at 11:22 AM, Andrés Ivaldi <ia...@gmail.com> wrote:
>
>> Jonathan thanks for the link,
>> I believe that maybe is good as Data Store part, because is fast for I/o
>> and handles Time Series, for analytics could be with Apache Ignite and/or
>> Apache Spark
>> what it worries me is that looks very complex create the structure for
>> each Fact table and then extend
>>
>> regards.
>>
>> On Sun, Feb 28, 2016 at 12:28 PM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> Cassandra is primarily used as an OLTP database, not analytics. You
>>> should watch this 30 min video discussing Cassandra core concepts (coming
>>> from a relational background):
>>> https://academy.datastax.com/courses/ds101-introduction-cassandra
>>>
>>> On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi <ia...@gmail.com>
>>> wrote:
>>>
>>>> Hello, At my work we are looking for new technologies for an Analysis
>>>> Engine, and we are evaluating differents technologies one of them is
>>>> Cassandra as our Data repository.
>>>>
>>>> Now we can execute query analysis agains an OLAP Cube and RDBMS, using
>>>> MSSQL as our data repository. Cube is obsolete and SQL server engine is
>>>> slow as data repository.
>>>>
>>>> I don't know much about cassandra, I read some books, and looks to fit
>>>> well on what we are needing, but there are some things that looks like a
>>>> problem for us.
>>>>
>>>> Our engine is designed to be scalable, flexible and dynamic, any user
>>>> can add new dimensions or measures from any source, all the data is stored
>>>> on Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
>>>> tables with the dimension values.
>>>>
>>>>
>>>> Ok, with the context given I'll like to clear some doubts
>>>>
>>>> - I able to flat the table with all the possible dimension values to
>>>> cassandra, creating the pk against the dimension columns? this will give me
>>>> the "sensation" of data pivot over the PK columns? If correct, what if I
>>>> want to select the order of the columns, or add another or reduce them?
>>>> - It's possible to extend the values of a row dynamically? What we do
>>>> often is join row against a value of a mapped external data value to extend
>>>> the dimensions hierarchical value structure (ie state->Country->Continent)
>>>>
>>>> I know we can do some of this things in the core of our engine, like
>>>> the dimension extension of the values or reduce columns, but as we are
>>>> evaluating differents technologies is good to know.
>>>>
>>>> Regards!!
>>>>
>>>>
>>>> --
>>>> Ing. Ivaldi Andres
>>>>
>>>
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>
>

Re: Cassandra Ussages

Posted by Andrés Ivaldi <ia...@gmail.com>.

Hello Jack
What do you mind with "the map datatype with string key values effectively
gives you extensible columns"

Regards

On Tue, Mar 1, 2016 at 1:34 PM, Jack Krupansky <ja...@gmail.com>
wrote:

> OLAP using Cassandra and Spark:
>
> http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandra-and-spark
>
> What is the cardinality of your cube dimenstions? Obviously any
> multi-dimensional data must be flattened.
>
> Cassandra tables have fixed named columns, but... the map datatype with
> string key values effectively gives you extensible columns.
>
>
>
> -- Jack Krupansky
>
> On Tue, Mar 1, 2016 at 11:22 AM, Andrés Ivaldi <ia...@gmail.com> wrote:
>
>> Jonathan thanks for the link,
>> I believe that maybe is good as Data Store part, because is fast for I/o
>> and handles Time Series, for analytics could be with Apache Ignite and/or
>> Apache Spark
>> what it worries me is that looks very complex create the structure for
>> each Fact table and then extend
>>
>> regards.
>>
>> On Sun, Feb 28, 2016 at 12:28 PM, Jonathan Haddad <jo...@jonhaddad.com>
>> wrote:
>>
>>> Cassandra is primarily used as an OLTP database, not analytics. You
>>> should watch this 30 min video discussing Cassandra core concepts (coming
>>> from a relational background):
>>> https://academy.datastax.com/courses/ds101-introduction-cassandra
>>>
>>> On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi <ia...@gmail.com>
>>> wrote:
>>>
>>>> Hello, At my work we are looking for new technologies for an Analysis
>>>> Engine, and we are evaluating differents technologies one of them is
>>>> Cassandra as our Data repository.
>>>>
>>>> Now we can execute query analysis agains an OLAP Cube and RDBMS, using
>>>> MSSQL as our data repository. Cube is obsolete and SQL server engine is
>>>> slow as data repository.
>>>>
>>>> I don't know much about cassandra, I read some books, and looks to fit
>>>> well on what we are needing, but there are some things that looks like a
>>>> problem for us.
>>>>
>>>> Our engine is designed to be scalable, flexible and dynamic, any user
>>>> can add new dimensions or measures from any source, all the data is stored
>>>> on Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
>>>> tables with the dimension values.
>>>>
>>>>
>>>> Ok, with the context given I'll like to clear some doubts
>>>>
>>>> - I able to flat the table with all the possible dimension values to
>>>> cassandra, creating the pk against the dimension columns? this will give me
>>>> the "sensation" of data pivot over the PK columns? If correct, what if I
>>>> want to select the order of the columns, or add another or reduce them?
>>>> - It's possible to extend the values of a row dynamically? What we do
>>>> often is join row against a value of a mapped external data value to extend
>>>> the dimensions hierarchical value structure (ie state->Country->Continent)
>>>>
>>>> I know we can do some of this things in the core of our engine, like
>>>> the dimension extension of the values or reduce columns, but as we are
>>>> evaluating differents technologies is good to know.
>>>>
>>>> Regards!!
>>>>
>>>>
>>>> --
>>>> Ing. Ivaldi Andres
>>>>
>>>
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>
>


-- 
Ing. Ivaldi Andres

Re: Cassandra Ussages

Posted by Jack Krupansky <ja...@gmail.com>.

OLAP using Cassandra and Spark:
http://www.slideshare.net/EvanChan2/breakthrough-olap-performance-with-cassandra-and-spark

What is the cardinality of your cube dimenstions? Obviously any
multi-dimensional data must be flattened.

Cassandra tables have fixed named columns, but... the map datatype with
string key values effectively gives you extensible columns.



-- Jack Krupansky

On Tue, Mar 1, 2016 at 11:22 AM, Andrés Ivaldi <ia...@gmail.com> wrote:

> Jonathan thanks for the link,
> I believe that maybe is good as Data Store part, because is fast for I/o
> and handles Time Series, for analytics could be with Apache Ignite and/or
> Apache Spark
> what it worries me is that looks very complex create the structure for
> each Fact table and then extend
>
> regards.
>
> On Sun, Feb 28, 2016 at 12:28 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> Cassandra is primarily used as an OLTP database, not analytics. You
>> should watch this 30 min video discussing Cassandra core concepts (coming
>> from a relational background):
>> https://academy.datastax.com/courses/ds101-introduction-cassandra
>>
>> On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi <ia...@gmail.com> wrote:
>>
>>> Hello, At my work we are looking for new technologies for an Analysis
>>> Engine, and we are evaluating differents technologies one of them is
>>> Cassandra as our Data repository.
>>>
>>> Now we can execute query analysis agains an OLAP Cube and RDBMS, using
>>> MSSQL as our data repository. Cube is obsolete and SQL server engine is
>>> slow as data repository.
>>>
>>> I don't know much about cassandra, I read some books, and looks to fit
>>> well on what we are needing, but there are some things that looks like a
>>> problem for us.
>>>
>>> Our engine is designed to be scalable, flexible and dynamic, any user
>>> can add new dimensions or measures from any source, all the data is stored
>>> on Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
>>> tables with the dimension values.
>>>
>>>
>>> Ok, with the context given I'll like to clear some doubts
>>>
>>> - I able to flat the table with all the possible dimension values to
>>> cassandra, creating the pk against the dimension columns? this will give me
>>> the "sensation" of data pivot over the PK columns? If correct, what if I
>>> want to select the order of the columns, or add another or reduce them?
>>> - It's possible to extend the values of a row dynamically? What we do
>>> often is join row against a value of a mapped external data value to extend
>>> the dimensions hierarchical value structure (ie state->Country->Continent)
>>>
>>> I know we can do some of this things in the core of our engine, like the
>>> dimension extension of the values or reduce columns, but as we are
>>> evaluating differents technologies is good to know.
>>>
>>> Regards!!
>>>
>>>
>>> --
>>> Ing. Ivaldi Andres
>>>
>>
>
>
> --
> Ing. Ivaldi Andres
>

Re: Cassandra Ussages

Posted by Andrés Ivaldi <ia...@gmail.com>.

Jonathan thanks for the link,
I believe that maybe is good as Data Store part, because is fast for I/o
and handles Time Series, for analytics could be with Apache Ignite and/or
Apache Spark
what it worries me is that looks very complex create the structure for each
Fact table and then extend

regards.

On Sun, Feb 28, 2016 at 12:28 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> Cassandra is primarily used as an OLTP database, not analytics. You should
> watch this 30 min video discussing Cassandra core concepts (coming from a
> relational background):
> https://academy.datastax.com/courses/ds101-introduction-cassandra
>
> On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi <ia...@gmail.com> wrote:
>
>> Hello, At my work we are looking for new technologies for an Analysis
>> Engine, and we are evaluating differents technologies one of them is
>> Cassandra as our Data repository.
>>
>> Now we can execute query analysis agains an OLAP Cube and RDBMS, using
>> MSSQL as our data repository. Cube is obsolete and SQL server engine is
>> slow as data repository.
>>
>> I don't know much about cassandra, I read some books, and looks to fit
>> well on what we are needing, but there are some things that looks like a
>> problem for us.
>>
>> Our engine is designed to be scalable, flexible and dynamic, any user can
>> add new dimensions or measures from any source, all the data is stored on
>> Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
>> tables with the dimension values.
>>
>>
>> Ok, with the context given I'll like to clear some doubts
>>
>> - I able to flat the table with all the possible dimension values to
>> cassandra, creating the pk against the dimension columns? this will give me
>> the "sensation" of data pivot over the PK columns? If correct, what if I
>> want to select the order of the columns, or add another or reduce them?
>> - It's possible to extend the values of a row dynamically? What we do
>> often is join row against a value of a mapped external data value to extend
>> the dimensions hierarchical value structure (ie state->Country->Continent)
>>
>> I know we can do some of this things in the core of our engine, like the
>> dimension extension of the values or reduce columns, but as we are
>> evaluating differents technologies is good to know.
>>
>> Regards!!
>>
>>
>> --
>> Ing. Ivaldi Andres
>>
>


-- 
Ing. Ivaldi Andres

Re: Cassandra Ussages

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Cassandra is primarily used as an OLTP database, not analytics. You should
watch this 30 min video discussing Cassandra core concepts (coming from a
relational background):
https://academy.datastax.com/courses/ds101-introduction-cassandra
On Sun, Feb 28, 2016 at 5:40 AM Andrés Ivaldi <ia...@gmail.com> wrote:

> Hello, At my work we are looking for new technologies for an Analysis
> Engine, and we are evaluating differents technologies one of them is
> Cassandra as our Data repository.
>
> Now we can execute query analysis agains an OLAP Cube and RDBMS, using
> MSSQL as our data repository. Cube is obsolete and SQL server engine is
> slow as data repository.
>
> I don't know much about cassandra, I read some books, and looks to fit
> well on what we are needing, but there are some things that looks like a
> problem for us.
>
> Our engine is designed to be scalable, flexible and dynamic, any user can
> add new dimensions or measures from any source, all the data is stored on
> Cube(this is fixed data) and MSSQL(dynamic data) so we have decoupled
> tables with the dimension values.
>
>
> Ok, with the context given I'll like to clear some doubts
>
> - I able to flat the table with all the possible dimension values to
> cassandra, creating the pk against the dimension columns? this will give me
> the "sensation" of data pivot over the PK columns? If correct, what if I
> want to select the order of the columns, or add another or reduce them?
> - It's possible to extend the values of a row dynamically? What we do
> often is join row against a value of a mapped external data value to extend
> the dimensions hierarchical value structure (ie state->Country->Continent)
>
> I know we can do some of this things in the core of our engine, like the
> dimension extension of the values or reduce columns, but as we are
> evaluating differents technologies is good to know.
>
> Regards!!
>
>
> --
> Ing. Ivaldi Andres
>