You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@polygene.apache.org by Jiri Jetmar <ju...@gmail.com> on 2016/02/20 20:38:04 UTC

Apache Zest / Apache Cassandra

Hi guys,

what is the status of the Apache Cassandra Entity Store ? Somehow I can
remember that Cassandra was supported but can not
find it in the current development branch.

The reason I;m asking is because Cassandra works well with the analytical
Apache Spark stack.

Assume a scenario where you have e.g. the following Domain Models like :

- Products
- Orders
- Users

Each Domain has  its own Api, Usercases and States that is stored in the
DM. Now you have e.g. a Webshop UI on top of the
above Domains.

Now you want to answer questions like : What kind of Users are buying
Product X. Or, find those Users that are most likely buying
Product X in the next Y days.

To answer those questions is typically a challenge of "Data Analytics"
using algorithm like PCA, Random Forest, Regressions, XGBoost, etc.
All can be done surely in Java, but from my impression the Python community
built over the last years an amazing tool set and environments.

Also a "Data Scientist"  has to try out different things, until a good and
robust prediction is done. So the workflow is interactive and here is where
Apache Spark is offering
great tools, including the usage of the IPython/Jupyter Notebooks. Another
benefit is that one does not need to kick-on any ETL Jobs to transfer the
transactional data from the Domain Models to the analytical world -
Cassandra does this already. So one can do all the analysis on a realtime
snapshot
without influencing the transactional processing.

Thank you.

Cheers,
Jiri

Re: Apache Zest / Apache Cassandra

Posted by Jiri Jetmar <ju...@gmail.com>.
Hi Niclas,

thank you - I will take a look on the entity store cassandra
implementation.

Cheers,
Jiri

2016-02-21 1:52 GMT+01:00 Niclas Hedhman <ni...@hedhman.org>:

> About Cassandra...
>
> I think the only reason was that with CQL no one took the time to refactor
> the code, perhaps due to some conceptual changes were introduced.
> But it could have been that there were no true test suite, and failing the
> Release Criteria and fixing "run embedded during test" with the same client
> as in production code, may have been non-trivial (not sure). You have old
> code in the sandbox.
>
> Useful link;
>
> http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
> But I am uncertain if it is still relevant.
>
> Sandbox;
>
> https://github.com/apache/zest-sandbox/tree/master/extensions/entitystore-cassandra
>
> Niclas
>
> On Sun, Feb 21, 2016 at 3:38 AM, Jiri Jetmar <ju...@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > what is the status of the Apache Cassandra Entity Store ? Somehow I can
> > remember that Cassandra was supported but can not
> > find it in the current development branch.
> >
> > The reason I;m asking is because Cassandra works well with the analytical
> > Apache Spark stack.
> >
> > Assume a scenario where you have e.g. the following Domain Models like :
> >
> > - Products
> > - Orders
> > - Users
> >
> > Each Domain has  its own Api, Usercases and States that is stored in the
> > DM. Now you have e.g. a Webshop UI on top of the
> > above Domains.
> >
> > Now you want to answer questions like : What kind of Users are buying
> > Product X. Or, find those Users that are most likely buying
> > Product X in the next Y days.
> >
> > To answer those questions is typically a challenge of "Data Analytics"
> > using algorithm like PCA, Random Forest, Regressions, XGBoost, etc.
> > All can be done surely in Java, but from my impression the Python
> community
> > built over the last years an amazing tool set and environments.
> >
> > Also a "Data Scientist"  has to try out different things, until a good
> and
> > robust prediction is done. So the workflow is interactive and here is
> where
> > Apache Spark is offering
> > great tools, including the usage of the IPython/Jupyter Notebooks.
> Another
> > benefit is that one does not need to kick-on any ETL Jobs to transfer the
> > transactional data from the Domain Models to the analytical world -
> > Cassandra does this already. So one can do all the analysis on a realtime
> > snapshot
> > without influencing the transactional processing.
> >
> > Thank you.
> >
> > Cheers,
> > Jiri
> >
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>

Re: Apache Zest / Apache Cassandra

Posted by Niclas Hedhman <ni...@hedhman.org>.
About Cassandra...

I think the only reason was that with CQL no one took the time to refactor
the code, perhaps due to some conceptual changes were introduced.
But it could have been that there were no true test suite, and failing the
Release Criteria and fixing "run embedded during test" with the same client
as in production code, may have been non-trivial (not sure). You have old
code in the sandbox.

Useful link;
http://prettyprint.me/prettyprint.me/2010/02/14/running-cassandra-as-an-embedded-service/index.html
But I am uncertain if it is still relevant.

Sandbox;
https://github.com/apache/zest-sandbox/tree/master/extensions/entitystore-cassandra

Niclas

On Sun, Feb 21, 2016 at 3:38 AM, Jiri Jetmar <ju...@gmail.com>
wrote:

> Hi guys,
>
> what is the status of the Apache Cassandra Entity Store ? Somehow I can
> remember that Cassandra was supported but can not
> find it in the current development branch.
>
> The reason I;m asking is because Cassandra works well with the analytical
> Apache Spark stack.
>
> Assume a scenario where you have e.g. the following Domain Models like :
>
> - Products
> - Orders
> - Users
>
> Each Domain has  its own Api, Usercases and States that is stored in the
> DM. Now you have e.g. a Webshop UI on top of the
> above Domains.
>
> Now you want to answer questions like : What kind of Users are buying
> Product X. Or, find those Users that are most likely buying
> Product X in the next Y days.
>
> To answer those questions is typically a challenge of "Data Analytics"
> using algorithm like PCA, Random Forest, Regressions, XGBoost, etc.
> All can be done surely in Java, but from my impression the Python community
> built over the last years an amazing tool set and environments.
>
> Also a "Data Scientist"  has to try out different things, until a good and
> robust prediction is done. So the workflow is interactive and here is where
> Apache Spark is offering
> great tools, including the usage of the IPython/Jupyter Notebooks. Another
> benefit is that one does not need to kick-on any ETL Jobs to transfer the
> transactional data from the Domain Models to the analytical world -
> Cassandra does this already. So one can do all the analysis on a realtime
> snapshot
> without influencing the transactional processing.
>
> Thank you.
>
> Cheers,
> Jiri
>



-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Re: Apache Zest / Apache Cassandra

Posted by Jiri Jetmar <ju...@gmail.com>.
Yes, the OLAP world with things like "star schema", ETL jobs, etc. is far
too heavyweight.  And therefore I see Apache Spark on the right direction,
providing easy access to data analysis, tools..

2016-02-21 1:54 GMT+01:00 Niclas Hedhman <ni...@hedhman.org>:

> On analytics; I have never enjoyed the OLAP world, and I take your word for
> it.
>
> Cheers
> Niclas
>
> On Sun, Feb 21, 2016 at 3:38 AM, Jiri Jetmar <ju...@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > what is the status of the Apache Cassandra Entity Store ? Somehow I can
> > remember that Cassandra was supported but can not
> > find it in the current development branch.
> >
> > The reason I;m asking is because Cassandra works well with the analytical
> > Apache Spark stack.
> >
> > Assume a scenario where you have e.g. the following Domain Models like :
> >
> > - Products
> > - Orders
> > - Users
> >
> > Each Domain has  its own Api, Usercases and States that is stored in the
> > DM. Now you have e.g. a Webshop UI on top of the
> > above Domains.
> >
> > Now you want to answer questions like : What kind of Users are buying
> > Product X. Or, find those Users that are most likely buying
> > Product X in the next Y days.
> >
> > To answer those questions is typically a challenge of "Data Analytics"
> > using algorithm like PCA, Random Forest, Regressions, XGBoost, etc.
> > All can be done surely in Java, but from my impression the Python
> community
> > built over the last years an amazing tool set and environments.
> >
> > Also a "Data Scientist"  has to try out different things, until a good
> and
> > robust prediction is done. So the workflow is interactive and here is
> where
> > Apache Spark is offering
> > great tools, including the usage of the IPython/Jupyter Notebooks.
> Another
> > benefit is that one does not need to kick-on any ETL Jobs to transfer the
> > transactional data from the Domain Models to the analytical world -
> > Cassandra does this already. So one can do all the analysis on a realtime
> > snapshot
> > without influencing the transactional processing.
> >
> > Thank you.
> >
> > Cheers,
> > Jiri
> >
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://zest.apache.org - New Energy for Java
>

Re: Apache Zest / Apache Cassandra

Posted by Niclas Hedhman <ni...@hedhman.org>.
On analytics; I have never enjoyed the OLAP world, and I take your word for
it.

Cheers
Niclas

On Sun, Feb 21, 2016 at 3:38 AM, Jiri Jetmar <ju...@gmail.com>
wrote:

> Hi guys,
>
> what is the status of the Apache Cassandra Entity Store ? Somehow I can
> remember that Cassandra was supported but can not
> find it in the current development branch.
>
> The reason I;m asking is because Cassandra works well with the analytical
> Apache Spark stack.
>
> Assume a scenario where you have e.g. the following Domain Models like :
>
> - Products
> - Orders
> - Users
>
> Each Domain has  its own Api, Usercases and States that is stored in the
> DM. Now you have e.g. a Webshop UI on top of the
> above Domains.
>
> Now you want to answer questions like : What kind of Users are buying
> Product X. Or, find those Users that are most likely buying
> Product X in the next Y days.
>
> To answer those questions is typically a challenge of "Data Analytics"
> using algorithm like PCA, Random Forest, Regressions, XGBoost, etc.
> All can be done surely in Java, but from my impression the Python community
> built over the last years an amazing tool set and environments.
>
> Also a "Data Scientist"  has to try out different things, until a good and
> robust prediction is done. So the workflow is interactive and here is where
> Apache Spark is offering
> great tools, including the usage of the IPython/Jupyter Notebooks. Another
> benefit is that one does not need to kick-on any ETL Jobs to transfer the
> transactional data from the Domain Models to the analytical world -
> Cassandra does this already. So one can do all the analysis on a realtime
> snapshot
> without influencing the transactional processing.
>
> Thank you.
>
> Cheers,
> Jiri
>



-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java