You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ignite.apache.org by Shrey <sh...@gmail.com> on 2018/09/26 07:26:55 UTC

Issues running Ignite with Cassandra and spark.

Hi, we are using Ignite as a cache layer over Cassandra for faster read
queries using spark. Our cluster has 10 nodes running an instance of
Cassandra and Ignite. However, we came across a few issues:

1)  We currently store the data from spark to cassandra. Hence to load data,
we need to call .loadCache() . I know there are ways for data written in
Ignite to be synced with cassandra (writeBehind, writeThroughs) . However we
want to do the opposite. Load in cassandra and want it to be reflected in
the cache which can be queries by spark. Is there a way to do so ?

2) To load data into the cache from Cassandra, I start a new client in
another machine and call the .loadCache() method. However, it takes almost
45 minutes to load the data (around 30 million rows with 20 columns each) .
Is there a way to make this faster by ensuring that data from a particular
node in cassandra cluster is parallelly loaded to the cache instance of the
same node ? I have defined my partition and clustering columns in the my
spring persistance-settings.

Thanks,
Shrey



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Re: Issues running Ignite with Cassandra and spark.

Posted by Shrey Garg <sh...@gmail.com>.
I fixed the issue with dataframe api and am getting all columns now.
However, I am not able to perform grouping + udaf operations as it tries to
perform these on ignite.
setting OPTION_DISABLE_SPARK_SQL_OPTIMIZATION = true is not helping.

How so we tell ignite to just fetch data and perform all other operations
in spark?

Re: Issues running Ignite with Cassandra and spark.

Posted by Shrey Garg <sh...@gmail.com>.
Hi,
Thanks for the answer.
Unfortunately, we cannot remove Cassandra as it is being used elsewhere as
well. We will have to write directly in ignite and sync with cassandra.

We had a few other issues while getting data from spark:

1) cacherdd.sql("select * from table") is giving me heap memory (GC)
issues. However, getting data using spark.read.format().... works fine. Why
is this so ?

2) in my persistence, i have IndexedTypes with key and value POJO classes.
The key class corresponds to the key in cassandra with partition and
clustering keys defined. While querying with sql, (select * from
value_class) i get all the columns of the table. However, while querying
using spark.read.format(...).option(OPTION_TABLE,value_class).load() , I
only get the columns stored in the value class. How do i fetch all the
columns using dataframe api ?

Thanks,
Shrey



On Fri, 28 Sep 2018, 08:43 Alexey Kuznetsov, <ak...@apache.org> wrote:

> Hi,  Shrey!
>
> Just as idea - Ignite now has persistence (see
> https://apacheignite.readme.io/docs/distributed-persistent-store),
>  may be you can completely replace  Cassandra with Ignite?
>
> In this case all data always be actual, no need to sync with external db.
>
> --
> Alexey Kuznetsov
>

Re: Issues running Ignite with Cassandra and spark.

Posted by Alexey Kuznetsov <ak...@apache.org>.
Hi,  Shrey!

Just as idea - Ignite now has persistence (see
https://apacheignite.readme.io/docs/distributed-persistent-store),
 may be you can completely replace  Cassandra with Ignite?

In this case all data always be actual, no need to sync with external db.

-- 
Alexey Kuznetsov

Re: Issues running Ignite with Cassandra and spark.

Posted by "ilya.kasnacheev" <il...@gmail.com>.
Hello!

1) There is no generic way of pulling updates from 3rd party database and
there is no API support for it usually, so it's not obvious how we could
implement that even if we wanted.

2) By default cache store will process data in parallel on all nodes.
However if will not align data distribution with that of cassandra, and I
would say that implementing it will be infeasible. However, you could try to
see if there are ways to speed up loadCache by tuning Ignite and-or cache
configurations.

Regards,



--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/