You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Lasse Nedergaard <la...@gmail.com> on 2020/02/11 07:48:27 UTC

Batch reading from Cassandra. How to?

Hi.

We would like to do some batch analytics on our data set stored in
Cassandra and are looking for an efficient way to load data from a single
table. Not by key, but random 15%, 50% or 100%
Data bricks has create an efficient way to load Cassandra data into Apache
Spark and they are doing it by reading from the underlying SS tables to
load in parallel.
Do we have something similarly in Flink, or how is the most efficient way
to load all, or many random data from a single Cassandra table into Flink?

Any suggestions and/or recommendations is highly appreciated.

Thanks in advance

Lasse Nedergaard

Re: Batch reading from Cassandra. How to?

Posted by Till Rohrmann <tr...@apache.org>.
Hi Lasse,

as far as I know, the best way to read from Cassandra is to use the
CassandraInputFormat [1]. Unfortunately, there is no such optimized way to
read a large amount of data as Spark offers it at the moment. But if you
want to contribute this feature to Flink, then the community would highly
appreciate it.

[1]
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-cassandra/src/main/java/org/apache/flink/batch/connectors/cassandra/CassandraInputFormat.java

Cheers,
Till

On Fri, Feb 14, 2020 at 11:04 AM Lasse Nedergaard <la...@gmail.com>
wrote:

> Any good suggestions?
>
> Lasse
>
> Den tir. 11. feb. 2020 kl. 08.48 skrev Lasse Nedergaard <
> lassenedergaard@gmail.com>:
>
>> Hi.
>>
>> We would like to do some batch analytics on our data set stored in
>> Cassandra and are looking for an efficient way to load data from a single
>> table. Not by key, but random 15%, 50% or 100%
>> Data bricks has create an efficient way to load Cassandra data into
>> Apache Spark and they are doing it by reading from the underlying SS tables
>> to load in parallel.
>> Do we have something similarly in Flink, or how is the most efficient way
>> to load all, or many random data from a single Cassandra table into Flink?
>>
>> Any suggestions and/or recommendations is highly appreciated.
>>
>> Thanks in advance
>>
>> Lasse Nedergaard
>>
>

Re: Batch reading from Cassandra. How to?

Posted by Lasse Nedergaard <la...@gmail.com>.
Any good suggestions?

Lasse

Den tir. 11. feb. 2020 kl. 08.48 skrev Lasse Nedergaard <
lassenedergaard@gmail.com>:

> Hi.
>
> We would like to do some batch analytics on our data set stored in
> Cassandra and are looking for an efficient way to load data from a single
> table. Not by key, but random 15%, 50% or 100%
> Data bricks has create an efficient way to load Cassandra data into Apache
> Spark and they are doing it by reading from the underlying SS tables to
> load in parallel.
> Do we have something similarly in Flink, or how is the most efficient way
> to load all, or many random data from a single Cassandra table into Flink?
>
> Any suggestions and/or recommendations is highly appreciated.
>
> Thanks in advance
>
> Lasse Nedergaard
>