You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Priya Ch <le...@gmail.com> on 2015/08/13 20:30:17 UTC

Write to cassandra...each individual statement

Hi All,

 I have a question in writing rdd to cassandra. Instead of writing entire
rdd to cassandra, i want to write individual statement into cassandra
beacuse there is a need to perform to ETL on each message ( which requires
checking with the DB).
How could i insert statements individually? Using
CassandraConnector.session ??

If so, what is the performance impact of this ? How about using
sc.parallelize() for eah message in the rdd and then insert into cassandra ?

Thanks,
Padma Ch

Re: Write to cassandra...each individual statement

Posted by Philip Weaver <ph...@gmail.com>.
So you need some state between messages in a partition. You can use
mapPartitions or foreachPartition, which allow you to write code to process
an entire partition.

On Thu, Aug 13, 2015 at 11:48 AM, Priya Ch <le...@gmail.com>
wrote:

> Hi Philip,
>
>  I have the following requirement -
> I read the streams of data from various partitions of kafka topic. And
> then I union the dstreams and apply hash partitioner so messages of same
> key would go into single partition of an rdd, which is ofcourse handled by
> a single thread. This way we trying to resolve concurrency issue.
>
> Now one of the partitions of the rdd holds messages with same key. Let's
> say 1st message in the partition may correspond to ticket issuance and 2nd
> message might corresponds to update on the ticket. Now while handling 1st
> message there is different logic and 2nd message's logic depends on 1st
> message.
> Hence using rdd.foreach i am handling different logic for individual
> messages. Now bulk rdd.saveToCassandra will now work.
>
> Hope you got what i am trying to say..
>
> On Fri, Aug 14, 2015 at 12:07 AM, Philip Weaver <ph...@gmail.com>
> wrote:
>
>> All you'd need to do is *transform* the rdd before writing it, e.g.
>> using the .map function.
>>
>>
>> On Thu, Aug 13, 2015 at 11:30 AM, Priya Ch <le...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>>  I have a question in writing rdd to cassandra. Instead of writing
>>> entire rdd to cassandra, i want to write individual statement into
>>> cassandra beacuse there is a need to perform to ETL on each message ( which
>>> requires checking with the DB).
>>> How could i insert statements individually? Using
>>> CassandraConnector.session ??
>>>
>>> If so, what is the performance impact of this ? How about using
>>> sc.parallelize() for eah message in the rdd and then insert into cassandra ?
>>>
>>> Thanks,
>>> Padma Ch
>>>
>>
>>
>

Re: Write to cassandra...each individual statement

Posted by Priya Ch <le...@gmail.com>.
Hi Philip,

 I have the following requirement -
I read the streams of data from various partitions of kafka topic. And then
I union the dstreams and apply hash partitioner so messages of same key
would go into single partition of an rdd, which is ofcourse handled by a
single thread. This way we trying to resolve concurrency issue.

Now one of the partitions of the rdd holds messages with same key. Let's
say 1st message in the partition may correspond to ticket issuance and 2nd
message might corresponds to update on the ticket. Now while handling 1st
message there is different logic and 2nd message's logic depends on 1st
message.
Hence using rdd.foreach i am handling different logic for individual
messages. Now bulk rdd.saveToCassandra will now work.

Hope you got what i am trying to say..

On Fri, Aug 14, 2015 at 12:07 AM, Philip Weaver <ph...@gmail.com>
wrote:

> All you'd need to do is *transform* the rdd before writing it, e.g. using
> the .map function.
>
>
> On Thu, Aug 13, 2015 at 11:30 AM, Priya Ch <le...@gmail.com>
> wrote:
>
>> Hi All,
>>
>>  I have a question in writing rdd to cassandra. Instead of writing entire
>> rdd to cassandra, i want to write individual statement into cassandra
>> beacuse there is a need to perform to ETL on each message ( which requires
>> checking with the DB).
>> How could i insert statements individually? Using
>> CassandraConnector.session ??
>>
>> If so, what is the performance impact of this ? How about using
>> sc.parallelize() for eah message in the rdd and then insert into cassandra ?
>>
>> Thanks,
>> Padma Ch
>>
>
>

Re: Write to cassandra...each individual statement

Posted by Philip Weaver <ph...@gmail.com>.
All you'd need to do is *transform* the rdd before writing it, e.g. using
the .map function.


On Thu, Aug 13, 2015 at 11:30 AM, Priya Ch <le...@gmail.com>
wrote:

> Hi All,
>
>  I have a question in writing rdd to cassandra. Instead of writing entire
> rdd to cassandra, i want to write individual statement into cassandra
> beacuse there is a need to perform to ETL on each message ( which requires
> checking with the DB).
> How could i insert statements individually? Using
> CassandraConnector.session ??
>
> If so, what is the performance impact of this ? How about using
> sc.parallelize() for eah message in the rdd and then insert into cassandra ?
>
> Thanks,
> Padma Ch
>