You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by nick xander <ni...@gmail.com> on 2016/08/01 05:41:20 UTC

Different Serde for Store and Changelog

Hi Team,
    We have a requirement where we need to decrypt/encrypt any data coming
in/going out to Samza processor (Kafka used as the system stream). It is
achievable with the different serde for input/output stream however for
Rocksdb key value store backed up by kafka changelog it becomes
challenging.

   With the current state of the system
<https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/serializers/SerdeManager.scala#L40>
the only way to achieve encryption/decryption for the changelog stream is
to make the key value store have a serde which also does
encryption/decryption. But the problem is if we try to do a iteration of
the rocksdb store in window task, we will be unnecessarily
encrypting/decrypting the data(which is present locally in the system)
which will have impact on the performance. So is there any other way to use
a different serde for key value store but before sending the data to kafka
changelog use a different serde(doing encryption)?

I believe we don't need to do the encryption if we can securely connect to
Kafka with 0.10 release, however Samza is using the old consumer, so we
cannot leverage the feature. Is there any timeline regarding the new kafka
client integration(Samza-855
<https://issues.apache.org/jira/browse/SAMZA-855>)?

Also unrelated to this topic when could we expect the next Samza release?

Thank you for the support.

Regards,
Nick

Re: Different Serde for Store and Changelog

Posted by Yi Pan <ni...@gmail.com>.
Hi, Nick,

Thanks a lot for the input. Does it work for you if you only encrypt the
value? If that works, you won't have the problem w/ the order of keys in
RocksDB store. Regarding to the decryption cost, if you enable the cache
store, most of the cache access is to get the deserialized objects. Hence,
the cost would be mitigated when you have a larger cache store.

We will consider the request to separate the changelog serde vs the store
serde. But that's a bigger scope of work. Please let us know if you have
further questions.

As for SAMZA-855, I will ping Robert Crim to get the update.

P.S. 0.10.1 RC is out for the vote. Feel free to try it out and vote!

Thanks!

-Yi

On Sun, Jul 31, 2016 at 10:41 PM, nick xander <ni...@gmail.com>
wrote:

> Hi Team,
>     We have a requirement where we need to decrypt/encrypt any data coming
> in/going out to Samza processor (Kafka used as the system stream). It is
> achievable with the different serde for input/output stream however for
> Rocksdb key value store backed up by kafka changelog it becomes
> challenging.
>
>    With the current state of the system
> <
> https://github.com/apache/samza/blob/master/samza-core/src/main/scala/org/apache/samza/serializers/SerdeManager.scala#L40
> >
> the only way to achieve encryption/decryption for the changelog stream is
> to make the key value store have a serde which also does
> encryption/decryption. But the problem is if we try to do a iteration of
> the rocksdb store in window task, we will be unnecessarily
> encrypting/decrypting the data(which is present locally in the system)
> which will have impact on the performance. So is there any other way to use
> a different serde for key value store but before sending the data to kafka
> changelog use a different serde(doing encryption)?
>
> I believe we don't need to do the encryption if we can securely connect to
> Kafka with 0.10 release, however Samza is using the old consumer, so we
> cannot leverage the feature. Is there any timeline regarding the new kafka
> client integration(Samza-855
> <https://issues.apache.org/jira/browse/SAMZA-855>)?
>
> Also unrelated to this topic when could we expect the next Samza release?
>
> Thank you for the support.
>
> Regards,
> Nick
>