You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by José Barrueta <jo...@stormpath.com> on 2015/05/12 02:30:15 UTC

Quick question regarding deserialization

Hi all,

We're planning to have a Samza Job that reads from 2 or more Kafka topics,
the messages are serialized using Binary Json, we'd like to deserialize
each message to its specific Java Type, the job configuration only allows
to declare one system msg deserializer, is there any plan to allow per
input deserializers? If not, and you guys find it useful, I'd be happy to
contribute in it.

So the other option I have is to send the raw bytes to the StreamTask and
handle the deserialization in the `process(...)` method based on the stream
source, i.e. envelope.getSystemStreamPartition().getStream() what I don't
like about this approach, is that I'll be coupling the name of a partition
to a specific behavior in the application.

Any other suggestions?

Jose Luis

Re: Quick question regarding deserialization

Posted by Percy Wegmann <pe...@evariant.com>.
Do you have control over the binary JSON?  If so, perhaps you could
include a discriminator in the JSON itself that allows the deserializer to
target the right output type?

Percy Wegmann
Director, Application Platform Engineering
 
cell: 512.710.6991

www.evariant.com <http://www.evariant.com/>
 
 <http://www.shsmd.org/shsmd/conference/13/index.shtml>The information
contained in this email message is intended only for the personal and
confidential use of the recipient(s) named above. If the reader of this
message is not the intended recipient or an agent responsible for
delivering it to the intended recipient, you are hereby notified that you
have received this document in error and that any review, dissemination,
distribution, or copying of this message is strictly prohibited. If you
have received this communication in error, please notify us immediately by
email, and destroy the original message.






On 5/11/15, 7:30 PM, "José Barrueta" <jo...@stormpath.com> wrote:

>Hi all,
>
>We're planning to have a Samza Job that reads from 2 or more Kafka topics,
>the messages are serialized using Binary Json, we'd like to deserialize
>each message to its specific Java Type, the job configuration only allows
>to declare one system msg deserializer, is there any plan to allow per
>input deserializers? If not, and you guys find it useful, I'd be happy to
>contribute in it.
>
>So the other option I have is to send the raw bytes to the StreamTask and
>handle the deserialization in the `process(...)` method based on the
>stream
>source, i.e. envelope.getSystemStreamPartition().getStream() what I don't
>like about this approach, is that I'll be coupling the name of a partition
>to a specific behavior in the application.
>
>Any other suggestions?
>
>Jose Luis


Re: Quick question regarding deserialization

Posted by José Barrueta <jo...@stormpath.com>.
Thanks to both for the responses!

Yi, thanks for pointing that out, I didn't see this before.

systems.system-name.streams.stream-name. samza.msg.serde

That's exactly what I need!

Thanks,

José Luis


On May 11, 2015 6:03 PM, "Yi Pan" <ni...@gmail.com> wrote:

> Hi, Jose,
>
> Please refer to the configure wiki:
>
> http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
> Samza actually allows multiple Serde classes to be defined for different
> topics, as long as you don't have multiple schemas of messages in the same
> topic.
>
> Best,
>
> -Yi
>
> On Mon, May 11, 2015 at 5:30 PM, José Barrueta <jo...@stormpath.com> wrote:
>
> > Hi all,
> >
> > We're planning to have a Samza Job that reads from 2 or more Kafka
> topics,
> > the messages are serialized using Binary Json, we'd like to deserialize
> > each message to its specific Java Type, the job configuration only allows
> > to declare one system msg deserializer, is there any plan to allow per
> > input deserializers? If not, and you guys find it useful, I'd be happy to
> > contribute in it.
> >
> > So the other option I have is to send the raw bytes to the StreamTask and
> > handle the deserialization in the `process(...)` method based on the
> stream
> > source, i.e. envelope.getSystemStreamPartition().getStream() what I don't
> > like about this approach, is that I'll be coupling the name of a
> partition
> > to a specific behavior in the application.
> >
> > Any other suggestions?
> >
> > Jose Luis
> >
>

Re: Quick question regarding deserialization

Posted by Yi Pan <ni...@gmail.com>.
Hi, Jose,

Please refer to the configure wiki:
http://samza.apache.org/learn/documentation/0.9/jobs/configuration-table.html
Samza actually allows multiple Serde classes to be defined for different
topics, as long as you don't have multiple schemas of messages in the same
topic.

Best,

-Yi

On Mon, May 11, 2015 at 5:30 PM, José Barrueta <jo...@stormpath.com> wrote:

> Hi all,
>
> We're planning to have a Samza Job that reads from 2 or more Kafka topics,
> the messages are serialized using Binary Json, we'd like to deserialize
> each message to its specific Java Type, the job configuration only allows
> to declare one system msg deserializer, is there any plan to allow per
> input deserializers? If not, and you guys find it useful, I'd be happy to
> contribute in it.
>
> So the other option I have is to send the raw bytes to the StreamTask and
> handle the deserialization in the `process(...)` method based on the stream
> source, i.e. envelope.getSystemStreamPartition().getStream() what I don't
> like about this approach, is that I'll be coupling the name of a partition
> to a specific behavior in the application.
>
> Any other suggestions?
>
> Jose Luis
>