You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by ljwagerfield <la...@dmz.wagerfield.com> on 2017/01/05 05:40:35 UTC

Are heterogeneous DataStreams possible?

Our data's schema is defined by our users and is not known at compile time.

All data arrives in via a single Kafka topic and is serialized using the
same serialization tech (to be defined). 

We want to use King.com's RBEA technique to process this data in different
ways at runtime (depending on its schema), using a single topology/DAG.

Therefore, each message passing through the DAG will have a different
schema.

---

My question is, what's the best way to implement a system like this, where
each message may have a different schema, and none of the schemas are known
at compile time, but must use the same DAG?

I've tried using an 'array of heterogenous tuples' which appears to work
fine when playing around in the IDE, but before I continue too far down that
route, I just wanted to verify if there were any known methods for doing
this?

Thanks!
Lawrence



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-possible-tp10852.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: Are heterogeneous DataStreams possible?

Posted by Henri Heiskanen <he...@gmail.com>.
Hi,

We have been using HashMap and has been working fine so far.

Br,
Henkka

On Mon, Jan 9, 2017 at 5:35 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> You could try using JSON for all your data, this might me slow, however.
> The other route, which I would suggest, is to have your own custom
> TypeSerializers than can efficiently deal with different types and dynamic
> schemas.
>
> Cheers,
> Aljoscha
>
> On Thu, 5 Jan 2017 at 07:02 ljwagerfield <la...@dmz.wagerfield.com>
> wrote:
>
>> I should add: the operators determine how to handle each message by
>> inspecting the message's SCHEMA_ID field (every message has a SCHEMA_ID as
>> its first field).
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-
>> mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-
>> possible-tp10852p10853.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>

Re: Are heterogeneous DataStreams possible?

Posted by Aljoscha Krettek <al...@apache.org>.
You could try using JSON for all your data, this might me slow, however.
The other route, which I would suggest, is to have your own custom
TypeSerializers than can efficiently deal with different types and dynamic
schemas.

Cheers,
Aljoscha

On Thu, 5 Jan 2017 at 07:02 ljwagerfield <la...@dmz.wagerfield.com>
wrote:

> I should add: the operators determine how to handle each message by
> inspecting the message's SCHEMA_ID field (every message has a SCHEMA_ID as
> its first field).
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-possible-tp10852p10853.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Re: Are heterogeneous DataStreams possible?

Posted by ljwagerfield <la...@dmz.wagerfield.com>.
I should add: the operators determine how to handle each message by
inspecting the message's SCHEMA_ID field (every message has a SCHEMA_ID as
its first field).



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-possible-tp10852p10853.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.