You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Salva Alcántara <sa...@gmail.com> on 2020/12/01 06:01:01 UTC

Best way to handle data streams for non-sealed trait hierarchies

I know that for ADTs (sealed traits) there are some ongoing efforts to
overcome the performance degradation caused by the kryo fallback (see
https://github.com/apache/flink/pull/12929). E.g.,

```
sealed trait Event {
  def id: Int
}
case class Pageview(id: Int, page: String) extends Event
case class Click(id: Int, url: String) extends Event
```

However, is there anything one can do for handling a similar situation but
for unsealed traits. That is, imagine a situation where Event is not sealed
because you don't know in advance which specific events you will be dealing
with, or maybe you are working on a generic framework that will be used to
develop specific applications afterwards, each one dealing with a set of
particular events. So, basically, the trait cannot be sealed. Instead of
simple case classes like Pageview and Click, within each specific
application one could be dealing with auto-generated case classes from
protocol buffer definitions, in a system which should be able to add more
types of events into the mix, so to speak.

How should one address this problem, if serialization performance wants to
be optimized? The general advice is not to use Flink with heterogenous types
to start with, because it will not be able to derive efficient serializers.
But the described use case sounds legit to me, so what would be the best way
to handle it, or, to put it another way, how to minimize performance
degradation due to serialization?

FYI: Posted also in SO:
https://stackoverflow.com/questions/65085335/best-way-to-handle-data-streams-for-non-sealed-trait-hierarchies.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Re: Best way to handle data streams for non-sealed trait hierarchies

Posted by Salva Alcántara <sa...@gmail.com>.
To be clear, the main idea would be to define the "general framework" in
terms of DataStream[Event], where `Event` would be a trait capturing the
required commonalities. Within each specific application, one would work
with different types of events but would ultimately perform a union in order
to combine them in a DataStream[Event].



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/