You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by ljwagerfield <la...@dmz.wagerfield.com> on 2017/01/01 14:13:02 UTC

How do I ensure binary comparisons are being used?

I'd like to understand which operations can actually leverage "binary
comparisons" when using the DataStream API. This is regarding the
optimisation you receive when using Flink's own built-in serialization stack
as opposed to Avro/Kryo/Json/etc... whereby fields are compared without the
object needing to be deserialized.

I'm expecting that all operations which use field positions (i.e. `keyBy`
and `partitionCustom`) leverage binary comparisons, since they don't require
a reference to a deserialized object. **However I cannot see any other
methods that take field-offset parameters... they all take callbacks... so
does anything else in the DataStream API actually perform binary
comparisons?**

For example, the `join` operation requires a closure to perform the equality
check... which means the objects must be deserialized before being passed
into the closure... unless something really clever is happening
under-the-hood?

Please can you provide a list of stream operations that perform binary
comparisons / avoid deserialization?

Thanks!



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-binary-comparisons-are-being-used-tp10806.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: How do I ensure binary comparisons are being used?

Posted by ljwagerfield <la...@dmz.wagerfield.com>.
Thank you Fabian :)



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-binary-comparisons-are-being-used-tp10806p10851.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Re: How do I ensure binary comparisons are being used?

Posted by Fabian Hueske <fh...@gmail.com>.
Hi Lawrence,

comparison of binary data are mainly used by the DataSet API when sorting
large data sets or building and probing hash tables.

The DataStream API mainly benefits from Flink's custom and efficient
serialization when sending data over the wire or taking checkpoints.
There are also plans to implement a state backend based on the
serialization stack which leverages Flink's managed memory instead of
holding object on the heap (the RocksDB state backend is the current
solution to avoid this).

From what I know, the DataStream API does not perform compare on serialized
data.

Best, Fabian



2017-01-03 7:53 GMT+01:00 ljwagerfield <la...@dmz.wagerfield.com>:

> Any insights on this?
>
> Thanks,
> Lawrence
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-
> binary-comparisons-are-being-used-tp10806p10819.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Re: How do I ensure binary comparisons are being used?

Posted by ljwagerfield <la...@dmz.wagerfield.com>.
Any insights on this?

Thanks,
Lawrence



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-binary-comparisons-are-being-used-tp10806p10819.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.