You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by eric hoffmann <si...@gmail.com> on 2018/11/15 07:32:02 UTC

Join Dataset in stream

Hi.
I need to compute an euclidian distance between an input Vector and a full
dataset stored in Cassandra and keep the n lowest value. The Cassandra
dataset is evolving (mutable). I could do this on a batch job, but i will
have to triger it each time and the input are more like a slow stream, but
the computing need to be fast can i do this on a stream way? is there any
better solution ?
Thx

Re: Join Dataset in stream

Posted by Ken Krugler <kk...@transpac.com>.
Hi Eric,

This sounds like a use case for BroadcastProcessFunction <https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/functions/co/BroadcastProcessFunction.html>  You’d use the Cassandra dataset as the source for the broadcast stream, which is distributed to every parallel instance of your custom BroadcastProcessFunction. The input vectors are a partitioned stream that’s the other input to this function (via its processElement() method). The two streams get connected as a BroadcastConnectedStream <https://ci.apache.org/projects/flink/flink-docs-stable/api/java/org/apache/flink/streaming/api/datastream/BroadcastConnectedStream.html>.

Note that as of Flink 1.5 it’s also easy to maintain the broadcast state <https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html>.

— Ken

> On Nov 14, 2018, at 11:32 PM, eric hoffmann <sigfrid.hoffmann@gmail.com <ma...@gmail.com>> wrote:
> 
> 
> Hi.
> I need to compute an euclidian distance between an input Vector and a full dataset stored in Cassandra and keep the n lowest value. The Cassandra dataset is evolving (mutable). I could do this on a batch job, but i will have to triger it each time and the input are more like a slow stream, but the computing need to be fast can i do this on a stream way? is there any better solution ?
> Thx

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com <http://www.scaleunlimited.com/>
Custom big data solutions & training
Flink, Solr, Hadoop, Cascading & Cassandra