You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Robert Metzger <rm...@apache.org> on 2015/02/05 18:18:44 UTC

Fwd: [jira] [Created] (MRQL-66) Add support for MRQL streaming in Flink streaming mode

Just FYI from the MRQL mailing list.
Maybe somebody from the streaming folks want to give some advice or help.

---------- Forwarded message ----------
From: Leonidas Fegaras (JIRA) <ji...@apache.org>
Date: Thu, Feb 5, 2015 at 5:56 PM
Subject: [jira] [Created] (MRQL-66) Add support for MRQL streaming in Flink
streaming mode
To: dev@mrql.incubator.apache.org


Leonidas Fegaras created MRQL-66:
------------------------------------

             Summary: Add support for MRQL streaming in Flink streaming mode
                 Key: MRQL-66
                 URL: https://issues.apache.org/jira/browse/MRQL-66
             Project: MRQL
          Issue Type: New Feature
          Components: Run-Time/Flink, Streaming
    Affects Versions: 0.9.6
            Reporter: Leonidas Fegaras
            Priority: Critical


The new extension, MRQL Streaming, works fine with Spark Streaming (see
MRQL-63) but it would be nice if we make it work with Flink Streaming too.
It was easy to make it work with Spark Streaming: Data in one sliding
window in a Spark's DStream is viewed as an RDD. So a DStream can be viewed
as a continuous sequence of RDDs. A DStream has a method foreachRDD that
applies a function to each RDD in the stream. So to implement MRQL
Streaming, we just had to use the MRQL Spark evaluator (a function from RDD
to RDD) as an argument to foreachRDD. For Flink Streaming, the
implementation will be more complicated. A Flink Streaming DataStream
doesn't provide a hook to a DataSet object. I am guessing that this is
because Flink Streaming is far more general than Spark Streaming (it's not
just sliding windows) and because Flink Streaming needs to do special
optimizations. So we need to copy the FlinkEvaluator class into a new class
FlinkStreaming and change all methods to be on DataStream instead of
DataSet. Many DataSet methods have an equivalent in DataStream but some are
missing. I have already provided the input formats for streaming (method
FlinkStreaming.stream_source) but we need to write a stream evaluator for
MRQL plans.
Any volunteer?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)