You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Eleanore Jin <el...@gmail.com> on 2020/05/18 02:39:27 UTC

run flink on edge vs hub

Hi Community,

Currently we are running flink in 'hub' data centers where data is ingested
into the platform via kafka, and flink job will read from kafka, do the
transformations, and publish to another kafka topic.

I would also like to see if the same logic (read input message -> do
transformation -> return output message) can be applied on 'edge' data
centers.

The requirement for run on 'edge' is to return the response synchronously.
Like the synchronous http based request/response.

Can you please provide some guidance/thoughts on this?

Thanks a lot!
Eleanore

Re: run flink on edge vs hub

Posted by Eleanore Jin <el...@gmail.com>.
Hi Arvid,

Thanks for the suggestion! I will tryout to see how it works.

Best,
Eleanore

On Mon, May 18, 2020 at 8:04 AM Arvid Heise <ar...@ververica.com> wrote:

> Hi Eleanore,
>
> The question in general is what you understand under edge data centers as
> the term is pretty fuzzy. Since Flink is running on Java, it's not suitable
> for embedded clusters as of now. There is plenty of work done already to
> tests that Flink runs on ARM clusters [1].
>
> If you just mean in general moving away from a monolithic hub cluster to
> smaller clusters, then this is easily done with Flink on the compute side.
> The question is rather how data storage should look in such an edge setting
> and how the interfaces look.
>
> From your example, it seems as if you want to use Flink as a reactive
> server, possibly easily scalable. If so, then yes it is possible with
> Flink, even though I'd say it's not the primary use case for Flink. In any
> case, synchronous requests will be a bit difficult/unnatural. I'd probably
> go for an async job pattern. So Flink listens to some port for requests (
> socketTextStream [2]) with a job id, processes data and keeps the data in
> state keyed by job id. The client then uses the job id to fetch the job
> state through queryable state [2]. The responses eventually time out
> through TTL [4].
>
> Of course, you'd put a small proxy in front of that composited job
> (separate input/query port) that translates the queries from the client to
> the Flink job. The proxy would most likely also generate the job id and
> return it to the client. Ultimately, that proxy could offer a synchronous
> interface and pull for the result itself, but that makes the proxy suddenly
> quite heavy.
>
> The proxy setup can be reused for different edge clusters making it a one
> time investment. Note that there are other software stacks for reactive
> servers that offer the functionality out of the box.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-ARM-support-for-Flink-td30298.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#data-sources
> [3]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html
> [4]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl
>
> On Mon, May 18, 2020 at 4:39 AM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi Community,
>>
>> Currently we are running flink in 'hub' data centers where data is
>> ingested into the platform via kafka, and flink job will read from kafka,
>> do the transformations, and publish to another kafka topic.
>>
>> I would also like to see if the same logic (read input message -> do
>> transformation -> return output message) can be applied on 'edge' data
>> centers.
>>
>> The requirement for run on 'edge' is to return the response
>> synchronously. Like the synchronous http based request/response.
>>
>> Can you please provide some guidance/thoughts on this?
>>
>> Thanks a lot!
>> Eleanore
>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: run flink on edge vs hub

Posted by Eleanore Jin <el...@gmail.com>.
Hi Arvid,

Thanks for the suggestion! I will tryout to see how it works.

Best,
Eleanore

On Mon, May 18, 2020 at 8:04 AM Arvid Heise <ar...@ververica.com> wrote:

> Hi Eleanore,
>
> The question in general is what you understand under edge data centers as
> the term is pretty fuzzy. Since Flink is running on Java, it's not suitable
> for embedded clusters as of now. There is plenty of work done already to
> tests that Flink runs on ARM clusters [1].
>
> If you just mean in general moving away from a monolithic hub cluster to
> smaller clusters, then this is easily done with Flink on the compute side.
> The question is rather how data storage should look in such an edge setting
> and how the interfaces look.
>
> From your example, it seems as if you want to use Flink as a reactive
> server, possibly easily scalable. If so, then yes it is possible with
> Flink, even though I'd say it's not the primary use case for Flink. In any
> case, synchronous requests will be a bit difficult/unnatural. I'd probably
> go for an async job pattern. So Flink listens to some port for requests (
> socketTextStream [2]) with a job id, processes data and keeps the data in
> state keyed by job id. The client then uses the job id to fetch the job
> state through queryable state [2]. The responses eventually time out
> through TTL [4].
>
> Of course, you'd put a small proxy in front of that composited job
> (separate input/query port) that translates the queries from the client to
> the Flink job. The proxy would most likely also generate the job id and
> return it to the client. Ultimately, that proxy could offer a synchronous
> interface and pull for the result itself, but that makes the proxy suddenly
> quite heavy.
>
> The proxy setup can be reused for different edge clusters making it a one
> time investment. Note that there are other software stacks for reactive
> servers that offer the functionality out of the box.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-ARM-support-for-Flink-td30298.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#data-sources
> [3]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html
> [4]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl
>
> On Mon, May 18, 2020 at 4:39 AM Eleanore Jin <el...@gmail.com>
> wrote:
>
>> Hi Community,
>>
>> Currently we are running flink in 'hub' data centers where data is
>> ingested into the platform via kafka, and flink job will read from kafka,
>> do the transformations, and publish to another kafka topic.
>>
>> I would also like to see if the same logic (read input message -> do
>> transformation -> return output message) can be applied on 'edge' data
>> centers.
>>
>> The requirement for run on 'edge' is to return the response
>> synchronously. Like the synchronous http based request/response.
>>
>> Can you please provide some guidance/thoughts on this?
>>
>> Thanks a lot!
>> Eleanore
>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: run flink on edge vs hub

Posted by Arvid Heise <ar...@ververica.com>.
Hi Eleanore,

The question in general is what you understand under edge data centers as
the term is pretty fuzzy. Since Flink is running on Java, it's not suitable
for embedded clusters as of now. There is plenty of work done already to
tests that Flink runs on ARM clusters [1].

If you just mean in general moving away from a monolithic hub cluster to
smaller clusters, then this is easily done with Flink on the compute side.
The question is rather how data storage should look in such an edge setting
and how the interfaces look.

From your example, it seems as if you want to use Flink as a reactive
server, possibly easily scalable. If so, then yes it is possible with
Flink, even though I'd say it's not the primary use case for Flink. In any
case, synchronous requests will be a bit difficult/unnatural. I'd probably
go for an async job pattern. So Flink listens to some port for requests (
socketTextStream [2]) with a job id, processes data and keeps the data in
state keyed by job id. The client then uses the job id to fetch the job
state through queryable state [2]. The responses eventually time out
through TTL [4].

Of course, you'd put a small proxy in front of that composited job
(separate input/query port) that translates the queries from the client to
the Flink job. The proxy would most likely also generate the job id and
return it to the client. Ultimately, that proxy could offer a synchronous
interface and pull for the result itself, but that makes the proxy suddenly
quite heavy.

The proxy setup can be reused for different edge clusters making it a one
time investment. Note that there are other software stacks for reactive
servers that offer the functionality out of the box.

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-ARM-support-for-Flink-td30298.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/datastream_api.html#data-sources
[3]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/queryable_state.html
[4]
https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/state.html#state-time-to-live-ttl

On Mon, May 18, 2020 at 4:39 AM Eleanore Jin <el...@gmail.com> wrote:

> Hi Community,
>
> Currently we are running flink in 'hub' data centers where data is
> ingested into the platform via kafka, and flink job will read from kafka,
> do the transformations, and publish to another kafka topic.
>
> I would also like to see if the same logic (read input message -> do
> transformation -> return output message) can be applied on 'edge' data
> centers.
>
> The requirement for run on 'edge' is to return the response synchronously.
> Like the synchronous http based request/response.
>
> Can you please provide some guidance/thoughts on this?
>
> Thanks a lot!
> Eleanore
>
>

-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng