You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Dipanjan Mazumder <ja...@yahoo.com> on 2021/09/10 03:08:00 UTC

Usecase for flink

Hi,
   I am working on a usecase and thinking of using flink for the same. The use case is i will be having many large resource graphs , i need to parse that graph for each node and edge and evaluate each one of them against some suddhi rules , right now the implementation for evaluating individual entities with flink and siddhi are in place , but i am in dilemma whether i should do the graph processing as well in flink or not.So this is what i am planning to do
From kafka will fetch the graph , decompose the graph into nodes and edges , fetch additional meradata for each node and edge from different Rest API’s and then pass the individual nodes and edges which are resources to different substreams which are already inplace and rules will work on individual substreams to process individual nodes and edges and finally they will spit the rule output into a stream. I will collate all of them based on the graph id from that stream using another operator and send the final result to an outputstream.
This is what i am thinking , now need input from all of you whether this is a fair usecase to do with flink , will flink be able to handle this level of processing at scale and volume or not.
Any help input will ease my understanding and will help me go ahead with this idea.
Regarddipanjan

Re: Usecase for flink

Posted by Timo Walther <tw...@apache.org>.

Hi Dipanjan,

Gelly is built on top of the DataSet API which is a batch-only API that 
is slowly phasing out.

It is not possible to connect a DataStream API program with a DataSet 
API program unless you go through a connector such as CSV in between.

Regards,
Timo


On 10.09.21 09:09, Dipanjan Mazumder wrote:
> Hi Jing,
> 
>      Thanks for the input another question i had was can Gelly be used 
> for processing the graph that flink receives through kafka and then 
> using Gelly i decompose the graph into its nodes and edges and then 
> process them individually through substreams and then write the final 
> output of processing the graph somewhere.
> 
> I saw Gelly is for batch processing but had this question if it supports 
> above , it will solve my entire use case.
> 
> Regards
> Dipanjan
> 
> On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG 
> <be...@gmail.com> wrote:
> 
> 
> Hi Dipanjan,
> Base your description, I think Flink could handle this user case.
> Don't worry that Flink can't handle this kind of data scale because 
> Flink is a distributed engine. As long as the problem of data skew is 
> carefully avoided, the input throughput can be handled through 
> appropriate resources.
> 
> Best,
> JING ZHANG
> 
> Dipanjan Mazumder <javahub@yahoo.com <ma...@yahoo.com>> 于2021 
> 年9月10日周五 上午11:11写道：
> 
>     Hi,
> 
>         I am working on a usecase and thinking of using flink for the
>     same. The use case is i will be having many large resource graphs ,
>     i need to parse that graph for each node and edge and evaluate each
>     one of them against some suddhi rules , right now the implementation
>     for evaluating individual entities with flink and siddhi are in
>     place , but i am in dilemma whether i should do the graph processing
>     as well in flink or not.
>     So this is what i am planning to do
> 
>      From kafka will fetch the graph , decompose the graph into nodes
>     and edges , fetch additional meradata for each node and edge from
>     different Rest API’s and then pass the individual nodes and edges
>     which are resources to different substreams which are already
>     inplace and rules will work on individual substreams to process
>     individual nodes and edges and finally they will spit the rule
>     output into a stream. I will collate all of them based on the graph
>     id from that stream using another operator and send the final result
>     to an outputstream.
> 
>     This is what i am thinking , now need input from all of you whether
>     this is a fair usecase to do with flink , will flink be able to
>     handle this level of processing at scale and volume or not.
> 
>     Any help input will ease my understanding and will help me go ahead
>     with this idea.
> 
>     Regard
>     dipanjan
>

Re: Usecase for flink

Posted by Timo Walther <tw...@apache.org>.

If your graphs fit in memory (at least after an initial partitioning), 
you could use any external library for graph processing within a single 
node in a Flink ProcessFunction.

Flink is a general data processor that allows to have arbitrary logic 
where user code is allowed.

Regards,
Timo

On 10.09.21 15:13, Dipanjan Mazumder wrote:
> Good point what is the better option for graph processing with flink.. 
> any suggestions
> 
> On Friday, September 10, 2021, 04:52:30 PM GMT+5:30, Martijn Visser 
> <ma...@ververica.com> wrote:
> 
> 
> Hi,
> 
> Please keep in mind that Gelly is approaching end-of-life [1]
> 
> Regards,
> 
> Martijn
> 
> [1] https://flink.apache.org/roadmap.html 
> <https://flink.apache.org/roadmap.html>
> 
> On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder <javahub@yahoo.com 
> <ma...@yahoo.com>> wrote:
> 
>     Hi Jing,
> 
>          Thanks for the input another question i had was can Gelly be
>     used for processing the graph that flink receives through kafka and
>     then using Gelly i decompose the graph into its nodes and edges and
>     then process them individually through substreams and then write the
>     final output of processing the graph somewhere.
> 
>     I saw Gelly is for batch processing but had this question if it
>     supports above , it will solve my entire use case.
> 
>     Regards
>     Dipanjan
> 
>     On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG
>     <beyond1920@gmail.com <ma...@gmail.com>> wrote:
> 
> 
>     Hi Dipanjan,
>     Base your description, I think Flink could handle this user case.
>     Don't worry that Flink can't handle this kind of data scale because
>     Flink is a distributed engine. As long as the problem of data skew
>     is carefully avoided, the input throughput can be handled through
>     appropriate resources.
> 
>     Best,
>     JING ZHANG
> 
>     Dipanjan Mazumder <javahub@yahoo.com <ma...@yahoo.com>> 于
>     2021年9月10日周五 上午11:11写道：
> 
>         Hi,
> 
>             I am working on a usecase and thinking of using flink for
>         the same. The use case is i will be having many large resource
>         graphs , i need to parse that graph for each node and edge and
>         evaluate each one of them against some suddhi rules , right now
>         the implementation for evaluating individual entities with flink
>         and siddhi are in place , but i am in dilemma whether i should
>         do the graph processing as well in flink or not.
>         So this is what i am planning to do
> 
>          From kafka will fetch the graph , decompose the graph into
>         nodes and edges , fetch additional meradata for each node and
>         edge from different Rest API’s and then pass the individual
>         nodes and edges which are resources to different substreams
>         which are already inplace and rules will work on individual
>         substreams to process individual nodes and edges and finally
>         they will spit the rule output into a stream. I will collate all
>         of them based on the graph id from that stream using another
>         operator and send the final result to an outputstream.
> 
>         This is what i am thinking , now need input from all of you
>         whether this is a fair usecase to do with flink , will flink be
>         able to handle this level of processing at scale and volume or not.
> 
>         Any help input will ease my understanding and will help me go
>         ahead with this idea.
> 
>         Regard
>         dipanjan
>

Re: Usecase for flink

Posted by Dipanjan Mazumder <ja...@yahoo.com>.

 Good point what is the better option for graph processing with flink.. any suggestions
    On Friday, September 10, 2021, 04:52:30 PM GMT+5:30, Martijn Visser <ma...@ververica.com> wrote:  
 
 Hi,
Please keep in mind that Gelly is approaching end-of-life [1] 
Regards,
Martijn
[1] https://flink.apache.org/roadmap.html
On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder <ja...@yahoo.com> wrote:

 Hi Jing,
    Thanks for the input another question i had was can Gelly be used for processing the graph that flink receives through kafka and then using Gelly i decompose the graph into its nodes and edges and then process them individually through substreams and then write the final output of processing the graph somewhere. 
I saw Gelly is for batch processing but had this question if it supports above , it will solve my entire use case.
RegardsDipanjan
    On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG <be...@gmail.com> wrote:  
 
 Hi Dipanjan,Base your description, I think Flink could handle this user case. Don't worry that Flink can't handle this kind of data scale because Flink is a distributed engine. As long as the problem of data skew is carefully avoided, the input throughput can be handled through appropriate resources.

Best,JING ZHANG
Dipanjan Mazumder <ja...@yahoo.com> 于2021年9月10日周五 上午11:11写道：

Hi,
   I am working on a usecase and thinking of using flink for the same. The use case is i will be having many large resource graphs , i need to parse that graph for each node and edge and evaluate each one of them against some suddhi rules , right now the implementation for evaluating individual entities with flink and siddhi are in place , but i am in dilemma whether i should do the graph processing as well in flink or not.So this is what i am planning to do
From kafka will fetch the graph , decompose the graph into nodes and edges , fetch additional meradata for each node and edge from different Rest API’s and then pass the individual nodes and edges which are resources to different substreams which are already inplace and rules will work on individual substreams to process individual nodes and edges and finally they will spit the rule output into a stream. I will collate all of them based on the graph id from that stream using another operator and send the final result to an outputstream.
This is what i am thinking , now need input from all of you whether this is a fair usecase to do with flink , will flink be able to handle this level of processing at scale and volume or not.
Any help input will ease my understanding and will help me go ahead with this idea.
Regarddipanjan

Re: Usecase for flink

Posted by Martijn Visser <ma...@ververica.com>.

Hi,

Please keep in mind that Gelly is approaching end-of-life [1]

Regards,

Martijn

[1] https://flink.apache.org/roadmap.html

On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder <ja...@yahoo.com> wrote:

> Hi Jing,
>
>     Thanks for the input another question i had was can Gelly be used for
> processing the graph that flink receives through kafka and then using Gelly
> i decompose the graph into its nodes and edges and then process them
> individually through substreams and then write the final output of
> processing the graph somewhere.
>
> I saw Gelly is for batch processing but had this question if it supports
> above , it will solve my entire use case.
>
> Regards
> Dipanjan
>
> On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG <
> beyond1920@gmail.com> wrote:
>
>
> Hi Dipanjan,
> Base your description, I think Flink could handle this user case.
> Don't worry that Flink can't handle this kind of data scale because Flink
> is a distributed engine. As long as the problem of data skew is carefully
> avoided, the input throughput can be handled through appropriate resources.
>
> Best,
> JING ZHANG
>
> Dipanjan Mazumder <ja...@yahoo.com> 于2021年9月10日周五 上午11:11写道：
>
> Hi,
>
>    I am working on a usecase and thinking of using flink for the same.
> The use case is i will be having many large resource graphs , i need to
> parse that graph for each node and edge and evaluate each one of them
> against some suddhi rules , right now the implementation for evaluating
> individual entities with flink and siddhi are in place , but i am in
> dilemma whether i should do the graph processing as well in flink or not.
> So this is what i am planning to do
>
> From kafka will fetch the graph , decompose the graph into nodes and edges
> , fetch additional meradata for each node and edge from different Rest
> API’s and then pass the individual nodes and edges which are resources to
> different substreams which are already inplace and rules will work on
> individual substreams to process individual nodes and edges and finally
> they will spit the rule output into a stream. I will collate all of them
> based on the graph id from that stream using another operator and send the
> final result to an outputstream.
>
> This is what i am thinking , now need input from all of you whether this
> is a fair usecase to do with flink , will flink be able to handle this
> level of processing at scale and volume or not.
>
> Any help input will ease my understanding and will help me go ahead with
> this idea.
>
> Regard
> dipanjan
>
>

Re: Usecase for flink

Posted by Dipanjan Mazumder <ja...@yahoo.com>.

 Hi Jing,
    Thanks for the input another question i had was can Gelly be used for processing the graph that flink receives through kafka and then using Gelly i decompose the graph into its nodes and edges and then process them individually through substreams and then write the final output of processing the graph somewhere. 
I saw Gelly is for batch processing but had this question if it supports above , it will solve my entire use case.
RegardsDipanjan
    On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG <be...@gmail.com> wrote:  
 
 Hi Dipanjan,Base your description, I think Flink could handle this user case. Don't worry that Flink can't handle this kind of data scale because Flink is a distributed engine. As long as the problem of data skew is carefully avoided, the input throughput can be handled through appropriate resources.

Best,JING ZHANG
Dipanjan Mazumder <ja...@yahoo.com> 于2021年9月10日周五 上午11:11写道：

Hi,
   I am working on a usecase and thinking of using flink for the same. The use case is i will be having many large resource graphs , i need to parse that graph for each node and edge and evaluate each one of them against some suddhi rules , right now the implementation for evaluating individual entities with flink and siddhi are in place , but i am in dilemma whether i should do the graph processing as well in flink or not.So this is what i am planning to do
From kafka will fetch the graph , decompose the graph into nodes and edges , fetch additional meradata for each node and edge from different Rest API’s and then pass the individual nodes and edges which are resources to different substreams which are already inplace and rules will work on individual substreams to process individual nodes and edges and finally they will spit the rule output into a stream. I will collate all of them based on the graph id from that stream using another operator and send the final result to an outputstream.
This is what i am thinking , now need input from all of you whether this is a fair usecase to do with flink , will flink be able to handle this level of processing at scale and volume or not.
Any help input will ease my understanding and will help me go ahead with this idea.
Regarddipanjan

Re: Usecase for flink

Posted by JING ZHANG <be...@gmail.com>.

Hi Dipanjan,
Base your description, I think Flink could handle this user case.
Don't worry that Flink can't handle this kind of data scale because Flink
is a distributed engine. As long as the problem of data skew is carefully
avoided, the input throughput can be handled through appropriate resources.

Best,
JING ZHANG

Dipanjan Mazumder <ja...@yahoo.com> 于2021年9月10日周五 上午11:11写道：

> Hi,
>
>    I am working on a usecase and thinking of using flink for the same.
> The use case is i will be having many large resource graphs , i need to
> parse that graph for each node and edge and evaluate each one of them
> against some suddhi rules , right now the implementation for evaluating
> individual entities with flink and siddhi are in place , but i am in
> dilemma whether i should do the graph processing as well in flink or not.
> So this is what i am planning to do
>
> From kafka will fetch the graph , decompose the graph into nodes and edges
> , fetch additional meradata for each node and edge from different Rest
> API’s and then pass the individual nodes and edges which are resources to
> different substreams which are already inplace and rules will work on
> individual substreams to process individual nodes and edges and finally
> they will spit the rule output into a stream. I will collate all of them
> based on the graph id from that stream using another operator and send the
> final result to an outputstream.
>
> This is what i am thinking , now need input from all of you whether this
> is a fair usecase to do with flink , will flink be able to handle this
> level of processing at scale and volume or not.
>
> Any help input will ease my understanding and will help me go ahead with
> this idea.
>
> Regard
> dipanjan
>