You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Martijn Visser <ma...@apache.org> on 2022/10/12 20:57:34 UTC

Re: [DISCUSS] Drop Gelly

Hi everyone,

I'm reviving this really old discussion thread, but I just stumbled across Gelly again and realized that this discussion never was finished. 

I'll open up a vote thread for dropping the current DataSet based Gelly library. 

Best regards,

Martijn

On 2022/01/05 03:37:18 Yun Gao wrote:
> Hi,
> 
> Very thanks for initiating the discussion!
> 
> Also +1 to drop the current DataSet based Gelly library so that we could finally drop the 
> legacy DataSet API. 
> 
> For whether to keep the graph computing ability, from my side graph query / graph computing and
> chaining them with the preprocessing pipeline should be an actually existent requirements. 
> Currently we also already have the basis for a graph computing library on DataStream API
> with the new iteration library[1], thus it would be already feasible to have a stream / batch
> unified graph computing library on top of the DataStream API. And it would indeed be most suitable as 
> a separate ecosystem project. 
> 
> Best,
> Yun
> 
> [1] https://cwiki.apache.org/confluence/x/hAEBCw
> 
> 
>  ------------------Original Mail ------------------
> Sender:Martijn Visser <ma...@ververica.com>
> Send Date:Wed Jan 5 02:58:53 2022
> Recipients:Zhipeng Zhang <zh...@gmail.com>
> CC:David Anderson <da...@apache.org>, Till Rohrmann <tr...@apache.org>, dev <de...@flink.apache.org>, User <us...@flink.apache.org>
> Subject:Re: [DISCUSS] Drop Gelly
> 
> Hi Zhipeng,
> 
> I think that we're seeing more code being externalised, for example with the Flink Remote Shuffle service [1] and the ongoing discussion on the external connector repository [2], it makes sense to go for your second option. Maybe it fits under Flink Extended [3]. 
> 
> The main question becomes who can contribute and maintain this library. Another (intermediate) solution might also be to find someone who can migrate/move the current Gelly codebase to use Flink's DataStream API in batch mode, so it wouldn't be using the DataSet API anymore. This has recently also happened with the State Processor API [4]. 
> 
> Best regards,
> 
> Martijn
> 
> [1] https://github.com/flink-extended/flink-remote-shuffle
> [2] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm
> [3] https://github.com/flink-extended/
> [4] https://issues.apache.org/jira/browse/FLINK-24912
> On Tue, 4 Jan 2022 at 14:01, Zhipeng Zhang <zh...@gmail.com> wrote:
> 
> Hi Martijin,
> 
> Thanks for the feedback. I am not proposing  to bundle the new graph library with Alink. I am +1 for dropping the DataSet-based Gelly library, but we probably need a new graph library in Flink for the possible migration.
> 
> We haven't decided what to do yet and probably need more discussion. There are some possible solutions:
> 1. We include a new DataStream-based graph library in FlinkML[1], given that graphs and machine learning algorithms are more often used together [2][3][4]. To achieve this, we could reuse the `AlgoOperator` interface in FlinkML.
> 2. We include a new DataStream-based graph library as a separate module/repo. This is consistent with existing libraries like Spark [5].
> 
> What do you think?
> 
> 
> [1] https://github.com/apache/flink-ml
> [2] https://arxiv.org/abs/1403.6652
> [3] https://arxiv.org/abs/1503.03578
> [4] https://github.com/apache/spark
> 
> Best,
> Zhipeng
> Martijn Visser <ma...@ververica.com> 于2022年1月4日周二 15:27写道:
> 
> Hi Zhipeng,
> 
> Good that you've reached out, I wasn't aware that Gelly is being used in Alink. Are you proposing to write a new graph library as a successor of Gelly and bundle that with Alink? 
> 
> Best regards,
> 
> Martijn
> On Tue, 4 Jan 2022 at 02:57, Zhipeng Zhang <zh...@gmail.com> wrote:
> 
> Hi everyone,
> 
> Thanks for starting the discussion :)
> 
> We (Alink team [1]) are actually using part of the Gelly library to support graph algorithms (connected component, single source shortest path, etc.) for users in Alibaba Inc.
> 
> As DataSet API is going to be dropped, shall we also provide a new graph library based on DataStream runtime (similar as we did for machine learning)?
> 
> [1] https://github.com/Alibaba/alink
> David Anderson <da...@apache.org> 于2022年1月4日周二 00:01写道:
> 
> Most of the inquiries I've had about Gelly in recent memory have been from folks looking for a streaming solution, and it's only been a handful. 
> 
> +1 for dropping Gelly
> 
> David
> On Mon, Jan 3, 2022 at 2:41 PM Till Rohrmann <tr...@apache.org> wrote:
> 
> I haven't seen any changes or requests to/for Gelly in ages. Hence, I would assume that it is not really used and can be removed.
> 
> +1 for dropping Gelly.
> 
> Cheers,
> Till
> On Mon, Jan 3, 2022 at 2:20 PM Martijn Visser <ma...@ververica.com> wrote:
> 
> Hi everyone,
> 
> Flink is bundled with Gelly, a Graph API library [1]. This has been marked as approaching end-of-life for quite some time [2].
> 
> Gelly is built on top of Flink's DataSet API, which is deprecated and slowly being phased out [3]. It only works on batch jobs. Based on the activity in the Dev and User mailing lists, I don't see a lot of questions popping up regarding the usage of Gelly. Removing Gelly would reduce CI time and resources because we won't need to run tests for this anymore. 
> 
> I'm cross-posting this to the User mailing list to see if there are any users of Gelly at the moment. 
> 
> Let me know your thoughts.
> 
> Martijn Visser | Product Manager
> martijn@ververica.com
> 
> [1] https://nightlies.apache.org/flink/flink-docs-stable/docs/libs/gelly/overview/
> [2] https://flink.apache.org/roadmap.html
> [3] https://lists.apache.org/thread/b2y3xx3thbcbtzdphoct5wvzwogs9sqz
> 
> 
> Follow us @VervericaData
> --
> Join Flink Forward - The Apache Flink Conference
> Stream Processing | Event Driven | Real Time
> 
> 
> 
> -- 
> best,
> Zhipeng
> 
> 
> 
> -- 
> best,
> Zhipeng
> 
>