You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@flink.apache.org by Edward Lee <ed...@gmail.com> on 2015/12/27 08:51:45 UTC

Why does Flink copy code from Spark?

Lately I have been studying the source code to understand the internals.
One thing that really surprised me was that a lot of code throughout Flink
was very similar to Spark.

Open source projects learn from each other and apply similar ideas.
However, I am not talking about applying similar ideas. I am talking about
literal copy of code. Many files seemed like they were created by
copy-pasting code directly from Spark and then renaming the variable names
to avoid looking identical.

As I study more, I find "copy-pasted" code throughout Flink, from actors to
machine learning to analyzer to code generation. A few files have
attribution, but most of them do not.

I thought Flink was more advanced. Why?

Re: Why does Flink copy code from Spark?

Posted by Maximilian Michels <mx...@apache.org>.

Flink and Spark are open source projects which both have similar
problem domains. In some parts, their methodologies are similar, e.g.
because they build on Hadoop, use the Akka library, or implement
machine learning algorithms. In other parts, they are very different,
e.g. pipelined (Flink) vs batch (Spark) data transfer, real-time
(Flink) vs mini-batched (Spark) streaming, RDD-based memory execution
(Spark) vs out-of-core algorithms and graceful out-of-memory memory
handling (Flink).

Some of these differences may seem subtle but they are backed by
different philosophies and origins. Both, Flink and Spark, are complex
systems which have their pros and cons. Whether people use Flink or
Spark depends on their use cases.

As a Flink committer, it hurts a lot to hear such claims. I know how
much dedication and proficiency we have in the Flink community. If we
included any code which is subject to copyright, I would like to
resolve this. However, I'm not aware of any violation. If you make
such strong accusations, please provide a proper proof. Otherwise,
your message may be seen as an act of defamation or trolling.

Best regards,
Max

On Sun, Dec 27, 2015 at 8:51 AM, Edward Lee <ed...@gmail.com> wrote:
> Lately I have been studying the source code to understand the internals.
> One thing that really surprised me was that a lot of code throughout Flink
> was very similar to Spark.
>
> Open source projects learn from each other and apply similar ideas.
> However, I am not talking about applying similar ideas. I am talking about
> literal copy of code. Many files seemed like they were created by
> copy-pasting code directly from Spark and then renaming the variable names
> to avoid looking identical.
>
> As I study more, I find "copy-pasted" code throughout Flink, from actors to
> machine learning to analyzer to code generation. A few files have
> attribution, but most of them do not.
>
> I thought Flink was more advanced. Why?

Re: Why does Flink copy code from Spark?

Posted by Stefano Baghino <st...@radicalbit.io>.

Hi Edward, I'm pretty new at Flink and I'm interested at looking at that
code. Can you pinpoint some source files so that I can study them?

On Sun, Dec 27, 2015 at 8:51 AM, Edward Lee <ed...@gmail.com> wrote:

> Lately I have been studying the source code to understand the internals.
> One thing that really surprised me was that a lot of code throughout Flink
> was very similar to Spark.
>
> Open source projects learn from each other and apply similar ideas.
> However, I am not talking about applying similar ideas. I am talking about
> literal copy of code. Many files seemed like they were created by
> copy-pasting code directly from Spark and then renaming the variable names
> to avoid looking identical.
>
> As I study more, I find "copy-pasted" code throughout Flink, from actors to
> machine learning to analyzer to code generation. A few files have
> attribution, but most of them do not.
>
> I thought Flink was more advanced. Why?
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit