You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Mitar (JIRA)" <ji...@apache.org> on 2017/04/20 18:20:04 UTC

[jira] [Comment Edited] (BEAM-2026) High performance direct runner

    [ https://issues.apache.org/jira/browse/BEAM-2026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15977150#comment-15977150 ] 

Mitar edited comment on BEAM-2026 at 4/20/17 6:19 PM:
------------------------------------------------------

I have not yet done any benchmark, but I would suspect having any extra layer in between would make it slower, no?

To me one issue is that Spark adds the whole JVM into the mix. But I see that current implementation of Beam direct runner is also based on JVM.

For me personally it is more about how hard is to start using any of these distributed technologies. The appeal of Beam to me is that I can for now learn the programming model and start developing in it, and then later on, if needed, I can scale it by changing the execution runner, and also at that time learn about all the details how t deploy Spark or Flink and so on. Probably for somebody who knows how to run and use Spark or Flink it does not matter. But not everyone does.

In some way I would just prefer to start with programming in Python, but in Beam programming model, using Python runner. And then if needed scale it.


was (Author: mitar):
I have not yet done any benchmark, but I would suspect having any extra layer in between would make it slower, no?

To me one issue is that Spark adds the whole JVM into the mix. But I see that current implementation of Beam direct runner is also based on JVM.

For me personally it is more about how hard is to start using any of this distributed technologies. The appeal of Beam to me is that I can for now learn the programming model and start developing in it, and then later on, if needed, I can scale it by changing the execution runner, and also at that time learn about all the details how t deploy Spark or Flink and so on. Probably for somebody who knows how to run and use Spark or Flink it does not matter. But not everyone does.

In some way I would just prefer to start with programming in Python, but in Beam programming model, using Python runner. And then if needed scale it.

> High performance direct runner
> ------------------------------
>
>                 Key: BEAM-2026
>                 URL: https://issues.apache.org/jira/browse/BEAM-2026
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-direct
>            Reporter: Mitar
>            Assignee: Mitar
>
> In documentation (https://beam.apache.org/documentation/runners/direct/) it is written that direct runner does not try to run efficiently, but it serves mostly for development and debugging.
> I would suggest that there should be also an efficient direct runner. If Beam tries to be an unified programming model, for some smaller tasks I would love to implement them in Beam, just to keep the code in the same model, but it would be OK to run it as a normal smaller program (maybe inside one Docker container), without any distribution across multiple machines. In the future, if usage grows, I could then replace underlying runner with something distributed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)