You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Amit Sela (JIRA)" <ji...@apache.org> on 2016/09/20 19:06:21 UTC
[jira] [Created] (BEAM-649) Pipeline "actions" should use
foreachRDD via ParDo.
Amit Sela created BEAM-649:
------------------------------
Summary: Pipeline "actions" should use foreachRDD via ParDo.
Key: BEAM-649
URL: https://issues.apache.org/jira/browse/BEAM-649
Project: Beam
Issue Type: Bug
Components: runner-spark
Reporter: Amit Sela
Assignee: Amit Sela
Spark will execute a pipeline ONLY if it's triggered by an action (batch) / output operation (streaming) - http://spark.apache.org/docs/1.6.2/streaming-programming-guide.html#output-operations-on-dstreams.
Currently, such actions in Beam are mostly implemented via ParDo, and translated by the runner as a Map transformation (via mapPartitions).
The runner overcomes this by "forcing" actions on untranslated leaves.
While this is OK, it would be better in some cases, e.g., Sinks, to apply the same ParDo translation but with foreach/foreachRDD instead of foreachPartition/mapPartitions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)