You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Tobias Feldhaus (JIRA)" <ji...@apache.org> on 2017/04/18 14:54:41 UTC

[jira] [Created] (BEAM-1997) Scaling Problem of Beam (size of the serialized JSON representation of the pipeline exceeds the allowable limit)

Tobias Feldhaus created BEAM-1997:
-------------------------------------

             Summary: Scaling Problem of Beam (size of the serialized JSON representation of the pipeline exceeds the allowable limit)
                 Key: BEAM-1997
                 URL: https://issues.apache.org/jira/browse/BEAM-1997
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
    Affects Versions: 0.6.0
            Reporter: Tobias Feldhaus
            Assignee: Daniel Halperin


After switching from Dataflow SDK 1.9 to Apache Beam SDK 0.6 my pipeline does no longer let run it with 180 output days (BigQuery partitions as sinks), but only 60 output days. If using a larger number with Beam the response from the Cloud  Dataflow service reads as follows:

<code>
Failed to create a workflow job: The size of the serialized JSON representation of the pipeline exceeds the allowable limit. For more information, please check the FAQ link below:
<code>

This is the pipeline in dataflow: https://gist.github.com/james-woods/f84b6784ee6d1b87b617f80f8c7dd59f
The resulting graph in Dataflow looks like this: 
https://puu.sh/vhWAW/a12f3246a1.png

This is the same pipeline in beam: https://gist.github.com/james-woods/c4565db769bffff0494e0bef5e9c334c
The constructed graph looks somewhat different:
https://puu.sh/vhWvm/78a40d422d.png

Methods used are taken from this example https://gist.github.com/dhalperi/4bbd13021dd5f9998250cff99b155db6



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)