You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Robert Burke (JIRA)" <ji...@apache.org> on 2019/01/07 17:18:00 UTC

[jira] [Assigned] (BEAM-6372) Direct Runner should marshal data in a similar way to Dataflow runner

     [ https://issues.apache.org/jira/browse/BEAM-6372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Burke reassigned BEAM-6372:
----------------------------------

    Assignee:     (was: Robert Burke)

> Direct Runner should marshal data in a similar way to Dataflow runner
> ---------------------------------------------------------------------
>
>                 Key: BEAM-6372
>                 URL: https://issues.apache.org/jira/browse/BEAM-6372
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-direct, sdk-go
>            Reporter: Andrew Brampton
>            Priority: Major
>
> I would test my pipeline using the direct runner, and it would happily run on a sample. When I ran it on the Dataflow runner, it'll run for a hour, then get to a stage that would crash like so:
>  
> {quote}java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -224: execute failed: panic: reflect: Call using main.HistogramResult as type struct \{ Key string "json:\"key\""; Files []string "json:\"files\""; Histogram palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 [running]:{quote}
> This was because I forgot to register my HistogramResult type.
> It would be useful if the direct runner tried to marshal and unmarshal all types, to help expose issues like this earlier.
> Also, when running on Dataflow, the value of flags, and captured variables, would be the empty/default value. It would be good if direct also caused this behaviour. For example:
> {code}
> prefix := “X”
> s = s.Scope(“Prefix ” + prefix)
> c = beam.ParDo(s, func(value string) string {
> 	return prefix + value
> }, c)
> {code}
> Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. Subtle behaviour, but I suspect the direct runner could expose this if it marshalled and unmarshalled the func like the dataflow runner.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)