You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 23:43:55 UTC

[GitHub] [beam] kennknowles opened a new issue, #19318: Direct Runner should marshal data in a similar way to Dataflow runner

kennknowles opened a new issue, #19318:
URL: https://github.com/apache/beam/issues/19318

   I would test my pipeline using the direct runner, and it would happily run on a sample. When I ran it on the Dataflow runner, it'll run for a hour, then get to a stage that would crash like so:
    
   > java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness for instruction -224: execute failed: panic: reflect: Call using main.HistogramResult as type struct \{ Key string "json:\"key\""; Files []string "json:\"files\""; Histogram palette.ColorHistogram "json:\"histogram,omitempty\""; Palette []struct { R uint8; G uint8; B uint8; A uint8 } "json:\"palette\"" } goroutine 70 [running]:> 
   > This was because I forgot to register my HistogramResult type.
   > It would be useful if the direct runner tried to marshal and unmarshal all types, to help expose issues like this earlier.
   > Also, when running on Dataflow, the value of flags, and captured variables, would be the empty/default value. It would be good if direct also caused this behaviour. For example:
   > ```
   
   prefix := “X”
   s = s.Scope(“Prefix ” + prefix)
   c = beam.ParDo(s, func(value string) string {
   	return
   prefix + value
   }, c)
   
   ```
   
   
   Will work prefix "X" on the Direct runner, but will prefix "" on Dataflow. Subtle behaviour, but I suspect the direct runner could expose this if it marshalled and unmarshalled the func like the dataflow runner.
   
   Imported from Jira [BEAM-6372](https://issues.apache.org/jira/browse/BEAM-6372). Original Jira may contain additional context.
   Reported by: bramp.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org