You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@beam.apache.org by "Luke Cwik (JIRA)" <ji...@apache.org> on 2017/11/20 17:06:00 UTC

[jira] [Comment Edited] (BEAM-3227) Consider sharing Udf/SkdFunctionSpec records via pointer

    [ https://issues.apache.org/jira/browse/BEAM-3227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16259488#comment-16259488 ] 

Luke Cwik edited comment on BEAM-3227 at 11/20/17 5:05 PM:
-----------------------------------------------------------

That makes a lot of sense, is that going to be typical of all composite PTransforms with UDFs?

I could see composite PTransforms defining multiple UDFs with multiple environments.


was (Author: lcwik):
That makes a lot of sense.

> Consider sharing Udf/SkdFunctionSpec records via pointer
> --------------------------------------------------------
>
>                 Key: BEAM-3227
>                 URL: https://issues.apache.org/jira/browse/BEAM-3227
>             Project: Beam
>          Issue Type: Sub-task
>          Components: beam-model
>            Reporter: Kenneth Knowles
>
> Coders are stored by pointer, because they are often repeated and a common source of huge pipeline descriptions.
> We considered doing the same for all UDFs but decided not to, based on the logic that they are not as often identical and will rarely implement the equals() needed to actually share encoded versions.
> However, in the presence of generated code, it is very likely that DoFns and CombineFns are repeated, and also much more likely that they have meaningful equals(), so there could be size savings.
> None of this is terribly important for storage or transmission, but has more to do with arbitrary and small size limits that occur in some API frameworks or database column types.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)