You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Robin Trietsch (JIRA)" <ji...@apache.org> on 2018/04/18 14:43:00 UTC

[jira] [Updated] (BEAM-4114) Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()

     [ https://issues.apache.org/jira/browse/BEAM-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robin Trietsch updated BEAM-4114:
---------------------------------
    Description: 
When using the [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-], a checkNotNull() is done for the [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207] and [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].

However, it makes more sense to allow null values, since sometimes, if the key used for the join is not the same, you'd like to see that the value will become null. This should be decided by the developer, and not by the join library.

Looking at the source code, this is also supported by [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42] (it allows null values), which is used in Join.fullOuterJoin().

If required, I can create a pull request on GitHub.

  was:
When using the [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-], a checkNotNull() is done for the [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207] and [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].

However, it makes more sense to allow null values, since sometimes, if the key used for the join is not the same, you'd like to see that the value will become null. This should be decided by the developer, and not by the join library.

Looking at the source code, this is also supported by [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42] (it allows null values), which is used in Join.fullOuterJoin().


> Allow null as leftNullValue/rightNullValue in Join.fullOuterJoin()
> ------------------------------------------------------------------
>
>                 Key: BEAM-4114
>                 URL: https://issues.apache.org/jira/browse/BEAM-4114
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>    Affects Versions: 2.4.0
>            Reporter: Robin Trietsch
>            Assignee: Kenneth Knowles
>            Priority: Major
>
> When using the [Join.fullOuterJoin()|https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/extensions/joinlibrary/Join.html#fullOuterJoin-org.apache.beam.sdk.values.PCollection-org.apache.beam.sdk.values.PCollection-V1-V2-], a checkNotNull() is done for the [leftNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L207] and [rightNullValue|https://github.com/apache/beam/blob/master/sdks/java/extensions/join-library/src/main/java/org/apache/beam/sdk/extensions/joinlibrary/Join.java#L208].
> However, it makes more sense to allow null values, since sometimes, if the key used for the join is not the same, you'd like to see that the value will become null. This should be decided by the developer, and not by the join library.
> Looking at the source code, this is also supported by [KV.of()|https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/values/KV.java#L42] (it allows null values), which is used in Join.fullOuterJoin().
> If required, I can create a pull request on GitHub.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)