You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Aaron Dixon (Jira)" <ji...@apache.org> on 2020/01/20 16:35:00 UTC

[jira] [Comment Edited] (BEAM-9144) Beam's own Avro TimeConversion class in beam-sdk-java-core

    [ https://issues.apache.org/jira/browse/BEAM-9144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17019629#comment-17019629 ] 

Aaron Dixon edited comment on BEAM-9144 at 1/20/20 4:34 PM:
------------------------------------------------------------

Attempting to validate with 2.20.0-SNAPSHOT [1]

My pipeline however is unable to start in Dataflow, I get:

{noformat}
java.lang.ClassNotFoundException: org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Message$Builder
{noformat}

When I inspect my build artifacts I find that when I depend on Beam 2.16.0 (last Beam build that works for me), I find this class present (v1p21p0). When I build against Beam 2.20.0-SNAPSHOT, only v1p26p0 is resolved and packaged.

* I assume that the Dataflow runner needs to somehow know to look for v1p26p0 somehow?  I.e., do I need to send it a flag that I'm suing 2.20.0-SNAPSHOT so that it can know to try to resolve the later versions of the rpc message (?) classes?
* --or-- Is this a bug in 2.20.0-SNAPSHOT?


===
[1] Complete list of 2.20.0-SNAPSHOT dependencies resolved by my build:


{noformat}
.../beam-sdks-java-io-kafka/2.20.0-SNAPSHOT/beam-sdks-java-io-kafka-2.20.0-20200120.072626-5.pom
.../beam-sdks-java-core/2.20.0-SNAPSHOT/beam-sdks-java-core-2.20.0-20200120.071349-5.pom
.../beam-model-pipeline/2.20.0-SNAPSHOT/beam-model-pipeline-2.20.0-20200120.070348-5.pom
.../beam-model-job-management/2.20.0-SNAPSHOT/beam-model-job-management-2.20.0-20200120.070335-5.pom
.../beam-runners-google-cloud-dataflow-java/2.20.0-SNAPSHOT/beam-runners-google-cloud-dataflow-java-2.20.0-20200120.070606-5.pom
.../beam-sdks-java-extensions-google-cloud-platform-core/2.20.0-SNAPSHOT/beam-sdks-java-extensions-google-cloud-platform-core-2.20.0-20200120.072059-5.pom
.../beam-sdks-java-io-google-cloud-platform/2.20.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.20.0-20200120.072441-5.pom
.../beam-sdks-java-extensions-protobuf/2.20.0-SNAPSHOT/beam-sdks-java-extensions-protobuf-2.20.0-20200120.072149-5.pom
.../beam-runners-core-construction-java/2.20.0-SNAPSHOT/beam-runners-core-construction-java-2.20.0-20200120.070440-5.pom
.../beam-runners-direct-java/2.20.0-SNAPSHOT/beam-runners-direct-java-2.20.0-20200120.070532-5.pom
.../beam-sdks-java-core/2.20.0-SNAPSHOT/beam-sdks-java-core-2.20.0-20200120.071349-5.jar
.../beam-model-job-management/2.20.0-SNAPSHOT/beam-model-job-management-2.20.0-20200120.070335-5.jar
.../beam-model-pipeline/2.20.0-SNAPSHOT/beam-model-pipeline-2.20.0-20200120.070348-5.jar
.../beam-sdks-java-io-kafka/2.20.0-SNAPSHOT/beam-sdks-java-io-kafka-2.20.0-20200120.072626-5.jar
.../beam-runners-google-cloud-dataflow-java/2.20.0-SNAPSHOT/beam-runners-google-cloud-dataflow-java-2.20.0-20200120.070606-5.jar
.../beam-sdks-java-extensions-google-cloud-platform-core/2.20.0-SNAPSHOT/beam-sdks-java-extensions-google-cloud-platform-core-2.20.0-20200120.072059-5.jar
.../beam-sdks-java-io-google-cloud-platform/2.20.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.20.0-20200120.072441-5.jar
.../beam-sdks-java-extensions-protobuf/2.20.0-SNAPSHOT/beam-sdks-java-extensions-protobuf-2.20.0-20200120.072149-5.jar
.../beam-runners-core-construction-java/2.20.0-SNAPSHOT/beam-runners-core-construction-java-2.20.0-20200120.070440-5.jar
.../beam-runners-direct-java/2.20.0-SNAPSHOT/beam-runners-direct-java-2.20.0-20200120.070532-5.jar
{noformat}


was (Author: atdixon):
Attempting to validate with 2.20.0-SNAPSHOT [1]

My pipeline however is unable to start in Dataflow, I get:
`java.lang.ClassNotFoundException: org.apache.beam.vendor.grpc.v1p21p0.com.google.protobuf.Message$Builder`

When I inspect my build artifacts I find that when I depend on Beam 2.16.0 (last Beam build that works for me), I find this class present (v1p21p0). When I build against Beam 2.20.0-SNAPSHOT, only v1p26p0 is resolved and packaged.

* I assume that the Dataflow runner needs to somehow know to look for v1p26p0 somehow?  I.e., do I need to send it a flag that I'm suing 2.20.0-SNAPSHOT so that it can know to try to resolve the later versions of the rpc message (?) classes?
* --or-- Is this a bug in 2.20.0-SNAPSHOT?


===
[1] Complete list of 2.20.0-SNAPSHOT dependencies resolved by my build:
```
.../beam-sdks-java-io-kafka/2.20.0-SNAPSHOT/beam-sdks-java-io-kafka-2.20.0-20200120.072626-5.pom
.../beam-sdks-java-core/2.20.0-SNAPSHOT/beam-sdks-java-core-2.20.0-20200120.071349-5.pom
.../beam-model-pipeline/2.20.0-SNAPSHOT/beam-model-pipeline-2.20.0-20200120.070348-5.pom
.../beam-model-job-management/2.20.0-SNAPSHOT/beam-model-job-management-2.20.0-20200120.070335-5.pom
.../beam-runners-google-cloud-dataflow-java/2.20.0-SNAPSHOT/beam-runners-google-cloud-dataflow-java-2.20.0-20200120.070606-5.pom
.../beam-sdks-java-extensions-google-cloud-platform-core/2.20.0-SNAPSHOT/beam-sdks-java-extensions-google-cloud-platform-core-2.20.0-20200120.072059-5.pom
.../beam-sdks-java-io-google-cloud-platform/2.20.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.20.0-20200120.072441-5.pom
.../beam-sdks-java-extensions-protobuf/2.20.0-SNAPSHOT/beam-sdks-java-extensions-protobuf-2.20.0-20200120.072149-5.pom
.../beam-runners-core-construction-java/2.20.0-SNAPSHOT/beam-runners-core-construction-java-2.20.0-20200120.070440-5.pom
.../beam-runners-direct-java/2.20.0-SNAPSHOT/beam-runners-direct-java-2.20.0-20200120.070532-5.pom
.../beam-sdks-java-core/2.20.0-SNAPSHOT/beam-sdks-java-core-2.20.0-20200120.071349-5.jar
.../beam-model-job-management/2.20.0-SNAPSHOT/beam-model-job-management-2.20.0-20200120.070335-5.jar
.../beam-model-pipeline/2.20.0-SNAPSHOT/beam-model-pipeline-2.20.0-20200120.070348-5.jar
.../beam-sdks-java-io-kafka/2.20.0-SNAPSHOT/beam-sdks-java-io-kafka-2.20.0-20200120.072626-5.jar
.../beam-runners-google-cloud-dataflow-java/2.20.0-SNAPSHOT/beam-runners-google-cloud-dataflow-java-2.20.0-20200120.070606-5.jar
.../beam-sdks-java-extensions-google-cloud-platform-core/2.20.0-SNAPSHOT/beam-sdks-java-extensions-google-cloud-platform-core-2.20.0-20200120.072059-5.jar
.../beam-sdks-java-io-google-cloud-platform/2.20.0-SNAPSHOT/beam-sdks-java-io-google-cloud-platform-2.20.0-20200120.072441-5.jar
.../beam-sdks-java-extensions-protobuf/2.20.0-SNAPSHOT/beam-sdks-java-extensions-protobuf-2.20.0-20200120.072149-5.jar
.../beam-runners-core-construction-java/2.20.0-SNAPSHOT/beam-runners-core-construction-java-2.20.0-20200120.070440-5.jar
.../beam-runners-direct-java/2.20.0-SNAPSHOT/beam-runners-direct-java-2.20.0-20200120.070532-5.jar
```

> Beam's own Avro TimeConversion class in beam-sdk-java-core 
> -----------------------------------------------------------
>
>                 Key: BEAM-9144
>                 URL: https://issues.apache.org/jira/browse/BEAM-9144
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Tomo Suzuki
>            Assignee: Tomo Suzuki
>            Priority: Major
>             Fix For: 2.19.0
>
>         Attachments: avro-beam-dependency-graph.png
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> From Aaron's comment in https://issues.apache.org/jira/browse/BEAM-8388?focusedCommentId=17016476&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17016476 .
> {quote}My org must use Avro 1.9.x (due to some Avro schema resolution issues resolved in 1.9.x) so downgrading Avro is not possible for us.
>  Beam 2.16.0 is compatible with our usage of Avro 1.9.x – but upgrading to 2.17.0 we are broken as 2.17.0 links to Java classes in Avro 1.8.x that are not available in 1.9.x.
> {quote}
> The Java class is {{org.apache.avro.data.TimeConversions.TimestampConversion}} in Avro 1.8.
>  It's renamed to {{org.apache.avro.data.JodaTimeConversions}} in Avro 1.9.
> h1. Beam Java SDK cannot upgrade Avro to 1.9
> Beam has Spark runners and Spark has not yet upgraded to Avro 1.9.
> Illustration of the dependency
> !avro-beam-dependency-graph.png|width=799,height=385!
> h1. Short-term Solution
> As illustrated above, as long as Beam Java SDK uses only the intersection of Avro classes, method, and fields between Avro 1.8 and 1.9, it will provide flexibility in runtime Avro versions (as it did until Beam 2.16).
> h2. Difference of the TimeConversion Classes
> Avro 1.9's TimestampConversion overrides {{getRecommendedSchema}} method. Details below:
> Avro 1.8's TimeConversions.TimestampConversion:
> {code:java}
>   public static class TimestampConversion extends Conversion<DateTime> {
>     @Override
>     public Class<DateTime> getConvertedType() {
>       return DateTime.class;
>     }
>     @Override
>     public String getLogicalTypeName() {
>       return "timestamp-millis";
>     }
>     @Override
>     public DateTime fromLong(Long millisFromEpoch, Schema schema, LogicalType type) {
>       return new DateTime(millisFromEpoch, DateTimeZone.UTC);
>     }
>     @Override
>     public Long toLong(DateTime timestamp, Schema schema, LogicalType type) {
>       return timestamp.getMillis();
>     }
>   }
> {code}
> Avro 1.9's JodaTimeConversions.TimestampConversion:
> {code:java}
>   public static class TimestampConversion extends Conversion<DateTime> {
>     @Override
>     public Class<DateTime> getConvertedType() {
>       return DateTime.class;
>     }
>     @Override
>     public String getLogicalTypeName() {
>       return "timestamp-millis";
>     }
>     @Override
>     public DateTime fromLong(Long millisFromEpoch, Schema schema, LogicalType type) {
>       return new DateTime(millisFromEpoch, DateTimeZone.UTC);
>     }
>     @Override
>     public Long toLong(DateTime timestamp, Schema schema, LogicalType type) {
>       return timestamp.getMillis();
>     }
>     @Override
>     public Schema getRecommendedSchema() {
>       return LogicalTypes.timestampMillis().addToSchema(Schema.create(Schema.Type.LONG));
>     }
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)