You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Balázs Németh <ba...@aliz.ai> on 2022/09/09 17:32:09 UTC

Re: Incomplete Beam Schema -> Avro Schema conversion

Is it still better to have an asymmetric conversion that supports more data
types than not having these implemented, right? This contribution seems
simple enough, but that's definitely not true for the other direction (...
and I'm also biased, I only need Beam->Avro).

Brian Hulette via dev <de...@beam.apache.org> ezt írta (időpont: 2022. aug.
23., K, 1:53):

> I don't think there's a reason for this, it's just that these logical
> types were defined after the Avro <-> Beam schema conversion. I think it
> would be worthwhile to add support for them, but we'd also need to look at
> the reverse (avro to beam) direction, which would map back to the catch-all
> DATETIME primitive type [1]. Changing that could break backwards
> compatibility.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L771-L776
>
> On Wed, Aug 17, 2022 at 2:53 PM Balázs Németh <ba...@aliz.ai>
> wrote:
>
>> java.lang.RuntimeException: Unhandled logical type
>> beam:logical_type:date:v1
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.getFieldSchema(AvroUtils.java:943)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroField(AvroUtils.java:306)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java:341)
>>   at
>> org.apache.beam.sdk.schemas.utils.AvroUtils.toAvroSchema(AvroUtils.java
>>
>> In
>> https://github.com/apache/beam/blob/7bb755906c350d77ba175e1bd990096fbeaf8e44/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java#L902-L944
>> it seems to me there are some missing options.
>>
>> For example
>> - FixedBytes.IDENTIFIER,
>> - EnumerationType.IDENTIFIER,
>> - OneOfType.IDENTIFIER
>> is there, but:
>> - org.apache.beam.sdk.schemas.logicaltypes.Date.IDENTIFIER
>> ("beam:logical_type:date:v1")
>> - org.apache.beam.sdk.schemas.logicaltypes.DateTime.IDENTIFIER
>> ("beam:logical_type:datetime:v1")
>> - org.apache.beam.sdk.schemas.logicaltypes.Time.IDENTIFIER
>> ("beam:logical_type:time:v1")
>> is missing.
>>
>> This in an example that fails:
>>
>>> import java.time.LocalDate;
>>> import org.apache.beam.sdk.io.gcp.bigquery.BigQueryUtils;
>>> import org.apache.beam.sdk.schemas.Schema;
>>> import org.apache.beam.sdk.schemas.Schema.FieldType;
>>> import org.apache.beam.sdk.schemas.logicaltypes.SqlTypes;
>>> import org.apache.beam.sdk.schemas.utils.AvroUtils;
>>> import org.apache.beam.sdk.values.Row;
>>
>> // ...
>>
>>         final Schema schema =
>>>                 Schema.builder()
>>>                         .addField("ymd",
>>> FieldType.logicalType(SqlTypes.DATE))
>>>                         .build();
>>>
>>>         final Row row =
>>>                 Row.withSchema(schema)
>>>                         .withFieldValue("ymd", LocalDate.now())
>>>                         .build();
>>>
>>>         System.out.println(BigQueryUtils.toTableSchema(schema)); // works
>>>         System.out.println(BigQueryUtils.toTableRow(row)); // works
>>>
>>>         System.out.println(AvroUtils.toAvroSchema(schema)); // fails
>>>         System.out.println(AvroUtils.toGenericRecord(row)); // fails
>>
>>
>> Am I missing a reason for that or is it just not done properly yet? If
>> this is the case, am I right to assume that they should be represented in
>> the Avro format as the already existing cases?
>> "beam:logical_type:date:v1" vs "DATE"
>> "beam:logical_type:time:v1" vs "TIME"
>>
>>
>>