You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/06/03 08:45:15 UTC

[GitHub] [beam] RyanSkraba commented on a change in pull request #14858: [BEAM-12385] Handle VARCHAR and Date-time JDBC specific logical types in AvroUtils.

RyanSkraba commented on a change in pull request #14858:
URL: https://github.com/apache/beam/pull/14858#discussion_r644606333



##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/utils/AvroUtils.java
##########
@@ -906,6 +906,23 @@ private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundE
                         .map(x -> getFieldSchema(x.getType(), x.getName(), namespace))
                         .collect(Collectors.toList()));
             break;
+
+          case "NVARCHAR":
+          case "VARCHAR":
+          case "LONGNVARCHAR":
+          case "LONGVARCHAR":
+            baseType = org.apache.avro.Schema.create(Type.STRING);

Review comment:
       Super interesting conversation from 2017!  This could go either way -- the `"char"` and `"varchar"` logical types that Hive adds are not part of the Avro specification and should be ignored by implementations that don't understand them.  A developer could implement their own "custom" logical types that understand them and do any necessary truncation, but AFAIK, nobody does.
   
   I have a small preference for aligning with the de facto behaviour of Spark and Hive, since this information might be useful as data is sent downstream, and it's easy to ignore.  But I'd be OK either way, it could be added in a later PR as a new feature.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org