You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2022/03/20 17:26:00 UTC

[jira] [Updated] (BEAM-10934) handling Date type in HCatToRow

     [ https://issues.apache.org/jira/browse/BEAM-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Beam JIRA Bot updated BEAM-10934:
---------------------------------
    Labels: Clarified stale-P2 starter  (was: Clarified starter)

> handling Date type in HCatToRow
> -------------------------------
>
>                 Key: BEAM-10934
>                 URL: https://issues.apache.org/jira/browse/BEAM-10934
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-hcatalog, sdk-java-core
>            Reporter: chie hayashida
>            Priority: P2
>              Labels: Clarified, stale-P2, starter
>
> When I convert HCatRecord include Date type record to Row, it failed with the following errors.
> * the code
> ```
>     PCollection<Row> p =
>         pipeline
>             /*
>              * Step #1: Read hive table rows from Hive.
>              */
>             .apply(
>                 "Read from Hive source",
>                     HCatToRow.fromSpec(
>                             HCatalogIO.read()
>                                     .withConfigProperties(configProperties)
>                                     .withDatabase(options.getHiveDatabaseName())
>                                     .withTable(options.getHiveTableName())
>                                     .withFilter(options.getFilterString())));
> ```
> * error log
> ```
> org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.IllegalArgumentException: For field name submissiondate and DATETIME type got unexpected class class java.sql.Date
>         at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:348)
>         at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:318)
>         at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:213)
>         at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
>         at org.apache.beam.sdk.Pipeline.run(Pipeline.java:317)
>         at org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>         at com.google.cloud.teleport.v2.templates.HiveToBigQuery.run(HiveToBigQuery.java:234)
>         at com.google.cloud.teleport.v2.templates.HiveToBigQuery.main(HiveToBigQuery.java:176)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalArgumentException: For field name submissiondate and DATETIME type got unexpected class class java.sql.Date
>         at org.apache.beam.sdk.values.Row$Builder.verifyDateTime(Row.java:828)
>         at org.apache.beam.sdk.values.Row$Builder.verifyPrimitiveType(Row.java:755)
>         at org.apache.beam.sdk.values.Row$Builder.verify(Row.java:654)
>         at org.apache.beam.sdk.values.Row$Builder.verify(Row.java:635)
>         at org.apache.beam.sdk.values.Row$Builder.build(Row.java:840)
>         at org.apache.beam.sdk.io.hcatalog.HCatToRow$HCatToRowFn.processElement(HCatToRow.java:84)
> ```
> It occurs because HCatalogIO reads Date type as java.sql.Date in HCatRecord, but Row class doesn't support Date and HCatToRow doesn't care about it.
> I think there are two solution about it.
> 1. Row type supports Date type(java.util.Date or java.sql.Date)
>    I don't know another IO classes enough, but there may be another IO classes which has same problem, and this solution may be able to solve those problem.
> 2. Add logic to convert Date type to Datetime type in HCatToRow
> The impact of change will be smaller then 1. because it doesn't change Row class.
> Which would be better?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)