You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2022/07/06 17:40:29 UTC

[GitHub] [hive] soumyakanti3578 opened a new pull request, #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

soumyakanti3578 opened a new pull request, #3418:
URL: https://github.com/apache/hive/pull/3418

   
   ### What changes were proposed in this pull request?
   `isPrimitive` returns `true` for Timestamp
   
   
   ### Why are the changes needed?
   `isPrimitive` was returning `false` and because of that `Timestamp` type was not getting converted to `LazyTimestamp`
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   `mvn test -Dtest=TestHBaseCliDriver -Dtest.output.overwrite=true -Dqfile=hbase_avro_nested_timestamp.q`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a diff in pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

Posted by GitBox <gi...@apache.org>.
zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915124542


##########
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##########
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+#### A masked pattern was here ####
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+#### A masked pattern was here ####
+1970-01-19 20:16:19.2

Review Comment:
   The result does not appear to be correct. I was expecting to see something like `2022-07-05 00:00:00`. 
   
   Are we doing something wrong when inserting the data? Are we doing something wrong while fetching the data?



##########
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##########
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
     }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+    String dataDir = System.getProperty("test.data.dir");
+    Schema schema = new Schema.Parser().parse(new File(dataDir+"/nested_ts.avsc"));

Review Comment:
   This path concatenation may not be portable to different OS.



##########
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##########
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
     }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+    String dataDir = System.getProperty("test.data.dir");
+    Schema schema = new Schema.Parser().parse(new File(dataDir+"/nested_ts.avsc"));

Review Comment:
   Make method static.



##########
itests/util/src/main/java/org/apache/hadoop/hive/hbase/HBaseTestSetup.java:
##########
@@ -158,6 +170,41 @@ private void createHBaseTable() throws IOException {
     }
   }
 
+  private byte[] createAvroRecordWithNestedTimestamp() throws IOException {
+    String dataDir = System.getProperty("test.data.dir");
+    Schema schema = new Schema.Parser().parse(new File(dataDir+"/nested_ts.avsc"));
+    GenericData.Record rootRecord = new GenericData.Record(schema);
+    rootRecord.put("id", "X338092");
+    GenericData.Record dateRecord = new GenericData.Record(schema.getField("dischargedate").schema());
+    dateRecord.put("value", LocalDate.of(2022,7,5).atStartOfDay().toEpochSecond(ZoneOffset.UTC));
+    rootRecord.put("dischargedate", dateRecord);
+
+    try (ByteArrayOutputStream out = new ByteArrayOutputStream()) {
+      try (DataFileWriter<GenericRecord> dataFileWriter
+             = new DataFileWriter<GenericRecord>(new GenericDatumWriter<>(schema))) {
+        dataFileWriter.create(schema, out);
+        dataFileWriter.append(rootRecord);
+      }
+      return out.toByteArray();
+    }
+  }
+
+  private void createAvroTable() throws IOException {
+    final String HBASE_TABLE_NAME = "HiveAvroTable";
+    HTableDescriptor htableDesc = new HTableDescriptor(TableName.valueOf(HBASE_TABLE_NAME));
+    htableDesc.addFamily(new HColumnDescriptor("data".getBytes()));
+
+    try (Admin hbaseAdmin = hbaseConn.getAdmin()) {
+      hbaseAdmin.createTable(htableDesc);
+      try (Table table = hbaseConn.getTable(TableName.valueOf(HBASE_TABLE_NAME))) {

Review Comment:
   Extract `TableName.valueOf("HiveAvroTable")` to local variable and replace occurrences.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

Posted by GitBox <gi...@apache.org>.
zabetak commented on PR #3418:
URL: https://github.com/apache/hive/pull/3418#issuecomment-1177583388

   Hive has been always converting data from local time zone to UTC when writing and from UTC to local time zone when reading. I updated the way the the timestamp is stored in HBase (https://github.com/apache/hive/pull/3418/commits/fc9bc94be427a02485b089c2aeb6b494644beb05) to make it coherent with the way it is read by the query. 
   
   There are properties and Avro file metadata which can control if we want to perform the conversion or not (e.g., `hive.avro.timestamp.skip.conversion`) but these are not working at the moment for HBase (and basically anything that relies on `AvroLazyObjectInspector`). This is a bug that should be fixed but it is out of the scope of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak closed pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

Posted by GitBox <gi...@apache.org>.
zabetak closed pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp
URL: https://github.com/apache/hive/pull/3418


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] soumyakanti3578 commented on a diff in pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

Posted by GitBox <gi...@apache.org>.
soumyakanti3578 commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915150677


##########
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##########
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+#### A masked pattern was here ####
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+#### A masked pattern was here ####
+1970-01-19 20:16:19.2

Review Comment:
   `2022-07-05 00:00:00` = 1657004400
   `1970-01-19 20:16:19.2` = 1657004
   So somewhere we are clipping the last three digits. I will check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a diff in pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

Posted by GitBox <gi...@apache.org>.
zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915873512


##########
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##########
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+#### A masked pattern was here ####
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+#### A masked pattern was here ####
+1970-01-19 20:16:19.2

Review Comment:
   Copying here the offline follow-up by Soumyakanti:
   
   It's because `"logicalType": "timestamp-millis"` is defined in the avsc.
   
   I had to make this change 
   ```java
   dateRecord.put("value", LocalDate.of(2022,7,5).atStartOfDay().atZone(ZoneOffset.UTC).toInstant().toEpochMilli());
   ```
   
   However, right now the result I am getting for this is: 2022-07-04 17:00:00



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] zabetak commented on a diff in pull request #3418: HIVE-26373: ClassCastException while inserting Avro data into Hbase table for nested struct with Timestamp

Posted by GitBox <gi...@apache.org>.
zabetak commented on code in PR #3418:
URL: https://github.com/apache/hive/pull/3418#discussion_r915874809


##########
hbase-handler/src/test/results/positive/hbase_avro_nested_timestamp.q.out:
##########
@@ -0,0 +1,45 @@
+PREHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+PREHOOK: type: CREATETABLE
+PREHOOK: Output: database:default
+PREHOOK: Output: default@tbl
+POSTHOOK: query: CREATE EXTERNAL TABLE tbl(
+`key` string COMMENT '',
+`data_frv4` struct<`id`:string, `dischargedate`:struct<`value`:timestamp>>)
+ROW FORMAT SERDE
+  'org.apache.hadoop.hive.hbase.HBaseSerDe'
+STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+WITH SERDEPROPERTIES (
+'serialization.format'='1',
+'hbase.columns.mapping' = ':key,data:frV4',
+'data.frV4.serialization.type'='avro',
+#### A masked pattern was here ####
+)
+TBLPROPERTIES (
+'hbase.table.name' = 'HiveAvroTable',
+'hbase.struct.autogenerate'='true')
+POSTHOOK: type: CREATETABLE
+POSTHOOK: Output: database:default
+POSTHOOK: Output: default@tbl
+PREHOOK: query: select data_frV4.dischargedate.value from tbl
+PREHOOK: type: QUERY
+PREHOOK: Input: default@tbl
+#### A masked pattern was here ####
+POSTHOOK: query: select data_frV4.dischargedate.value from tbl
+POSTHOOK: type: QUERY
+POSTHOOK: Input: default@tbl
+#### A masked pattern was here ####
+1970-01-19 20:16:19.2

Review Comment:
   The resolution of this is discussed here: https://github.com/apache/hive/pull/3418#issuecomment-1177583388



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org