You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2022/01/20 00:47:15 UTC

[GitHub] [drill] cdmikechen opened a new pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

cdmikechen opened a new pull request #2431:
URL: https://github.com/apache/drill/pull/2431


   # [DRILL-8109](https://issues.apache.org/jira/browse/DRILL-8109): Drill can not read parquet timestamp type with logical type in hive storage plugin
   
   ## Description
   
   Parquet timestamp int96 had been deprecated.
   https://github.com/apache/parquet-format/pull/86/files
   
   Many computing engines that support parquet reading (such as Hive4, Hudi, Spark and so on) can use Int64 as the time storage type. When using these computing engines to read parquet files, we should also support long data type reading.
   
   We should support int64 with logical type timestamp by hive storage.
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the Drill documentation.)
   
   ## Testing
   (Please describe how this PR has been tested.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cdmikechen commented on pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
cdmikechen commented on pull request #2431:
URL: https://github.com/apache/drill/pull/2431#issuecomment-1024731093


   @cgivre
   Hi~ I have added a test case and update codes to newest. You can see whether the current test case is ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] luocooong merged pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
luocooong merged pull request #2431:
URL: https://github.com/apache/drill/pull/2431


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cdmikechen commented on pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
cdmikechen commented on pull request #2431:
URL: https://github.com/apache/drill/pull/2431#issuecomment-1017483944


   @vvysotskyi  
   Hi~ `org.apache.hadoop.io.LongWritable` is in `hadoop-common` package, and it is used from hive1. it can build with hive2 and hive3.
   You should be talking aboute `org.apache.hadoop.hive.serde2.io.TimestampWritableV2`, it is used from hive3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cdmikechen commented on a change in pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
cdmikechen commented on a change in pull request #2431:
URL: https://github.com/apache/drill/pull/2431#discussion_r794191769



##########
File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/writers/primitive/HiveTimestampWriter.java
##########
@@ -33,10 +34,14 @@ public HiveTimestampWriter(PrimitiveObjectInspector inspector, TimeStampWriter w
 
   @Override
   public void write(Object value) {
-    String timestampString = PrimitiveObjectInspectorUtils.getString(value, inspector);
-    long timestampMillis = new DateTime(Timestamp.valueOf(timestampString).getTime())
-        .withZoneRetainFields(DateTimeZone.UTC).getMillis();
-    writer.writeTimeStamp(timestampMillis);
+    if (value instanceof LongWritable) {
+      writer.writeTimeStamp(((LongWritable) value).get() / 1000);

Review comment:
       @vdiravka 
   Thank you for your suggestion ~




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cdmikechen commented on pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
cdmikechen commented on pull request #2431:
URL: https://github.com/apache/drill/pull/2431#issuecomment-1017487425


   @cgivre
   Ok~ I will add a test case later


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vvysotskyi commented on pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
vvysotskyi commented on pull request #2431:
URL: https://github.com/apache/drill/pull/2431#issuecomment-1017238018


   @cdmikechen, please make sure that this change doesn't break the support for Hive 2. If I recall correctly, `LongWritable` was added in Hive 3. Please make sure that after your changes Drill is able to build with the following extra properties:
   `-Dhive.version=2.3.2 -Dfreemarker.conf.file=src/main/codegen/config.fmpp`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] vdiravka commented on a change in pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
vdiravka commented on a change in pull request #2431:
URL: https://github.com/apache/drill/pull/2431#discussion_r788918641



##########
File path: contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/writers/primitive/HiveTimestampWriter.java
##########
@@ -33,10 +34,14 @@ public HiveTimestampWriter(PrimitiveObjectInspector inspector, TimeStampWriter w
 
   @Override
   public void write(Object value) {
-    String timestampString = PrimitiveObjectInspectorUtils.getString(value, inspector);
-    long timestampMillis = new DateTime(Timestamp.valueOf(timestampString).getTime())
-        .withZoneRetainFields(DateTimeZone.UTC).getMillis();
-    writer.writeTimeStamp(timestampMillis);
+    if (value instanceof LongWritable) {
+      writer.writeTimeStamp(((LongWritable) value).get() / 1000);

Review comment:
       ```suggestion
         writer.writeTimeStamp(((LongWritable) value).get() / DateTimeConstants.MILLIS_PER_SECOND);
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [drill] cdmikechen edited a comment on pull request #2431: DRILL-8109: Hive storage plugin support reading parquet timestamp type with int64 logical type

Posted by GitBox <gi...@apache.org>.
cdmikechen edited a comment on pull request #2431:
URL: https://github.com/apache/drill/pull/2431#issuecomment-1017483944


   @vvysotskyi  
   Hi~ `org.apache.hadoop.io.LongWritable` is in `hadoop-common` package, and it is used from hive1. it can build with hive2 and hive3.
   You should be talking about `org.apache.hadoop.hive.serde2.io.TimestampWritableV2`, it is used from hive3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@drill.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org