You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/02/27 23:46:32 UTC

[GitHub] [iceberg] bkahloon opened a new pull request #2281: [SPARK] Add in support to read timestamp without timezone from parquet files

bkahloon opened a new pull request #2281:
URL: https://github.com/apache/iceberg/pull/2281


   Porting over code from Shardul's PR
   
   https://github.com/linkedin/iceberg/pull/48/files


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bkahloon commented on a change in pull request #2281: [SPARK] Add in support to read timestamp without timezone from parquet files

Posted by GitBox <gi...@apache.org>.
bkahloon commented on a change in pull request #2281:
URL: https://github.com/apache/iceberg/pull/2281#discussion_r584217617



##########
File path: spark3/src/main/java/org/apache/iceberg/spark/source/SparkBatchScan.java
##########
@@ -150,24 +169,39 @@ public PartitionReaderFactory createReaderFactory() {
                 .allMatch(fileScanTask -> fileScanTask.file().format().equals(
                     FileFormat.ORC)));
 
+    boolean hasNoRowFilters =
+        tasks().stream()
+            .allMatch(combinedScanTask -> !combinedScanTask.isDataTask() && combinedScanTask.files()
+                .stream()
+                .allMatch(fileScanTask -> OrcRowFilterUtils.rowFilterFromTask(fileScanTask) == null));

Review comment:
       @shardulm94 sorry yea, you're correct. I think I messed up while merging. I'll create new PR




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #2281: [SPARK] Add in support to read timestamp without timezone from parquet files

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #2281:
URL: https://github.com/apache/iceberg/pull/2281#discussion_r584210600



##########
File path: spark3/src/main/java/org/apache/iceberg/spark/source/SparkBatchScan.java
##########
@@ -150,24 +169,39 @@ public PartitionReaderFactory createReaderFactory() {
                 .allMatch(fileScanTask -> fileScanTask.file().format().equals(
                     FileFormat.ORC)));
 
+    boolean hasNoRowFilters =
+        tasks().stream()
+            .allMatch(combinedScanTask -> !combinedScanTask.isDataTask() && combinedScanTask.files()
+                .stream()
+                .allMatch(fileScanTask -> OrcRowFilterUtils.rowFilterFromTask(fileScanTask) == null));

Review comment:
       This code was also not touched by https://github.com/linkedin/iceberg/pull/48 and does not exist in apache/iceberg. Can you remove this?
   
   These are several other changes in the files which are linkedin specific and were not touched by https://github.com/linkedin/iceberg/pull/48

##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -136,7 +143,7 @@
     if (io.getValue() instanceof HadoopFileIO) {
       String fsscheme = "no_exist";
       try {
-        Configuration conf = SparkSession.active().sessionState().newHadoopConf();
+        Configuration conf = new Configuration(activeSparkSession().sessionState().newHadoopConf());

Review comment:
       Many changes in this file seems to be copied over from LinkedIn's fork which are not relevant to apache/iceberg. Can you remove these?
   
   The PR over linkedin/iceberg does not have these changes either. So not sure how those were copied over.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #2281: [SPARK] Add in support to read timestamp without timezone from parquet files

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #2281:
URL: https://github.com/apache/iceberg/pull/2281#discussion_r584210308



##########
File path: spark2/src/main/java/org/apache/iceberg/spark/source/Reader.java
##########
@@ -136,7 +143,7 @@
     if (io.getValue() instanceof HadoopFileIO) {
       String fsscheme = "no_exist";
       try {
-        Configuration conf = SparkSession.active().sessionState().newHadoopConf();
+        Configuration conf = new Configuration(activeSparkSession().sessionState().newHadoopConf());

Review comment:
       Many changes in this file seems to be copied over from LinkedIn's fork which are not relevant to apache/iceberg. Can you remove these?
   
   https://github.com/linkedin/iceberg/pull/48 does not have these changes either. So not sure how those were copied over.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bkahloon closed pull request #2281: [SPARK] Add in support to read timestamp without timezone from parquet files

Posted by GitBox <gi...@apache.org>.
bkahloon closed pull request #2281:
URL: https://github.com/apache/iceberg/pull/2281


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org