You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/01/13 16:31:52 UTC

[GitHub] [hive] pvargacl opened a new pull request #1866: HIVE-14165: Remove unnecessary file listing from FetchOperator

pvargacl opened a new pull request #1866:
URL: https://github.com/apache/hive/pull/1866


   
   ### What changes were proposed in this pull request?
   Remove unnecessary file listing from Fetchoperator, rather handle FileNotFoundException, to make it more performant on s3.
   Rebased the original patch from Sahil Takiar.
   
   ### Why are the changes needed?
   Performance improvement
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Current unit tests. Manually test: deleted some directories during execution to cause FileNotFoundEx.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvargacl commented on a change in pull request #1866: HIVE-14165: Remove unnecessary file listing from FetchOperator

Posted by GitBox <gi...@apache.org>.
pvargacl commented on a change in pull request #1866:
URL: https://github.com/apache/hive/pull/1866#discussion_r559448989



##########
File path: ql/src/test/queries/clientpositive/exim_04_evolved_parts.q
##########
@@ -15,7 +15,7 @@ alter table exim_employee_n12 clustered by (emp_sex, emp_dept) sorted by (emp_id
 alter table exim_employee_n12 add partition (emp_country='in', emp_state='tn');
 
 alter table exim_employee_n12 set fileformat 
-	inputformat  "org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat" 
+	inputformat  "org.apache.hadoop.hive.ql.io.RCFileInputFormat"

Review comment:
       BucketizedHiveInputFormat can not be used this way, it is an internal inputformat used by mapjoins and it requires someone to setup either has.map.work or has.reduce.work property.
   This test was passing because the partitions in this table are empty and the InputFormat.getSplits were not called for them in the FetchOperator. It is impossible to read from a table that has data in it and was setup like this.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary commented on a change in pull request #1866: HIVE-14165: Remove unnecessary file listing from FetchOperator

Posted by GitBox <gi...@apache.org>.
pvary commented on a change in pull request #1866:
URL: https://github.com/apache/hive/pull/1866#discussion_r559442547



##########
File path: ql/src/test/queries/clientpositive/exim_04_evolved_parts.q
##########
@@ -15,7 +15,7 @@ alter table exim_employee_n12 clustered by (emp_sex, emp_dept) sorted by (emp_id
 alter table exim_employee_n12 add partition (emp_country='in', emp_state='tn');
 
 alter table exim_employee_n12 set fileformat 
-	inputformat  "org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat" 
+	inputformat  "org.apache.hadoop.hive.ql.io.RCFileInputFormat"

Review comment:
       Why do we need this change?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] pvary merged pull request #1866: HIVE-14165: Remove unnecessary file listing from FetchOperator

Posted by GitBox <gi...@apache.org>.
pvary merged pull request #1866:
URL: https://github.com/apache/hive/pull/1866


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org