You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@hive.apache.org by GitBox <gi...@apache.org> on 2021/03/10 09:45:29 UTC

[GitHub] [hive] prasanthj opened a new pull request #2057: HIVE-24866: FileNotFoundException during alter table concat

prasanthj opened a new pull request #2057:
URL: https://github.com/apache/hive/pull/2057


   ### What changes were proposed in this pull request?
   There has been a bug lurking in alter table concatenate for ORC which is typically observed in case where orc files are bigger and different nodes and racks. Because of the CombineFileInputFormat groups the files together based on node/rack locality and based on default max split size of 256MB, if the orc file size is >256MB and if the file spans multiple nodes/rack then CombineIF splits the file and groups then in different splits. Now when these different splits are processed by the mappers of merge task, the first task will initiate the concatenate and as part of task commit will move the file to scratch dir. Now when the same file is processed by a different split, the will be non-existent as it was moved by the prior mapper. This can cause failures in alter table concat task and also can results in stripes being lost because of this partial concatenation. 
   This PR addresses this issue by mapping the mapper that gets the start of the split to own the entire orc file for concatenation. It will process all the stripes, concatenate them to destination file and move the source file. Mappers that does not get start of the split will simply skip as the file is already handled or will be handled by different mapper.
   
   ### Why are the changes needed?
   To avoid concatenation failures and stripe loss issues. 
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Tested in internal repro cluster which had bigger orc files that spans multiple nodes and racks. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] commented on pull request #2057: HIVE-24866: FileNotFoundException during alter table concat

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #2057:
URL: https://github.com/apache/hive/pull/2057#issuecomment-855489135


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] ShubhamChaurasia commented on pull request #2057: HIVE-24866: FileNotFoundException during alter table concat

Posted by GitBox <gi...@apache.org>.
ShubhamChaurasia commented on pull request #2057:
URL: https://github.com/apache/hive/pull/2057#issuecomment-795419941


   @prasanthj  this test failure is related - http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2057/1/tests ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] prasanthj commented on pull request #2057: HIVE-24866: FileNotFoundException during alter table concat

Posted by GitBox <gi...@apache.org>.
prasanthj commented on pull request #2057:
URL: https://github.com/apache/hive/pull/2057#issuecomment-795678425


   > @prasanthj this test failure is related - http://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/detail/PR-2057/1/tests ?
   
   Thanks! Yeah. Fixed it. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org


[GitHub] [hive] github-actions[bot] closed pull request #2057: HIVE-24866: FileNotFoundException during alter table concat

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #2057:
URL: https://github.com/apache/hive/pull/2057


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org
For additional commands, e-mail: gitbox-help@hive.apache.org