You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/04 12:22:24 UTC

[GitHub] [iceberg] tprelle edited a comment on issue #2541: Hive: insert into from hive tez it's not working for Map Only insert query

tprelle edited a comment on issue #2541:
URL: https://github.com/apache/iceberg/issues/2541#issuecomment-831888173


   @marton-bod  sure : 
   For tez from apache 0.10.0 tag i add 
   -  https://issues.apache.org/jira/projects/TEZ/issues/TEZ-4238 
   - https://issues.apache.org/jira/projects/TEZ/issues/TEZ-4264
   
   For hive it was a bit complex from HDP 3.1.5-2-4 versions i add :
    - https://issues.apache.org/jira/browse/HIVE-23190 for be able to go to tez 0.10
    - https://issues.apache.org/jira/browse/HIVE-24629 for output committer classe
    - https://issues.apache.org/jira/browse/HIVE-24207 because i need that hive tez processor fill jobconf https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezProcessor.java#L202 for TEZ_VERTEX_ID_HIVE in order to make TaskAttemptWrapper https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/TezUtil.java#L95
    
   With this version i add still an issue : with this line https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java#L382
   Because conf.getNumReduceTasks() and conf.getNumMapTasks() was never setup by Hive.
   I found a way (but i do not know if it's the correct one or it's because of HDP fork) to fix.
   
   - For  ReduceWork plan, i add at this line
   
    https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L800
   `    conf.setNumReduceTasks(reduceWork.isAutoReduceParallelism() ?
               reduceWork.getMaxReduceTasks() :
               reduceWork.getNumReduceTasks());`
   
   - For MergeJoinWork i add at this line https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L596 `conf.setNumMapTasks(mapWorkList.size() + 1);`
   
   - For MapWork, i was able only in one condition, if hive.compute.splits.in.am=false by adding at  https://github.com/apache/hive/blob/branch-3.1/ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java#L716 `conf.setNumMapTasks(numTasks);`
   But with hive.compute.splits.in.am=false vectorisation it's not longer working because row ids a not longer projected.
   
   I need to set me up an hive from latest 3.1 version in order to be able to test.
   I take as example Apache code as Cloudera deside to remove from internet Hortonworks github but it's seems it's almost the same code from apache branch 3.1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org