You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/19 15:10:31 UTC

[GitHub] [iceberg] liubo1022126 opened a new pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

liubo1022126 opened a new pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614


   Issue: https://github.com/apache/iceberg/issues/2567
   
   Run hive Sql in hive-shell. Table A left join Table B.
   
   > select * from 
   (select * from ta)p1
   left join 
   (select id,name,age from tb) p2 
   on p1.id=p2.id limit 10; 
   
   Regardless of whether Table A and Table B are in iceberg format or not, The amount of data in the table on the right is relatively large, Some map operator initialization failed.
   
   I find that the code `String[] selectedColumns = ColumnProjectionUtils.getReadColumnNames(configuration)` in class HiveIcebergSerDe get selectedColumns value from hconf by hive.io.file.readcolumn.names, But it does not correspond to the current map sometimes.
   
   Maybe someone realized this problem before, so there is some notes and code below:
   >  // the input split mapper handles does not belong to this table
       // it is necessary to ensure projectedSchema equals to tableSchema,
       // or we cannot find selectOperator's column from inspector
       if (projectedSchema.columns().size() != distinctSelectedColumns.length) {
         projectedSchema = tableSchema;
       }
   
   But it is not enough at some case. eg: Table ta also have column [name] and column [age], which are the select column in Table tb.
   
   I debug and notice that when the above situation occurs,  `serDeProperties.getProperty("columns")` corresponds to the schema columns of the current map, and `configuration.get("schema.evolution.columns")` corresponds to the schema columns of another. So I compare them to verify and it running ok.
   
   **But I'm not sure if these are enough, Can someone please help to check?**
   
   -------------------------
   
   And I found that there is another way to fix these problem, and I think this way is the best. But we also need to code hive.
   
   With hive, in class org.apache.hadoop.hive.ql.exec.MapOperator, we can get need columns from `((TableScanOperator) conf.getAliasToWork().get(alias)).getConf().getNeededColumns()`,  and set it in hconf `public void setChildren(Configuration hconf)` use a property px.
   
   Then in class org.apache.iceberg.mr.hive.HiveIcebergSerDe in iceberg, we can get need columns from property px correctly.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845010938


   @pvary : Yes, it only reset `hive.io.file.readNestedColumn.paths` from class variable [conf](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L184)
   
   So as what I mentioned in Solution 2 in this Pr, we can get needColumns from [conf](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L184) and then get it as a property in iceberg, but we also need to patch hive.
   
   > And I found that there is another way to fix these problem, and I think this way is the best. But we also need to code hive.
   > 
   > With hive, in class org.apache.hadoop.hive.ql.exec.MapOperator, we can get need columns from ((TableScanOperator) conf.getAliasToWork().get(alias)).getConf().getNeededColumns(), and set it in hconf public void setChildren(Configuration hconf) use a property px.
   > 
   > Then in class org.apache.iceberg.mr.hive.HiveIcebergSerDe in iceberg, we can get need columns from property px correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 removed a comment on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 removed a comment on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-844918302


   Thanks @pvary : 
   
   My problem should overlap with https://github.com/apache/iceberg/pull/2171 problem. 
   
   In the process of I fixing problem before, I tried to replace [projectedSchema](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java#L106) with `this.inspector = IcebergObjectInspector.create(tableSchema)` **[1]**,  and then I got the error `ArrayIndexOutOfBoundsException` at [IcebergRecordObjectInspector](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L73-L76) too, cause by `Object o` is only the part of the select columns and `StructField structField` is the all columns by init, which is modification of **[1]**.
   
   I think https://github.com/apache/iceberg/pull/2171 can also solve my problem. Although I did not use tez, our fundamental problem is the same.
   
   
   But because I have no experience with hive, I have not found why the projectedSchema in hconf is incorrect, 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-844271708


   @liubo1022126: Thanks for looking into this!
   
   - Could you please create the PR against the master branch, and then we can port it to the 0.11 branch if needed (depending on the releases)
   - Could you please add a test case which is failing before the fix and working after the fix? Maybe a new method into the `TestHiveIcebergStorageHandlerWithEngine`?
   
   I am not entirely sure that I understand the root cause of the problem.
   I feel that somehow the list of the projected columns are not correct. If my understanding is correct then this might be similar to #2171 but that is only for Tez execution engine.
   Having a test case would greatly simplify the understanding of the issue.
   
   Thanks,
   Peter
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 closed pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 closed pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845002242


   @pvary


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 removed a comment on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 removed a comment on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845008728


   @pvary : Yes, it only reset `hive.io.file.readNestedColumn.paths` from class variable [conf](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L184)
   
   So as what I mentioned in Solution 2 in this Pr, we can get needColumns from [conf](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L184) and then get it as a property in iceberg
   
   > And I found that there is another way to fix these problem, and I think this way is the best. But we also need to code hive.
   >
   >With hive, in class org.apache.hadoop.hive.ql.exec.MapOperator, we can get need columns from ((TableScanOperator) conf.getAliasToWork().get(alias)).getConf().getNeededColumns(), and set it in hconf public void setChildren(Configuration hconf) use a property px.
   >
   >Then in class org.apache.iceberg.mr.hive.HiveIcebergSerDe in iceberg, we can get need columns from property px correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] lcspinter commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
lcspinter commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-853059080


   @liubo1022126 Thanks for the patch! I tried to reproduce the issue with several versions of hive and with both execution engines, but I couldn't. In my repro environment, all the queries passed. 
   If it's not a big ask, could you please create a unit test? Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 removed a comment on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 removed a comment on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-844832360


   > @liubo1022126: Thanks for looking into this!
   > 
   > * Could you please create the PR against the master branch, and then we can port it to the 0.11 branch if needed (depending on the releases)
   > * Could you please add a test case which is failing before the fix and working after the fix? Maybe a new method into the `TestHiveIcebergStorageHandlerWithEngine`?
   > 
   > I am not entirely sure that I understand the root cause of the problem.
   > I feel that somehow the list of the projected columns are not correct. If my understanding is correct then this might be similar to #2171 but that is only for Tez execution engine.
   > Having a test case would greatly simplify the understanding of the issue.
   > 
   > Thanks,
   > Peter
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845040516


   @pvary We need a larger number of rows in both tables, the unit test class is not easy to code.
   
   I think I can write 2 tables schema and upload data textfile somewhere, anyone can test by them, what do you think about.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845213399


   > I think I can write 2 tables schema and upload data textfile somewhere, anyone can test by them, what do you think about.
   
   So if you can upload the data files somewhere and provide a unit test which reproduces the case that would be a good start. I still not sure why we need large number of rows, but if that is the only way to repro the case, then we should try it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845030747


   So basically the test case is that we have 2 tables where we have a column with the same name (maybe at the end of the column list), and we select only a few columns from each of the table, but we select the column with the same name. This would catch the issue?
   
   Am I right? Could you please create a test case, then I can check the Hive code and see what we could do with it.
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845008728


   @pvary : Yes, it only reset `hive.io.file.readNestedColumn.paths` from class variable [conf](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L184)
   
   So as what I mentioned in Solution 2 in this Pr, we can get needColumns from [conf](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L184) and then get it as a property in iceberg
   
   > And I found that there is another way to fix these problem, and I think this way is the best. But we also need to code hive.
   >
   >With hive, in class org.apache.hadoop.hive.ql.exec.MapOperator, we can get need columns from ((TableScanOperator) conf.getAliasToWork().get(alias)).getConf().getNeededColumns(), and set it in hconf public void setChildren(Configuration hconf) use a property px.
   >
   >Then in class org.apache.iceberg.mr.hive.HiveIcebergSerDe in iceberg, we can get need columns from property px correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-844985765


   > But what I can sure is [hconf in method param](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java#L419) is transfer to a Map<TableName, Conf>, two table's conf (contains hive.io.file.readcolumn.names) is same here. that's why get [selectedColumns](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java#L92) wrong.
   
   By reading the code there `cloneConfsForNestedColPruning(hconf);` creates a map for every table, and every conf should contain the pruning info for the specific table. See:
   ```
     /**
      * For each source table, combine the nested column pruning information from all its 
      * table scan descriptors and set it in a configuration copy. This is necessary since
      * the configuration property "READ_NESTED_COLUMN_PATH_CONF_STR" is set on a per-table
      * basis, so we can't just use a single configuration for all the tables.
      */
     private Map<String, Configuration> cloneConfsForNestedColPruning(Configuration hconf) {
   ```
   
   So you have found that the configs are not correctly set by this method and they contain the same pruning information for both tables?
   
   Thanks,
   Peter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-846538833


   @pvary ok, have a great holiday.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-844950381


   Thanks @pvary :
   
   My problem should overlap with #2171 problem. and I will take the time to write test code for this problem
   
   In the process of I fixing problem before, I tried to replace [projectedSchema](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java#L106) with [this.inspector = IcebergObjectInspector.create(tableSchema)] **[1]**, and then I got the error `ArrayIndexOutOfBoundsException` at [getStructFieldData](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L73-L76) too, cause by [Object o] is only part of the select columns and [StructField structField] is all columns by init, which is modification of **[1]**.
   
   I think #2171 can also solve my problem. Although I did not use tez, our fundamental problem is the same.
   
   Because I have no experience with hive, I have not found why the projectedSchema in hconf is incorrect. 
   
   But what I can sure is [hconf in method param](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java#L419) is transfer to a Map<TableName, Conf>, two table's conf (contains hive.io.file.readcolumn.names) is same here. that's why get [selectedColumns](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java#L92) wrong.
   
   But the class variable [conf](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java#L184) in super is ok, contains all tables info, can refer to [here](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java#L350) and [here](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/MapOperator.java#L354)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-844832360


   > @liubo1022126: Thanks for looking into this!
   > 
   > * Could you please create the PR against the master branch, and then we can port it to the 0.11 branch if needed (depending on the releases)
   > * Could you please add a test case which is failing before the fix and working after the fix? Maybe a new method into the `TestHiveIcebergStorageHandlerWithEngine`?
   > 
   > I am not entirely sure that I understand the root cause of the problem.
   > I feel that somehow the list of the projected columns are not correct. If my understanding is correct then this might be similar to #2171 but that is only for Tez execution engine.
   > Having a test case would greatly simplify the understanding of the issue.
   > 
   > Thanks,
   > Peter
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845924472


   @pvary Sorry to reply now, I was a bit busy today, right table need large number of rows, I guessing is related to cbo, affecting the execution plan. 
   
    [there](https://github.com/liubo1022126/data_lib/blob/main/iceberg/pr-2614/README.md) is the Reproduce the problem and [orc_000000_0](https://github.com/liubo1022126/data_lib/tree/main/iceberg/pr-2614) is the datafile.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845949077


   @liubo1022126: Thanks for collecting these, because of some National Holidays next week I will need some time to take a look into it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 removed a comment on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 removed a comment on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-845002242


   @pvary


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] liubo1022126 commented on pull request #2614: MR: Fix selectedColumns not belong to current Map (#2567)

Posted by GitBox <gi...@apache.org>.
liubo1022126 commented on pull request #2614:
URL: https://github.com/apache/iceberg/pull/2614#issuecomment-844918302


   Thanks @pvary : 
   
   My problem should overlap with https://github.com/apache/iceberg/pull/2171 problem. 
   
   In the process of I fixing problem before, I tried to replace [projectedSchema](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergSerDe.java#L106) with `this.inspector = IcebergObjectInspector.create(tableSchema)` **[1]**,  and then I got the error `ArrayIndexOutOfBoundsException` at [IcebergRecordObjectInspector](https://github.com/apache/iceberg/blob/0.11.x/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L73-L76) too, cause by `Object o` is only the part of the select columns and `StructField structField` is the all columns by init, which is modification of **[1]**.
   
   I think https://github.com/apache/iceberg/pull/2171 can also solve my problem. Although I did not use tez, our fundamental problem is the same.
   
   
   But because I have no experience with hive, I have not found why the projectedSchema in hconf is incorrect, 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org