You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/09/11 20:34:08 UTC

[GitHub] [iceberg] HotSushi opened a new issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

HotSushi opened a new issue #1445:
URL: https://github.com/apache/iceberg/issues/1445


   I wrote a simple test for reading uppercased schema in iceberg-mr, but it fails.
   
   The schema is as follows
   ```
   Schema(
       required(1, "Data", Types.StructType.of(
           required(2, "Case1", Types.BooleanType.get())
   ))
   ```
   If you run simple `Select * from table` query with hiverunner, it fails because of following error:
   ```
   java.lang.RuntimeException: cannot find field data from [org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector$IcebergRecordStructField@f45265b5]
   	at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:523)
   	at org.apache.iceberg.mr.hive.serde.objectinspector.IcebergRecordObjectInspector.getStructFieldRef(IcebergRecordObjectInspector.java:68)
   	at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:56)
   	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluators(Operator.java:1033)
   	at org.apache.hadoop.hive.ql.exec.Operator.initEvaluatorsAndReturnStruct(Operator.java:1059)
   	at org.apache.hadoop.hive.ql.exec.SelectOperator.initializeOp(SelectOperator.java:75)
   	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
   	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
   	at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
   	at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
   	at org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:88)
   ```
   
   The reason for this is that ObjectInspectorUtils.getStandardStructFieldRef forcibly checks with a lowercased fieldname (i.e data) whereas IcebergRecordObjectInspector has uppercased fieldname (i.e Data).
   
   the following workaround works but not sure if worth pursuing as all fieldnames in structs would be lowercased.
   ```
   --- a/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
   +++ b/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java
   @@ -125,7 +125,7 @@ public final class IcebergRecordObjectInspector extends StructObjectInspector {
    
        @Override
        public String getFieldName() {
   -      return field.name();
   +      return field.name().toLowerCase();
        }
   ``` 
   
   Here's the complete test: [link](https://gist.github.com/HotSushi/be2439941675aa3a01c514174e8fbf74)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] omalley commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
omalley commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-691335791


   I thought Hive only lowercases the top level column names. Does it also lowercase the fields in structs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HotSushi commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
HotSushi commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-692244861


   @pvary This error occurs even before InputSplits are formed, so TableScan is not yet configured. This error seems to occur during query compile time when it's trying to construct Select operator for Column[Data]
   
   @omalley I think it's the other way round. If I run "select Data.Case1 from table", Hive tries to create select operator for "Column[Data].case1". So top lower columns are proper-cased but fields in structs are lowercased
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-691416873


   > I thought Hive only lowercases the top level column names. Does it also lowercase the fields in structs?
   
   Maybe this is because we use the lowercase config when we start the TableScan, and this config might lowercase the struct fields too?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] omalley commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
omalley commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-691335791






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HotSushi commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
HotSushi commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-692993115


   @edgarRd @guilload @cmathiesen  Any thoughts? 
   
   Seems like [StructField](https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/StructField.java#L33) implementation requires getFieldName() to return lowercase name (unlike what we are doing [here](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergRecordObjectInspector.java#L128)).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-691416873






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
pvary commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-691416873






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] omalley commented on issue #1445: Uppercased schemas are not readable in Iceberg-mr/ hive

Posted by GitBox <gi...@apache.org>.
omalley commented on issue #1445:
URL: https://github.com/apache/iceberg/issues/1445#issuecomment-691335791


   I thought Hive only lowercases the top level column names. Does it also lowercase the fields in structs?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org