You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/21 17:34:34 UTC

[GitHub] [iceberg] boroknagyz opened a new pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

boroknagyz opened a new pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947


   …does


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] boroknagyz edited a comment on pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

Posted by GitBox <gi...@apache.org>.

boroknagyz edited a comment on pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947#issuecomment-1019872087


   Another thing is that you can create a partitioned Iceberg table in Hive by setting the table property 'iceberg.mr.table.partition.spec':
   https://github.com/apache/hive/blob/392c4f259ad57d746280790a21e560319ace4c8d/iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q#L11
   
   To define this property, the user has to be aware of details of the Iceberg specification, i.e. field ids, source ids, and so on. Without this change the user would need to use different ids than the ids assigned by Iceberg later, which I think can be very confusing.
   
   We don't have the corresponding tests here I think, but in the Hive repo some tests would fail if Iceberg starts to use different field id assignment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947#issuecomment-1019001036


   I think that this change is fine, but you should be able to use any IDs that you choose so it isn't really a functional change. IDs are reassigned when we create an Iceberg table anyway, so they're for internal consistency. The IDs produced here shouldn't really matter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

Posted by GitBox <gi...@apache.org>.

pvary commented on pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947#issuecomment-1019071181


   @rdblue: We have a corresponding change in the Hive code where during the migration of an external Hive table to Iceberg table we would like to set the "schema.name-mapping.default", so Impala could read the migrated table.
   
   We can do in a different way:
   - When Iceberg rewrites the schema, it should rewrite the `schema.name-mapping.default` value too - This might be a more general solution than depending on the Hive schema generation creating the same ids as the Iceberg schema generation. On the other hand I am not sure this worth the added complexity.
   
   What are your thoughts Ryan?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] boroknagyz commented on pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

Posted by GitBox <gi...@apache.org>.

boroknagyz commented on pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947#issuecomment-1019872087


   Another thing is that you can create a partitioned Iceberg table in Hive by setting the table property 'iceberg.mr.table.partition.spec':
   https://github.com/apache/hive/blob/392c4f259ad57d746280790a21e560319ace4c8d/iceberg/iceberg-handler/src/test/queries/positive/describe_iceberg_table.q#L11
   
   To define this property, the user has to be aware of details of the Iceberg specification, i.e. field ids, source ids, and so on. Without this change the user would need to use different ids than the ids assigned by Iceberg later, which I think can be very confusing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947#issuecomment-1019535657


   @pvary, it sounds like what you're saying is that you're generating a schema, then a name mapping, and then creating a table and want the table schema to match the name mapping. I think it makes the most sense to create the table, then base the name mapping on the schema, and then add it to the table. That way you don't rely on the IDs that you pass into create table -- the IDs passed in are reassigned and we make no guarantees about them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] boroknagyz commented on pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

Posted by GitBox <gi...@apache.org>.

boroknagyz commented on pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947#issuecomment-1031287126


   The HiveSchemaConverter in the Hive repo already have this change:
   https://github.com/apache/hive/blob/4b7a948e45fd88372fef573be321cda40d189cc7/iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveSchemaConverter.java#L50
   
   Can we move forward with this PR? Or should we do things differently?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] pvary commented on pull request #3947: Hive: HiveSchemaConverter should use one-based indexing like Iceberg …

Posted by GitBox <gi...@apache.org>.

pvary commented on pull request #3947:
URL: https://github.com/apache/iceberg/pull/3947#issuecomment-1019864347


   > @pvary, it sounds like what you're saying is that you're generating a schema, then a name mapping, and then creating a table and want the table schema to match the name mapping. I think it makes the most sense to create the table, then base the name mapping on the schema, and then add it to the table. That way you don't rely on the IDs that you pass into create table -- the IDs passed in are reassigned and we make no guarantees about them.
   
   Wouldn't this create a transiently invalid table? Also this would create an extra write to the S3 as we need to create a second metadata file.
   
   Maybe some other way to provide the mapping would be better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org