You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/07/01 19:13:57 UTC

[GitHub] [iceberg] guilload commented on issue #1155: Support HiveCatalog for Iceberg StorageHandler

guilload commented on issue #1155:
URL: https://github.com/apache/iceberg/issues/1155#issuecomment-652598657


   I have some thoughts regarding this. Now that the SerDe has been merged, there are two pieces of code that do essentially the same thing:
   - [IcebergInputFormat.findTable(...)](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapreduce/IcebergInputFormat.java#L357)
   - [TableResolver.resolveTableFromConfiguration(...)](https://github.com/apache/iceberg/blob/master/mr/src/main/java/org/apache/iceberg/mr/mapred/TableResolver.java#L45)
   
   1. Those two classes should probably be consolidated into one.
   2. The ability to define a custom catalog loader is neat. We actually use our own catalog at Airbnb and that feature comes in handy.
   3. At the same, most users are going to use either the default Hadoop or Hive catalogs, so let's make it easy for them.
   4. I'm personally not a fan of using  a "table path" configuration property to pass either a file path or a table identifier and have to look for the `/` character  to know which use is intended.
   
   Roughly, I have something like that in mind:
   
   ```python
   def find_table(conf)
     assert conf.get('table.identifier') is None or conf.get('table.path') is None
     assert conf.get('catalog') is None or conf.get('catalog.loader.class') is None
   
     if conf.get('table.path'):
       return HadoopTables.load(conf.get('table.path'))
   
     if conf.get('catalog.loader.class'):
       loader = load_class(conf.get('catalog.loader.class'))
       return loader.load(conf)
   
     identifier = conf.get('table.identifier')
   
     if (conf.get('catalog') == 'hadoop'):
       return HadoopCatalog.load(identifier)
     elif (conf.get('catalog') == 'hive'):
       return HiveCatalog.load(identifier)
     else:
       raise ...
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org