You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/16 02:34:02 UTC

[GitHub] [iceberg] kbendick commented on pull request #5516: API: Add Generate Symlink Manifest API

kbendick commented on PR #5516:
URL: https://github.com/apache/iceberg/pull/5516#issuecomment-1216075644

   > While there are definitely caveats like renaming column name case and presence of V2 delete files that we should warn about, I also agree with @jackye1995 , generate SymlinkManifest seems quite useful, we have seen some asks for interop of data in iceberg table to non-Iceberg systems (in house file-based tools for instance).
   > 
   > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/SymlinkTextInputFormat.java seems a pretty standard way to provide file listing, and both generating/reading seems supported in a variety of data warehouse systems.
   
   I strongly share @rdblue’s concerns about properly reading deletes and can see potential concerns about schema evolution, but I do have seen that symlink text input format is pretty useful for interop and _usually, in my experience_ it’s (re)loaded every time it’s needed. But that might not be the common case. Does it have to be rebuilt entirely every time?
   
   I think it has a lot of value for interop with specific BI tools that are used for presentations etc that the end user really doesn’t have a choice on.
   
   In my experience, I’ve seen this needed especially for specific one off things like financial report demonstrations where the BI tool is for some reason set in stone.
   
   I’m not familiar enough with the format and concerns with schema evolution — again, usually my usage of it has been one off or rebuild / reload on every “refresh” — but I do think this would be good for Iceberg overall.
   
   But ensuring schema and properly applied deletes on every creation should be done with a lot of care.
   
   At least for BI tools that wouldn’t support a custom format at all (as this is much less efficient than reading the Iceberg table directly). I can share concern with not discouraging other tools from skipping on supporting Iceberg natively. But maybe this will provide those tools more of an incentive in the long run.
   
   My 2 cents on the matter. 100% happy to learn more about the format’s specifics to help support it if need be. 🙂 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org