You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/23 08:28:06 UTC

[GitHub] [iceberg] johndb2016 opened a new issue, #4839: Can Iceberg support different FileIO for meta and data?

johndb2016 opened a new issue, #4839:
URL: https://github.com/apache/iceberg/issues/4839

   I'd like to write metadata on the local ssd, but data on S3 with Iceberg, and i am wondering whether iceberg supports that. 
   Should I provide two kinds of FileIO?
   Any help willl be appreciated ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on issue #4839: Can Iceberg support different FileIO for meta and data?

Posted by GitBox <gi...@apache.org>.
nastra commented on issue #4839:
URL: https://github.com/apache/iceberg/issues/4839#issuecomment-1134438730

   I don't think this is currently supported. In fact we have a potential use case where having this would be helpful and I was planning to work on a prototype for this exact feature this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] johndb2016 closed issue #4839: Can Iceberg support different FileIO for meta and data?

Posted by GitBox <gi...@apache.org>.
johndb2016 closed issue #4839: Can Iceberg support different FileIO for meta and data? 
URL: https://github.com/apache/iceberg/issues/4839


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on issue #4839: Can Iceberg support different FileIO for meta and data?

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on issue #4839:
URL: https://github.com/apache/iceberg/issues/4839#issuecomment-1134554058

   > There's no differentiation between which FileIO implemenentation should be used for data and metadata files.
   
   Agree with you, As per my understanding, wouldn't this differentiation be automatically be done by the `location` produced for meta-data & data path as :
   meta-data path would be : `file://mnt/meta-data`
   data path would be : `s3://<bucket>/prefix`
   (this can be controled via conf's such as `write.metadata.path`). 
   
   Now a user when uses ResolvingFileIO it will implicitly use `HadoopFileIO` for meta-data (i.e paths like `file://mnt/meta-data`), S3FileIO for data (i.e paths like `s3://<bucket>/prefix`).
   
   > I think the idea behind this ticket is that users would have the capability to explicitly specify which FileIO implementation should be used for data files and which one for metadata files.
   
   This sounds like an interesting use case as to why would we want two diff FileIO's when let's say both data / meta-data share the same schemes. One case comes to mind that we want to seperate tuned SDK for S3FileIO obj which would read heavy data files and other has light weight meta-data files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on issue #4839: Can Iceberg support different FileIO for meta and data?

Posted by GitBox <gi...@apache.org>.
nastra commented on issue #4839:
URL: https://github.com/apache/iceberg/issues/4839#issuecomment-1134566590

   The use case I have in mind for example is: what if users would like to use S3+GCS or simply have different S3 locations for data and metadata files. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] singhpk234 commented on issue #4839: Can Iceberg support different FileIO for meta and data?

Posted by GitBox <gi...@apache.org>.
singhpk234 commented on issue #4839:
URL: https://github.com/apache/iceberg/issues/4839#issuecomment-1134456788

   [doubt] wouldn't [ResolvingFileIO](https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/io/ResolvingFileIO.java) work here ?
   
   This fileIo based on the location decides which FileIO impl should it use and stores, the instantiated fileIo in a local cache to avoid re-instantiation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on issue #4839: Can Iceberg support different FileIO for meta and data?

Posted by GitBox <gi...@apache.org>.
nastra commented on issue #4839:
URL: https://github.com/apache/iceberg/issues/4839#issuecomment-1134512854

   IIUC `ResolvingFileIO` has the capability to use different schemes, where `s3` / `s3n` / `s3a` all resolve to `S3FileIO` and `fs` resolves to `HadoopFileIO`. There's no differentiation between which FileIO implemenentation should be used for data and metadata files.
   
   I think the idea behind this ticket is that users would have the capability to explicitly specify which FileIO implementation should be used for data files and which one for metadata files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] johndb2016 commented on issue #4839: Can Iceberg support different FileIO for meta and data?

Posted by GitBox <gi...@apache.org>.
johndb2016 commented on issue #4839:
URL: https://github.com/apache/iceberg/issues/4839#issuecomment-1134574043

   yep, ResolingFileIO works for my case, but much better if iceberg  provides the capability to set custom FileIO for data & meta individually.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org