You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by John Clara <jo...@gmail.com> on 2020/11/11 21:07:28 UTC

Suggested S3 FileIO/Getting Started

Hello all,

Thank you all for creating/continuing this great project! I am just 
starting to get comfortable with the fundamentals and I'm thinking that 
my team has been using Iceberg the wrong way at the FileIO level.

I was wondering if people would be willing to share how they set up 
their FileIO/FileSystem with S3 and any customizations they had to add.

     (Preferably from smaller teams. My team is small and cannot 
realistically customize everything. If there's an up to date thread 
discussing this that I missed, please link me that instead.)

*****My team's specific problems/setup which you can ignore ***

My team has been using Hadoop FileIO with the S3AFileSystem. Jars are 
provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB for 
atomic renames by implementing Iceberg's provided interfaces. We 
read/write from either Spark in EMR or on-prem JVM's in docker 
containers (managed by k8s). Both use s3a, but the EMR clusters have 
HDFS (backed by core nodes) for the s3a buffered writes while the 
on-prem containers use the docker container's default file system which 
uses an overlay2 storage driver (that I know nothing about).

Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and list 
requests which is well known in the community (but not to my team 
unfortunately). There's also GET PUT GET inconsistency issues with S3 
that have been talked about, but I don't yet understand how they arise 
in the 2.8.5 S3AFilesystem (https://github.com/apache/iceberg/issues/1398).

*** End of specific ***


The options I'm seeing are:

1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?

     This still seems very new unless it is actually based on Netflix's 
prod implementation that they're releasing to the community? (I'm 
wondering if it's safe to start moving onto it in prod in the near term. 
If Netflix is using it (or rolling it out) that would be more than 
enough for my team.)

2. Using a newer hadoop version and use the S3AFileSystem. Any 
recommendations on a version and are you also using S3Guard?

     From a quick look, most gains compared to older versions seem to be 
from S3Guard. Are there substantial gains without it? (My team doesn't 
have experience with S3Guard and Iceberg seems to not need it outside of 
atomic renames?)

3. Using an alternative hadoop file system. Any recommendations?

     In the recent Iceberg S3 FileIO, the License states it was based 
off the Presto FileSystem. Has anyone used this file system as is with 
Iceberg? (https://github.com/apache/iceberg/blob/master/LICENSE#L251)

4. Roll our own hadoop file system. Anyone have stories/blogs about 
pitfalls or difficulties?

     rdblue hints that Netflix already done this: 
https://github.com/apache/iceberg/issues/1398#issuecomment-682837392 . 
(My team probably doesn't have the capacity for this)


Places where I tried looking for this info:

  * https://github.com/apache/iceberg/issues/761 (issue for getting
    started guide)
  * https://iceberg.apache.org/spec/#file-system-operations

Thanks everyone,

John Clara

Re: Suggested S3 FileIO/Getting Started

Posted by Saisai Shao <sa...@gmail.com>.

Thanks a lot Ryan for your explanation, greatly helpful.

Best regards,
Saisai

Ryan Blue <rb...@netflix.com.invalid> 于2020年11月14日周六 上午2:03写道：

> Saisai,
>
> Iceberg's FileIO interface doesn't require guarantees as strict as a
> Hadoop-compatible FileSystem. For S3, that allows us to avoid negative
> caching that can cause problems when reading a table that has just been
> updated. (Specifically, S3A performs a HEAD request to check whether a path
> is a directory even for overwrites.)
>
> It's up to users whether they want to use S3FileIO or S3A with
> HadoopFileIO, or even HadoopFileIO with a custom FileSystem. We are working
> to make it easy to switch between them, but there are notable problems like
> ORC not using the FileIO interface yet. I would recommend using S3FileIO to
> avoid negative caching, but S3A with S3Guard may be another way to avoid
> the problem. I think the default should be to use S3FileIO if it is in the
> classpath, and fall back to a Hadoop FileSystem if it isn't.
>
> For cloud providers, I think the question is whether there is a need for
> the relaxed guarantees. If there is no need, then using HadoopFileIO and a
> FileSystem works great and is probably the easiest way to maintain an
> implementation for the object store.
>
> On Thu, Nov 12, 2020 at 7:31 PM Saisai Shao <sa...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Sorry to chime in, I also have a same concern about using Iceberg with
>> Object storage.
>>
>> One of my concerns with S3FileIO is getting tied too much to a single
>>> cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
>>> so that S3FileIO and (a future) GCSFileIO could share logic? I haven't
>>> looked deep enough into the S3FileIO to know how much logic is not s3
>>> specific. Maybe the FileIO interface is enough.
>>>
>>
>> Now we have a S3 specific FileIO implementation, is it recommended to use
>> this one instead of s3a like HCFS implementation? Also each public cloud
>> provider has its own HCFS implementation for its own object storage. Are we
>> going to suggest to create a specific FileIO implementation or use the
>> existing HCFS implementation?
>>
>> Thanks
>> Saisai
>>
>>
>> Daniel Weeks <dw...@netflix.com.invalid> 于2020年11月13日周五 上午1:09写道：
>>
>>> Hey John, about the concerns around cloud provider dependency, I feel
>>> like the FileIO interface is actually the right level of abstraction
>>> already.
>>>
>>> That interface basically requires "open for read" and "open for write",
>>> where the implementation will diverge across different platforms.
>>>
>>> I guess you could think of it as S3FileIO is to FileIO what
>>> S3AFileSystem is to FileSystem (in Hadoop).  You can have many different
>>> implementations that coexist.
>>>
>>> In fact, recent changes to the Catalog allow for very flexible
>>> management of FIleIO and you could even have files within a table split
>>> across multiple cloud vendors.
>>>
>>> As to the consistency questions, the list operation can be inconsistent
>>> (e.g. if a new file is created and the implementation relies on list then
>>> read, it may not see newly created objects.  Iceberg does not list, so that
>>> should not be an issue).
>>>
>>> The stated read-after-write consistency is limited and does not include:
>>>  - Read after overwrite
>>>  - Read after delete
>>>  - Read after negative cache (e.g. a GET or HEAD that occurred before
>>> the object was created).
>>>
>>> Some of those inconsistencies have caused problems in certain cases when
>>> it comes to committing data (the negative cache being the main culprit).
>>>
>>> -Dan
>>>
>>>
>>> On Wed, Nov 11, 2020 at 6:49 PM John Clara <jo...@gmail.com>
>>> wrote:
>>>
>>>> Update: I think I'm wrong about the listing part. I think it will only
>>>> do the HEAD request. Also it seems like the consistency issue is
>>>> probably not something my team would encounter with our current jobs.
>>>>
>>>> On 2020/11/12 02:17:10, John Clara <j....@gmail.com> wrote:
>>>>  > (Not sure if this is actually replying or just starting a new
>>>> thread)>
>>>>  >
>>>>  > Hi Daniel,>
>>>>  >
>>>>  > Thanks for the response! It's very helpful and answers a lot my
>>>> questions.>
>>>>  >
>>>>  > A couple follow ups:>
>>>>  >
>>>>  > One of my concerns with S3FileIO is getting tied too much to a
>>>> single >
>>>>  > cloud provider. I'm wondering if an ObjectStoreFileIO would be
>>>> helpful >
>>>>  > so that S3FileIO and (a future) GCSFileIO could share logic? I
>>>> haven't >
>>>>  > looked deep enough into the S3FileIO to know how much logic is not
>>>> s3 >
>>>>  > specific. Maybe the FileIO interface is enough.>
>>>>  >
>>>>  > About consistency (no need to respond here):>
>>>>  > I'm seeing that during "getFileStatus" my version of s3a does some
>>>> list >
>>>>  > requests (but I'm not sure if that could fail from consistency
>>>> issues).>
>>>>  > I'm also confused about the read-after-(initial) write part:>
>>>>  > "Amazon S3 provides read-after-write consistency for PUTS of new
>>>> objects >
>>>>  > in your S3 bucket in all Regions with one caveat. The caveat is that
>>>> if >
>>>>  > you make a HEAD or GET request to a key name before the object is >
>>>>  > created, then create the object shortly after that, a subsequent GET
>>>> >
>>>>  > might not return the object due to eventual consistency. - >
>>>>  > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html">
>>>>  >
>>>>  > When my version of s3a does a create, it first does a
>>>> getMetadataRequest >
>>>>  > (HEAD) to check if the object exists before creating the object. I
>>>> think >
>>>>  > this is talked about in this issue: >
>>>>  > https://github.com/apache/iceberg/issues/1398 and talked about in
>>>> the >
>>>>  > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll
>>>> follow
>>>> up >
>>>>  > in that issue for more info.>
>>>>  >
>>>>  > John>
>>>>  >
>>>>  >
>>>>  > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID>
>>>> wrote:>
>>>>  > > Hey John, I might be able to help answer some of your questions
>>>> and >
>>>>  > provide>>
>>>>  > > some context around how you might want to go forward.>>
>>>>  > >>
>>>>  > > So, one fundamental aspect of Iceberg is that it only relies on a
>>>> few>>
>>>>  > > operations (as defined by the FileIO interface). This makes much
>>>> of
>>>> the>>
>>>>  > > functionality and complexity of full file system implementations>>
>>>>  > > unnecessary. You should not need features like S3Guard or
>>>> additional S3>>
>>>>  > > operations these implementations rely on in order to achieve file >
>>>>  > system>>
>>>>  > > contract behavior. Consistency issues should also not be a problem
>>>> >
>>>>  > since>>
>>>>  > > Iceberg does not overwrite or list and read-after-(initial)write
>>>> is
>>>> a>>
>>>>  > > guarantee provided by S3.>>
>>>>  > >>
>>>>  > > At Netflix, we use a custom FileSystem implementation (somewhat
>>>> like >
>>>>  > S3A),>>
>>>>  > > but with much of the contract behavior that drives additional >
>>>>  > operations>>
>>>>  > > against S3 disabled. However, we are transitioning to a more
>>>> native>>
>>>>  > > implementation of S3FileIO, which you'll see as part of the
>>>> ongoing >
>>>>  > work in>>
>>>>  > > Iceberg.>>
>>>>  > >>
>>>>  > > Per your specific questions:>>
>>>>  > >>
>>>>  > > 1) The S3FileIO implementation is very new, though internally we
>>>> have>>
>>>>  > > something very similar. There are features missing that we are >
>>>>  > working to>>
>>>>  > > add (e.g. progressive multipart upload for large files is likely
>>>> the >
>>>>  > most>>
>>>>  > > important).>>
>>>>  > > 2) You can use S3AFileSystem with the HadoopFileIO implementation,
>>>> >
>>>>  > though>>
>>>>  > > you may still see similar behavior with additional calls being
>>>> made
>>>> (I>>
>>>>  > > don't know if these can be disabled).>>
>>>>  > > 3) The PrestoS3FileSystem is tailored to Presto's use and is
>>>> likely >
>>>>  > not as>>
>>>>  > > complete as S3A, but seeing as it is using the Hadoop FileSystem
>>>> api, >
>>>>  > it>>
>>>>  > > would likely work for what HadoopFileIO exercises (as would the>>
>>>>  > > EMRFileSystem).>>
>>>>  > > 4) I would probably discourage you from writing your own file
>>>> system >
>>>>  > as the>>
>>>>  > > S3FileIO will likely be a more optimized implementation for what >
>>>>  > Iceberg>>
>>>>  > > needs.>>
>>>>  > >>
>>>>  > > If you want to contribute or have time to help contribute to >
>>>>  > S3FileIO, that>>
>>>>  > > is the path I would recommend. As for configuration, I would say a
>>>> >
>>>>  > lot of>>
>>>>  > > it comes down to how to configure the AWS S3 Client that you
>>>> provide >
>>>>  > to the>>
>>>>  > > S3FileIO implementation, but a lot of the defaults are reasonable
>>>> (you>>
>>>>  > > might want to tweak a few like max connections and maybe the retry
>>>> >
>>>>  > policy).>>
>>>>  > >>
>>>>  > > The recently committed work to dynamically load your FileIO should
>>>> >
>>>>  > make it>>
>>>>  > > relatively easy to test out and we'd love to have extra eyes and >
>>>>  > feedback>>
>>>>  > > on it.>>
>>>>  > >>
>>>>  > > Let me know if that helps,>>
>>>>  > > -Dan>>
>>>>  > >>
>>>>  > >>
>>>>  > >>
>>>>  > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>>
>>>>  > > wrote:>>
>>>>  > >>
>>>>  > > > Hello all,>>
>>>>  > > >>>
>>>>  > > > Thank you all for creating/continuing this great project! I am
>>>> just>>
>>>>  > > > starting to get comfortable with the fundamentals and I'm
>>>> thinking >
>>>>  > that my>>
>>>>  > > > team has been using Iceberg the wrong way at the FileIO level.>>
>>>>  > > >>>
>>>>  > > > I was wondering if people would be willing to share how they set
>>>> up >
>>>>  > their>>
>>>>  > > > FileIO/FileSystem with S3 and any customizations they had to
>>>> add.>>
>>>>  > > >>>
>>>>  > > > (Preferably from smaller teams. My team is small and cannot>>
>>>>  > > > realistically customize everything. If there's an up to date
>>>> thread>>
>>>>  > > > discussing this that I missed, please link me that instead.)>>
>>>>  > > >>>
>>>>  > > > ***** My team's specific problems/setup which you can ignore
>>>> ***>>
>>>>  > > >>>
>>>>  > > > My team has been using Hadoop FileIO with the S3AFileSystem.
>>>> Jars
>>>> are>>
>>>>  > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use
>>>> DynamoDB >
>>>>  > for>>
>>>>  > > > atomic renames by implementing Iceberg's provided interfaces. We
>>>> >
>>>>  > read/write>>
>>>>  > > > from either Spark in EMR or on-prem JVM's in docker containers >
>>>>  > (managed by>>
>>>>  > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by
>>>> core >
>>>>  > nodes)>>
>>>>  > > > for the s3a buffered writes while the on-prem containers use the
>>>> >
>>>>  > docker>>
>>>>  > > > container's default file system which uses an overlay2 storage >
>>>>  > driver (that>>
>>>>  > > > I know nothing about).>>
>>>>  > > >>>
>>>>  > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and
>>>> list>>
>>>>  > > > requests which is well known in the community (but not to my
>>>> team>>
>>>>  > > > unfortunately). There's also GET PUT GET inconsistency issues
>>>> with >
>>>>  > S3 that>>
>>>>  > > > have been talked about, but I don't yet understand how they
>>>> arise >
>>>>  > in the>>
>>>>  > > > 2.8.5 S3AFilesystem
>>>> (https://github.com/apache/iceberg/issues/1398).>>
>>>>  > > >>>
>>>>  > > > *** End of specific ***>>
>>>>  > > >>>
>>>>  > > >>>
>>>>  > > > The options I'm seeing are:>>
>>>>  > > >>>
>>>>  > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>>
>>>>  > > >>>
>>>>  > > > This still seems very new unless it is actually based on
>>>> Netflix's>>
>>>>  > > > prod implementation that they're releasing to the community?
>>>> (I'm >
>>>>  > wondering>>
>>>>  > > > if it's safe to start moving onto it in prod in the near term.
>>>> If >
>>>>  > Netflix>>
>>>>  > > > is using it (or rolling it out) that would be more than enough
>>>> for >
>>>>  > my team.)>>
>>>>  > > >>>
>>>>  > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>>
>>>>  > > > recommendations on a version and are you also using S3Guard?>>
>>>>  > > >>>
>>>>  > > > From a quick look, most gains compared to older versions seem to
>>>> be>>
>>>>  > > > from S3Guard. Are there substantial gains without it? (My team >
>>>>  > doesn't have>>
>>>>  > > > experience with S3Guard and Iceberg seems to not need it outside
>>>> of >
>>>>  > atomic>>
>>>>  > > > renames?)>>
>>>>  > > >>>
>>>>  > > > 3. Using an alternative hadoop file system. Any
>>>> recommendations?>>
>>>>  > > >>>
>>>>  > > > In the recent Iceberg S3 FileIO, the License states it was based
>>>> off>>
>>>>  > > > the Presto FileSystem. Has anyone used this file system as is
>>>> with >
>>>>  > Iceberg?>>
>>>>  > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>>
>>>>  > > >>>
>>>>  > > > 4. Roll our own hadoop file system. Anyone have stories/blogs
>>>> about>>
>>>>  > > > pitfalls or difficulties?>>
>>>>  > > >>>
>>>>  > > > rdblue hints that Netflix already done this:>>
>>>>  > > > >
>>>>  > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392
>>>> .>>
>>>>  > > > (My team probably doesn't have the capacity for this)>>
>>>>  > > >>>
>>>>  > > >>>
>>>>  > > > Places where I tried looking for this info:>>
>>>>  > > >>>
>>>>  > > > - https://github.com/apache/iceberg/issues/761 (issue for
>>>> getting>>
>>>>  > > > started guide)>>
>>>>  > > > - https://iceberg.apache.org/spec/#file-system-operations>>
>>>>  > > >>>
>>>>  > > > Thanks everyone,>>
>>>>  > > >>>
>>>>  > > > John Clara>>
>>>>  > > >>>
>>>>  > >>
>>>>  >
>>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Suggested S3 FileIO/Getting Started

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Saisai,

Iceberg's FileIO interface doesn't require guarantees as strict as a
Hadoop-compatible FileSystem. For S3, that allows us to avoid negative
caching that can cause problems when reading a table that has just been
updated. (Specifically, S3A performs a HEAD request to check whether a path
is a directory even for overwrites.)

It's up to users whether they want to use S3FileIO or S3A with
HadoopFileIO, or even HadoopFileIO with a custom FileSystem. We are working
to make it easy to switch between them, but there are notable problems like
ORC not using the FileIO interface yet. I would recommend using S3FileIO to
avoid negative caching, but S3A with S3Guard may be another way to avoid
the problem. I think the default should be to use S3FileIO if it is in the
classpath, and fall back to a Hadoop FileSystem if it isn't.

For cloud providers, I think the question is whether there is a need for
the relaxed guarantees. If there is no need, then using HadoopFileIO and a
FileSystem works great and is probably the easiest way to maintain an
implementation for the object store.

On Thu, Nov 12, 2020 at 7:31 PM Saisai Shao <sa...@gmail.com> wrote:

> Hi all,
>
> Sorry to chime in, I also have a same concern about using Iceberg with
> Object storage.
>
> One of my concerns with S3FileIO is getting tied too much to a single
>> cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
>> so that S3FileIO and (a future) GCSFileIO could share logic? I haven't
>> looked deep enough into the S3FileIO to know how much logic is not s3
>> specific. Maybe the FileIO interface is enough.
>>
>
> Now we have a S3 specific FileIO implementation, is it recommended to use
> this one instead of s3a like HCFS implementation? Also each public cloud
> provider has its own HCFS implementation for its own object storage. Are we
> going to suggest to create a specific FileIO implementation or use the
> existing HCFS implementation?
>
> Thanks
> Saisai
>
>
> Daniel Weeks <dw...@netflix.com.invalid> 于2020年11月13日周五 上午1:09写道：
>
>> Hey John, about the concerns around cloud provider dependency, I feel
>> like the FileIO interface is actually the right level of abstraction
>> already.
>>
>> That interface basically requires "open for read" and "open for write",
>> where the implementation will diverge across different platforms.
>>
>> I guess you could think of it as S3FileIO is to FileIO what S3AFileSystem
>> is to FileSystem (in Hadoop).  You can have many different implementations
>> that coexist.
>>
>> In fact, recent changes to the Catalog allow for very flexible management
>> of FIleIO and you could even have files within a table split across
>> multiple cloud vendors.
>>
>> As to the consistency questions, the list operation can be inconsistent
>> (e.g. if a new file is created and the implementation relies on list then
>> read, it may not see newly created objects.  Iceberg does not list, so that
>> should not be an issue).
>>
>> The stated read-after-write consistency is limited and does not include:
>>  - Read after overwrite
>>  - Read after delete
>>  - Read after negative cache (e.g. a GET or HEAD that occurred before the
>> object was created).
>>
>> Some of those inconsistencies have caused problems in certain cases when
>> it comes to committing data (the negative cache being the main culprit).
>>
>> -Dan
>>
>>
>> On Wed, Nov 11, 2020 at 6:49 PM John Clara <jo...@gmail.com>
>> wrote:
>>
>>> Update: I think I'm wrong about the listing part. I think it will only
>>> do the HEAD request. Also it seems like the consistency issue is
>>> probably not something my team would encounter with our current jobs.
>>>
>>> On 2020/11/12 02:17:10, John Clara <j....@gmail.com> wrote:
>>>  > (Not sure if this is actually replying or just starting a new thread)>
>>>  >
>>>  > Hi Daniel,>
>>>  >
>>>  > Thanks for the response! It's very helpful and answers a lot my
>>> questions.>
>>>  >
>>>  > A couple follow ups:>
>>>  >
>>>  > One of my concerns with S3FileIO is getting tied too much to a single
>>> >
>>>  > cloud provider. I'm wondering if an ObjectStoreFileIO would be
>>> helpful >
>>>  > so that S3FileIO and (a future) GCSFileIO could share logic? I
>>> haven't >
>>>  > looked deep enough into the S3FileIO to know how much logic is not s3
>>> >
>>>  > specific. Maybe the FileIO interface is enough.>
>>>  >
>>>  > About consistency (no need to respond here):>
>>>  > I'm seeing that during "getFileStatus" my version of s3a does some
>>> list >
>>>  > requests (but I'm not sure if that could fail from consistency
>>> issues).>
>>>  > I'm also confused about the read-after-(initial) write part:>
>>>  > "Amazon S3 provides read-after-write consistency for PUTS of new
>>> objects >
>>>  > in your S3 bucket in all Regions with one caveat. The caveat is that
>>> if >
>>>  > you make a HEAD or GET request to a key name before the object is >
>>>  > created, then create the object shortly after that, a subsequent GET >
>>>  > might not return the object due to eventual consistency. - >
>>>  > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html">
>>>  >
>>>  > When my version of s3a does a create, it first does a
>>> getMetadataRequest >
>>>  > (HEAD) to check if the object exists before creating the object. I
>>> think >
>>>  > this is talked about in this issue: >
>>>  > https://github.com/apache/iceberg/issues/1398 and talked about in
>>> the >
>>>  > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll
>>> follow
>>> up >
>>>  > in that issue for more info.>
>>>  >
>>>  > John>
>>>  >
>>>  >
>>>  > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID>
>>> wrote:>
>>>  > > Hey John, I might be able to help answer some of your questions and
>>> >
>>>  > provide>>
>>>  > > some context around how you might want to go forward.>>
>>>  > >>
>>>  > > So, one fundamental aspect of Iceberg is that it only relies on a
>>> few>>
>>>  > > operations (as defined by the FileIO interface). This makes much of
>>> the>>
>>>  > > functionality and complexity of full file system implementations>>
>>>  > > unnecessary. You should not need features like S3Guard or
>>> additional S3>>
>>>  > > operations these implementations rely on in order to achieve file >
>>>  > system>>
>>>  > > contract behavior. Consistency issues should also not be a problem >
>>>  > since>>
>>>  > > Iceberg does not overwrite or list and read-after-(initial)write is
>>> a>>
>>>  > > guarantee provided by S3.>>
>>>  > >>
>>>  > > At Netflix, we use a custom FileSystem implementation (somewhat
>>> like >
>>>  > S3A),>>
>>>  > > but with much of the contract behavior that drives additional >
>>>  > operations>>
>>>  > > against S3 disabled. However, we are transitioning to a more
>>> native>>
>>>  > > implementation of S3FileIO, which you'll see as part of the ongoing
>>> >
>>>  > work in>>
>>>  > > Iceberg.>>
>>>  > >>
>>>  > > Per your specific questions:>>
>>>  > >>
>>>  > > 1) The S3FileIO implementation is very new, though internally we
>>> have>>
>>>  > > something very similar. There are features missing that we are >
>>>  > working to>>
>>>  > > add (e.g. progressive multipart upload for large files is likely
>>> the >
>>>  > most>>
>>>  > > important).>>
>>>  > > 2) You can use S3AFileSystem with the HadoopFileIO implementation, >
>>>  > though>>
>>>  > > you may still see similar behavior with additional calls being made
>>> (I>>
>>>  > > don't know if these can be disabled).>>
>>>  > > 3) The PrestoS3FileSystem is tailored to Presto's use and is likely
>>> >
>>>  > not as>>
>>>  > > complete as S3A, but seeing as it is using the Hadoop FileSystem
>>> api, >
>>>  > it>>
>>>  > > would likely work for what HadoopFileIO exercises (as would the>>
>>>  > > EMRFileSystem).>>
>>>  > > 4) I would probably discourage you from writing your own file
>>> system >
>>>  > as the>>
>>>  > > S3FileIO will likely be a more optimized implementation for what >
>>>  > Iceberg>>
>>>  > > needs.>>
>>>  > >>
>>>  > > If you want to contribute or have time to help contribute to >
>>>  > S3FileIO, that>>
>>>  > > is the path I would recommend. As for configuration, I would say a >
>>>  > lot of>>
>>>  > > it comes down to how to configure the AWS S3 Client that you
>>> provide >
>>>  > to the>>
>>>  > > S3FileIO implementation, but a lot of the defaults are reasonable
>>> (you>>
>>>  > > might want to tweak a few like max connections and maybe the retry >
>>>  > policy).>>
>>>  > >>
>>>  > > The recently committed work to dynamically load your FileIO should >
>>>  > make it>>
>>>  > > relatively easy to test out and we'd love to have extra eyes and >
>>>  > feedback>>
>>>  > > on it.>>
>>>  > >>
>>>  > > Let me know if that helps,>>
>>>  > > -Dan>>
>>>  > >>
>>>  > >>
>>>  > >>
>>>  > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>>
>>>  > > wrote:>>
>>>  > >>
>>>  > > > Hello all,>>
>>>  > > >>>
>>>  > > > Thank you all for creating/continuing this great project! I am
>>> just>>
>>>  > > > starting to get comfortable with the fundamentals and I'm
>>> thinking >
>>>  > that my>>
>>>  > > > team has been using Iceberg the wrong way at the FileIO level.>>
>>>  > > >>>
>>>  > > > I was wondering if people would be willing to share how they set
>>> up >
>>>  > their>>
>>>  > > > FileIO/FileSystem with S3 and any customizations they had to
>>> add.>>
>>>  > > >>>
>>>  > > > (Preferably from smaller teams. My team is small and cannot>>
>>>  > > > realistically customize everything. If there's an up to date
>>> thread>>
>>>  > > > discussing this that I missed, please link me that instead.)>>
>>>  > > >>>
>>>  > > > ***** My team's specific problems/setup which you can ignore ***>>
>>>  > > >>>
>>>  > > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars
>>> are>>
>>>  > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use
>>> DynamoDB >
>>>  > for>>
>>>  > > > atomic renames by implementing Iceberg's provided interfaces. We >
>>>  > read/write>>
>>>  > > > from either Spark in EMR or on-prem JVM's in docker containers >
>>>  > (managed by>>
>>>  > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by
>>> core >
>>>  > nodes)>>
>>>  > > > for the s3a buffered writes while the on-prem containers use the >
>>>  > docker>>
>>>  > > > container's default file system which uses an overlay2 storage >
>>>  > driver (that>>
>>>  > > > I know nothing about).>>
>>>  > > >>>
>>>  > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and
>>> list>>
>>>  > > > requests which is well known in the community (but not to my
>>> team>>
>>>  > > > unfortunately). There's also GET PUT GET inconsistency issues
>>> with >
>>>  > S3 that>>
>>>  > > > have been talked about, but I don't yet understand how they arise
>>> >
>>>  > in the>>
>>>  > > > 2.8.5 S3AFilesystem
>>> (https://github.com/apache/iceberg/issues/1398).>>
>>>  > > >>>
>>>  > > > *** End of specific ***>>
>>>  > > >>>
>>>  > > >>>
>>>  > > > The options I'm seeing are:>>
>>>  > > >>>
>>>  > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>>
>>>  > > >>>
>>>  > > > This still seems very new unless it is actually based on
>>> Netflix's>>
>>>  > > > prod implementation that they're releasing to the community? (I'm
>>> >
>>>  > wondering>>
>>>  > > > if it's safe to start moving onto it in prod in the near term. If
>>> >
>>>  > Netflix>>
>>>  > > > is using it (or rolling it out) that would be more than enough
>>> for >
>>>  > my team.)>>
>>>  > > >>>
>>>  > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>>
>>>  > > > recommendations on a version and are you also using S3Guard?>>
>>>  > > >>>
>>>  > > > From a quick look, most gains compared to older versions seem to
>>> be>>
>>>  > > > from S3Guard. Are there substantial gains without it? (My team >
>>>  > doesn't have>>
>>>  > > > experience with S3Guard and Iceberg seems to not need it outside
>>> of >
>>>  > atomic>>
>>>  > > > renames?)>>
>>>  > > >>>
>>>  > > > 3. Using an alternative hadoop file system. Any recommendations?>>
>>>  > > >>>
>>>  > > > In the recent Iceberg S3 FileIO, the License states it was based
>>> off>>
>>>  > > > the Presto FileSystem. Has anyone used this file system as is
>>> with >
>>>  > Iceberg?>>
>>>  > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>>
>>>  > > >>>
>>>  > > > 4. Roll our own hadoop file system. Anyone have stories/blogs
>>> about>>
>>>  > > > pitfalls or difficulties?>>
>>>  > > >>>
>>>  > > > rdblue hints that Netflix already done this:>>
>>>  > > > >
>>>  > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392
>>> .>>
>>>  > > > (My team probably doesn't have the capacity for this)>>
>>>  > > >>>
>>>  > > >>>
>>>  > > > Places where I tried looking for this info:>>
>>>  > > >>>
>>>  > > > - https://github.com/apache/iceberg/issues/761 (issue for
>>> getting>>
>>>  > > > started guide)>>
>>>  > > > - https://iceberg.apache.org/spec/#file-system-operations>>
>>>  > > >>>
>>>  > > > Thanks everyone,>>
>>>  > > >>>
>>>  > > > John Clara>>
>>>  > > >>>
>>>  > >>
>>>  >
>>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Suggested S3 FileIO/Getting Started

Posted by Saisai Shao <sa...@gmail.com>.

Hi all,

Sorry to chime in, I also have a same concern about using Iceberg with
Object storage.

One of my concerns with S3FileIO is getting tied too much to a single
> cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
> so that S3FileIO and (a future) GCSFileIO could share logic? I haven't
> looked deep enough into the S3FileIO to know how much logic is not s3
> specific. Maybe the FileIO interface is enough.
>

Now we have a S3 specific FileIO implementation, is it recommended to use
this one instead of s3a like HCFS implementation? Also each public cloud
provider has its own HCFS implementation for its own object storage. Are we
going to suggest to create a specific FileIO implementation or use the
existing HCFS implementation?

Thanks
Saisai


Daniel Weeks <dw...@netflix.com.invalid> 于2020年11月13日周五 上午1:09写道：

> Hey John, about the concerns around cloud provider dependency, I feel like
> the FileIO interface is actually the right level of abstraction already.
>
> That interface basically requires "open for read" and "open for write",
> where the implementation will diverge across different platforms.
>
> I guess you could think of it as S3FileIO is to FileIO what S3AFileSystem
> is to FileSystem (in Hadoop).  You can have many different implementations
> that coexist.
>
> In fact, recent changes to the Catalog allow for very flexible management
> of FIleIO and you could even have files within a table split across
> multiple cloud vendors.
>
> As to the consistency questions, the list operation can be inconsistent
> (e.g. if a new file is created and the implementation relies on list then
> read, it may not see newly created objects.  Iceberg does not list, so that
> should not be an issue).
>
> The stated read-after-write consistency is limited and does not include:
>  - Read after overwrite
>  - Read after delete
>  - Read after negative cache (e.g. a GET or HEAD that occurred before the
> object was created).
>
> Some of those inconsistencies have caused problems in certain cases when
> it comes to committing data (the negative cache being the main culprit).
>
> -Dan
>
>
> On Wed, Nov 11, 2020 at 6:49 PM John Clara <jo...@gmail.com>
> wrote:
>
>> Update: I think I'm wrong about the listing part. I think it will only
>> do the HEAD request. Also it seems like the consistency issue is
>> probably not something my team would encounter with our current jobs.
>>
>> On 2020/11/12 02:17:10, John Clara <j....@gmail.com> wrote:
>>  > (Not sure if this is actually replying or just starting a new thread)>
>>  >
>>  > Hi Daniel,>
>>  >
>>  > Thanks for the response! It's very helpful and answers a lot my
>> questions.>
>>  >
>>  > A couple follow ups:>
>>  >
>>  > One of my concerns with S3FileIO is getting tied too much to a single >
>>  > cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful
>> >
>>  > so that S3FileIO and (a future) GCSFileIO could share logic? I haven't
>> >
>>  > looked deep enough into the S3FileIO to know how much logic is not s3 >
>>  > specific. Maybe the FileIO interface is enough.>
>>  >
>>  > About consistency (no need to respond here):>
>>  > I'm seeing that during "getFileStatus" my version of s3a does some
>> list >
>>  > requests (but I'm not sure if that could fail from consistency
>> issues).>
>>  > I'm also confused about the read-after-(initial) write part:>
>>  > "Amazon S3 provides read-after-write consistency for PUTS of new
>> objects >
>>  > in your S3 bucket in all Regions with one caveat. The caveat is that
>> if >
>>  > you make a HEAD or GET request to a key name before the object is >
>>  > created, then create the object shortly after that, a subsequent GET >
>>  > might not return the object due to eventual consistency. - >
>>  > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html">
>>  >
>>  > When my version of s3a does a create, it first does a
>> getMetadataRequest >
>>  > (HEAD) to check if the object exists before creating the object. I
>> think >
>>  > this is talked about in this issue: >
>>  > https://github.com/apache/iceberg/issues/1398 and talked about in the
>> >
>>  > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll follow
>> up >
>>  > in that issue for more info.>
>>  >
>>  > John>
>>  >
>>  >
>>  > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID>
>> wrote:>
>>  > > Hey John, I might be able to help answer some of your questions and >
>>  > provide>>
>>  > > some context around how you might want to go forward.>>
>>  > >>
>>  > > So, one fundamental aspect of Iceberg is that it only relies on a
>> few>>
>>  > > operations (as defined by the FileIO interface). This makes much of
>> the>>
>>  > > functionality and complexity of full file system implementations>>
>>  > > unnecessary. You should not need features like S3Guard or
>> additional S3>>
>>  > > operations these implementations rely on in order to achieve file >
>>  > system>>
>>  > > contract behavior. Consistency issues should also not be a problem >
>>  > since>>
>>  > > Iceberg does not overwrite or list and read-after-(initial)write is
>> a>>
>>  > > guarantee provided by S3.>>
>>  > >>
>>  > > At Netflix, we use a custom FileSystem implementation (somewhat like
>> >
>>  > S3A),>>
>>  > > but with much of the contract behavior that drives additional >
>>  > operations>>
>>  > > against S3 disabled. However, we are transitioning to a more native>>
>>  > > implementation of S3FileIO, which you'll see as part of the ongoing >
>>  > work in>>
>>  > > Iceberg.>>
>>  > >>
>>  > > Per your specific questions:>>
>>  > >>
>>  > > 1) The S3FileIO implementation is very new, though internally we
>> have>>
>>  > > something very similar. There are features missing that we are >
>>  > working to>>
>>  > > add (e.g. progressive multipart upload for large files is likely the
>> >
>>  > most>>
>>  > > important).>>
>>  > > 2) You can use S3AFileSystem with the HadoopFileIO implementation, >
>>  > though>>
>>  > > you may still see similar behavior with additional calls being made
>> (I>>
>>  > > don't know if these can be disabled).>>
>>  > > 3) The PrestoS3FileSystem is tailored to Presto's use and is likely >
>>  > not as>>
>>  > > complete as S3A, but seeing as it is using the Hadoop FileSystem
>> api, >
>>  > it>>
>>  > > would likely work for what HadoopFileIO exercises (as would the>>
>>  > > EMRFileSystem).>>
>>  > > 4) I would probably discourage you from writing your own file system
>> >
>>  > as the>>
>>  > > S3FileIO will likely be a more optimized implementation for what >
>>  > Iceberg>>
>>  > > needs.>>
>>  > >>
>>  > > If you want to contribute or have time to help contribute to >
>>  > S3FileIO, that>>
>>  > > is the path I would recommend. As for configuration, I would say a >
>>  > lot of>>
>>  > > it comes down to how to configure the AWS S3 Client that you provide
>> >
>>  > to the>>
>>  > > S3FileIO implementation, but a lot of the defaults are reasonable
>> (you>>
>>  > > might want to tweak a few like max connections and maybe the retry >
>>  > policy).>>
>>  > >>
>>  > > The recently committed work to dynamically load your FileIO should >
>>  > make it>>
>>  > > relatively easy to test out and we'd love to have extra eyes and >
>>  > feedback>>
>>  > > on it.>>
>>  > >>
>>  > > Let me know if that helps,>>
>>  > > -Dan>>
>>  > >>
>>  > >>
>>  > >>
>>  > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>>
>>  > > wrote:>>
>>  > >>
>>  > > > Hello all,>>
>>  > > >>>
>>  > > > Thank you all for creating/continuing this great project! I am
>> just>>
>>  > > > starting to get comfortable with the fundamentals and I'm thinking
>> >
>>  > that my>>
>>  > > > team has been using Iceberg the wrong way at the FileIO level.>>
>>  > > >>>
>>  > > > I was wondering if people would be willing to share how they set
>> up >
>>  > their>>
>>  > > > FileIO/FileSystem with S3 and any customizations they had to add.>>
>>  > > >>>
>>  > > > (Preferably from smaller teams. My team is small and cannot>>
>>  > > > realistically customize everything. If there's an up to date
>> thread>>
>>  > > > discussing this that I missed, please link me that instead.)>>
>>  > > >>>
>>  > > > ***** My team's specific problems/setup which you can ignore ***>>
>>  > > >>>
>>  > > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars
>> are>>
>>  > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB
>> >
>>  > for>>
>>  > > > atomic renames by implementing Iceberg's provided interfaces. We >
>>  > read/write>>
>>  > > > from either Spark in EMR or on-prem JVM's in docker containers >
>>  > (managed by>>
>>  > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by core
>> >
>>  > nodes)>>
>>  > > > for the s3a buffered writes while the on-prem containers use the >
>>  > docker>>
>>  > > > container's default file system which uses an overlay2 storage >
>>  > driver (that>>
>>  > > > I know nothing about).>>
>>  > > >>>
>>  > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and
>> list>>
>>  > > > requests which is well known in the community (but not to my team>>
>>  > > > unfortunately). There's also GET PUT GET inconsistency issues with
>> >
>>  > S3 that>>
>>  > > > have been talked about, but I don't yet understand how they arise >
>>  > in the>>
>>  > > > 2.8.5 S3AFilesystem
>> (https://github.com/apache/iceberg/issues/1398).>>
>>  > > >>>
>>  > > > *** End of specific ***>>
>>  > > >>>
>>  > > >>>
>>  > > > The options I'm seeing are:>>
>>  > > >>>
>>  > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>>
>>  > > >>>
>>  > > > This still seems very new unless it is actually based on
>> Netflix's>>
>>  > > > prod implementation that they're releasing to the community? (I'm >
>>  > wondering>>
>>  > > > if it's safe to start moving onto it in prod in the near term. If >
>>  > Netflix>>
>>  > > > is using it (or rolling it out) that would be more than enough for
>> >
>>  > my team.)>>
>>  > > >>>
>>  > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>>
>>  > > > recommendations on a version and are you also using S3Guard?>>
>>  > > >>>
>>  > > > From a quick look, most gains compared to older versions seem to
>> be>>
>>  > > > from S3Guard. Are there substantial gains without it? (My team >
>>  > doesn't have>>
>>  > > > experience with S3Guard and Iceberg seems to not need it outside
>> of >
>>  > atomic>>
>>  > > > renames?)>>
>>  > > >>>
>>  > > > 3. Using an alternative hadoop file system. Any recommendations?>>
>>  > > >>>
>>  > > > In the recent Iceberg S3 FileIO, the License states it was based
>> off>>
>>  > > > the Presto FileSystem. Has anyone used this file system as is with
>> >
>>  > Iceberg?>>
>>  > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>>
>>  > > >>>
>>  > > > 4. Roll our own hadoop file system. Anyone have stories/blogs
>> about>>
>>  > > > pitfalls or difficulties?>>
>>  > > >>>
>>  > > > rdblue hints that Netflix already done this:>>
>>  > > > >
>>  > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392
>> .>>
>>  > > > (My team probably doesn't have the capacity for this)>>
>>  > > >>>
>>  > > >>>
>>  > > > Places where I tried looking for this info:>>
>>  > > >>>
>>  > > > - https://github.com/apache/iceberg/issues/761 (issue for
>> getting>>
>>  > > > started guide)>>
>>  > > > - https://iceberg.apache.org/spec/#file-system-operations>>
>>  > > >>>
>>  > > > Thanks everyone,>>
>>  > > >>>
>>  > > > John Clara>>
>>  > > >>>
>>  > >>
>>  >
>>
>

Re: Suggested S3 FileIO/Getting Started

Posted by Daniel Weeks <dw...@netflix.com.INVALID>.

Hey John, about the concerns around cloud provider dependency, I feel like
the FileIO interface is actually the right level of abstraction already.

That interface basically requires "open for read" and "open for write",
where the implementation will diverge across different platforms.

I guess you could think of it as S3FileIO is to FileIO what S3AFileSystem
is to FileSystem (in Hadoop).  You can have many different implementations
that coexist.

In fact, recent changes to the Catalog allow for very flexible management
of FIleIO and you could even have files within a table split across
multiple cloud vendors.

As to the consistency questions, the list operation can be inconsistent
(e.g. if a new file is created and the implementation relies on list then
read, it may not see newly created objects.  Iceberg does not list, so that
should not be an issue).

The stated read-after-write consistency is limited and does not include:
 - Read after overwrite
 - Read after delete
 - Read after negative cache (e.g. a GET or HEAD that occurred before the
object was created).

Some of those inconsistencies have caused problems in certain cases when it
comes to committing data (the negative cache being the main culprit).

-Dan


On Wed, Nov 11, 2020 at 6:49 PM John Clara <jo...@gmail.com>
wrote:

> Update: I think I'm wrong about the listing part. I think it will only
> do the HEAD request. Also it seems like the consistency issue is
> probably not something my team would encounter with our current jobs.
>
> On 2020/11/12 02:17:10, John Clara <j....@gmail.com> wrote:
>  > (Not sure if this is actually replying or just starting a new thread)>
>  >
>  > Hi Daniel,>
>  >
>  > Thanks for the response! It's very helpful and answers a lot my
> questions.>
>  >
>  > A couple follow ups:>
>  >
>  > One of my concerns with S3FileIO is getting tied too much to a single >
>  > cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful >
>  > so that S3FileIO and (a future) GCSFileIO could share logic? I haven't >
>  > looked deep enough into the S3FileIO to know how much logic is not s3 >
>  > specific. Maybe the FileIO interface is enough.>
>  >
>  > About consistency (no need to respond here):>
>  > I'm seeing that during "getFileStatus" my version of s3a does some
> list >
>  > requests (but I'm not sure if that could fail from consistency issues).>
>  > I'm also confused about the read-after-(initial) write part:>
>  > "Amazon S3 provides read-after-write consistency for PUTS of new
> objects >
>  > in your S3 bucket in all Regions with one caveat. The caveat is that
> if >
>  > you make a HEAD or GET request to a key name before the object is >
>  > created, then create the object shortly after that, a subsequent GET >
>  > might not return the object due to eventual consistency. - >
>  > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html">
>  >
>  > When my version of s3a does a create, it first does a
> getMetadataRequest >
>  > (HEAD) to check if the object exists before creating the object. I
> think >
>  > this is talked about in this issue: >
>  > https://github.com/apache/iceberg/issues/1398 and talked about in the >
>  > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll follow
> up >
>  > in that issue for more info.>
>  >
>  > John>
>  >
>  >
>  > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID>
> wrote:>
>  > > Hey John, I might be able to help answer some of your questions and >
>  > provide>>
>  > > some context around how you might want to go forward.>>
>  > >>
>  > > So, one fundamental aspect of Iceberg is that it only relies on a
> few>>
>  > > operations (as defined by the FileIO interface). This makes much of
> the>>
>  > > functionality and complexity of full file system implementations>>
>  > > unnecessary. You should not need features like S3Guard or
> additional S3>>
>  > > operations these implementations rely on in order to achieve file >
>  > system>>
>  > > contract behavior. Consistency issues should also not be a problem >
>  > since>>
>  > > Iceberg does not overwrite or list and read-after-(initial)write is
> a>>
>  > > guarantee provided by S3.>>
>  > >>
>  > > At Netflix, we use a custom FileSystem implementation (somewhat like >
>  > S3A),>>
>  > > but with much of the contract behavior that drives additional >
>  > operations>>
>  > > against S3 disabled. However, we are transitioning to a more native>>
>  > > implementation of S3FileIO, which you'll see as part of the ongoing >
>  > work in>>
>  > > Iceberg.>>
>  > >>
>  > > Per your specific questions:>>
>  > >>
>  > > 1) The S3FileIO implementation is very new, though internally we
> have>>
>  > > something very similar. There are features missing that we are >
>  > working to>>
>  > > add (e.g. progressive multipart upload for large files is likely the >
>  > most>>
>  > > important).>>
>  > > 2) You can use S3AFileSystem with the HadoopFileIO implementation, >
>  > though>>
>  > > you may still see similar behavior with additional calls being made
> (I>>
>  > > don't know if these can be disabled).>>
>  > > 3) The PrestoS3FileSystem is tailored to Presto's use and is likely >
>  > not as>>
>  > > complete as S3A, but seeing as it is using the Hadoop FileSystem
> api, >
>  > it>>
>  > > would likely work for what HadoopFileIO exercises (as would the>>
>  > > EMRFileSystem).>>
>  > > 4) I would probably discourage you from writing your own file system >
>  > as the>>
>  > > S3FileIO will likely be a more optimized implementation for what >
>  > Iceberg>>
>  > > needs.>>
>  > >>
>  > > If you want to contribute or have time to help contribute to >
>  > S3FileIO, that>>
>  > > is the path I would recommend. As for configuration, I would say a >
>  > lot of>>
>  > > it comes down to how to configure the AWS S3 Client that you provide >
>  > to the>>
>  > > S3FileIO implementation, but a lot of the defaults are reasonable
> (you>>
>  > > might want to tweak a few like max connections and maybe the retry >
>  > policy).>>
>  > >>
>  > > The recently committed work to dynamically load your FileIO should >
>  > make it>>
>  > > relatively easy to test out and we'd love to have extra eyes and >
>  > feedback>>
>  > > on it.>>
>  > >>
>  > > Let me know if that helps,>>
>  > > -Dan>>
>  > >>
>  > >>
>  > >>
>  > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>>
>  > > wrote:>>
>  > >>
>  > > > Hello all,>>
>  > > >>>
>  > > > Thank you all for creating/continuing this great project! I am
> just>>
>  > > > starting to get comfortable with the fundamentals and I'm thinking >
>  > that my>>
>  > > > team has been using Iceberg the wrong way at the FileIO level.>>
>  > > >>>
>  > > > I was wondering if people would be willing to share how they set
> up >
>  > their>>
>  > > > FileIO/FileSystem with S3 and any customizations they had to add.>>
>  > > >>>
>  > > > (Preferably from smaller teams. My team is small and cannot>>
>  > > > realistically customize everything. If there's an up to date
> thread>>
>  > > > discussing this that I missed, please link me that instead.)>>
>  > > >>>
>  > > > ***** My team's specific problems/setup which you can ignore ***>>
>  > > >>>
>  > > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars
> are>>
>  > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB >
>  > for>>
>  > > > atomic renames by implementing Iceberg's provided interfaces. We >
>  > read/write>>
>  > > > from either Spark in EMR or on-prem JVM's in docker containers >
>  > (managed by>>
>  > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by core >
>  > nodes)>>
>  > > > for the s3a buffered writes while the on-prem containers use the >
>  > docker>>
>  > > > container's default file system which uses an overlay2 storage >
>  > driver (that>>
>  > > > I know nothing about).>>
>  > > >>>
>  > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and
> list>>
>  > > > requests which is well known in the community (but not to my team>>
>  > > > unfortunately). There's also GET PUT GET inconsistency issues with >
>  > S3 that>>
>  > > > have been talked about, but I don't yet understand how they arise >
>  > in the>>
>  > > > 2.8.5 S3AFilesystem
> (https://github.com/apache/iceberg/issues/1398).>>
>  > > >>>
>  > > > *** End of specific ***>>
>  > > >>>
>  > > >>>
>  > > > The options I'm seeing are:>>
>  > > >>>
>  > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>>
>  > > >>>
>  > > > This still seems very new unless it is actually based on Netflix's>>
>  > > > prod implementation that they're releasing to the community? (I'm >
>  > wondering>>
>  > > > if it's safe to start moving onto it in prod in the near term. If >
>  > Netflix>>
>  > > > is using it (or rolling it out) that would be more than enough for >
>  > my team.)>>
>  > > >>>
>  > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>>
>  > > > recommendations on a version and are you also using S3Guard?>>
>  > > >>>
>  > > > From a quick look, most gains compared to older versions seem to
> be>>
>  > > > from S3Guard. Are there substantial gains without it? (My team >
>  > doesn't have>>
>  > > > experience with S3Guard and Iceberg seems to not need it outside
> of >
>  > atomic>>
>  > > > renames?)>>
>  > > >>>
>  > > > 3. Using an alternative hadoop file system. Any recommendations?>>
>  > > >>>
>  > > > In the recent Iceberg S3 FileIO, the License states it was based
> off>>
>  > > > the Presto FileSystem. Has anyone used this file system as is with >
>  > Iceberg?>>
>  > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>>
>  > > >>>
>  > > > 4. Roll our own hadoop file system. Anyone have stories/blogs
> about>>
>  > > > pitfalls or difficulties?>>
>  > > >>>
>  > > > rdblue hints that Netflix already done this:>>
>  > > > >
>  > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392
> .>>
>  > > > (My team probably doesn't have the capacity for this)>>
>  > > >>>
>  > > >>>
>  > > > Places where I tried looking for this info:>>
>  > > >>>
>  > > > - https://github.com/apache/iceberg/issues/761 (issue for getting>>
>  > > > started guide)>>
>  > > > - https://iceberg.apache.org/spec/#file-system-operations>>
>  > > >>>
>  > > > Thanks everyone,>>
>  > > >>>
>  > > > John Clara>>
>  > > >>>
>  > >>
>  >
>

Re: Suggested S3 FileIO/Getting Started

Posted by John Clara <jo...@gmail.com>.

Update: I think I'm wrong about the listing part. I think it will only 
do the HEAD request. Also it seems like the consistency issue is 
probably not something my team would encounter with our current jobs.

On 2020/11/12 02:17:10, John Clara <j....@gmail.com> wrote:
 > (Not sure if this is actually replying or just starting a new thread)>
 >
 > Hi Daniel,>
 >
 > Thanks for the response! It's very helpful and answers a lot my 
questions.>
 >
 > A couple follow ups:>
 >
 > One of my concerns with S3FileIO is getting tied too much to a single >
 > cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful >
 > so that S3FileIO and (a future) GCSFileIO could share logic? I haven't >
 > looked deep enough into the S3FileIO to know how much logic is not s3 >
 > specific. Maybe the FileIO interface is enough.>
 >
 > About consistency (no need to respond here):>
 > I'm seeing that during "getFileStatus" my version of s3a does some 
list >
 > requests (but I'm not sure if that could fail from consistency issues).>
 > I'm also confused about the read-after-(initial) write part:>
 > "Amazon S3 provides read-after-write consistency for PUTS of new 
objects >
 > in your S3 bucket in all Regions with one caveat. The caveat is that 
if >
 > you make a HEAD or GET request to a key name before the object is >
 > created, then create the object shortly after that, a subsequent GET >
 > might not return the object due to eventual consistency. - >
 > https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html">
 >
 > When my version of s3a does a create, it first does a 
getMetadataRequest >
 > (HEAD) to check if the object exists before creating the object. I 
think >
 > this is talked about in this issue: >
 > https://github.com/apache/iceberg/issues/1398 and talked about in the >
 > S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll follow 
up >
 > in that issue for more info.>
 >
 > John>
 >
 >
 > On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID> wrote:>
 > > Hey John, I might be able to help answer some of your questions and >
 > provide>>
 > > some context around how you might want to go forward.>>
 > >>
 > > So, one fundamental aspect of Iceberg is that it only relies on a 
few>>
 > > operations (as defined by the FileIO interface). This makes much of 
the>>
 > > functionality and complexity of full file system implementations>>
 > > unnecessary. You should not need features like S3Guard or 
additional S3>>
 > > operations these implementations rely on in order to achieve file >
 > system>>
 > > contract behavior. Consistency issues should also not be a problem >
 > since>>
 > > Iceberg does not overwrite or list and read-after-(initial)write is 
a>>
 > > guarantee provided by S3.>>
 > >>
 > > At Netflix, we use a custom FileSystem implementation (somewhat like >
 > S3A),>>
 > > but with much of the contract behavior that drives additional >
 > operations>>
 > > against S3 disabled. However, we are transitioning to a more native>>
 > > implementation of S3FileIO, which you'll see as part of the ongoing >
 > work in>>
 > > Iceberg.>>
 > >>
 > > Per your specific questions:>>
 > >>
 > > 1) The S3FileIO implementation is very new, though internally we 
have>>
 > > something very similar. There are features missing that we are >
 > working to>>
 > > add (e.g. progressive multipart upload for large files is likely the >
 > most>>
 > > important).>>
 > > 2) You can use S3AFileSystem with the HadoopFileIO implementation, >
 > though>>
 > > you may still see similar behavior with additional calls being made 
(I>>
 > > don't know if these can be disabled).>>
 > > 3) The PrestoS3FileSystem is tailored to Presto's use and is likely >
 > not as>>
 > > complete as S3A, but seeing as it is using the Hadoop FileSystem 
api, >
 > it>>
 > > would likely work for what HadoopFileIO exercises (as would the>>
 > > EMRFileSystem).>>
 > > 4) I would probably discourage you from writing your own file system >
 > as the>>
 > > S3FileIO will likely be a more optimized implementation for what >
 > Iceberg>>
 > > needs.>>
 > >>
 > > If you want to contribute or have time to help contribute to >
 > S3FileIO, that>>
 > > is the path I would recommend. As for configuration, I would say a >
 > lot of>>
 > > it comes down to how to configure the AWS S3 Client that you provide >
 > to the>>
 > > S3FileIO implementation, but a lot of the defaults are reasonable 
(you>>
 > > might want to tweak a few like max connections and maybe the retry >
 > policy).>>
 > >>
 > > The recently committed work to dynamically load your FileIO should >
 > make it>>
 > > relatively easy to test out and we'd love to have extra eyes and >
 > feedback>>
 > > on it.>>
 > >>
 > > Let me know if that helps,>>
 > > -Dan>>
 > >>
 > >>
 > >>
 > > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>>
 > > wrote:>>
 > >>
 > > > Hello all,>>
 > > >>>
 > > > Thank you all for creating/continuing this great project! I am 
just>>
 > > > starting to get comfortable with the fundamentals and I'm thinking >
 > that my>>
 > > > team has been using Iceberg the wrong way at the FileIO level.>>
 > > >>>
 > > > I was wondering if people would be willing to share how they set 
up >
 > their>>
 > > > FileIO/FileSystem with S3 and any customizations they had to add.>>
 > > >>>
 > > > (Preferably from smaller teams. My team is small and cannot>>
 > > > realistically customize everything. If there's an up to date 
thread>>
 > > > discussing this that I missed, please link me that instead.)>>
 > > >>>
 > > > ***** My team's specific problems/setup which you can ignore ***>>
 > > >>>
 > > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars 
are>>
 > > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB >
 > for>>
 > > > atomic renames by implementing Iceberg's provided interfaces. We >
 > read/write>>
 > > > from either Spark in EMR or on-prem JVM's in docker containers >
 > (managed by>>
 > > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by core >
 > nodes)>>
 > > > for the s3a buffered writes while the on-prem containers use the >
 > docker>>
 > > > container's default file system which uses an overlay2 storage >
 > driver (that>>
 > > > I know nothing about).>>
 > > >>>
 > > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and 
list>>
 > > > requests which is well known in the community (but not to my team>>
 > > > unfortunately). There's also GET PUT GET inconsistency issues with >
 > S3 that>>
 > > > have been talked about, but I don't yet understand how they arise >
 > in the>>
 > > > 2.8.5 S3AFilesystem 
(https://github.com/apache/iceberg/issues/1398).>>
 > > >>>
 > > > *** End of specific ***>>
 > > >>>
 > > >>>
 > > > The options I'm seeing are:>>
 > > >>>
 > > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>>
 > > >>>
 > > > This still seems very new unless it is actually based on Netflix's>>
 > > > prod implementation that they're releasing to the community? (I'm >
 > wondering>>
 > > > if it's safe to start moving onto it in prod in the near term. If >
 > Netflix>>
 > > > is using it (or rolling it out) that would be more than enough for >
 > my team.)>>
 > > >>>
 > > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>>
 > > > recommendations on a version and are you also using S3Guard?>>
 > > >>>
 > > > From a quick look, most gains compared to older versions seem to 
be>>
 > > > from S3Guard. Are there substantial gains without it? (My team >
 > doesn't have>>
 > > > experience with S3Guard and Iceberg seems to not need it outside 
of >
 > atomic>>
 > > > renames?)>>
 > > >>>
 > > > 3. Using an alternative hadoop file system. Any recommendations?>>
 > > >>>
 > > > In the recent Iceberg S3 FileIO, the License states it was based 
off>>
 > > > the Presto FileSystem. Has anyone used this file system as is with >
 > Iceberg?>>
 > > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>>
 > > >>>
 > > > 4. Roll our own hadoop file system. Anyone have stories/blogs 
about>>
 > > > pitfalls or difficulties?>>
 > > >>>
 > > > rdblue hints that Netflix already done this:>>
 > > > >
 > https://github.com/apache/iceberg/issues/1398#issuecomment-682837392 .>>
 > > > (My team probably doesn't have the capacity for this)>>
 > > >>>
 > > >>>
 > > > Places where I tried looking for this info:>>
 > > >>>
 > > > - https://github.com/apache/iceberg/issues/761 (issue for getting>>
 > > > started guide)>>
 > > > - https://iceberg.apache.org/spec/#file-system-operations>>
 > > >>>
 > > > Thanks everyone,>>
 > > >>>
 > > > John Clara>>
 > > >>>
 > >>
 >

Re: Suggested S3 FileIO/Getting Started

Posted by John Clara <jo...@gmail.com>.

(Not sure if this is actually replying or just starting a new thread)

Hi Daniel,

Thanks for the response! It's very helpful and answers a lot my questions.

A couple follow ups:

One of my concerns with S3FileIO is getting tied too much to a single 
cloud provider. I'm wondering if an ObjectStoreFileIO would be helpful 
so that S3FileIO and (a future) GCSFileIO could share logic? I haven't 
looked deep enough into the S3FileIO to know how much logic is not s3 
specific. Maybe the FileIO interface is enough.

About consistency (no need to respond here):
I'm seeing that during "getFileStatus" my version of s3a does some list 
requests (but I'm not sure if that could fail from consistency issues).
I'm also confused about the read-after-(initial) write part:
"Amazon S3 provides read-after-write consistency for PUTS of new objects 
in your S3 bucket in all Regions with one caveat. The caveat is that if 
you make a HEAD or GET request to a key name before the object is 
created, then create the object shortly after that, a subsequent GET 
might not return the object due to eventual consistency. - 
https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html"

When my version of s3a does a create, it first does a getMetadataRequest 
(HEAD) to check if the object exists before creating the object. I think 
this is talked about in this issue: 
https://github.com/apache/iceberg/issues/1398 and talked about in the 
S3FileIO PR: https://github.com/apache/iceberg/pull/1573. I'll follow up 
in that issue for more info.

John


On 2020/11/12 00:36:10, Daniel Weeks <d....@netflix.com.INVALID> wrote:
 > Hey John, I might be able to help answer some of your questions and 
provide>
 > some context around how you might want to go forward.>
 >
 > So, one fundamental aspect of Iceberg is that it only relies on a few>
 > operations (as defined by the FileIO interface). This makes much of the>
 > functionality and complexity of full file system implementations>
 > unnecessary. You should not need features like S3Guard or additional S3>
 > operations these implementations rely on in order to achieve file 
system>
 > contract behavior. Consistency issues should also not be a problem 
since>
 > Iceberg does not overwrite or list and read-after-(initial)write is a>
 > guarantee provided by S3.>
 >
 > At Netflix, we use a custom FileSystem implementation (somewhat like 
S3A),>
 > but with much of the contract behavior that drives additional 
operations>
 > against S3 disabled. However, we are transitioning to a more native>
 > implementation of S3FileIO, which you'll see as part of the ongoing 
work in>
 > Iceberg.>
 >
 > Per your specific questions:>
 >
 > 1) The S3FileIO implementation is very new, though internally we have>
 > something very similar. There are features missing that we are 
working to>
 > add (e.g. progressive multipart upload for large files is likely the 
most>
 > important).>
 > 2) You can use S3AFileSystem with the HadoopFileIO implementation, 
though>
 > you may still see similar behavior with additional calls being made (I>
 > don't know if these can be disabled).>
 > 3) The PrestoS3FileSystem is tailored to Presto's use and is likely 
not as>
 > complete as S3A, but seeing as it is using the Hadoop FileSystem api, 
it>
 > would likely work for what HadoopFileIO exercises (as would the>
 > EMRFileSystem).>
 > 4) I would probably discourage you from writing your own file system 
as the>
 > S3FileIO will likely be a more optimized implementation for what 
Iceberg>
 > needs.>
 >
 > If you want to contribute or have time to help contribute to 
S3FileIO, that>
 > is the path I would recommend. As for configuration, I would say a 
lot of>
 > it comes down to how to configure the AWS S3 Client that you provide 
to the>
 > S3FileIO implementation, but a lot of the defaults are reasonable (you>
 > might want to tweak a few like max connections and maybe the retry 
policy).>
 >
 > The recently committed work to dynamically load your FileIO should 
make it>
 > relatively easy to test out and we'd love to have extra eyes and 
feedback>
 > on it.>
 >
 > Let me know if that helps,>
 > -Dan>
 >
 >
 >
 > On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>>
 > wrote:>
 >
 > > Hello all,>
 > >>
 > > Thank you all for creating/continuing this great project! I am just>
 > > starting to get comfortable with the fundamentals and I'm thinking 
that my>
 > > team has been using Iceberg the wrong way at the FileIO level.>
 > >>
 > > I was wondering if people would be willing to share how they set up 
their>
 > > FileIO/FileSystem with S3 and any customizations they had to add.>
 > >>
 > > (Preferably from smaller teams. My team is small and cannot>
 > > realistically customize everything. If there's an up to date thread>
 > > discussing this that I missed, please link me that instead.)>
 > >>
 > > ***** My team's specific problems/setup which you can ignore ***>
 > >>
 > > My team has been using Hadoop FileIO with the S3AFileSystem. Jars are>
 > > provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB 
for>
 > > atomic renames by implementing Iceberg's provided interfaces. We 
read/write>
 > > from either Spark in EMR or on-prem JVM's in docker containers 
(managed by>
 > > k8s). Both use s3a, but the EMR clusters have HDFS (backed by core 
nodes)>
 > > for the s3a buffered writes while the on-prem containers use the 
docker>
 > > container's default file system which uses an overlay2 storage 
driver (that>
 > > I know nothing about).>
 > >>
 > > Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and list>
 > > requests which is well known in the community (but not to my team>
 > > unfortunately). There's also GET PUT GET inconsistency issues with 
S3 that>
 > > have been talked about, but I don't yet understand how they arise 
in the>
 > > 2.8.5 S3AFilesystem (https://github.com/apache/iceberg/issues/1398).>
 > >>
 > > *** End of specific ***>
 > >>
 > >>
 > > The options I'm seeing are:>
 > >>
 > > 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?>
 > >>
 > > This still seems very new unless it is actually based on Netflix's>
 > > prod implementation that they're releasing to the community? (I'm 
wondering>
 > > if it's safe to start moving onto it in prod in the near term. If 
Netflix>
 > > is using it (or rolling it out) that would be more than enough for 
my team.)>
 > >>
 > > 2. Using a newer hadoop version and use the S3AFileSystem. Any>
 > > recommendations on a version and are you also using S3Guard?>
 > >>
 > > From a quick look, most gains compared to older versions seem to be>
 > > from S3Guard. Are there substantial gains without it? (My team 
doesn't have>
 > > experience with S3Guard and Iceberg seems to not need it outside of 
atomic>
 > > renames?)>
 > >>
 > > 3. Using an alternative hadoop file system. Any recommendations?>
 > >>
 > > In the recent Iceberg S3 FileIO, the License states it was based off>
 > > the Presto FileSystem. Has anyone used this file system as is with 
Iceberg?>
 > > (https://github.com/apache/iceberg/blob/master/LICENSE#L251)>
 > >>
 > > 4. Roll our own hadoop file system. Anyone have stories/blogs about>
 > > pitfalls or difficulties?>
 > >>
 > > rdblue hints that Netflix already done this:>
 > > 
https://github.com/apache/iceberg/issues/1398#issuecomment-682837392 .>
 > > (My team probably doesn't have the capacity for this)>
 > >>
 > >>
 > > Places where I tried looking for this info:>
 > >>
 > > - https://github.com/apache/iceberg/issues/761 (issue for getting>
 > > started guide)>
 > > - https://iceberg.apache.org/spec/#file-system-operations>
 > >>
 > > Thanks everyone,>
 > >>
 > > John Clara>
 > >>
 >

Re: Suggested S3 FileIO/Getting Started

Posted by Daniel Weeks <dw...@netflix.com.INVALID>.

Hey John, I might be able to help answer some of your questions and provide
some context around how you might want to go forward.

So, one fundamental aspect of Iceberg is that it only relies on a few
operations (as defined by the FileIO interface).  This makes much of the
functionality and complexity of full file system implementations
unnecessary.  You should not need features like S3Guard or additional S3
operations these implementations rely on in order to achieve file system
contract behavior.  Consistency issues should also not be a problem since
Iceberg does not overwrite or list and read-after-(initial)write is a
guarantee provided by S3.

At Netflix, we use a custom FileSystem implementation (somewhat like S3A),
but with much of the contract behavior that drives additional operations
against S3 disabled.  However, we are transitioning to a more native
implementation of S3FileIO, which you'll see as part of the ongoing work in
Iceberg.

Per your specific questions:

1) The S3FileIO implementation is very new, though internally we have
something very similar.  There are features missing that we are working to
add (e.g. progressive multipart upload for large files is likely the most
important).
2) You can use S3AFileSystem with the HadoopFileIO implementation, though
you may still see similar behavior with additional calls being made (I
don't know if these can be disabled).
3) The PrestoS3FileSystem is tailored to Presto's use and is likely not as
complete as S3A, but seeing as it is using the Hadoop FileSystem api, it
would likely work for what HadoopFileIO exercises (as would the
EMRFileSystem).
4) I would probably discourage you from writing your own file system as the
S3FileIO will likely be a more optimized implementation for what Iceberg
needs.

If you want to contribute or have time to help contribute to S3FileIO, that
is the path I would recommend.  As for configuration, I would say a lot of
it comes down to how to configure the AWS S3 Client that you provide to the
S3FileIO implementation, but a lot of the defaults are reasonable (you
might want to tweak a few like max connections and maybe the retry policy).

The recently committed work to dynamically load your FileIO should make it
relatively easy to test out and we'd love to have extra eyes and feedback
on it.

Let me know if that helps,
-Dan

On Wed, Nov 11, 2020 at 1:45 PM John Clara <jo...@gmail.com>
wrote:

> Hello all,
>
> Thank you all for creating/continuing this great project! I am just
> starting to get comfortable with the fundamentals and I'm thinking that my
> team has been using Iceberg the wrong way at the FileIO level.
>
> I was wondering if people would be willing to share how they set up their
> FileIO/FileSystem with S3 and any customizations they had to add.
>
>     (Preferably from smaller teams. My team is small and cannot
> realistically customize everything. If there's an up to date thread
> discussing this that I missed, please link me that instead.)
>
> ***** My team's specific problems/setup which you can ignore ***
>
> My team has been using Hadoop FileIO with the S3AFileSystem. Jars are
> provided by AWS EMR 5.23 which is on Hadoop 2.8.5. We use DynamoDB for
> atomic renames by implementing Iceberg's provided interfaces. We read/write
> from either Spark in EMR or on-prem JVM's in docker containers (managed by
> k8s). Both use s3a, but the EMR clusters have HDFS (backed by core nodes)
> for the s3a buffered writes while the on-prem containers use the docker
> container's default file system which uses an overlay2 storage driver (that
> I know nothing about).
>
> Hadoop 2.8.5's S3AFileSystem does a bunch of unnecessary get and list
> requests which is well known in the community (but not to my team
> unfortunately). There's also GET PUT GET inconsistency issues with S3 that
> have been talked about, but I don't yet understand how they arise in the
> 2.8.5 S3AFilesystem (https://github.com/apache/iceberg/issues/1398).
>
> *** End of specific ***
>
>
> The options I'm seeing are:
>
> 1. Using Iceberg's new S3 FileIO. Is anyone using this in prod?
>
>     This still seems very new unless it is actually based on Netflix's
> prod implementation that they're releasing to the community? (I'm wondering
> if it's safe to start moving onto it in prod in the near term. If Netflix
> is using it (or rolling it out) that would be more than enough for my team.)
>
> 2. Using a newer hadoop version and use the S3AFileSystem. Any
> recommendations on a version and are you also using S3Guard?
>
>     From a quick look, most gains compared to older versions seem to be
> from S3Guard. Are there substantial gains without it? (My team doesn't have
> experience with S3Guard and Iceberg seems to not need it outside of atomic
> renames?)
>
> 3. Using an alternative hadoop file system. Any recommendations?
>
>     In the recent Iceberg S3 FileIO, the License states it was based off
> the Presto FileSystem. Has anyone used this file system as is with Iceberg?
> (https://github.com/apache/iceberg/blob/master/LICENSE#L251)
>
> 4. Roll our own hadoop file system. Anyone have stories/blogs about
> pitfalls or difficulties?
>
>     rdblue hints that Netflix already done this:
> https://github.com/apache/iceberg/issues/1398#issuecomment-682837392 .
> (My team probably doesn't have the capacity for this)
>
>
> Places where I tried looking for this info:
>
>    - https://github.com/apache/iceberg/issues/761 (issue for getting
>    started guide)
>    - https://iceberg.apache.org/spec/#file-system-operations
>
> Thanks everyone,
>
> John Clara
>