You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ozone.apache.org by Neil Joshi <ne...@gmail.com> on 2023/04/06 05:14:36 UTC

Re: FSO vs OBS: Driving the wedge deeper

Thanks for your work on this doc!

Thanks for the proposal for a solution to access buckets with different
layouts through s3g and ofs.  A few points:

In the document it states as a convention to use symbolic links from FSO
buckets to s3v for s3g access.  This is fine.  Buckets created by s3g by
convention should be OBS layout.  However, it also proposes that OBS
buckets created by s3g can be accessed through OFS with symbolic links
between the s3v and an ofs accessed volume.  This currently can’t be done.
Are we proposing to have a linked bucket with src obs layout support file
system semantics when the link is accessed through OFS?



For HCFS applications like map-reduce, spark, trino that access ozone
through both s3g and ofs it appears that FSO with file system semantics is
needed.  Should the applications create buckets through s3g with obs
layout, currently those buckets cannot be accessed through OFS.  If sym
linked obs buckets can be accessed through ofs, still those buckets
relating to directories, tables that are automatically created by apps in
s3g have to be manually linked to a volume to be accessed through OFS.
Thoughts?



Driving a wedge between obs and fso depending on the access type and
porting/migration from HCFS datastore is a great proposal.  Adopting a
default bucket layout convention for buckets created through s3g and ofs to
be OBS and FSO respectively supports this.  To provide easier ports from
Hadoop file systems to ozone and avoid naming issues due to s3 naming
conventions, proposing to port through OFS with FSO layout makes sound
sense.


Regards,

Neil

On Wed, Mar 29, 2023 at 10:04 PM Ritesh Shukla <ri...@apache.org> wrote:

> Hello,
>
> This topic has been an active discussion internally at Cloudera and has
> been a source of confusion while onboarding new customers. Please take a
> look at the attached document.
>
> This document discusses the differences between two bucket layouts in
> Ozone, OBS (OBJECT_STORE) and FSO (FILE_SYSTEM_OPTIMIZED), and proposes a
> solution to address the complexity of accessing volumes through S3 Gateway
> and Hadoop Filesystem. The proposal suggests using symbolic linking to
> expose FSO buckets via S3 Gateway or vice versa and dividing the
> functionality of OBS and FSO based on their compatibility with S3 APIs and
> Hadoop Filesystem. OBS should always be compatible with S3 APIs and have S3
> bucket names, while FSO should always be compatible with Hadoop File System
> interface. The document also explains how to access FSO via S3 APIs and OBS
> via OFS addressing and the benefits of this approach, including transparent
> data sharing and a clear separation between applications using Ozone
> primarily as an S3 store and Hadoop FS-based apps.
>
>
> https://docs.google.com/document/d/1wVlbJX22yw84WowH6I4ni_pUvaxKDr9JLHHsEVOTYSA/edit#
>
> This document can be broken down into tasks that must be done across the
> stack once reviewed.
>
> Regards,
> Ritesh
>


-- 
NJ

Re: FSO vs OBS: Driving the wedge deeper

Posted by Ritesh Shukla <ri...@cloudera.com.INVALID>.
Yes, the proposal is that we empower users to share buckets if they desire.
S3V is available via the ofs:// path today itself. The buckets of both
types can be made accessible both ways. The idea here is the default
behavior is compatible with the protocol via which they are used, and an
informed user can share across protocols understanding the caveats of the
path handling differences and behavioral differences that we as a community
should document clearly for them.

On Wed, Apr 5, 2023 at 10:14 PM Neil Joshi <ne...@gmail.com> wrote:

> Thanks for your work on this doc!
>
> Thanks for the proposal for a solution to access buckets with different
> layouts through s3g and ofs.  A few points:
>
> In the document it states as a convention to use symbolic links from FSO
> buckets to s3v for s3g access.  This is fine.  Buckets created by s3g by
> convention should be OBS layout.  However, it also proposes that OBS
> buckets created by s3g can be accessed through OFS with symbolic links
> between the s3v and an ofs accessed volume.  This currently can’t be done.
> Are we proposing to have a linked bucket with src obs layout support file
> system semantics when the link is accessed through OFS?
>
>
>
> For HCFS applications like map-reduce, spark, trino that access ozone
> through both s3g and ofs it appears that FSO with file system semantics is
> needed.  Should the applications create buckets through s3g with obs
> layout, currently those buckets cannot be accessed through OFS.  If sym
> linked obs buckets can be accessed through ofs, still those buckets
> relating to directories, tables that are automatically created by apps in
> s3g have to be manually linked to a volume to be accessed through OFS.
> Thoughts?
>
>
>
> Driving a wedge between obs and fso depending on the access type and
> porting/migration from HCFS datastore is a great proposal.  Adopting a
> default bucket layout convention for buckets created through s3g and ofs to
> be OBS and FSO respectively supports this.  To provide easier ports from
> Hadoop file systems to ozone and avoid naming issues due to s3 naming
> conventions, proposing to port through OFS with FSO layout makes sound
> sense.
>
>
> Regards,
>
> Neil
>
> On Wed, Mar 29, 2023 at 10:04 PM Ritesh Shukla <ri...@apache.org> wrote:
>
> > Hello,
> >
> > This topic has been an active discussion internally at Cloudera and has
> > been a source of confusion while onboarding new customers. Please take a
> > look at the attached document.
> >
> > This document discusses the differences between two bucket layouts in
> > Ozone, OBS (OBJECT_STORE) and FSO (FILE_SYSTEM_OPTIMIZED), and proposes a
> > solution to address the complexity of accessing volumes through S3
> Gateway
> > and Hadoop Filesystem. The proposal suggests using symbolic linking to
> > expose FSO buckets via S3 Gateway or vice versa and dividing the
> > functionality of OBS and FSO based on their compatibility with S3 APIs
> and
> > Hadoop Filesystem. OBS should always be compatible with S3 APIs and have
> S3
> > bucket names, while FSO should always be compatible with Hadoop File
> System
> > interface. The document also explains how to access FSO via S3 APIs and
> OBS
> > via OFS addressing and the benefits of this approach, including
> transparent
> > data sharing and a clear separation between applications using Ozone
> > primarily as an S3 store and Hadoop FS-based apps.
> >
> >
> >
> https://docs.google.com/document/d/1wVlbJX22yw84WowH6I4ni_pUvaxKDr9JLHHsEVOTYSA/edit#
> >
> > This document can be broken down into tasks that must be done across the
> > stack once reviewed.
> >
> > Regards,
> > Ritesh
> >
>
>
> --
> NJ
>