You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by Chen Song <ch...@gmail.com> on 2020/06/24 13:12:01 UTC

S3 example in Java

Hi

Are there any Java examples to create/write/read tables backed by S3? I
tried to search in the documentation and github but did not find anything.

Thanks
Chen

Re: S3 example in Java

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Correct. You should just need to set the warehouse path so that the new
table is created in S3. You can also create a table with a specific path.

On Wed, Jun 24, 2020 at 11:39 AM Chen Song <ch...@gmail.com> wrote:

> Thanks Ryan/Edgar.
>
> I understand that I need to create a table via HiveCatalog. I already
> successfully set up a hive megastore for this and successfully created an
> Iceberg with that. I meant to create an Iceberg backed by data files
> created on S3. Sorry if my question was unclear. It seems like that these
> are already supported out of the box?
> Is it just a matter to change hive warehouse to a path like `s3://...`?
>
> Chen
>
> On Wed, Jun 24, 2020 at 2:00 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Chen,
>>
>> We have a getting started page that covers creating a table using the
>> HiveCatalog: https://iceberg.apache.org/getting-started/#creating-a-table
>>
>> I agree with Edgar that you probably want to start with a HiveCatalog
>> table. By default, tables are created under your warehouse path. If that's
>> an S3 location, then it will use the S3 file system that is available. You
>> can set the normal Hive warehouse config property to change the path. You
>> can also specify the path directly by setting "location".
>>
>> One more thing to note about using the Hive catalog vs a table location:
>> you _can_ use a table location without a metastore with S3. The trade-off
>> is that concurrent writes will not be atomic. If you want to just try it
>> out and don't need concurrent writes, then just using an S3 path is a
>> simple way to get started.
>>
>> rb
>>
>> On Wed, Jun 24, 2020 at 10:36 AM Edgar Rodriguez
>> <ed...@airbnb.com.invalid> wrote:
>>
>>> Hi Chen,
>>>
>>> Since S3 does not have atomic rename operation, for create/write/read of
>>> tables in S3 currently the only way to do it is via the HiveCatalog
>>> implementation which requires a Hive metastore with Lock support to provide
>>> the atomic commit required by Iceberg. You can alternatively write your
>>> custom Catalog implementation in which you set up your custom atomic commit
>>> mechanism as shown in http://iceberg.apache.org/custom-catalog/.
>>>
>>> Cheers,
>>>
>>> On Wed, Jun 24, 2020 at 6:12 AM Chen Song <ch...@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Are there any Java examples to create/write/read tables backed by S3? I
>>>> tried to search in the documentation and github but did not find anything.
>>>>
>>>> Thanks
>>>> Chen
>>>>
>>>>
>>>
>>> --
>>> Edgar R
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>
> --
> Chen Song
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: S3 example in Java

Posted by Chen Song <ch...@gmail.com>.

Thanks Ryan/Edgar.

I understand that I need to create a table via HiveCatalog. I already
successfully set up a hive megastore for this and successfully created an
Iceberg with that. I meant to create an Iceberg backed by data files
created on S3. Sorry if my question was unclear. It seems like that these
are already supported out of the box?
Is it just a matter to change hive warehouse to a path like `s3://...`?

Chen

On Wed, Jun 24, 2020 at 2:00 PM Ryan Blue <rb...@netflix.com.invalid> wrote:

> Chen,
>
> We have a getting started page that covers creating a table using the
> HiveCatalog: https://iceberg.apache.org/getting-started/#creating-a-table
>
> I agree with Edgar that you probably want to start with a HiveCatalog
> table. By default, tables are created under your warehouse path. If that's
> an S3 location, then it will use the S3 file system that is available. You
> can set the normal Hive warehouse config property to change the path. You
> can also specify the path directly by setting "location".
>
> One more thing to note about using the Hive catalog vs a table location:
> you _can_ use a table location without a metastore with S3. The trade-off
> is that concurrent writes will not be atomic. If you want to just try it
> out and don't need concurrent writes, then just using an S3 path is a
> simple way to get started.
>
> rb
>
> On Wed, Jun 24, 2020 at 10:36 AM Edgar Rodriguez
> <ed...@airbnb.com.invalid> wrote:
>
>> Hi Chen,
>>
>> Since S3 does not have atomic rename operation, for create/write/read of
>> tables in S3 currently the only way to do it is via the HiveCatalog
>> implementation which requires a Hive metastore with Lock support to provide
>> the atomic commit required by Iceberg. You can alternatively write your
>> custom Catalog implementation in which you set up your custom atomic commit
>> mechanism as shown in http://iceberg.apache.org/custom-catalog/.
>>
>> Cheers,
>>
>> On Wed, Jun 24, 2020 at 6:12 AM Chen Song <ch...@gmail.com> wrote:
>>
>>> Hi
>>>
>>> Are there any Java examples to create/write/read tables backed by S3? I
>>> tried to search in the documentation and github but did not find anything.
>>>
>>> Thanks
>>> Chen
>>>
>>>
>>
>> --
>> Edgar R
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Chen Song

Re: S3 example in Java

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Chen,

We have a getting started page that covers creating a table using the
HiveCatalog: https://iceberg.apache.org/getting-started/#creating-a-table

I agree with Edgar that you probably want to start with a HiveCatalog
table. By default, tables are created under your warehouse path. If that's
an S3 location, then it will use the S3 file system that is available. You
can set the normal Hive warehouse config property to change the path. You
can also specify the path directly by setting "location".

One more thing to note about using the Hive catalog vs a table location:
you _can_ use a table location without a metastore with S3. The trade-off
is that concurrent writes will not be atomic. If you want to just try it
out and don't need concurrent writes, then just using an S3 path is a
simple way to get started.

rb

On Wed, Jun 24, 2020 at 10:36 AM Edgar Rodriguez
<ed...@airbnb.com.invalid> wrote:

> Hi Chen,
>
> Since S3 does not have atomic rename operation, for create/write/read of
> tables in S3 currently the only way to do it is via the HiveCatalog
> implementation which requires a Hive metastore with Lock support to provide
> the atomic commit required by Iceberg. You can alternatively write your
> custom Catalog implementation in which you set up your custom atomic commit
> mechanism as shown in http://iceberg.apache.org/custom-catalog/.
>
> Cheers,
>
> On Wed, Jun 24, 2020 at 6:12 AM Chen Song <ch...@gmail.com> wrote:
>
>> Hi
>>
>> Are there any Java examples to create/write/read tables backed by S3? I
>> tried to search in the documentation and github but did not find anything.
>>
>> Thanks
>> Chen
>>
>>
>
> --
> Edgar R
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: S3 example in Java

Posted by Edgar Rodriguez <ed...@airbnb.com.INVALID>.

Hi Chen,

Since S3 does not have atomic rename operation, for create/write/read of
tables in S3 currently the only way to do it is via the HiveCatalog
implementation which requires a Hive metastore with Lock support to provide
the atomic commit required by Iceberg. You can alternatively write your
custom Catalog implementation in which you set up your custom atomic commit
mechanism as shown in http://iceberg.apache.org/custom-catalog/.

Cheers,

On Wed, Jun 24, 2020 at 6:12 AM Chen Song <ch...@gmail.com> wrote:

> Hi
>
> Are there any Java examples to create/write/read tables backed by S3? I
> tried to search in the documentation and github but did not find anything.
>
> Thanks
> Chen
>
>

-- 
Edgar R