You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Steven Wu <st...@gmail.com> on 2022/04/01 02:41:44 UTC

Re: Flink GenericRecord Iceberg Usage

funny that I was just working on this recently. You can plugin this mapper
to the FlinkSink builder
https://gist.github.com/stevenzwu/4b824556973b47178824852083ab7a50

RowType rowType = FlinkSchemaUtil.convert(icebergSchema);
FlinkSink.builderFor(
        dataStream,
        AvroGenericRecordToRowDataMapper.forAvroSchema(avroSchema),
        FlinkCompatibilityUtil.toTypeInfo(rowType))
    .table(table)
    .tableLoader(tableLoader)
    .writeParallelism(parallelism)
    .append();


On Thu, Mar 31, 2022 at 11:32 AM Hailu, Andreas <An...@gs.com>
wrote:

> Hello,
>
>
>
> I’m looking to write a proof of concept incorporating Iceberg into our ETL
> application. We use Avro as our schema management solution, and the Iceberg
> Flink documentation examples only show writing DataStream<Row> and
> DataStream<RowData> types.
>
>
>
> How can this be applied to for DataStream<GenericRecord>? Is there a
> translation step/utility? I found a discussion on an issue from 2020 [2]
> regarding the same thing, but it’s unclear of a solution was reached.
>
>
>
> [1] https://iceberg.apache.org/docs/latest/flink/
>
> [2] https://github.com/apache/iceberg/issues/1885
>
>
>
>
>
> best,
>
> ah
>
>
>
> ------------------------------
>
> Your Personal Data: We may collect and process information about you that
> may be subject to data protection laws. For more information about how we
> use and disclose your personal data, how we protect your information, our
> legal basis to use your information, your rights and who you can contact,
> please refer to: www.gs.com/privacy-notices
>

RE: Flink GenericRecord Iceberg Usage

Posted by "Hailu, Andreas" <An...@gs.com>.
This is great work. Thanks for contributing and sharing, Steven!

ah

From: Steven Wu <st...@gmail.com>
Sent: Tuesday, January 10, 2023 6:50 PM
To: dev@iceberg.apache.org
Subject: Re: Flink GenericRecord Iceberg Usage

Hi Andreas, here is the PR for adding the support to Iceberg
https://github.com/apache/iceberg/pull/6557<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_iceberg_pull_6557&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=rKKJZAWSffN7npJF49C2O589j3GXYuwvAPo0BOBjhHrHJsZ23_arRysWg_mLj-GX&s=bNB6X7jbA8mquM9ads-C9Q8soPC3nEMq9e-gLppg4qw&e=>


On Fri, Apr 1, 2022 at 10:23 AM Hailu, Andreas <An...@gs.com>> wrote:
I found it – it was part of the versioned iceberg-flink modules, iceberg-flink-1.13.

ah

From: Hailu, Andreas [Engineering]
Sent: Friday, April 1, 2022 10:37 AM
To: dev@iceberg.apache.org<ma...@iceberg.apache.org>
Subject: RE: Flink GenericRecord Iceberg Usage

Thanks, Steven, I’ll try this out!

I’m having a spot of trouble locating FlinkSchemaUtil – which Iceberg module is that from? I thought it belonged to iceberg-flink, but seems 0.13.1 doesn’t contain it.

ah

From: Steven Wu <st...@gmail.com>>
Sent: Thursday, March 31, 2022 10:42 PM
To: dev@iceberg.apache.org<ma...@iceberg.apache.org>
Subject: Re: Flink GenericRecord Iceberg Usage

funny that I was just working on this recently. You can plugin this mapper to the FlinkSink builder
https://gist.github.com/stevenzwu/4b824556973b47178824852083ab7a50<https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_stevenzwu_4b824556973b47178824852083ab7a50&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=HUI4x5nvdxCJG7gTwvUyn6hwFsNHEbvg6Wr4hAvXcws&e=>


RowType rowType = FlinkSchemaUtil.convert(icebergSchema);
FlinkSink.builderFor(
        dataStream,
        AvroGenericRecordToRowDataMapper.forAvroSchema(avroSchema),
        FlinkCompatibilityUtil.toTypeInfo(rowType))
    .table(table)
    .tableLoader(tableLoader)
    .writeParallelism(parallelism)
    .append();

On Thu, Mar 31, 2022 at 11:32 AM Hailu, Andreas <An...@gs.com>> wrote:
Hello,

I’m looking to write a proof of concept incorporating Iceberg into our ETL application. We use Avro as our schema management solution, and the Iceberg Flink documentation examples only show writing DataStream<Row> and DataStream<RowData> types.

How can this be applied to for DataStream<GenericRecord>? Is there a translation step/utility? I found a discussion on an issue from 2020 [2] regarding the same thing, but it’s unclear of a solution was reached.

[1] https://iceberg.apache.org/docs/latest/flink/<https://urldefense.proofpoint.com/v2/url?u=https-3A__iceberg.apache.org_docs_latest_flink_&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=bCLuKffWSJGedPeqtEfLZe1UJZWw9BhQITqttcQdhFc&e=>
[2] https://github.com/apache/iceberg/issues/1885<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_iceberg_issues_1885&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=v3mVfawvHkDP645gX2lgLv9JmuNB4fGso-j3Wm91nRA&e=>


best,
ah


________________________________

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

________________________________

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

________________________________

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

Re: Flink GenericRecord Iceberg Usage

Posted by Steven Wu <st...@gmail.com>.
Hi Andreas, here is the PR for adding the support to Iceberg
https://github.com/apache/iceberg/pull/6557


On Fri, Apr 1, 2022 at 10:23 AM Hailu, Andreas <An...@gs.com> wrote:

> I found it – it was part of the versioned iceberg-flink modules,
> iceberg-flink-1.13.
>
>
>
> ah
>
>
>
> *From:* Hailu, Andreas [Engineering]
> *Sent:* Friday, April 1, 2022 10:37 AM
> *To:* dev@iceberg.apache.org
> *Subject:* RE: Flink GenericRecord Iceberg Usage
>
>
>
> Thanks, Steven, I’ll try this out!
>
>
>
> I’m having a spot of trouble locating FlinkSchemaUtil – which Iceberg
> module is that from? I thought it belonged to iceberg-flink, but seems
> 0.13.1 doesn’t contain it.
>
>
>
> ah
>
>
>
> *From:* Steven Wu <st...@gmail.com>
> *Sent:* Thursday, March 31, 2022 10:42 PM
> *To:* dev@iceberg.apache.org
> *Subject:* Re: Flink GenericRecord Iceberg Usage
>
>
>
> funny that I was just working on this recently. You can plugin this mapper
> to the FlinkSink builder
>
> https://gist.github.com/stevenzwu/4b824556973b47178824852083ab7a50
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_stevenzwu_4b824556973b47178824852083ab7a50&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=HUI4x5nvdxCJG7gTwvUyn6hwFsNHEbvg6Wr4hAvXcws&e=>
>
>
>
> RowType rowType = FlinkSchemaUtil.*convert*(icebergSchema);
> FlinkSink.*builderFor*(
>         dataStream,
>         AvroGenericRecordToRowDataMapper.*forAvroSchema*(avroSchema),
>         FlinkCompatibilityUtil.*toTypeInfo*(rowType))
>     .table(table)
>     .tableLoader(tableLoader)
>     .writeParallelism(parallelism)
>     .append();
>
>
>
> On Thu, Mar 31, 2022 at 11:32 AM Hailu, Andreas <An...@gs.com>
> wrote:
>
> Hello,
>
>
>
> I’m looking to write a proof of concept incorporating Iceberg into our ETL
> application. We use Avro as our schema management solution, and the Iceberg
> Flink documentation examples only show writing DataStream<Row> and
> DataStream<RowData> types.
>
>
>
> How can this be applied to for DataStream<GenericRecord>? Is there a
> translation step/utility? I found a discussion on an issue from 2020 [2]
> regarding the same thing, but it’s unclear of a solution was reached.
>
>
>
> [1] https://iceberg.apache.org/docs/latest/flink/
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__iceberg.apache.org_docs_latest_flink_&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=bCLuKffWSJGedPeqtEfLZe1UJZWw9BhQITqttcQdhFc&e=>
>
> [2] https://github.com/apache/iceberg/issues/1885
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_iceberg_issues_1885&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=v3mVfawvHkDP645gX2lgLv9JmuNB4fGso-j3Wm91nRA&e=>
>
>
>
>
>
> best,
>
> ah
>
>
>
>
> ------------------------------
>
>
> Your Personal Data: We may collect and process information about you that
> may be subject to data protection laws. For more information about how we
> use and disclose your personal data, how we protect your information, our
> legal basis to use your information, your rights and who you can contact,
> please refer to: www.gs.com/privacy-notices
>
>
> ------------------------------
>
> Your Personal Data: We may collect and process information about you that
> may be subject to data protection laws. For more information about how we
> use and disclose your personal data, how we protect your information, our
> legal basis to use your information, your rights and who you can contact,
> please refer to: www.gs.com/privacy-notices
>

RE: Flink GenericRecord Iceberg Usage

Posted by "Hailu, Andreas" <An...@gs.com>.
I found it – it was part of the versioned iceberg-flink modules, iceberg-flink-1.13.

ah

From: Hailu, Andreas [Engineering]
Sent: Friday, April 1, 2022 10:37 AM
To: dev@iceberg.apache.org
Subject: RE: Flink GenericRecord Iceberg Usage

Thanks, Steven, I’ll try this out!

I’m having a spot of trouble locating FlinkSchemaUtil – which Iceberg module is that from? I thought it belonged to iceberg-flink, but seems 0.13.1 doesn’t contain it.

ah

From: Steven Wu <st...@gmail.com>>
Sent: Thursday, March 31, 2022 10:42 PM
To: dev@iceberg.apache.org<ma...@iceberg.apache.org>
Subject: Re: Flink GenericRecord Iceberg Usage

funny that I was just working on this recently. You can plugin this mapper to the FlinkSink builder
https://gist.github.com/stevenzwu/4b824556973b47178824852083ab7a50<https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_stevenzwu_4b824556973b47178824852083ab7a50&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=HUI4x5nvdxCJG7gTwvUyn6hwFsNHEbvg6Wr4hAvXcws&e=>


RowType rowType = FlinkSchemaUtil.convert(icebergSchema);
FlinkSink.builderFor(
        dataStream,
        AvroGenericRecordToRowDataMapper.forAvroSchema(avroSchema),
        FlinkCompatibilityUtil.toTypeInfo(rowType))
    .table(table)
    .tableLoader(tableLoader)
    .writeParallelism(parallelism)
    .append();

On Thu, Mar 31, 2022 at 11:32 AM Hailu, Andreas <An...@gs.com>> wrote:
Hello,

I’m looking to write a proof of concept incorporating Iceberg into our ETL application. We use Avro as our schema management solution, and the Iceberg Flink documentation examples only show writing DataStream<Row> and DataStream<RowData> types.

How can this be applied to for DataStream<GenericRecord>? Is there a translation step/utility? I found a discussion on an issue from 2020 [2] regarding the same thing, but it’s unclear of a solution was reached.

[1] https://iceberg.apache.org/docs/latest/flink/<https://urldefense.proofpoint.com/v2/url?u=https-3A__iceberg.apache.org_docs_latest_flink_&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=bCLuKffWSJGedPeqtEfLZe1UJZWw9BhQITqttcQdhFc&e=>
[2] https://github.com/apache/iceberg/issues/1885<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_iceberg_issues_1885&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=v3mVfawvHkDP645gX2lgLv9JmuNB4fGso-j3Wm91nRA&e=>


best,
ah


________________________________

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

________________________________

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

RE: Flink GenericRecord Iceberg Usage

Posted by "Hailu, Andreas" <An...@gs.com>.
Thanks, Steven, I’ll try this out!

I’m having a spot of trouble locating FlinkSchemaUtil – which Iceberg module is that from? I thought it belonged to iceberg-flink, but seems 0.13.1 doesn’t contain it.

ah

From: Steven Wu <st...@gmail.com>
Sent: Thursday, March 31, 2022 10:42 PM
To: dev@iceberg.apache.org
Subject: Re: Flink GenericRecord Iceberg Usage

funny that I was just working on this recently. You can plugin this mapper to the FlinkSink builder
https://gist.github.com/stevenzwu/4b824556973b47178824852083ab7a50<https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_stevenzwu_4b824556973b47178824852083ab7a50&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=HUI4x5nvdxCJG7gTwvUyn6hwFsNHEbvg6Wr4hAvXcws&e=>


RowType rowType = FlinkSchemaUtil.convert(icebergSchema);
FlinkSink.builderFor(
        dataStream,
        AvroGenericRecordToRowDataMapper.forAvroSchema(avroSchema),
        FlinkCompatibilityUtil.toTypeInfo(rowType))
    .table(table)
    .tableLoader(tableLoader)
    .writeParallelism(parallelism)
    .append();

On Thu, Mar 31, 2022 at 11:32 AM Hailu, Andreas <An...@gs.com>> wrote:
Hello,

I’m looking to write a proof of concept incorporating Iceberg into our ETL application. We use Avro as our schema management solution, and the Iceberg Flink documentation examples only show writing DataStream<Row> and DataStream<RowData> types.

How can this be applied to for DataStream<GenericRecord>? Is there a translation step/utility? I found a discussion on an issue from 2020 [2] regarding the same thing, but it’s unclear of a solution was reached.

[1] https://iceberg.apache.org/docs/latest/flink/<https://urldefense.proofpoint.com/v2/url?u=https-3A__iceberg.apache.org_docs_latest_flink_&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=bCLuKffWSJGedPeqtEfLZe1UJZWw9BhQITqttcQdhFc&e=>
[2] https://github.com/apache/iceberg/issues/1885<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_iceberg_issues_1885&d=DwMFaQ&c=7563p3e2zaQw0AB1wrFVgyagb2IE5rTZOYPxLxfZlX4&r=hRr4SA7BtUvKoMBP6VDhfisy2OJ1ZAzai-pcCC6TFXM&m=hg3GJeOj4I2ktkldMegIhMKlQw6J0ZIchFOXRjWJjXw&s=v3mVfawvHkDP645gX2lgLv9JmuNB4fGso-j3Wm91nRA&e=>


best,
ah


________________________________

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>

________________________________

Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices<http://www.gs.com/privacy-notices>