You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Ryan Blue <bl...@apache.org> on 2020/08/12 23:39:52 UTC

[DISCUSS] August board report

Hi everyone,

Here's a draft of the board report for this month. Please reply with
anything that you'd like to see added or that I've missed. Thanks!

rb

## Description:
Apache Iceberg is a table format for huge analytic datasets that is designed
for high performance and ease of use.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Iceberg was founded 2020-05-19 (2 months ago)
There are currently 10 committers and 9 PMC members in this project.
The Committer-to-PMC ratio is roughly 1:1.

Community changes, past quarter:
- No new PMC members (project graduated recently).
- Shardul Mahadik was added as committer on 2020-07-25

## Project Activity:
0.9.0 was released, including support for Spark 3 and SQL DDL commands,
support
for JDK 11, vectorized Parquet reads, and an action to compact data files.

Since the 0.9.0 release, the community has made progress in several areas:
- The Hive StorageHandler now provides access to query Iceberg tables
  (work is ongoing to implement projection and predicate pushdown).
- Flink integration has made substantial progress toward using native
RowData,
  and the first stage of the Flink sink (data file writers) has been
committed.
- An action to expire snapshots using Spark was added and is an improvement
on
  the incremental approach because it compares the reachable file sets.
- The implementation of row-level deletes is nearing completion. Scan
planning
  now supports delete files, merge-based and set-based row filters have been
  committed, and delete file writers are under review. The delete file
writers
  allow storing deleted row data in support of Flink CDC use cases.

Releases:
- 0.9.0 was released on 2020-07-13
- 0.9.1 has an ongoing vote

## Community Health:
The month since the last report has been one of the busiest since the
project
started. 80 pull requests were merged in the last 4 weeks, and more
importantly,
came from 21 different contributors. Both of these are new high watermarks.

Community members gave 2 Iceberg talks at Subsurface Conf, on enabling Hive
queries against Iceberg tables and working with petabyte-scale Iceberg
tables.
Iceberg was also mentioned in the keynotes.

-- 
Ryan Blue

Re: [DISCUSS] August board report

Posted by Jacques Nadeau <ja...@dremio.com>.
The talks are posted as a youtube playlist now:

https://www.youtube.com/watch?v=L8WQZeeV6Yw&list=PL-gIUf9e9CCtewYqIGUKvz0fVcoyOYU1H

Iceberg specific videos:
Adrian/Christine
https://www.youtube.com/watch?v=9azStU4aDFE&list=PL-gIUf9e9CCtewYqIGUKvz0fVcoyOYU1H&index=5

Anton
https://www.youtube.com/watch?v=5RJrqS8_u68&list=PL-gIUf9e9CCtewYqIGUKvz0fVcoyOYU1H&index=10

Dan
https://www.youtube.com/watch?v=9uiaCN3tJyI&list=PL-gIUf9e9CCtewYqIGUKvz0fVcoyOYU1H&index=3
--
Jacques Nadeau
CTO and Co-Founder, Dremio


On Thu, Aug 13, 2020 at 7:49 PM OpenInx <op...@gmail.com> wrote:

> Thanks for the links,  Jacques.  I will try to create a pull request to
> attach that sharing links.
>
> On Thu, Aug 13, 2020 at 10:24 AM Jacques Nadeau <ja...@dremio.com>
> wrote:
>
>> The conference was free so all the recordings are available on-demand for
>> free:
>> https://subsurfaceconf.com/summer2020/recordings
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>>
>> On Wed, Aug 12, 2020 at 7:07 PM OpenInx <op...@gmail.com> wrote:
>>
>>> > Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
>>> Hive
>>> queries against Iceberg tables and working with petabyte-scale Iceberg
>>> tables.
>>> Iceberg was also mentioned in the keynotes.
>>>
>>> Are there slides or videos about the two iceberg talks ? I'd like to
>>> read/watch slides or videos but it seems I did not find the resources after
>>> a few google.  How about creating a page to collect all those sharing (also
>>> a 'power by' page) ?
>>>
>>>
>>>
>>> On Thu, Aug 13, 2020 at 7:50 AM Owen O'Malley <ow...@gmail.com>
>>> wrote:
>>>
>>>> +1 looks good.
>>>>
>>>> On Wed, Aug 12, 2020 at 4:41 PM Ryan Blue <bl...@apache.org> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Here's a draft of the board report for this month. Please reply with
>>>>> anything that you'd like to see added or that I've missed. Thanks!
>>>>>
>>>>> rb
>>>>>
>>>>> ## Description:
>>>>> Apache Iceberg is a table format for huge analytic datasets that is
>>>>> designed
>>>>> for high performance and ease of use.
>>>>>
>>>>> ## Issues:
>>>>> There are no issues requiring board attention.
>>>>>
>>>>> ## Membership Data:
>>>>> Apache Iceberg was founded 2020-05-19 (2 months ago)
>>>>> There are currently 10 committers and 9 PMC members in this project.
>>>>> The Committer-to-PMC ratio is roughly 1:1.
>>>>>
>>>>> Community changes, past quarter:
>>>>> - No new PMC members (project graduated recently).
>>>>> - Shardul Mahadik was added as committer on 2020-07-25
>>>>>
>>>>> ## Project Activity:
>>>>> 0.9.0 was released, including support for Spark 3 and SQL DDL
>>>>> commands, support
>>>>> for JDK 11, vectorized Parquet reads, and an action to compact data
>>>>> files.
>>>>>
>>>>> Since the 0.9.0 release, the community has made progress in several
>>>>> areas:
>>>>> - The Hive StorageHandler now provides access to query Iceberg tables
>>>>>   (work is ongoing to implement projection and predicate pushdown).
>>>>> - Flink integration has made substantial progress toward using native
>>>>> RowData,
>>>>>   and the first stage of the Flink sink (data file writers) has been
>>>>> committed.
>>>>> - An action to expire snapshots using Spark was added and is an
>>>>> improvement on
>>>>>   the incremental approach because it compares the reachable file sets.
>>>>> - The implementation of row-level deletes is nearing completion. Scan
>>>>> planning
>>>>>   now supports delete files, merge-based and set-based row filters
>>>>> have been
>>>>>   committed, and delete file writers are under review. The delete file
>>>>> writers
>>>>>   allow storing deleted row data in support of Flink CDC use cases.
>>>>>
>>>>> Releases:
>>>>> - 0.9.0 was released on 2020-07-13
>>>>> - 0.9.1 has an ongoing vote
>>>>>
>>>>> ## Community Health:
>>>>> The month since the last report has been one of the busiest since the
>>>>> project
>>>>> started. 80 pull requests were merged in the last 4 weeks, and more
>>>>> importantly,
>>>>> came from 21 different contributors. Both of these are new high
>>>>> watermarks.
>>>>>
>>>>> Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
>>>>> Hive
>>>>> queries against Iceberg tables and working with petabyte-scale Iceberg
>>>>> tables.
>>>>> Iceberg was also mentioned in the keynotes.
>>>>>
>>>>> --
>>>>> Ryan Blue
>>>>>
>>>>

Re: [DISCUSS] August board report

Posted by OpenInx <op...@gmail.com>.
Thanks for the links,  Jacques.  I will try to create a pull request to
attach that sharing links.

On Thu, Aug 13, 2020 at 10:24 AM Jacques Nadeau <ja...@dremio.com> wrote:

> The conference was free so all the recordings are available on-demand for
> free:
> https://subsurfaceconf.com/summer2020/recordings
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
>
> On Wed, Aug 12, 2020 at 7:07 PM OpenInx <op...@gmail.com> wrote:
>
>> > Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
>> Hive
>> queries against Iceberg tables and working with petabyte-scale Iceberg
>> tables.
>> Iceberg was also mentioned in the keynotes.
>>
>> Are there slides or videos about the two iceberg talks ? I'd like to
>> read/watch slides or videos but it seems I did not find the resources after
>> a few google.  How about creating a page to collect all those sharing (also
>> a 'power by' page) ?
>>
>>
>>
>> On Thu, Aug 13, 2020 at 7:50 AM Owen O'Malley <ow...@gmail.com>
>> wrote:
>>
>>> +1 looks good.
>>>
>>> On Wed, Aug 12, 2020 at 4:41 PM Ryan Blue <bl...@apache.org> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Here's a draft of the board report for this month. Please reply with
>>>> anything that you'd like to see added or that I've missed. Thanks!
>>>>
>>>> rb
>>>>
>>>> ## Description:
>>>> Apache Iceberg is a table format for huge analytic datasets that is
>>>> designed
>>>> for high performance and ease of use.
>>>>
>>>> ## Issues:
>>>> There are no issues requiring board attention.
>>>>
>>>> ## Membership Data:
>>>> Apache Iceberg was founded 2020-05-19 (2 months ago)
>>>> There are currently 10 committers and 9 PMC members in this project.
>>>> The Committer-to-PMC ratio is roughly 1:1.
>>>>
>>>> Community changes, past quarter:
>>>> - No new PMC members (project graduated recently).
>>>> - Shardul Mahadik was added as committer on 2020-07-25
>>>>
>>>> ## Project Activity:
>>>> 0.9.0 was released, including support for Spark 3 and SQL DDL commands,
>>>> support
>>>> for JDK 11, vectorized Parquet reads, and an action to compact data
>>>> files.
>>>>
>>>> Since the 0.9.0 release, the community has made progress in several
>>>> areas:
>>>> - The Hive StorageHandler now provides access to query Iceberg tables
>>>>   (work is ongoing to implement projection and predicate pushdown).
>>>> - Flink integration has made substantial progress toward using native
>>>> RowData,
>>>>   and the first stage of the Flink sink (data file writers) has been
>>>> committed.
>>>> - An action to expire snapshots using Spark was added and is an
>>>> improvement on
>>>>   the incremental approach because it compares the reachable file sets.
>>>> - The implementation of row-level deletes is nearing completion. Scan
>>>> planning
>>>>   now supports delete files, merge-based and set-based row filters have
>>>> been
>>>>   committed, and delete file writers are under review. The delete file
>>>> writers
>>>>   allow storing deleted row data in support of Flink CDC use cases.
>>>>
>>>> Releases:
>>>> - 0.9.0 was released on 2020-07-13
>>>> - 0.9.1 has an ongoing vote
>>>>
>>>> ## Community Health:
>>>> The month since the last report has been one of the busiest since the
>>>> project
>>>> started. 80 pull requests were merged in the last 4 weeks, and more
>>>> importantly,
>>>> came from 21 different contributors. Both of these are new high
>>>> watermarks.
>>>>
>>>> Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
>>>> Hive
>>>> queries against Iceberg tables and working with petabyte-scale Iceberg
>>>> tables.
>>>> Iceberg was also mentioned in the keynotes.
>>>>
>>>> --
>>>> Ryan Blue
>>>>
>>>

Re: [DISCUSS] August board report

Posted by Jacques Nadeau <ja...@dremio.com>.
The conference was free so all the recordings are available on-demand for
free:
https://subsurfaceconf.com/summer2020/recordings
--
Jacques Nadeau
CTO and Co-Founder, Dremio


On Wed, Aug 12, 2020 at 7:07 PM OpenInx <op...@gmail.com> wrote:

> > Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
> Hive
> queries against Iceberg tables and working with petabyte-scale Iceberg
> tables.
> Iceberg was also mentioned in the keynotes.
>
> Are there slides or videos about the two iceberg talks ? I'd like to
> read/watch slides or videos but it seems I did not find the resources after
> a few google.  How about creating a page to collect all those sharing (also
> a 'power by' page) ?
>
>
>
> On Thu, Aug 13, 2020 at 7:50 AM Owen O'Malley <ow...@gmail.com>
> wrote:
>
>> +1 looks good.
>>
>> On Wed, Aug 12, 2020 at 4:41 PM Ryan Blue <bl...@apache.org> wrote:
>>
>>> Hi everyone,
>>>
>>> Here's a draft of the board report for this month. Please reply with
>>> anything that you'd like to see added or that I've missed. Thanks!
>>>
>>> rb
>>>
>>> ## Description:
>>> Apache Iceberg is a table format for huge analytic datasets that is
>>> designed
>>> for high performance and ease of use.
>>>
>>> ## Issues:
>>> There are no issues requiring board attention.
>>>
>>> ## Membership Data:
>>> Apache Iceberg was founded 2020-05-19 (2 months ago)
>>> There are currently 10 committers and 9 PMC members in this project.
>>> The Committer-to-PMC ratio is roughly 1:1.
>>>
>>> Community changes, past quarter:
>>> - No new PMC members (project graduated recently).
>>> - Shardul Mahadik was added as committer on 2020-07-25
>>>
>>> ## Project Activity:
>>> 0.9.0 was released, including support for Spark 3 and SQL DDL commands,
>>> support
>>> for JDK 11, vectorized Parquet reads, and an action to compact data
>>> files.
>>>
>>> Since the 0.9.0 release, the community has made progress in several
>>> areas:
>>> - The Hive StorageHandler now provides access to query Iceberg tables
>>>   (work is ongoing to implement projection and predicate pushdown).
>>> - Flink integration has made substantial progress toward using native
>>> RowData,
>>>   and the first stage of the Flink sink (data file writers) has been
>>> committed.
>>> - An action to expire snapshots using Spark was added and is an
>>> improvement on
>>>   the incremental approach because it compares the reachable file sets.
>>> - The implementation of row-level deletes is nearing completion. Scan
>>> planning
>>>   now supports delete files, merge-based and set-based row filters have
>>> been
>>>   committed, and delete file writers are under review. The delete file
>>> writers
>>>   allow storing deleted row data in support of Flink CDC use cases.
>>>
>>> Releases:
>>> - 0.9.0 was released on 2020-07-13
>>> - 0.9.1 has an ongoing vote
>>>
>>> ## Community Health:
>>> The month since the last report has been one of the busiest since the
>>> project
>>> started. 80 pull requests were merged in the last 4 weeks, and more
>>> importantly,
>>> came from 21 different contributors. Both of these are new high
>>> watermarks.
>>>
>>> Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
>>> Hive
>>> queries against Iceberg tables and working with petabyte-scale Iceberg
>>> tables.
>>> Iceberg was also mentioned in the keynotes.
>>>
>>> --
>>> Ryan Blue
>>>
>>

Re: [DISCUSS] August board report

Posted by OpenInx <op...@gmail.com>.
> Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
Hive
queries against Iceberg tables and working with petabyte-scale Iceberg
tables.
Iceberg was also mentioned in the keynotes.

Are there slides or videos about the two iceberg talks ? I'd like to
read/watch slides or videos but it seems I did not find the resources after
a few google.  How about creating a page to collect all those sharing (also
a 'power by' page) ?



On Thu, Aug 13, 2020 at 7:50 AM Owen O'Malley <ow...@gmail.com>
wrote:

> +1 looks good.
>
> On Wed, Aug 12, 2020 at 4:41 PM Ryan Blue <bl...@apache.org> wrote:
>
>> Hi everyone,
>>
>> Here's a draft of the board report for this month. Please reply with
>> anything that you'd like to see added or that I've missed. Thanks!
>>
>> rb
>>
>> ## Description:
>> Apache Iceberg is a table format for huge analytic datasets that is
>> designed
>> for high performance and ease of use.
>>
>> ## Issues:
>> There are no issues requiring board attention.
>>
>> ## Membership Data:
>> Apache Iceberg was founded 2020-05-19 (2 months ago)
>> There are currently 10 committers and 9 PMC members in this project.
>> The Committer-to-PMC ratio is roughly 1:1.
>>
>> Community changes, past quarter:
>> - No new PMC members (project graduated recently).
>> - Shardul Mahadik was added as committer on 2020-07-25
>>
>> ## Project Activity:
>> 0.9.0 was released, including support for Spark 3 and SQL DDL commands,
>> support
>> for JDK 11, vectorized Parquet reads, and an action to compact data files.
>>
>> Since the 0.9.0 release, the community has made progress in several areas:
>> - The Hive StorageHandler now provides access to query Iceberg tables
>>   (work is ongoing to implement projection and predicate pushdown).
>> - Flink integration has made substantial progress toward using native
>> RowData,
>>   and the first stage of the Flink sink (data file writers) has been
>> committed.
>> - An action to expire snapshots using Spark was added and is an
>> improvement on
>>   the incremental approach because it compares the reachable file sets.
>> - The implementation of row-level deletes is nearing completion. Scan
>> planning
>>   now supports delete files, merge-based and set-based row filters have
>> been
>>   committed, and delete file writers are under review. The delete file
>> writers
>>   allow storing deleted row data in support of Flink CDC use cases.
>>
>> Releases:
>> - 0.9.0 was released on 2020-07-13
>> - 0.9.1 has an ongoing vote
>>
>> ## Community Health:
>> The month since the last report has been one of the busiest since the
>> project
>> started. 80 pull requests were merged in the last 4 weeks, and more
>> importantly,
>> came from 21 different contributors. Both of these are new high
>> watermarks.
>>
>> Community members gave 2 Iceberg talks at Subsurface Conf, on enabling
>> Hive
>> queries against Iceberg tables and working with petabyte-scale Iceberg
>> tables.
>> Iceberg was also mentioned in the keynotes.
>>
>> --
>> Ryan Blue
>>
>

Re: [DISCUSS] August board report

Posted by Owen O'Malley <ow...@gmail.com>.
+1 looks good.

On Wed, Aug 12, 2020 at 4:41 PM Ryan Blue <bl...@apache.org> wrote:

> Hi everyone,
>
> Here's a draft of the board report for this month. Please reply with
> anything that you'd like to see added or that I've missed. Thanks!
>
> rb
>
> ## Description:
> Apache Iceberg is a table format for huge analytic datasets that is
> designed
> for high performance and ease of use.
>
> ## Issues:
> There are no issues requiring board attention.
>
> ## Membership Data:
> Apache Iceberg was founded 2020-05-19 (2 months ago)
> There are currently 10 committers and 9 PMC members in this project.
> The Committer-to-PMC ratio is roughly 1:1.
>
> Community changes, past quarter:
> - No new PMC members (project graduated recently).
> - Shardul Mahadik was added as committer on 2020-07-25
>
> ## Project Activity:
> 0.9.0 was released, including support for Spark 3 and SQL DDL commands,
> support
> for JDK 11, vectorized Parquet reads, and an action to compact data files.
>
> Since the 0.9.0 release, the community has made progress in several areas:
> - The Hive StorageHandler now provides access to query Iceberg tables
>   (work is ongoing to implement projection and predicate pushdown).
> - Flink integration has made substantial progress toward using native
> RowData,
>   and the first stage of the Flink sink (data file writers) has been
> committed.
> - An action to expire snapshots using Spark was added and is an
> improvement on
>   the incremental approach because it compares the reachable file sets.
> - The implementation of row-level deletes is nearing completion. Scan
> planning
>   now supports delete files, merge-based and set-based row filters have
> been
>   committed, and delete file writers are under review. The delete file
> writers
>   allow storing deleted row data in support of Flink CDC use cases.
>
> Releases:
> - 0.9.0 was released on 2020-07-13
> - 0.9.1 has an ongoing vote
>
> ## Community Health:
> The month since the last report has been one of the busiest since the
> project
> started. 80 pull requests were merged in the last 4 weeks, and more
> importantly,
> came from 21 different contributors. Both of these are new high watermarks.
>
> Community members gave 2 Iceberg talks at Subsurface Conf, on enabling Hive
> queries against Iceberg tables and working with petabyte-scale Iceberg
> tables.
> Iceberg was also mentioned in the keynotes.
>
> --
> Ryan Blue
>