You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@iceberg.apache.org by 周康 <zh...@gmail.com> on 2022/06/06 03:00:43 UTC

【Feature】Request support for c++ sdk

Hi team
I am a dev from StarRocks community, and we have supported iceberg v1
format.
We are also planning to support v2 format. If there is a C++ package, it
will be very convenient for our implementation.
At the same time, other c++ computing engines support v2 format will also
be faster.

Do we have plans to support c++ version sdk?
-- 
caneGuy

Re: 【Feature】Request support for c++ sdk

Posted by Joshua Howard <jo...@gmail.com>.

+1 for Rust.

On Wed, Jun 22, 2022 at 4:21 AM LuNing Wang <wa...@gmail.com> wrote:

> +1 for Rust
>
> Best Regards,
> LuNing Wang
>
> Nan Zhu <zh...@gmail.com> 于2022年6月22日周三 14:15写道：
>
>> +1 for using rust as the backbone for new language bindings
>>
>> On Sun, Jun 12, 2022 at 23:52 OpenInx <op...@gmail.com> wrote:
>>
>>> Thanks Kyle for sharing your context.
>>>
>>> Recently, I also spent some time practicing my Rust skills.  Generally,
>>> I'm +1 for adding Rust SDK support for native language.
>>>
>>>
>>> On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <ky...@tabular.io>
>>> wrote:
>>>
>>>> Thanks for starting this discussion.
>>>>
>>>> I know I was the first to mention some of my concerns (which I still
>>>> have and would apply to any new major change), but I also think that this
>>>> is an avenue that should be explored.
>>>>
>>>> Specifically a native integration would have many benefits for
>>>> read paths (in addition to others). I know that the Rust avro reader is
>>>> significantly faster, as well as native columnar formats.
>>>>
>>>> So while I do have some concerns about making sure we have enough
>>>> people to support this endeavor, I do want to say I think it's a really
>>>> good idea. My apologies if I gave the impression otherwise.
>>>>
>>>> I would personally be interested in contributing to and reviewing for a
>>>> native Rust library (or CPP, but I think Rust is a much more elegant
>>>> language and I'd personally prefer to work in that as it's easier to work
>>>> with across systems than C++ imo though I would defer to others on that).
>>>>
>>>> I would also be happy to offer my help and perspective in moving this
>>>> forward if need be. But I did want to express my practical concerns so that
>>>> we don't have an area of the codebase where there aren't enough people to
>>>> help maintain it etc.
>>>>
>>>> But in general I think this is an exciting opportunity, and results
>>>> have shown time and time again that native readers / writers are much more
>>>> performant.
>>>>
>>>> +1 to using Rust as well (which is a language I know more of than C++
>>>> these days - though both I'd have to brush off my skillset).
>>>>
>>>> Best, Kyle
>>>>
>>>> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <op...@gmail.com> wrote:
>>>>
>>>>> Hi Tao Wu.
>>>>>
>>>>> I think the apache iceberg community is very consistent in providing
>>>>> the Iceberg SDK for native languages.  I am very happy to offer my
>>>>> perspective and help if needed when you try to move this thing forward.
>>>>>
>>>>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:
>>>>>
>>>>>> Hi, everyone, I'm Tao. I'm currently working on a commercial
>>>>>> streaming system that is written in Rust.
>>>>>>
>>>>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we
>>>>>> can have better integration with the existing Iceberg ecosystem. Initially
>>>>>> I found https://github.com/oliverdaff/iceberg-rs, but it appears the
>>>>>> author hasn't been active lately. So I'm looking to see if the Iceberg
>>>>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if
>>>>>> there is, we'd love to contribute. I believe as Iceberg increases its
>>>>>> popularity, there will eventually be more systems that want such libraries.
>>>>>> There could have even been some ongoing works without consulting with the
>>>>>> community.
>>>>>>
>>>>>> Additionally, I think the initial Rust/C++ SDK can only support the
>>>>>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
>>>>>> query engines out there taking charge of data maintenance. We don't have to
>>>>>> rewrite every corner of Iceberg in Rust. That means less engineering work.
>>>>>>
>>>>>> On 2022/06/08 10:16:05 OpenInx wrote:
>>>>>> > As a cloud-native table format standard for the big-data
>>>>>> ecosystem,  I
>>>>>> > believe supporting multiple languages is the correct direction so
>>>>>> that
>>>>>> > different languages can connect to the apache iceberg table format.
>>>>>> >
>>>>>> > But I can also get Kyle's point about lacking enough
>>>>>> resources(developers
>>>>>> > and reviewers ) to accomplish this goal.  In my mind,  Python,
>>>>>> Golang, C++,
>>>>>> > Rust , all of them can be regarded as the native language support.
>>>>>> we may
>>>>>> > just need to support the Rust SDK and then all of the other
>>>>>> languages can
>>>>>> > just wrap the Rust SDK to access the table format.
>>>>>> >
>>>>>> > Anyway,  we will need to wait for the REST catalog finished before
>>>>>> we
>>>>>> > introduce another languages support , because we can not access the
>>>>>> iceberg
>>>>>> > table by invoking the JVM catalog interfaces.
>>>>>> >
>>>>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <
>>>>>> emkornfield@gmail.com>
>>>>>> > wrote:
>>>>>> >
>>>>>> > > There’s also the question of how useful this would be in practice
>>>>>> given
>>>>>> > >> the complexity of using C++ (or Rust etc) within some of the
>>>>>> major
>>>>>> > >> frameworks.
>>>>>> > >>
>>>>>> > >
>>>>>> > > One place this would be useful is for the Arrow's DataSet API
>>>>>> [1].  An
>>>>>> > > option the Arrow community might be open to is hosting parts of
>>>>>> the code
>>>>>> > > there (this is what is done for Apache Parquet C++).  This helps
>>>>>> shape some
>>>>>> > > of the answers to other questions posed (ORC and Parquet are
>>>>>> already in the
>>>>>> > > Repo, it provides a Filesystem interface, etc).  The project
>>>>>> doesn't
>>>>>> > > currently consume Avro, and I think the preferred approach is to
>>>>>> make a
>>>>>> > > clean room Avro parser.  But I agree this is a non-trivial effort
>>>>>> to get
>>>>>> > > underway.
>>>>>> > >
>>>>>> > > Another area to consider is compatibility testing.  I think
>>>>>> before a third
>>>>>> > > officially supported community library is introduced it would be
>>>>>> good to
>>>>>> > > have a compatibility framework in place to make sure
>>>>>> implementations are
>>>>>> > > all interpreting the specification correctly.  If there isn't
>>>>>> already an
>>>>>> > > effort here, I'd like to start contributing something (probably
>>>>>> will have
>>>>>> > > bandwidth sometime place in Q3).
>>>>>> > >
>>>>>> > > Thanks,
>>>>>> > > -Micah
>>>>>> > >
>>>>>> > >
>>>>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
>>>>>> > >
>>>>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io>
>>>>>> wrote:
>>>>>> > >
>>>>>> > >> Hi caneGuy,
>>>>>> > >>
>>>>>> > >> I personally don’t dislike this idea. I understand the
>>>>>> performance
>>>>>> > >> benefits.
>>>>>> > >>
>>>>>> > >> But this would be a huge undertaking for the community. We’d
>>>>>> need to
>>>>>> > >> ensure we had sufficient developer support for reviews (likely
>>>>>> one of the
>>>>>> > >> biggest issues), as well as a number of other things.
>>>>>> Particularly
>>>>>> > >> dependencies, package management, etc. We’d also need to scope
>>>>>> support down
>>>>>> > >> to specific OS / compilers etc.
>>>>>> > >>
>>>>>> > >> We’d also need to be sure we had adequate developer support from
>>>>>> a wide
>>>>>> > >> enough range of the community to support the project long term.
>>>>>> One issue
>>>>>> > >> in open source is that developers will work on something
>>>>>> tangential to
>>>>>> > >> their project in another repository, but nobody is available to
>>>>>> maintain it.
>>>>>> > >>
>>>>>> > >> There’s also the question of how useful this would be in
>>>>>> practice given
>>>>>> > >> the complexity of using C++ (or Rust etc) within some of the
>>>>>> major
>>>>>> > >> frameworks.
>>>>>> > >>
>>>>>> > >> Again, I’m not opposed to the idea but just trying to be
>>>>>> realistic about
>>>>>> > >> the realities of such an undertaking. It would need full
>>>>>> community support
>>>>>> > >> (or at least support from enough community members to be
>>>>>> sustainable).
>>>>>> > >>
>>>>>> > >> If you wanted to make a design doc, the milestones tab in the
>>>>>> Iceberg
>>>>>> > >> project has some that you might use as reference.
>>>>>> > >>
>>>>>> > >> *I highly suggest you come to the next community sync and bring
>>>>>> this up
>>>>>> > >> to the community then.*
>>>>>> > >>
>>>>>> > >> If you’re not already on the invite list for the monthly
>>>>>> community sync,
>>>>>> > >> you can get on it by joining the Google group. You’ll receive
>>>>>> incites when
>>>>>> > >> they go out:
>>>>>> > >> https://groups.google.com/g/iceberg-sync
>>>>>> > >>
>>>>>> > >> Looking forward to seeing you at the next community sync.
>>>>>> > >>
>>>>>> > >> A design document and/or any prior art would be very helpful as
>>>>>> the
>>>>>> > >> community sync does discuss many topics (possibly there is
>>>>>> existing C++
>>>>>> > >> support in StarRocks for Iceberg V1?).
>>>>>> > >>
>>>>>> > >> Thank you,
>>>>>> > >> Kyle Bendickson
>>>>>> > >> GitHub: kbendick
>>>>>> > >>
>>>>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io>
>>>>>> wrote:
>>>>>> > >>
>>>>>> > >>> Currently there is no existing effort to develop a C++ package.
>>>>>> That
>>>>>> > >>> being said I think it would be awesome to have one! If anyone
>>>>>> is willing to
>>>>>> > >>> start that development effort, I can help with some of the
>>>>>> ground work to
>>>>>> > >>> kickstart it.
>>>>>> > >>>
>>>>>> > >>> I would say the first step would be for someone to prepare a
>>>>>> high-level
>>>>>> > >>> proposal.
>>>>>> > >>>
>>>>>> > >>> -Sam
>>>>>> > >>>
>>>>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com>
>>>>>> wrote:
>>>>>> > >>>
>>>>>> > >>>> Hi team
>>>>>> > >>>> I am a dev from StarRocks community, and we have supported
>>>>>> iceberg v1
>>>>>> > >>>> format.
>>>>>> > >>>> We are also planning to support v2 format. If there is a C++
>>>>>> package,
>>>>>> > >>>> it will be very convenient for our implementation.
>>>>>> > >>>> At the same time, other c++ computing engines support v2
>>>>>> format will
>>>>>> > >>>> also be faster.
>>>>>> > >>>>
>>>>>> > >>>> Do we have plans to support c++ version sdk?
>>>>>> > >>>> --
>>>>>> > >>>> caneGuy
>>>>>> > >>>>
>>>>>> > >>> --
>>>>>> > >>>
>>>>>> > >>> Sam Redai <sa...@tabular.io>
>>>>>> > >>>
>>>>>> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>>>> > >>>
>>>>>> > >>> c (267) 226-8606
>>>>>> > >>>
>>>>>> > >>
>>>>>> >
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Kyle Bendickson
>>>>
>>>> OSS Developer  |  Tabular <https://tabular.io/>
>>>>
>>>> kyle@tabular.io
>>>>
>>>

-- 
Josh Howard

Re: 【Feature】Request support for c++ sdk

Posted by LuNing Wang <wa...@gmail.com>.

+1 for Rust

Best Regards,
LuNing Wang

Nan Zhu <zh...@gmail.com> 于2022年6月22日周三 14:15写道：

> +1 for using rust as the backbone for new language bindings
>
> On Sun, Jun 12, 2022 at 23:52 OpenInx <op...@gmail.com> wrote:
>
>> Thanks Kyle for sharing your context.
>>
>> Recently, I also spent some time practicing my Rust skills.  Generally,
>> I'm +1 for adding Rust SDK support for native language.
>>
>>
>> On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <ky...@tabular.io> wrote:
>>
>>> Thanks for starting this discussion.
>>>
>>> I know I was the first to mention some of my concerns (which I still
>>> have and would apply to any new major change), but I also think that this
>>> is an avenue that should be explored.
>>>
>>> Specifically a native integration would have many benefits for
>>> read paths (in addition to others). I know that the Rust avro reader is
>>> significantly faster, as well as native columnar formats.
>>>
>>> So while I do have some concerns about making sure we have enough people
>>> to support this endeavor, I do want to say I think it's a really good idea.
>>> My apologies if I gave the impression otherwise.
>>>
>>> I would personally be interested in contributing to and reviewing for a
>>> native Rust library (or CPP, but I think Rust is a much more elegant
>>> language and I'd personally prefer to work in that as it's easier to work
>>> with across systems than C++ imo though I would defer to others on that).
>>>
>>> I would also be happy to offer my help and perspective in moving this
>>> forward if need be. But I did want to express my practical concerns so that
>>> we don't have an area of the codebase where there aren't enough people to
>>> help maintain it etc.
>>>
>>> But in general I think this is an exciting opportunity, and results have
>>> shown time and time again that native readers / writers are much more
>>> performant.
>>>
>>> +1 to using Rust as well (which is a language I know more of than C++
>>> these days - though both I'd have to brush off my skillset).
>>>
>>> Best, Kyle
>>>
>>> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <op...@gmail.com> wrote:
>>>
>>>> Hi Tao Wu.
>>>>
>>>> I think the apache iceberg community is very consistent in providing
>>>> the Iceberg SDK for native languages.  I am very happy to offer my
>>>> perspective and help if needed when you try to move this thing forward.
>>>>
>>>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:
>>>>
>>>>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
>>>>> system that is written in Rust.
>>>>>
>>>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
>>>>> have better integration with the existing Iceberg ecosystem. Initially I
>>>>> found https://github.com/oliverdaff/iceberg-rs, but it appears the
>>>>> author hasn't been active lately. So I'm looking to see if the Iceberg
>>>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if
>>>>> there is, we'd love to contribute. I believe as Iceberg increases its
>>>>> popularity, there will eventually be more systems that want such libraries.
>>>>> There could have even been some ongoing works without consulting with the
>>>>> community.
>>>>>
>>>>> Additionally, I think the initial Rust/C++ SDK can only support the
>>>>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
>>>>> query engines out there taking charge of data maintenance. We don't have to
>>>>> rewrite every corner of Iceberg in Rust. That means less engineering work.
>>>>>
>>>>> On 2022/06/08 10:16:05 OpenInx wrote:
>>>>> > As a cloud-native table format standard for the big-data ecosystem,
>>>>> I
>>>>> > believe supporting multiple languages is the correct direction so
>>>>> that
>>>>> > different languages can connect to the apache iceberg table format.
>>>>> >
>>>>> > But I can also get Kyle's point about lacking enough
>>>>> resources(developers
>>>>> > and reviewers ) to accomplish this goal.  In my mind,  Python,
>>>>> Golang, C++,
>>>>> > Rust , all of them can be regarded as the native language support.
>>>>> we may
>>>>> > just need to support the Rust SDK and then all of the other
>>>>> languages can
>>>>> > just wrap the Rust SDK to access the table format.
>>>>> >
>>>>> > Anyway,  we will need to wait for the REST catalog finished before we
>>>>> > introduce another languages support , because we can not access the
>>>>> iceberg
>>>>> > table by invoking the JVM catalog interfaces.
>>>>> >
>>>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <
>>>>> emkornfield@gmail.com>
>>>>> > wrote:
>>>>> >
>>>>> > > There’s also the question of how useful this would be in practice
>>>>> given
>>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>>> > >> frameworks.
>>>>> > >>
>>>>> > >
>>>>> > > One place this would be useful is for the Arrow's DataSet API
>>>>> [1].  An
>>>>> > > option the Arrow community might be open to is hosting parts of
>>>>> the code
>>>>> > > there (this is what is done for Apache Parquet C++).  This helps
>>>>> shape some
>>>>> > > of the answers to other questions posed (ORC and Parquet are
>>>>> already in the
>>>>> > > Repo, it provides a Filesystem interface, etc).  The project
>>>>> doesn't
>>>>> > > currently consume Avro, and I think the preferred approach is to
>>>>> make a
>>>>> > > clean room Avro parser.  But I agree this is a non-trivial effort
>>>>> to get
>>>>> > > underway.
>>>>> > >
>>>>> > > Another area to consider is compatibility testing.  I think before
>>>>> a third
>>>>> > > officially supported community library is introduced it would be
>>>>> good to
>>>>> > > have a compatibility framework in place to make sure
>>>>> implementations are
>>>>> > > all interpreting the specification correctly.  If there isn't
>>>>> already an
>>>>> > > effort here, I'd like to start contributing something (probably
>>>>> will have
>>>>> > > bandwidth sometime place in Q3).
>>>>> > >
>>>>> > > Thanks,
>>>>> > > -Micah
>>>>> > >
>>>>> > >
>>>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
>>>>> > >
>>>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io>
>>>>> wrote:
>>>>> > >
>>>>> > >> Hi caneGuy,
>>>>> > >>
>>>>> > >> I personally don’t dislike this idea. I understand the performance
>>>>> > >> benefits.
>>>>> > >>
>>>>> > >> But this would be a huge undertaking for the community. We’d need
>>>>> to
>>>>> > >> ensure we had sufficient developer support for reviews (likely
>>>>> one of the
>>>>> > >> biggest issues), as well as a number of other things. Particularly
>>>>> > >> dependencies, package management, etc. We’d also need to scope
>>>>> support down
>>>>> > >> to specific OS / compilers etc.
>>>>> > >>
>>>>> > >> We’d also need to be sure we had adequate developer support from
>>>>> a wide
>>>>> > >> enough range of the community to support the project long term.
>>>>> One issue
>>>>> > >> in open source is that developers will work on something
>>>>> tangential to
>>>>> > >> their project in another repository, but nobody is available to
>>>>> maintain it.
>>>>> > >>
>>>>> > >> There’s also the question of how useful this would be in practice
>>>>> given
>>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>>> > >> frameworks.
>>>>> > >>
>>>>> > >> Again, I’m not opposed to the idea but just trying to be
>>>>> realistic about
>>>>> > >> the realities of such an undertaking. It would need full
>>>>> community support
>>>>> > >> (or at least support from enough community members to be
>>>>> sustainable).
>>>>> > >>
>>>>> > >> If you wanted to make a design doc, the milestones tab in the
>>>>> Iceberg
>>>>> > >> project has some that you might use as reference.
>>>>> > >>
>>>>> > >> *I highly suggest you come to the next community sync and bring
>>>>> this up
>>>>> > >> to the community then.*
>>>>> > >>
>>>>> > >> If you’re not already on the invite list for the monthly
>>>>> community sync,
>>>>> > >> you can get on it by joining the Google group. You’ll receive
>>>>> incites when
>>>>> > >> they go out:
>>>>> > >> https://groups.google.com/g/iceberg-sync
>>>>> > >>
>>>>> > >> Looking forward to seeing you at the next community sync.
>>>>> > >>
>>>>> > >> A design document and/or any prior art would be very helpful as
>>>>> the
>>>>> > >> community sync does discuss many topics (possibly there is
>>>>> existing C++
>>>>> > >> support in StarRocks for Iceberg V1?).
>>>>> > >>
>>>>> > >> Thank you,
>>>>> > >> Kyle Bendickson
>>>>> > >> GitHub: kbendick
>>>>> > >>
>>>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>>>>> > >>
>>>>> > >>> Currently there is no existing effort to develop a C++ package.
>>>>> That
>>>>> > >>> being said I think it would be awesome to have one! If anyone is
>>>>> willing to
>>>>> > >>> start that development effort, I can help with some of the
>>>>> ground work to
>>>>> > >>> kickstart it.
>>>>> > >>>
>>>>> > >>> I would say the first step would be for someone to prepare a
>>>>> high-level
>>>>> > >>> proposal.
>>>>> > >>>
>>>>> > >>> -Sam
>>>>> > >>>
>>>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com>
>>>>> wrote:
>>>>> > >>>
>>>>> > >>>> Hi team
>>>>> > >>>> I am a dev from StarRocks community, and we have supported
>>>>> iceberg v1
>>>>> > >>>> format.
>>>>> > >>>> We are also planning to support v2 format. If there is a C++
>>>>> package,
>>>>> > >>>> it will be very convenient for our implementation.
>>>>> > >>>> At the same time, other c++ computing engines support v2 format
>>>>> will
>>>>> > >>>> also be faster.
>>>>> > >>>>
>>>>> > >>>> Do we have plans to support c++ version sdk?
>>>>> > >>>> --
>>>>> > >>>> caneGuy
>>>>> > >>>>
>>>>> > >>> --
>>>>> > >>>
>>>>> > >>> Sam Redai <sa...@tabular.io>
>>>>> > >>>
>>>>> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>>> > >>>
>>>>> > >>> c (267) 226-8606
>>>>> > >>>
>>>>> > >>
>>>>> >
>>>>>
>>>>
>>>
>>> --
>>>
>>> Kyle Bendickson
>>>
>>> OSS Developer  |  Tabular <https://tabular.io/>
>>>
>>> kyle@tabular.io
>>>
>>

Re: 【Feature】Request support for c++ sdk

Posted by Nan Zhu <zh...@gmail.com>.

+1 for using rust as the backbone for new language bindings

On Sun, Jun 12, 2022 at 23:52 OpenInx <op...@gmail.com> wrote:

> Thanks Kyle for sharing your context.
>
> Recently, I also spent some time practicing my Rust skills.  Generally,
> I'm +1 for adding Rust SDK support for native language.
>
>
> On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <ky...@tabular.io> wrote:
>
>> Thanks for starting this discussion.
>>
>> I know I was the first to mention some of my concerns (which I still have
>> and would apply to any new major change), but I also think that this is an
>> avenue that should be explored.
>>
>> Specifically a native integration would have many benefits for read paths
>> (in addition to others). I know that the Rust avro reader is
>> significantly faster, as well as native columnar formats.
>>
>> So while I do have some concerns about making sure we have enough people
>> to support this endeavor, I do want to say I think it's a really good idea.
>> My apologies if I gave the impression otherwise.
>>
>> I would personally be interested in contributing to and reviewing for a
>> native Rust library (or CPP, but I think Rust is a much more elegant
>> language and I'd personally prefer to work in that as it's easier to work
>> with across systems than C++ imo though I would defer to others on that).
>>
>> I would also be happy to offer my help and perspective in moving this
>> forward if need be. But I did want to express my practical concerns so that
>> we don't have an area of the codebase where there aren't enough people to
>> help maintain it etc.
>>
>> But in general I think this is an exciting opportunity, and results have
>> shown time and time again that native readers / writers are much more
>> performant.
>>
>> +1 to using Rust as well (which is a language I know more of than C++
>> these days - though both I'd have to brush off my skillset).
>>
>> Best, Kyle
>>
>> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <op...@gmail.com> wrote:
>>
>>> Hi Tao Wu.
>>>
>>> I think the apache iceberg community is very consistent in providing the
>>> Iceberg SDK for native languages.  I am very happy to offer my perspective
>>> and help if needed when you try to move this thing forward.
>>>
>>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:
>>>
>>>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
>>>> system that is written in Rust.
>>>>
>>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
>>>> have better integration with the existing Iceberg ecosystem. Initially I
>>>> found https://github.com/oliverdaff/iceberg-rs, but it appears the
>>>> author hasn't been active lately. So I'm looking to see if the Iceberg
>>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if
>>>> there is, we'd love to contribute. I believe as Iceberg increases its
>>>> popularity, there will eventually be more systems that want such libraries.
>>>> There could have even been some ongoing works without consulting with the
>>>> community.
>>>>
>>>> Additionally, I think the initial Rust/C++ SDK can only support the
>>>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
>>>> query engines out there taking charge of data maintenance. We don't have to
>>>> rewrite every corner of Iceberg in Rust. That means less engineering work.
>>>>
>>>> On 2022/06/08 10:16:05 OpenInx wrote:
>>>> > As a cloud-native table format standard for the big-data ecosystem,  I
>>>> > believe supporting multiple languages is the correct direction so that
>>>> > different languages can connect to the apache iceberg table format.
>>>> >
>>>> > But I can also get Kyle's point about lacking enough
>>>> resources(developers
>>>> > and reviewers ) to accomplish this goal.  In my mind,  Python,
>>>> Golang, C++,
>>>> > Rust , all of them can be regarded as the native language support.
>>>> we may
>>>> > just need to support the Rust SDK and then all of the other languages
>>>> can
>>>> > just wrap the Rust SDK to access the table format.
>>>> >
>>>> > Anyway,  we will need to wait for the REST catalog finished before we
>>>> > introduce another languages support , because we can not access the
>>>> iceberg
>>>> > table by invoking the JVM catalog interfaces.
>>>> >
>>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <emkornfield@gmail.com
>>>> >
>>>> > wrote:
>>>> >
>>>> > > There’s also the question of how useful this would be in practice
>>>> given
>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>> > >> frameworks.
>>>> > >>
>>>> > >
>>>> > > One place this would be useful is for the Arrow's DataSet API [1].
>>>> An
>>>> > > option the Arrow community might be open to is hosting parts of the
>>>> code
>>>> > > there (this is what is done for Apache Parquet C++).  This helps
>>>> shape some
>>>> > > of the answers to other questions posed (ORC and Parquet are
>>>> already in the
>>>> > > Repo, it provides a Filesystem interface, etc).  The project doesn't
>>>> > > currently consume Avro, and I think the preferred approach is to
>>>> make a
>>>> > > clean room Avro parser.  But I agree this is a non-trivial effort
>>>> to get
>>>> > > underway.
>>>> > >
>>>> > > Another area to consider is compatibility testing.  I think before
>>>> a third
>>>> > > officially supported community library is introduced it would be
>>>> good to
>>>> > > have a compatibility framework in place to make sure
>>>> implementations are
>>>> > > all interpreting the specification correctly.  If there isn't
>>>> already an
>>>> > > effort here, I'd like to start contributing something (probably
>>>> will have
>>>> > > bandwidth sometime place in Q3).
>>>> > >
>>>> > > Thanks,
>>>> > > -Micah
>>>> > >
>>>> > >
>>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
>>>> > >
>>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io>
>>>> wrote:
>>>> > >
>>>> > >> Hi caneGuy,
>>>> > >>
>>>> > >> I personally don’t dislike this idea. I understand the performance
>>>> > >> benefits.
>>>> > >>
>>>> > >> But this would be a huge undertaking for the community. We’d need
>>>> to
>>>> > >> ensure we had sufficient developer support for reviews (likely one
>>>> of the
>>>> > >> biggest issues), as well as a number of other things. Particularly
>>>> > >> dependencies, package management, etc. We’d also need to scope
>>>> support down
>>>> > >> to specific OS / compilers etc.
>>>> > >>
>>>> > >> We’d also need to be sure we had adequate developer support from a
>>>> wide
>>>> > >> enough range of the community to support the project long term.
>>>> One issue
>>>> > >> in open source is that developers will work on something
>>>> tangential to
>>>> > >> their project in another repository, but nobody is available to
>>>> maintain it.
>>>> > >>
>>>> > >> There’s also the question of how useful this would be in practice
>>>> given
>>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>>> > >> frameworks.
>>>> > >>
>>>> > >> Again, I’m not opposed to the idea but just trying to be realistic
>>>> about
>>>> > >> the realities of such an undertaking. It would need full community
>>>> support
>>>> > >> (or at least support from enough community members to be
>>>> sustainable).
>>>> > >>
>>>> > >> If you wanted to make a design doc, the milestones tab in the
>>>> Iceberg
>>>> > >> project has some that you might use as reference.
>>>> > >>
>>>> > >> *I highly suggest you come to the next community sync and bring
>>>> this up
>>>> > >> to the community then.*
>>>> > >>
>>>> > >> If you’re not already on the invite list for the monthly community
>>>> sync,
>>>> > >> you can get on it by joining the Google group. You’ll receive
>>>> incites when
>>>> > >> they go out:
>>>> > >> https://groups.google.com/g/iceberg-sync
>>>> > >>
>>>> > >> Looking forward to seeing you at the next community sync.
>>>> > >>
>>>> > >> A design document and/or any prior art would be very helpful as the
>>>> > >> community sync does discuss many topics (possibly there is
>>>> existing C++
>>>> > >> support in StarRocks for Iceberg V1?).
>>>> > >>
>>>> > >> Thank you,
>>>> > >> Kyle Bendickson
>>>> > >> GitHub: kbendick
>>>> > >>
>>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>>>> > >>
>>>> > >>> Currently there is no existing effort to develop a C++ package.
>>>> That
>>>> > >>> being said I think it would be awesome to have one! If anyone is
>>>> willing to
>>>> > >>> start that development effort, I can help with some of the ground
>>>> work to
>>>> > >>> kickstart it.
>>>> > >>>
>>>> > >>> I would say the first step would be for someone to prepare a
>>>> high-level
>>>> > >>> proposal.
>>>> > >>>
>>>> > >>> -Sam
>>>> > >>>
>>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com>
>>>> wrote:
>>>> > >>>
>>>> > >>>> Hi team
>>>> > >>>> I am a dev from StarRocks community, and we have supported
>>>> iceberg v1
>>>> > >>>> format.
>>>> > >>>> We are also planning to support v2 format. If there is a C++
>>>> package,
>>>> > >>>> it will be very convenient for our implementation.
>>>> > >>>> At the same time, other c++ computing engines support v2 format
>>>> will
>>>> > >>>> also be faster.
>>>> > >>>>
>>>> > >>>> Do we have plans to support c++ version sdk?
>>>> > >>>> --
>>>> > >>>> caneGuy
>>>> > >>>>
>>>> > >>> --
>>>> > >>>
>>>> > >>> Sam Redai <sa...@tabular.io>
>>>> > >>>
>>>> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>> > >>>
>>>> > >>> c (267) 226-8606
>>>> > >>>
>>>> > >>
>>>> >
>>>>
>>>
>>
>> --
>>
>> Kyle Bendickson
>>
>> OSS Developer  |  Tabular <https://tabular.io/>
>>
>> kyle@tabular.io
>>
>

Re: 【Feature】Request support for c++ sdk

Posted by OpenInx <op...@gmail.com>.

Thanks Kyle for sharing your context.

Recently, I also spent some time practicing my Rust skills.  Generally,
I'm +1 for adding Rust SDK support for native language.


On Mon, Jun 13, 2022 at 12:51 PM Kyle Bendickson <ky...@tabular.io> wrote:

> Thanks for starting this discussion.
>
> I know I was the first to mention some of my concerns (which I still have
> and would apply to any new major change), but I also think that this is an
> avenue that should be explored.
>
> Specifically a native integration would have many benefits for read paths
> (in addition to others). I know that the Rust avro reader is
> significantly faster, as well as native columnar formats.
>
> So while I do have some concerns about making sure we have enough people
> to support this endeavor, I do want to say I think it's a really good idea.
> My apologies if I gave the impression otherwise.
>
> I would personally be interested in contributing to and reviewing for a
> native Rust library (or CPP, but I think Rust is a much more elegant
> language and I'd personally prefer to work in that as it's easier to work
> with across systems than C++ imo though I would defer to others on that).
>
> I would also be happy to offer my help and perspective in moving this
> forward if need be. But I did want to express my practical concerns so that
> we don't have an area of the codebase where there aren't enough people to
> help maintain it etc.
>
> But in general I think this is an exciting opportunity, and results have
> shown time and time again that native readers / writers are much more
> performant.
>
> +1 to using Rust as well (which is a language I know more of than C++
> these days - though both I'd have to brush off my skillset).
>
> Best, Kyle
>
> On Sun, Jun 12, 2022 at 8:20 PM OpenInx <op...@gmail.com> wrote:
>
>> Hi Tao Wu.
>>
>> I think the apache iceberg community is very consistent in providing the
>> Iceberg SDK for native languages.  I am very happy to offer my perspective
>> and help if needed when you try to move this thing forward.
>>
>> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:
>>
>>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
>>> system that is written in Rust.
>>>
>>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
>>> have better integration with the existing Iceberg ecosystem. Initially I
>>> found https://github.com/oliverdaff/iceberg-rs, but it appears the
>>> author hasn't been active lately. So I'm looking to see if the Iceberg
>>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if
>>> there is, we'd love to contribute. I believe as Iceberg increases its
>>> popularity, there will eventually be more systems that want such libraries.
>>> There could have even been some ongoing works without consulting with the
>>> community.
>>>
>>> Additionally, I think the initial Rust/C++ SDK can only support the
>>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
>>> query engines out there taking charge of data maintenance. We don't have to
>>> rewrite every corner of Iceberg in Rust. That means less engineering work.
>>>
>>> On 2022/06/08 10:16:05 OpenInx wrote:
>>> > As a cloud-native table format standard for the big-data ecosystem,  I
>>> > believe supporting multiple languages is the correct direction so that
>>> > different languages can connect to the apache iceberg table format.
>>> >
>>> > But I can also get Kyle's point about lacking enough
>>> resources(developers
>>> > and reviewers ) to accomplish this goal.  In my mind,  Python, Golang,
>>> C++,
>>> > Rust , all of them can be regarded as the native language support.  we
>>> may
>>> > just need to support the Rust SDK and then all of the other languages
>>> can
>>> > just wrap the Rust SDK to access the table format.
>>> >
>>> > Anyway,  we will need to wait for the REST catalog finished before we
>>> > introduce another languages support , because we can not access the
>>> iceberg
>>> > table by invoking the JVM catalog interfaces.
>>> >
>>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <em...@gmail.com>
>>> > wrote:
>>> >
>>> > > There’s also the question of how useful this would be in practice
>>> given
>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>> > >> frameworks.
>>> > >>
>>> > >
>>> > > One place this would be useful is for the Arrow's DataSet API [1].
>>> An
>>> > > option the Arrow community might be open to is hosting parts of the
>>> code
>>> > > there (this is what is done for Apache Parquet C++).  This helps
>>> shape some
>>> > > of the answers to other questions posed (ORC and Parquet are already
>>> in the
>>> > > Repo, it provides a Filesystem interface, etc).  The project doesn't
>>> > > currently consume Avro, and I think the preferred approach is to
>>> make a
>>> > > clean room Avro parser.  But I agree this is a non-trivial effort to
>>> get
>>> > > underway.
>>> > >
>>> > > Another area to consider is compatibility testing.  I think before a
>>> third
>>> > > officially supported community library is introduced it would be
>>> good to
>>> > > have a compatibility framework in place to make sure implementations
>>> are
>>> > > all interpreting the specification correctly.  If there isn't
>>> already an
>>> > > effort here, I'd like to start contributing something (probably will
>>> have
>>> > > bandwidth sometime place in Q3).
>>> > >
>>> > > Thanks,
>>> > > -Micah
>>> > >
>>> > >
>>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
>>> > >
>>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io>
>>> wrote:
>>> > >
>>> > >> Hi caneGuy,
>>> > >>
>>> > >> I personally don’t dislike this idea. I understand the performance
>>> > >> benefits.
>>> > >>
>>> > >> But this would be a huge undertaking for the community. We’d need to
>>> > >> ensure we had sufficient developer support for reviews (likely one
>>> of the
>>> > >> biggest issues), as well as a number of other things. Particularly
>>> > >> dependencies, package management, etc. We’d also need to scope
>>> support down
>>> > >> to specific OS / compilers etc.
>>> > >>
>>> > >> We’d also need to be sure we had adequate developer support from a
>>> wide
>>> > >> enough range of the community to support the project long term. One
>>> issue
>>> > >> in open source is that developers will work on something tangential
>>> to
>>> > >> their project in another repository, but nobody is available to
>>> maintain it.
>>> > >>
>>> > >> There’s also the question of how useful this would be in practice
>>> given
>>> > >> the complexity of using C++ (or Rust etc) within some of the major
>>> > >> frameworks.
>>> > >>
>>> > >> Again, I’m not opposed to the idea but just trying to be realistic
>>> about
>>> > >> the realities of such an undertaking. It would need full community
>>> support
>>> > >> (or at least support from enough community members to be
>>> sustainable).
>>> > >>
>>> > >> If you wanted to make a design doc, the milestones tab in the
>>> Iceberg
>>> > >> project has some that you might use as reference.
>>> > >>
>>> > >> *I highly suggest you come to the next community sync and bring
>>> this up
>>> > >> to the community then.*
>>> > >>
>>> > >> If you’re not already on the invite list for the monthly community
>>> sync,
>>> > >> you can get on it by joining the Google group. You’ll receive
>>> incites when
>>> > >> they go out:
>>> > >> https://groups.google.com/g/iceberg-sync
>>> > >>
>>> > >> Looking forward to seeing you at the next community sync.
>>> > >>
>>> > >> A design document and/or any prior art would be very helpful as the
>>> > >> community sync does discuss many topics (possibly there is existing
>>> C++
>>> > >> support in StarRocks for Iceberg V1?).
>>> > >>
>>> > >> Thank you,
>>> > >> Kyle Bendickson
>>> > >> GitHub: kbendick
>>> > >>
>>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>>> > >>
>>> > >>> Currently there is no existing effort to develop a C++ package.
>>> That
>>> > >>> being said I think it would be awesome to have one! If anyone is
>>> willing to
>>> > >>> start that development effort, I can help with some of the ground
>>> work to
>>> > >>> kickstart it.
>>> > >>>
>>> > >>> I would say the first step would be for someone to prepare a
>>> high-level
>>> > >>> proposal.
>>> > >>>
>>> > >>> -Sam
>>> > >>>
>>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com>
>>> wrote:
>>> > >>>
>>> > >>>> Hi team
>>> > >>>> I am a dev from StarRocks community, and we have supported
>>> iceberg v1
>>> > >>>> format.
>>> > >>>> We are also planning to support v2 format. If there is a C++
>>> package,
>>> > >>>> it will be very convenient for our implementation.
>>> > >>>> At the same time, other c++ computing engines support v2 format
>>> will
>>> > >>>> also be faster.
>>> > >>>>
>>> > >>>> Do we have plans to support c++ version sdk?
>>> > >>>> --
>>> > >>>> caneGuy
>>> > >>>>
>>> > >>> --
>>> > >>>
>>> > >>> Sam Redai <sa...@tabular.io>
>>> > >>>
>>> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
>>> > >>>
>>> > >>> c (267) 226-8606
>>> > >>>
>>> > >>
>>> >
>>>
>>
>
> --
>
> Kyle Bendickson
>
> OSS Developer  |  Tabular <https://tabular.io/>
>
> kyle@tabular.io
>

Re: 【Feature】Request support for c++ sdk

Posted by Kyle Bendickson <ky...@tabular.io>.

Thanks for starting this discussion.

I know I was the first to mention some of my concerns (which I still have
and would apply to any new major change), but I also think that this is an
avenue that should be explored.

Specifically a native integration would have many benefits for read paths
(in addition to others). I know that the Rust avro reader is
significantly faster, as well as native columnar formats.

So while I do have some concerns about making sure we have enough people to
support this endeavor, I do want to say I think it's a really good idea. My
apologies if I gave the impression otherwise.

I would personally be interested in contributing to and reviewing for a
native Rust library (or CPP, but I think Rust is a much more elegant
language and I'd personally prefer to work in that as it's easier to work
with across systems than C++ imo though I would defer to others on that).

I would also be happy to offer my help and perspective in moving this
forward if need be. But I did want to express my practical concerns so that
we don't have an area of the codebase where there aren't enough people to
help maintain it etc.

But in general I think this is an exciting opportunity, and results have
shown time and time again that native readers / writers are much more
performant.

+1 to using Rust as well (which is a language I know more of than C++ these
days - though both I'd have to brush off my skillset).

Best, Kyle

On Sun, Jun 12, 2022 at 8:20 PM OpenInx <op...@gmail.com> wrote:

> Hi Tao Wu.
>
> I think the apache iceberg community is very consistent in providing the
> Iceberg SDK for native languages.  I am very happy to offer my perspective
> and help if needed when you try to move this thing forward.
>
> On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:
>
>> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
>> system that is written in Rust.
>>
>> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
>> have better integration with the existing Iceberg ecosystem. Initially I
>> found https://github.com/oliverdaff/iceberg-rs, but it appears the
>> author hasn't been active lately. So I'm looking to see if the Iceberg
>> community has any consensus on a Rust/C++ SDK (Rust is preferable), and if
>> there is, we'd love to contribute. I believe as Iceberg increases its
>> popularity, there will eventually be more systems that want such libraries.
>> There could have even been some ongoing works without consulting with the
>> community.
>>
>> Additionally, I think the initial Rust/C++ SDK can only support the
>> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
>> query engines out there taking charge of data maintenance. We don't have to
>> rewrite every corner of Iceberg in Rust. That means less engineering work.
>>
>> On 2022/06/08 10:16:05 OpenInx wrote:
>> > As a cloud-native table format standard for the big-data ecosystem,  I
>> > believe supporting multiple languages is the correct direction so that
>> > different languages can connect to the apache iceberg table format.
>> >
>> > But I can also get Kyle's point about lacking enough
>> resources(developers
>> > and reviewers ) to accomplish this goal.  In my mind,  Python, Golang,
>> C++,
>> > Rust , all of them can be regarded as the native language support.  we
>> may
>> > just need to support the Rust SDK and then all of the other languages
>> can
>> > just wrap the Rust SDK to access the table format.
>> >
>> > Anyway,  we will need to wait for the REST catalog finished before we
>> > introduce another languages support , because we can not access the
>> iceberg
>> > table by invoking the JVM catalog interfaces.
>> >
>> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <em...@gmail.com>
>> > wrote:
>> >
>> > > There’s also the question of how useful this would be in practice
>> given
>> > >> the complexity of using C++ (or Rust etc) within some of the major
>> > >> frameworks.
>> > >>
>> > >
>> > > One place this would be useful is for the Arrow's DataSet API [1].  An
>> > > option the Arrow community might be open to is hosting parts of the
>> code
>> > > there (this is what is done for Apache Parquet C++).  This helps
>> shape some
>> > > of the answers to other questions posed (ORC and Parquet are already
>> in the
>> > > Repo, it provides a Filesystem interface, etc).  The project doesn't
>> > > currently consume Avro, and I think the preferred approach is to make
>> a
>> > > clean room Avro parser.  But I agree this is a non-trivial effort to
>> get
>> > > underway.
>> > >
>> > > Another area to consider is compatibility testing.  I think before a
>> third
>> > > officially supported community library is introduced it would be good
>> to
>> > > have a compatibility framework in place to make sure implementations
>> are
>> > > all interpreting the specification correctly.  If there isn't already
>> an
>> > > effort here, I'd like to start contributing something (probably will
>> have
>> > > bandwidth sometime place in Q3).
>> > >
>> > > Thanks,
>> > > -Micah
>> > >
>> > >
>> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
>> > >
>> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io>
>> wrote:
>> > >
>> > >> Hi caneGuy,
>> > >>
>> > >> I personally don’t dislike this idea. I understand the performance
>> > >> benefits.
>> > >>
>> > >> But this would be a huge undertaking for the community. We’d need to
>> > >> ensure we had sufficient developer support for reviews (likely one
>> of the
>> > >> biggest issues), as well as a number of other things. Particularly
>> > >> dependencies, package management, etc. We’d also need to scope
>> support down
>> > >> to specific OS / compilers etc.
>> > >>
>> > >> We’d also need to be sure we had adequate developer support from a
>> wide
>> > >> enough range of the community to support the project long term. One
>> issue
>> > >> in open source is that developers will work on something tangential
>> to
>> > >> their project in another repository, but nobody is available to
>> maintain it.
>> > >>
>> > >> There’s also the question of how useful this would be in practice
>> given
>> > >> the complexity of using C++ (or Rust etc) within some of the major
>> > >> frameworks.
>> > >>
>> > >> Again, I’m not opposed to the idea but just trying to be realistic
>> about
>> > >> the realities of such an undertaking. It would need full community
>> support
>> > >> (or at least support from enough community members to be
>> sustainable).
>> > >>
>> > >> If you wanted to make a design doc, the milestones tab in the Iceberg
>> > >> project has some that you might use as reference.
>> > >>
>> > >> *I highly suggest you come to the next community sync and bring this
>> up
>> > >> to the community then.*
>> > >>
>> > >> If you’re not already on the invite list for the monthly community
>> sync,
>> > >> you can get on it by joining the Google group. You’ll receive
>> incites when
>> > >> they go out:
>> > >> https://groups.google.com/g/iceberg-sync
>> > >>
>> > >> Looking forward to seeing you at the next community sync.
>> > >>
>> > >> A design document and/or any prior art would be very helpful as the
>> > >> community sync does discuss many topics (possibly there is existing
>> C++
>> > >> support in StarRocks for Iceberg V1?).
>> > >>
>> > >> Thank you,
>> > >> Kyle Bendickson
>> > >> GitHub: kbendick
>> > >>
>> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>> > >>
>> > >>> Currently there is no existing effort to develop a C++ package. That
>> > >>> being said I think it would be awesome to have one! If anyone is
>> willing to
>> > >>> start that development effort, I can help with some of the ground
>> work to
>> > >>> kickstart it.
>> > >>>
>> > >>> I would say the first step would be for someone to prepare a
>> high-level
>> > >>> proposal.
>> > >>>
>> > >>> -Sam
>> > >>>
>> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com>
>> wrote:
>> > >>>
>> > >>>> Hi team
>> > >>>> I am a dev from StarRocks community, and we have supported iceberg
>> v1
>> > >>>> format.
>> > >>>> We are also planning to support v2 format. If there is a C++
>> package,
>> > >>>> it will be very convenient for our implementation.
>> > >>>> At the same time, other c++ computing engines support v2 format
>> will
>> > >>>> also be faster.
>> > >>>>
>> > >>>> Do we have plans to support c++ version sdk?
>> > >>>> --
>> > >>>> caneGuy
>> > >>>>
>> > >>> --
>> > >>>
>> > >>> Sam Redai <sa...@tabular.io>
>> > >>>
>> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
>> > >>>
>> > >>> c (267) 226-8606
>> > >>>
>> > >>
>> >
>>
>

-- 

Kyle Bendickson

OSS Developer  |  Tabular <https://tabular.io/>

kyle@tabular.io

Re: 【Feature】Request support for c++ sdk

Posted by OpenInx <op...@gmail.com>.

Hi Tao Wu.

I think the apache iceberg community is very consistent in providing the
Iceberg SDK for native languages.  I am very happy to offer my perspective
and help if needed when you try to move this thing forward.

On Mon, Jun 13, 2022 at 11:04 AM Wu Tao <wu...@apache.org> wrote:

> Hi, everyone, I'm Tao. I'm currently working on a commercial streaming
> system that is written in Rust.
>
> Actually, I'm planning to implement an Iceberg Rust SDK so that we can
> have better integration with the existing Iceberg ecosystem. Initially I
> found https://github.com/oliverdaff/iceberg-rs, but it appears the author
> hasn't been active lately. So I'm looking to see if the Iceberg community
> has any consensus on a Rust/C++ SDK (Rust is preferable), and if there is,
> we'd love to contribute. I believe as Iceberg increases its popularity,
> there will eventually be more systems that want such libraries. There could
> have even been some ongoing works without consulting with the community.
>
> Additionally, I think the initial Rust/C++ SDK can only support the
> reader&writer sides of Iceberg. Because there have been plenty of JVM-based
> query engines out there taking charge of data maintenance. We don't have to
> rewrite every corner of Iceberg in Rust. That means less engineering work.
>
> On 2022/06/08 10:16:05 OpenInx wrote:
> > As a cloud-native table format standard for the big-data ecosystem,  I
> > believe supporting multiple languages is the correct direction so that
> > different languages can connect to the apache iceberg table format.
> >
> > But I can also get Kyle's point about lacking enough resources(developers
> > and reviewers ) to accomplish this goal.  In my mind,  Python, Golang,
> C++,
> > Rust , all of them can be regarded as the native language support.  we
> may
> > just need to support the Rust SDK and then all of the other languages can
> > just wrap the Rust SDK to access the table format.
> >
> > Anyway,  we will need to wait for the REST catalog finished before we
> > introduce another languages support , because we can not access the
> iceberg
> > table by invoking the JVM catalog interfaces.
> >
> > On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> > > There’s also the question of how useful this would be in practice given
> > >> the complexity of using C++ (or Rust etc) within some of the major
> > >> frameworks.
> > >>
> > >
> > > One place this would be useful is for the Arrow's DataSet API [1].  An
> > > option the Arrow community might be open to is hosting parts of the
> code
> > > there (this is what is done for Apache Parquet C++).  This helps shape
> some
> > > of the answers to other questions posed (ORC and Parquet are already
> in the
> > > Repo, it provides a Filesystem interface, etc).  The project doesn't
> > > currently consume Avro, and I think the preferred approach is to make a
> > > clean room Avro parser.  But I agree this is a non-trivial effort to
> get
> > > underway.
> > >
> > > Another area to consider is compatibility testing.  I think before a
> third
> > > officially supported community library is introduced it would be good
> to
> > > have a compatibility framework in place to make sure implementations
> are
> > > all interpreting the specification correctly.  If there isn't already
> an
> > > effort here, I'd like to start contributing something (probably will
> have
> > > bandwidth sometime place in Q3).
> > >
> > > Thanks,
> > > -Micah
> > >
> > >
> > > [1] https://arrow.apache.org/docs/cpp/dataset.html
> > >
> > > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io>
> wrote:
> > >
> > >> Hi caneGuy,
> > >>
> > >> I personally don’t dislike this idea. I understand the performance
> > >> benefits.
> > >>
> > >> But this would be a huge undertaking for the community. We’d need to
> > >> ensure we had sufficient developer support for reviews (likely one of
> the
> > >> biggest issues), as well as a number of other things. Particularly
> > >> dependencies, package management, etc. We’d also need to scope
> support down
> > >> to specific OS / compilers etc.
> > >>
> > >> We’d also need to be sure we had adequate developer support from a
> wide
> > >> enough range of the community to support the project long term. One
> issue
> > >> in open source is that developers will work on something tangential to
> > >> their project in another repository, but nobody is available to
> maintain it.
> > >>
> > >> There’s also the question of how useful this would be in practice
> given
> > >> the complexity of using C++ (or Rust etc) within some of the major
> > >> frameworks.
> > >>
> > >> Again, I’m not opposed to the idea but just trying to be realistic
> about
> > >> the realities of such an undertaking. It would need full community
> support
> > >> (or at least support from enough community members to be sustainable).
> > >>
> > >> If you wanted to make a design doc, the milestones tab in the Iceberg
> > >> project has some that you might use as reference.
> > >>
> > >> *I highly suggest you come to the next community sync and bring this
> up
> > >> to the community then.*
> > >>
> > >> If you’re not already on the invite list for the monthly community
> sync,
> > >> you can get on it by joining the Google group. You’ll receive incites
> when
> > >> they go out:
> > >> https://groups.google.com/g/iceberg-sync
> > >>
> > >> Looking forward to seeing you at the next community sync.
> > >>
> > >> A design document and/or any prior art would be very helpful as the
> > >> community sync does discuss many topics (possibly there is existing
> C++
> > >> support in StarRocks for Iceberg V1?).
> > >>
> > >> Thank you,
> > >> Kyle Bendickson
> > >> GitHub: kbendick
> > >>
> > >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
> > >>
> > >>> Currently there is no existing effort to develop a C++ package. That
> > >>> being said I think it would be awesome to have one! If anyone is
> willing to
> > >>> start that development effort, I can help with some of the ground
> work to
> > >>> kickstart it.
> > >>>
> > >>> I would say the first step would be for someone to prepare a
> high-level
> > >>> proposal.
> > >>>
> > >>> -Sam
> > >>>
> > >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:
> > >>>
> > >>>> Hi team
> > >>>> I am a dev from StarRocks community, and we have supported iceberg
> v1
> > >>>> format.
> > >>>> We are also planning to support v2 format. If there is a C++
> package,
> > >>>> it will be very convenient for our implementation.
> > >>>> At the same time, other c++ computing engines support v2 format will
> > >>>> also be faster.
> > >>>>
> > >>>> Do we have plans to support c++ version sdk?
> > >>>> --
> > >>>> caneGuy
> > >>>>
> > >>> --
> > >>>
> > >>> Sam Redai <sa...@tabular.io>
> > >>>
> > >>> Developer Advocate  |  Tabular <https://tabular.io/>
> > >>>
> > >>> c (267) 226-8606
> > >>>
> > >>
> >
>

Re: 【Feature】Request support for c++ sdk

Posted by Wu Tao <wu...@apache.org>.

Hi, everyone, I'm Tao. I'm currently working on a commercial streaming system that is written in Rust.

Actually, I'm planning to implement an Iceberg Rust SDK so that we can have better integration with the existing Iceberg ecosystem. Initially I found https://github.com/oliverdaff/iceberg-rs, but it appears the author hasn't been active lately. So I'm looking to see if the Iceberg community has any consensus on a Rust/C++ SDK (Rust is preferable), and if there is, we'd love to contribute. I believe as Iceberg increases its popularity, there will eventually be more systems that want such libraries. There could have even been some ongoing works without consulting with the community.

Additionally, I think the initial Rust/C++ SDK can only support the reader&writer sides of Iceberg. Because there have been plenty of JVM-based query engines out there taking charge of data maintenance. We don't have to rewrite every corner of Iceberg in Rust. That means less engineering work.

On 2022/06/08 10:16:05 OpenInx wrote:
> As a cloud-native table format standard for the big-data ecosystem,  I
> believe supporting multiple languages is the correct direction so that
> different languages can connect to the apache iceberg table format.
> 
> But I can also get Kyle's point about lacking enough resources(developers
> and reviewers ) to accomplish this goal.  In my mind,  Python, Golang, C++,
> Rust , all of them can be regarded as the native language support.  we may
> just need to support the Rust SDK and then all of the other languages can
> just wrap the Rust SDK to access the table format.
> 
> Anyway,  we will need to wait for the REST catalog finished before we
> introduce another languages support , because we can not access the iceberg
> table by invoking the JVM catalog interfaces.
> 
> On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <em...@gmail.com>
> wrote:
> 
> > There’s also the question of how useful this would be in practice given
> >> the complexity of using C++ (or Rust etc) within some of the major
> >> frameworks.
> >>
> >
> > One place this would be useful is for the Arrow's DataSet API [1].  An
> > option the Arrow community might be open to is hosting parts of the code
> > there (this is what is done for Apache Parquet C++).  This helps shape some
> > of the answers to other questions posed (ORC and Parquet are already in the
> > Repo, it provides a Filesystem interface, etc).  The project doesn't
> > currently consume Avro, and I think the preferred approach is to make a
> > clean room Avro parser.  But I agree this is a non-trivial effort to get
> > underway.
> >
> > Another area to consider is compatibility testing.  I think before a third
> > officially supported community library is introduced it would be good to
> > have a compatibility framework in place to make sure implementations are
> > all interpreting the specification correctly.  If there isn't already an
> > effort here, I'd like to start contributing something (probably will have
> > bandwidth sometime place in Q3).
> >
> > Thanks,
> > -Micah
> >
> >
> > [1] https://arrow.apache.org/docs/cpp/dataset.html
> >
> > On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io> wrote:
> >
> >> Hi caneGuy,
> >>
> >> I personally don’t dislike this idea. I understand the performance
> >> benefits.
> >>
> >> But this would be a huge undertaking for the community. We’d need to
> >> ensure we had sufficient developer support for reviews (likely one of the
> >> biggest issues), as well as a number of other things. Particularly
> >> dependencies, package management, etc. We’d also need to scope support down
> >> to specific OS / compilers etc.
> >>
> >> We’d also need to be sure we had adequate developer support from a wide
> >> enough range of the community to support the project long term. One issue
> >> in open source is that developers will work on something tangential to
> >> their project in another repository, but nobody is available to maintain it.
> >>
> >> There’s also the question of how useful this would be in practice given
> >> the complexity of using C++ (or Rust etc) within some of the major
> >> frameworks.
> >>
> >> Again, I’m not opposed to the idea but just trying to be realistic about
> >> the realities of such an undertaking. It would need full community support
> >> (or at least support from enough community members to be sustainable).
> >>
> >> If you wanted to make a design doc, the milestones tab in the Iceberg
> >> project has some that you might use as reference.
> >>
> >> *I highly suggest you come to the next community sync and bring this up
> >> to the community then.*
> >>
> >> If you’re not already on the invite list for the monthly community sync,
> >> you can get on it by joining the Google group. You’ll receive incites when
> >> they go out:
> >> https://groups.google.com/g/iceberg-sync
> >>
> >> Looking forward to seeing you at the next community sync.
> >>
> >> A design document and/or any prior art would be very helpful as the
> >> community sync does discuss many topics (possibly there is existing C++
> >> support in StarRocks for Iceberg V1?).
> >>
> >> Thank you,
> >> Kyle Bendickson
> >> GitHub: kbendick
> >>
> >> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
> >>
> >>> Currently there is no existing effort to develop a C++ package. That
> >>> being said I think it would be awesome to have one! If anyone is willing to
> >>> start that development effort, I can help with some of the ground work to
> >>> kickstart it.
> >>>
> >>> I would say the first step would be for someone to prepare a high-level
> >>> proposal.
> >>>
> >>> -Sam
> >>>
> >>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:
> >>>
> >>>> Hi team
> >>>> I am a dev from StarRocks community, and we have supported iceberg v1
> >>>> format.
> >>>> We are also planning to support v2 format. If there is a C++ package,
> >>>> it will be very convenient for our implementation.
> >>>> At the same time, other c++ computing engines support v2 format will
> >>>> also be faster.
> >>>>
> >>>> Do we have plans to support c++ version sdk?
> >>>> --
> >>>> caneGuy
> >>>>
> >>> --
> >>>
> >>> Sam Redai <sa...@tabular.io>
> >>>
> >>> Developer Advocate  |  Tabular <https://tabular.io/>
> >>>
> >>> c (267) 226-8606
> >>>
> >>
>

Re: 【Feature】Request support for c++ sdk

Posted by Weston Pace <we...@gmail.com>.

As Micah said, this would be pretty cool to use in Arrow datasets.  I can't
make any promises about helping develop it but if it were developed I could
help integrate into Arrow datasets / Acero and provide some proof of
concept.

On Wed, Jun 8, 2022, 6:35 AM Ryan Blue <bl...@tabular.io> wrote:

> While I understand Kyle's concerns, I'm all for a C++ or Rust
> implementation.
>
> We know that this is going to help a lot of people that want to integrate
> Iceberg in engines that are outside the JVM ecosystem. I think it would be
> great to work with anyone that is interested and build up the community in
> this area!
>
> Ryan
>
> On Wed, Jun 8, 2022 at 3:16 AM OpenInx <op...@gmail.com> wrote:
>
>> As a cloud-native table format standard for the big-data ecosystem,  I
>> believe supporting multiple languages is the correct direction so that
>> different languages can connect to the apache iceberg table format.
>>
>> But I can also get Kyle's point about lacking enough resources(developers
>> and reviewers ) to accomplish this goal.  In my mind,  Python, Golang, C++,
>> Rust , all of them can be regarded as the native language support.  we may
>> just need to support the Rust SDK and then all of the other languages can
>> just wrap the Rust SDK to access the table format.
>>
>> Anyway,  we will need to wait for the REST catalog finished before we
>> introduce another languages support , because we can not access the iceberg
>> table by invoking the JVM catalog interfaces.
>>
>> On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <em...@gmail.com>
>> wrote:
>>
>>> There’s also the question of how useful this would be in practice given
>>>> the complexity of using C++ (or Rust etc) within some of the major
>>>> frameworks.
>>>>
>>>
>>> One place this would be useful is for the Arrow's DataSet API [1].  An
>>> option the Arrow community might be open to is hosting parts of the code
>>> there (this is what is done for Apache Parquet C++).  This helps shape some
>>> of the answers to other questions posed (ORC and Parquet are already in the
>>> Repo, it provides a Filesystem interface, etc).  The project doesn't
>>> currently consume Avro, and I think the preferred approach is to make a
>>> clean room Avro parser.  But I agree this is a non-trivial effort to get
>>> underway.
>>>
>>> Another area to consider is compatibility testing.  I think before a
>>> third officially supported community library is introduced it would be good
>>> to have a compatibility framework in place to make sure implementations are
>>> all interpreting the specification correctly.  If there isn't already an
>>> effort here, I'd like to start contributing something (probably will have
>>> bandwidth sometime place in Q3).
>>>
>>> Thanks,
>>> -Micah
>>>
>>>
>>> [1] https://arrow.apache.org/docs/cpp/dataset.html
>>>
>>> On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io> wrote:
>>>
>>>> Hi caneGuy,
>>>>
>>>> I personally don’t dislike this idea. I understand the performance
>>>> benefits.
>>>>
>>>> But this would be a huge undertaking for the community. We’d need to
>>>> ensure we had sufficient developer support for reviews (likely one of the
>>>> biggest issues), as well as a number of other things. Particularly
>>>> dependencies, package management, etc. We’d also need to scope support down
>>>> to specific OS / compilers etc.
>>>>
>>>> We’d also need to be sure we had adequate developer support from a wide
>>>> enough range of the community to support the project long term. One issue
>>>> in open source is that developers will work on something tangential to
>>>> their project in another repository, but nobody is available to maintain it.
>>>>
>>>> There’s also the question of how useful this would be in practice given
>>>> the complexity of using C++ (or Rust etc) within some of the major
>>>> frameworks.
>>>>
>>>> Again, I’m not opposed to the idea but just trying to be realistic
>>>> about the realities of such an undertaking. It would need full community
>>>> support (or at least support from enough community members to be
>>>> sustainable).
>>>>
>>>> If you wanted to make a design doc, the milestones tab in the Iceberg
>>>> project has some that you might use as reference.
>>>>
>>>> *I highly suggest you come to the next community sync and bring this up
>>>> to the community then.*
>>>>
>>>> If you’re not already on the invite list for the monthly community
>>>> sync, you can get on it by joining the Google group. You’ll receive incites
>>>> when they go out:
>>>> https://groups.google.com/g/iceberg-sync
>>>>
>>>> Looking forward to seeing you at the next community sync.
>>>>
>>>> A design document and/or any prior art would be very helpful as the
>>>> community sync does discuss many topics (possibly there is existing C++
>>>> support in StarRocks for Iceberg V1?).
>>>>
>>>> Thank you,
>>>> Kyle Bendickson
>>>> GitHub: kbendick
>>>>
>>>> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>>>>
>>>>> Currently there is no existing effort to develop a C++ package. That
>>>>> being said I think it would be awesome to have one! If anyone is willing to
>>>>> start that development effort, I can help with some of the ground work to
>>>>> kickstart it.
>>>>>
>>>>> I would say the first step would be for someone to prepare a
>>>>> high-level proposal.
>>>>>
>>>>> -Sam
>>>>>
>>>>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:
>>>>>
>>>>>> Hi team
>>>>>> I am a dev from StarRocks community, and we have supported iceberg v1
>>>>>> format.
>>>>>> We are also planning to support v2 format. If there is a C++ package,
>>>>>> it will be very convenient for our implementation.
>>>>>> At the same time, other c++ computing engines support v2 format will
>>>>>> also be faster.
>>>>>>
>>>>>> Do we have plans to support c++ version sdk?
>>>>>> --
>>>>>> caneGuy
>>>>>>
>>>>> --
>>>>>
>>>>> Sam Redai <sa...@tabular.io>
>>>>>
>>>>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>>>
>>>>> c (267) 226-8606
>>>>>
>>>>
>
> --
> Ryan Blue
> Tabular
>

Re: 【Feature】Request support for c++ sdk

Posted by Ryan Blue <bl...@tabular.io>.

While I understand Kyle's concerns, I'm all for a C++ or Rust
implementation.

We know that this is going to help a lot of people that want to integrate
Iceberg in engines that are outside the JVM ecosystem. I think it would be
great to work with anyone that is interested and build up the community in
this area!

Ryan

On Wed, Jun 8, 2022 at 3:16 AM OpenInx <op...@gmail.com> wrote:

> As a cloud-native table format standard for the big-data ecosystem,  I
> believe supporting multiple languages is the correct direction so that
> different languages can connect to the apache iceberg table format.
>
> But I can also get Kyle's point about lacking enough resources(developers
> and reviewers ) to accomplish this goal.  In my mind,  Python, Golang, C++,
> Rust , all of them can be regarded as the native language support.  we may
> just need to support the Rust SDK and then all of the other languages can
> just wrap the Rust SDK to access the table format.
>
> Anyway,  we will need to wait for the REST catalog finished before we
> introduce another languages support , because we can not access the iceberg
> table by invoking the JVM catalog interfaces.
>
> On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <em...@gmail.com>
> wrote:
>
>> There’s also the question of how useful this would be in practice given
>>> the complexity of using C++ (or Rust etc) within some of the major
>>> frameworks.
>>>
>>
>> One place this would be useful is for the Arrow's DataSet API [1].  An
>> option the Arrow community might be open to is hosting parts of the code
>> there (this is what is done for Apache Parquet C++).  This helps shape some
>> of the answers to other questions posed (ORC and Parquet are already in the
>> Repo, it provides a Filesystem interface, etc).  The project doesn't
>> currently consume Avro, and I think the preferred approach is to make a
>> clean room Avro parser.  But I agree this is a non-trivial effort to get
>> underway.
>>
>> Another area to consider is compatibility testing.  I think before a
>> third officially supported community library is introduced it would be good
>> to have a compatibility framework in place to make sure implementations are
>> all interpreting the specification correctly.  If there isn't already an
>> effort here, I'd like to start contributing something (probably will have
>> bandwidth sometime place in Q3).
>>
>> Thanks,
>> -Micah
>>
>>
>> [1] https://arrow.apache.org/docs/cpp/dataset.html
>>
>> On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io> wrote:
>>
>>> Hi caneGuy,
>>>
>>> I personally don’t dislike this idea. I understand the performance
>>> benefits.
>>>
>>> But this would be a huge undertaking for the community. We’d need to
>>> ensure we had sufficient developer support for reviews (likely one of the
>>> biggest issues), as well as a number of other things. Particularly
>>> dependencies, package management, etc. We’d also need to scope support down
>>> to specific OS / compilers etc.
>>>
>>> We’d also need to be sure we had adequate developer support from a wide
>>> enough range of the community to support the project long term. One issue
>>> in open source is that developers will work on something tangential to
>>> their project in another repository, but nobody is available to maintain it.
>>>
>>> There’s also the question of how useful this would be in practice given
>>> the complexity of using C++ (or Rust etc) within some of the major
>>> frameworks.
>>>
>>> Again, I’m not opposed to the idea but just trying to be realistic about
>>> the realities of such an undertaking. It would need full community support
>>> (or at least support from enough community members to be sustainable).
>>>
>>> If you wanted to make a design doc, the milestones tab in the Iceberg
>>> project has some that you might use as reference.
>>>
>>> *I highly suggest you come to the next community sync and bring this up
>>> to the community then.*
>>>
>>> If you’re not already on the invite list for the monthly community sync,
>>> you can get on it by joining the Google group. You’ll receive incites when
>>> they go out:
>>> https://groups.google.com/g/iceberg-sync
>>>
>>> Looking forward to seeing you at the next community sync.
>>>
>>> A design document and/or any prior art would be very helpful as the
>>> community sync does discuss many topics (possibly there is existing C++
>>> support in StarRocks for Iceberg V1?).
>>>
>>> Thank you,
>>> Kyle Bendickson
>>> GitHub: kbendick
>>>
>>> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>>>
>>>> Currently there is no existing effort to develop a C++ package. That
>>>> being said I think it would be awesome to have one! If anyone is willing to
>>>> start that development effort, I can help with some of the ground work to
>>>> kickstart it.
>>>>
>>>> I would say the first step would be for someone to prepare a high-level
>>>> proposal.
>>>>
>>>> -Sam
>>>>
>>>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:
>>>>
>>>>> Hi team
>>>>> I am a dev from StarRocks community, and we have supported iceberg v1
>>>>> format.
>>>>> We are also planning to support v2 format. If there is a C++ package,
>>>>> it will be very convenient for our implementation.
>>>>> At the same time, other c++ computing engines support v2 format will
>>>>> also be faster.
>>>>>
>>>>> Do we have plans to support c++ version sdk?
>>>>> --
>>>>> caneGuy
>>>>>
>>>> --
>>>>
>>>> Sam Redai <sa...@tabular.io>
>>>>
>>>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>>
>>>> c (267) 226-8606
>>>>
>>>

-- 
Ryan Blue
Tabular

Re: 【Feature】Request support for c++ sdk

Posted by OpenInx <op...@gmail.com>.

As a cloud-native table format standard for the big-data ecosystem,  I
believe supporting multiple languages is the correct direction so that
different languages can connect to the apache iceberg table format.

But I can also get Kyle's point about lacking enough resources(developers
and reviewers ) to accomplish this goal.  In my mind,  Python, Golang, C++,
Rust , all of them can be regarded as the native language support.  we may
just need to support the Rust SDK and then all of the other languages can
just wrap the Rust SDK to access the table format.

Anyway,  we will need to wait for the REST catalog finished before we
introduce another languages support , because we can not access the iceberg
table by invoking the JVM catalog interfaces.

On Tue, Jun 7, 2022 at 4:41 AM Micah Kornfield <em...@gmail.com>
wrote:

> There’s also the question of how useful this would be in practice given
>> the complexity of using C++ (or Rust etc) within some of the major
>> frameworks.
>>
>
> One place this would be useful is for the Arrow's DataSet API [1].  An
> option the Arrow community might be open to is hosting parts of the code
> there (this is what is done for Apache Parquet C++).  This helps shape some
> of the answers to other questions posed (ORC and Parquet are already in the
> Repo, it provides a Filesystem interface, etc).  The project doesn't
> currently consume Avro, and I think the preferred approach is to make a
> clean room Avro parser.  But I agree this is a non-trivial effort to get
> underway.
>
> Another area to consider is compatibility testing.  I think before a third
> officially supported community library is introduced it would be good to
> have a compatibility framework in place to make sure implementations are
> all interpreting the specification correctly.  If there isn't already an
> effort here, I'd like to start contributing something (probably will have
> bandwidth sometime place in Q3).
>
> Thanks,
> -Micah
>
>
> [1] https://arrow.apache.org/docs/cpp/dataset.html
>
> On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io> wrote:
>
>> Hi caneGuy,
>>
>> I personally don’t dislike this idea. I understand the performance
>> benefits.
>>
>> But this would be a huge undertaking for the community. We’d need to
>> ensure we had sufficient developer support for reviews (likely one of the
>> biggest issues), as well as a number of other things. Particularly
>> dependencies, package management, etc. We’d also need to scope support down
>> to specific OS / compilers etc.
>>
>> We’d also need to be sure we had adequate developer support from a wide
>> enough range of the community to support the project long term. One issue
>> in open source is that developers will work on something tangential to
>> their project in another repository, but nobody is available to maintain it.
>>
>> There’s also the question of how useful this would be in practice given
>> the complexity of using C++ (or Rust etc) within some of the major
>> frameworks.
>>
>> Again, I’m not opposed to the idea but just trying to be realistic about
>> the realities of such an undertaking. It would need full community support
>> (or at least support from enough community members to be sustainable).
>>
>> If you wanted to make a design doc, the milestones tab in the Iceberg
>> project has some that you might use as reference.
>>
>> *I highly suggest you come to the next community sync and bring this up
>> to the community then.*
>>
>> If you’re not already on the invite list for the monthly community sync,
>> you can get on it by joining the Google group. You’ll receive incites when
>> they go out:
>> https://groups.google.com/g/iceberg-sync
>>
>> Looking forward to seeing you at the next community sync.
>>
>> A design document and/or any prior art would be very helpful as the
>> community sync does discuss many topics (possibly there is existing C++
>> support in StarRocks for Iceberg V1?).
>>
>> Thank you,
>> Kyle Bendickson
>> GitHub: kbendick
>>
>> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>>
>>> Currently there is no existing effort to develop a C++ package. That
>>> being said I think it would be awesome to have one! If anyone is willing to
>>> start that development effort, I can help with some of the ground work to
>>> kickstart it.
>>>
>>> I would say the first step would be for someone to prepare a high-level
>>> proposal.
>>>
>>> -Sam
>>>
>>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:
>>>
>>>> Hi team
>>>> I am a dev from StarRocks community, and we have supported iceberg v1
>>>> format.
>>>> We are also planning to support v2 format. If there is a C++ package,
>>>> it will be very convenient for our implementation.
>>>> At the same time, other c++ computing engines support v2 format will
>>>> also be faster.
>>>>
>>>> Do we have plans to support c++ version sdk?
>>>> --
>>>> caneGuy
>>>>
>>> --
>>>
>>> Sam Redai <sa...@tabular.io>
>>>
>>> Developer Advocate  |  Tabular <https://tabular.io/>
>>>
>>> c (267) 226-8606
>>>
>>

Re: 【Feature】Request support for c++ sdk

Posted by Micah Kornfield <em...@gmail.com>.

>
> There’s also the question of how useful this would be in practice given
> the complexity of using C++ (or Rust etc) within some of the major
> frameworks.
>

One place this would be useful is for the Arrow's DataSet API [1].  An
option the Arrow community might be open to is hosting parts of the code
there (this is what is done for Apache Parquet C++).  This helps shape some
of the answers to other questions posed (ORC and Parquet are already in the
Repo, it provides a Filesystem interface, etc).  The project doesn't
currently consume Avro, and I think the preferred approach is to make a
clean room Avro parser.  But I agree this is a non-trivial effort to get
underway.

Another area to consider is compatibility testing.  I think before a third
officially supported community library is introduced it would be good to
have a compatibility framework in place to make sure implementations are
all interpreting the specification correctly.  If there isn't already an
effort here, I'd like to start contributing something (probably will have
bandwidth sometime place in Q3).

Thanks,
-Micah


[1] https://arrow.apache.org/docs/cpp/dataset.html

On Sun, Jun 5, 2022 at 11:07 PM Kyle Bendickson <ky...@tabular.io> wrote:

> Hi caneGuy,
>
> I personally don’t dislike this idea. I understand the performance
> benefits.
>
> But this would be a huge undertaking for the community. We’d need to
> ensure we had sufficient developer support for reviews (likely one of the
> biggest issues), as well as a number of other things. Particularly
> dependencies, package management, etc. We’d also need to scope support down
> to specific OS / compilers etc.
>
> We’d also need to be sure we had adequate developer support from a wide
> enough range of the community to support the project long term. One issue
> in open source is that developers will work on something tangential to
> their project in another repository, but nobody is available to maintain it.
>
> There’s also the question of how useful this would be in practice given
> the complexity of using C++ (or Rust etc) within some of the major
> frameworks.
>
> Again, I’m not opposed to the idea but just trying to be realistic about
> the realities of such an undertaking. It would need full community support
> (or at least support from enough community members to be sustainable).
>
> If you wanted to make a design doc, the milestones tab in the Iceberg
> project has some that you might use as reference.
>
> *I highly suggest you come to the next community sync and bring this up to
> the community then.*
>
> If you’re not already on the invite list for the monthly community sync,
> you can get on it by joining the Google group. You’ll receive incites when
> they go out:
> https://groups.google.com/g/iceberg-sync
>
> Looking forward to seeing you at the next community sync.
>
> A design document and/or any prior art would be very helpful as the
> community sync does discuss many topics (possibly there is existing C++
> support in StarRocks for Iceberg V1?).
>
> Thank you,
> Kyle Bendickson
> GitHub: kbendick
>
> On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:
>
>> Currently there is no existing effort to develop a C++ package. That
>> being said I think it would be awesome to have one! If anyone is willing to
>> start that development effort, I can help with some of the ground work to
>> kickstart it.
>>
>> I would say the first step would be for someone to prepare a high-level
>> proposal.
>>
>> -Sam
>>
>> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:
>>
>>> Hi team
>>> I am a dev from StarRocks community, and we have supported iceberg v1
>>> format.
>>> We are also planning to support v2 format. If there is a C++ package, it
>>> will be very convenient for our implementation.
>>> At the same time, other c++ computing engines support v2 format will
>>> also be faster.
>>>
>>> Do we have plans to support c++ version sdk?
>>> --
>>> caneGuy
>>>
>> --
>>
>> Sam Redai <sa...@tabular.io>
>>
>> Developer Advocate  |  Tabular <https://tabular.io/>
>>
>> c (267) 226-8606
>>
>

Re: 【Feature】Request support for c++ sdk

Posted by Kyle Bendickson <ky...@tabular.io>.

Hi caneGuy,

I personally don’t dislike this idea. I understand the performance benefits.

But this would be a huge undertaking for the community. We’d need to ensure
we had sufficient developer support for reviews (likely one of the biggest
issues), as well as a number of other things. Particularly dependencies,
package management, etc. We’d also need to scope support down to specific
OS / compilers etc.

We’d also need to be sure we had adequate developer support from a wide
enough range of the community to support the project long term. One issue
in open source is that developers will work on something tangential to
their project in another repository, but nobody is available to maintain it.

There’s also the question of how useful this would be in practice given the
complexity of using C++ (or Rust etc) within some of the major frameworks.

Again, I’m not opposed to the idea but just trying to be realistic about
the realities of such an undertaking. It would need full community support
(or at least support from enough community members to be sustainable).

If you wanted to make a design doc, the milestones tab in the Iceberg
project has some that you might use as reference.

*I highly suggest you come to the next community sync and bring this up to
the community then.*

If you’re not already on the invite list for the monthly community sync,
you can get on it by joining the Google group. You’ll receive incites when
they go out:
https://groups.google.com/g/iceberg-sync

Looking forward to seeing you at the next community sync.

A design document and/or any prior art would be very helpful as the
community sync does discuss many topics (possibly there is existing C++
support in StarRocks for Iceberg V1?).

Thank you,
Kyle Bendickson
GitHub: kbendick

On Sun, Jun 5, 2022 at 10:44 PM Sam Redai <sa...@tabular.io> wrote:

> Currently there is no existing effort to develop a C++ package. That being
> said I think it would be awesome to have one! If anyone is willing to start
> that development effort, I can help with some of the ground work to
> kickstart it.
>
> I would say the first step would be for someone to prepare a high-level
> proposal.
>
> -Sam
>
> On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:
>
>> Hi team
>> I am a dev from StarRocks community, and we have supported iceberg v1
>> format.
>> We are also planning to support v2 format. If there is a C++ package, it
>> will be very convenient for our implementation.
>> At the same time, other c++ computing engines support v2 format will also
>> be faster.
>>
>> Do we have plans to support c++ version sdk?
>> --
>> caneGuy
>>
> --
>
> Sam Redai <sa...@tabular.io>
>
> Developer Advocate  |  Tabular <https://tabular.io/>
>
> c (267) 226-8606
>

Re: 【Feature】Request support for c++ sdk

Posted by Sam Redai <sa...@tabular.io>.

Currently there is no existing effort to develop a C++ package. That being
said I think it would be awesome to have one! If anyone is willing to start
that development effort, I can help with some of the ground work to
kickstart it.

I would say the first step would be for someone to prepare a high-level
proposal.

-Sam

On Sun, Jun 5, 2022 at 11:02 PM 周康 <zh...@gmail.com> wrote:

> Hi team
> I am a dev from StarRocks community, and we have supported iceberg v1
> format.
> We are also planning to support v2 format. If there is a C++ package, it
> will be very convenient for our implementation.
> At the same time, other c++ computing engines support v2 format will also
> be faster.
>
> Do we have plans to support c++ version sdk?
> --
> caneGuy
>
-- 

Sam Redai <sa...@tabular.io>

Developer Advocate  |  Tabular <https://tabular.io/>

c (267) 226-8606