You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Chao Sun <su...@apache.org> on 2024/02/13 20:41:48 UTC

Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Hi all,

We are very happy to announce that Project Comet, a plugin to
accelerate Spark query execution via leveraging DataFusion and Arrow,
has now been open sourced under the Apache Arrow umbrella. Please
check the project repo
https://github.com/apache/arrow-datafusion-comet for more details if
you are interested. We'd love to collaborate with people from the open
source community who share similar goals.

Thanks,
Chao

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi,I gather from the replies that the plugin is not currently available in
the form expected although I am aware of the shell script.

Also have you got some benchmark results from your tests that you can
possibly share?

Thanks,

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge, sourced from both personal expertise and other resources but of
course cannot be guaranteed . It is essential to note that, as with any
advice, one verified and tested result holds more weight than a thousand
expert opinions.


On Thu, 15 Feb 2024 at 01:18, Chao Sun <su...@apache.org> wrote:

> Hi Praveen,
>
> We will add a "Getting Started" section in the README soon, but basically
> comet-spark-shell
> <https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell> in
> the repo should provide a basic tool to build Comet and launch a Spark
> shell with it.
>
> Note that we haven't open sourced several features yet including shuffle
> support, which the aggregate operation depends on. Please stay tuned!
>
> Chao
>
>
> On Wed, Feb 14, 2024 at 2:44 PM praveen sinha <pr...@gmail.com>
> wrote:
>
>> Hi Chao,
>>
>> Is there any example app/gist/repo which can help me use this plugin. I
>> wanted to try out some realtime aggregate performance on top of parquet and
>> spark dataframes.
>>
>> Thanks and Regards
>> Praveen
>>
>>
>> On Wed, Feb 14, 2024 at 9:20 AM Chao Sun <su...@apache.org> wrote:
>>
>>> > Out of interest what are the differences in the approach between this
>>> and Glutten?
>>>
>>> Overall they are similar, although Gluten supports multiple backends
>>> including Velox and Clickhouse. One major difference is (obviously)
>>> Comet is based on DataFusion and Arrow, and written in Rust, while
>>> Gluten is mostly C++.
>>> I haven't looked very deep into Gluten yet, but there could be other
>>> differences such as how strictly the engine follows Spark's semantics,
>>> table format support (Iceberg, Delta, etc), fallback mechanism
>>> (coarse-grained fallback on stage level or more fine-grained fallback
>>> within stages), UDF support (Comet hasn't started on this yet),
>>> shuffle support, memory management, etc.
>>>
>>> Both engines are backed by very strong and vibrant open source
>>> communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
>>> exciting to see how the projects will grow in future.
>>>
>>> Best,
>>> Chao
>>>
>>> On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
>>> >
>>> > Congratulations! Excellent work!
>>> >
>>> > On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
>>> >>
>>> >> Absolutely thrilled to see the project going open-source! Huge
>>> congrats to Chao and the entire team on this milestone!
>>> >>
>>> >> Yufei
>>> >>
>>> >>
>>> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> We are very happy to announce that Project Comet, a plugin to
>>> >>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> >>> has now been open sourced under the Apache Arrow umbrella. Please
>>> >>> check the project repo
>>> >>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> >>> you are interested. We'd love to collaborate with people from the
>>> open
>>> >>> source community who share similar goals.
>>> >>>
>>> >>> Thanks,
>>> >>> Chao
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>>
>>> >
>>> >
>>> > --
>>> > John Zhuge
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi,I gather from the replies that the plugin is not currently available in
the form expected although I am aware of the shell script.

Also have you got some benchmark results from your tests that you can
possibly share?

Thanks,

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge, sourced from both personal expertise and other resources but of
course cannot be guaranteed . It is essential to note that, as with any
advice, one verified and tested result holds more weight than a thousand
expert opinions.


On Thu, 15 Feb 2024 at 01:18, Chao Sun <su...@apache.org> wrote:

> Hi Praveen,
>
> We will add a "Getting Started" section in the README soon, but basically
> comet-spark-shell
> <https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell> in
> the repo should provide a basic tool to build Comet and launch a Spark
> shell with it.
>
> Note that we haven't open sourced several features yet including shuffle
> support, which the aggregate operation depends on. Please stay tuned!
>
> Chao
>
>
> On Wed, Feb 14, 2024 at 2:44 PM praveen sinha <pr...@gmail.com>
> wrote:
>
>> Hi Chao,
>>
>> Is there any example app/gist/repo which can help me use this plugin. I
>> wanted to try out some realtime aggregate performance on top of parquet and
>> spark dataframes.
>>
>> Thanks and Regards
>> Praveen
>>
>>
>> On Wed, Feb 14, 2024 at 9:20 AM Chao Sun <su...@apache.org> wrote:
>>
>>> > Out of interest what are the differences in the approach between this
>>> and Glutten?
>>>
>>> Overall they are similar, although Gluten supports multiple backends
>>> including Velox and Clickhouse. One major difference is (obviously)
>>> Comet is based on DataFusion and Arrow, and written in Rust, while
>>> Gluten is mostly C++.
>>> I haven't looked very deep into Gluten yet, but there could be other
>>> differences such as how strictly the engine follows Spark's semantics,
>>> table format support (Iceberg, Delta, etc), fallback mechanism
>>> (coarse-grained fallback on stage level or more fine-grained fallback
>>> within stages), UDF support (Comet hasn't started on this yet),
>>> shuffle support, memory management, etc.
>>>
>>> Both engines are backed by very strong and vibrant open source
>>> communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
>>> exciting to see how the projects will grow in future.
>>>
>>> Best,
>>> Chao
>>>
>>> On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
>>> >
>>> > Congratulations! Excellent work!
>>> >
>>> > On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
>>> >>
>>> >> Absolutely thrilled to see the project going open-source! Huge
>>> congrats to Chao and the entire team on this milestone!
>>> >>
>>> >> Yufei
>>> >>
>>> >>
>>> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> We are very happy to announce that Project Comet, a plugin to
>>> >>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> >>> has now been open sourced under the Apache Arrow umbrella. Please
>>> >>> check the project repo
>>> >>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> >>> you are interested. We'd love to collaborate with people from the
>>> open
>>> >>> source community who share similar goals.
>>> >>>
>>> >>> Thanks,
>>> >>> Chao
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>> >>>
>>> >
>>> >
>>> > --
>>> > John Zhuge
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Chao Sun <su...@apache.org>.

Hi Praveen,

We will add a "Getting Started" section in the README soon, but basically
comet-spark-shell
<https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell>
in
the repo should provide a basic tool to build Comet and launch a Spark
shell with it.

Note that we haven't open sourced several features yet including shuffle
support, which the aggregate operation depends on. Please stay tuned!

Chao


On Wed, Feb 14, 2024 at 2:44 PM praveen sinha <pr...@gmail.com>
wrote:

> Hi Chao,
>
> Is there any example app/gist/repo which can help me use this plugin. I
> wanted to try out some realtime aggregate performance on top of parquet and
> spark dataframes.
>
> Thanks and Regards
> Praveen
>
>
> On Wed, Feb 14, 2024 at 9:20 AM Chao Sun <su...@apache.org> wrote:
>
>> > Out of interest what are the differences in the approach between this
>> and Glutten?
>>
>> Overall they are similar, although Gluten supports multiple backends
>> including Velox and Clickhouse. One major difference is (obviously)
>> Comet is based on DataFusion and Arrow, and written in Rust, while
>> Gluten is mostly C++.
>> I haven't looked very deep into Gluten yet, but there could be other
>> differences such as how strictly the engine follows Spark's semantics,
>> table format support (Iceberg, Delta, etc), fallback mechanism
>> (coarse-grained fallback on stage level or more fine-grained fallback
>> within stages), UDF support (Comet hasn't started on this yet),
>> shuffle support, memory management, etc.
>>
>> Both engines are backed by very strong and vibrant open source
>> communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
>> exciting to see how the projects will grow in future.
>>
>> Best,
>> Chao
>>
>> On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
>> >
>> > Congratulations! Excellent work!
>> >
>> > On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
>> >>
>> >> Absolutely thrilled to see the project going open-source! Huge
>> congrats to Chao and the entire team on this milestone!
>> >>
>> >> Yufei
>> >>
>> >>
>> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> We are very happy to announce that Project Comet, a plugin to
>> >>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>> >>> has now been open sourced under the Apache Arrow umbrella. Please
>> >>> check the project repo
>> >>> https://github.com/apache/arrow-datafusion-comet for more details if
>> >>> you are interested. We'd love to collaborate with people from the open
>> >>> source community who share similar goals.
>> >>>
>> >>> Thanks,
>> >>> Chao
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>>
>> >
>> >
>> > --
>> > John Zhuge
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Chao Sun <su...@apache.org>.

Hi Praveen,

We will add a "Getting Started" section in the README soon, but basically
comet-spark-shell
<https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell>
in
the repo should provide a basic tool to build Comet and launch a Spark
shell with it.

Note that we haven't open sourced several features yet including shuffle
support, which the aggregate operation depends on. Please stay tuned!

Chao


On Wed, Feb 14, 2024 at 2:44 PM praveen sinha <pr...@gmail.com>
wrote:

> Hi Chao,
>
> Is there any example app/gist/repo which can help me use this plugin. I
> wanted to try out some realtime aggregate performance on top of parquet and
> spark dataframes.
>
> Thanks and Regards
> Praveen
>
>
> On Wed, Feb 14, 2024 at 9:20 AM Chao Sun <su...@apache.org> wrote:
>
>> > Out of interest what are the differences in the approach between this
>> and Glutten?
>>
>> Overall they are similar, although Gluten supports multiple backends
>> including Velox and Clickhouse. One major difference is (obviously)
>> Comet is based on DataFusion and Arrow, and written in Rust, while
>> Gluten is mostly C++.
>> I haven't looked very deep into Gluten yet, but there could be other
>> differences such as how strictly the engine follows Spark's semantics,
>> table format support (Iceberg, Delta, etc), fallback mechanism
>> (coarse-grained fallback on stage level or more fine-grained fallback
>> within stages), UDF support (Comet hasn't started on this yet),
>> shuffle support, memory management, etc.
>>
>> Both engines are backed by very strong and vibrant open source
>> communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
>> exciting to see how the projects will grow in future.
>>
>> Best,
>> Chao
>>
>> On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
>> >
>> > Congratulations! Excellent work!
>> >
>> > On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
>> >>
>> >> Absolutely thrilled to see the project going open-source! Huge
>> congrats to Chao and the entire team on this milestone!
>> >>
>> >> Yufei
>> >>
>> >>
>> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>> >>>
>> >>> Hi all,
>> >>>
>> >>> We are very happy to announce that Project Comet, a plugin to
>> >>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>> >>> has now been open sourced under the Apache Arrow umbrella. Please
>> >>> check the project repo
>> >>> https://github.com/apache/arrow-datafusion-comet for more details if
>> >>> you are interested. We'd love to collaborate with people from the open
>> >>> source community who share similar goals.
>> >>>
>> >>> Thanks,
>> >>> Chao
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>> >>>
>> >
>> >
>> > --
>> > John Zhuge
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by praveen sinha <pr...@gmail.com>.

Hi Chao,

Is there any example app/gist/repo which can help me use this plugin. I
wanted to try out some realtime aggregate performance on top of parquet and
spark dataframes.

Thanks and Regards
Praveen


On Wed, Feb 14, 2024 at 9:20 AM Chao Sun <su...@apache.org> wrote:

> > Out of interest what are the differences in the approach between this
> and Glutten?
>
> Overall they are similar, although Gluten supports multiple backends
> including Velox and Clickhouse. One major difference is (obviously)
> Comet is based on DataFusion and Arrow, and written in Rust, while
> Gluten is mostly C++.
> I haven't looked very deep into Gluten yet, but there could be other
> differences such as how strictly the engine follows Spark's semantics,
> table format support (Iceberg, Delta, etc), fallback mechanism
> (coarse-grained fallback on stage level or more fine-grained fallback
> within stages), UDF support (Comet hasn't started on this yet),
> shuffle support, memory management, etc.
>
> Both engines are backed by very strong and vibrant open source
> communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
> exciting to see how the projects will grow in future.
>
> Best,
> Chao
>
> On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
> >
> > Congratulations! Excellent work!
> >
> > On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
> >>
> >> Absolutely thrilled to see the project going open-source! Huge congrats
> to Chao and the entire team on this milestone!
> >>
> >> Yufei
> >>
> >>
> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> We are very happy to announce that Project Comet, a plugin to
> >>> accelerate Spark query execution via leveraging DataFusion and Arrow,
> >>> has now been open sourced under the Apache Arrow umbrella. Please
> >>> check the project repo
> >>> https://github.com/apache/arrow-datafusion-comet for more details if
> >>> you are interested. We'd love to collaborate with people from the open
> >>> source community who share similar goals.
> >>>
> >>> Thanks,
> >>> Chao
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>>
> >
> >
> > --
> > John Zhuge
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by "Liu(Laswift) Cao" <tw...@gmail.com>.

This is very cool! Congrats on the amazing work Chao and the team!
It's exciting to see this native engine trend within the community. Other
than gluten, I ran into https://github.com/blaze-init/blaze as well (but
haven't evaluated it in detail)

On Wed, Feb 14, 2024 at 09:20 Chao Sun <su...@apache.org> wrote:

> > Out of interest what are the differences in the approach between this
> and Glutten?
>
> Overall they are similar, although Gluten supports multiple backends
> including Velox and Clickhouse. One major difference is (obviously)
> Comet is based on DataFusion and Arrow, and written in Rust, while
> Gluten is mostly C++.
> I haven't looked very deep into Gluten yet, but there could be other
> differences such as how strictly the engine follows Spark's semantics,
> table format support (Iceberg, Delta, etc), fallback mechanism
> (coarse-grained fallback on stage level or more fine-grained fallback
> within stages), UDF support (Comet hasn't started on this yet),
> shuffle support, memory management, etc.
>
> Both engines are backed by very strong and vibrant open source
> communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
> exciting to see how the projects will grow in future.
>
> Best,
> Chao
>
> On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
> >
> > Congratulations! Excellent work!
> >
> > On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
> >>
> >> Absolutely thrilled to see the project going open-source! Huge congrats
> to Chao and the entire team on this milestone!
> >>
> >> Yufei
> >>
> >>
> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> We are very happy to announce that Project Comet, a plugin to
> >>> accelerate Spark query execution via leveraging DataFusion and Arrow,
> >>> has now been open sourced under the Apache Arrow umbrella. Please
> >>> check the project repo
> >>> https://github.com/apache/arrow-datafusion-comet for more details if
> >>> you are interested. We'd love to collaborate with people from the open
> >>> source community who share similar goals.
> >>>
> >>> Thanks,
> >>> Chao
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>>
> >
> >
> > --
> > John Zhuge
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
> --

Liu Cao

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Chao Sun <su...@apache.org>.

> Out of interest what are the differences in the approach between this and Glutten?

Overall they are similar, although Gluten supports multiple backends
including Velox and Clickhouse. One major difference is (obviously)
Comet is based on DataFusion and Arrow, and written in Rust, while
Gluten is mostly C++.
I haven't looked very deep into Gluten yet, but there could be other
differences such as how strictly the engine follows Spark's semantics,
table format support (Iceberg, Delta, etc), fallback mechanism
(coarse-grained fallback on stage level or more fine-grained fallback
within stages), UDF support (Comet hasn't started on this yet),
shuffle support, memory management, etc.

Both engines are backed by very strong and vibrant open source
communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
exciting to see how the projects will grow in future.

Best,
Chao

On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
>
> Congratulations! Excellent work!
>
> On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
>>
>> Absolutely thrilled to see the project going open-source! Huge congrats to Chao and the entire team on this milestone!
>>
>> Yufei
>>
>>
>> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>>>
>>> Hi all,
>>>
>>> We are very happy to announce that Project Comet, a plugin to
>>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> has now been open sourced under the Apache Arrow umbrella. Please
>>> check the project repo
>>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> you are interested. We'd love to collaborate with people from the open
>>> source community who share similar goals.
>>>
>>> Thanks,
>>> Chao
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>
>
> --
> John Zhuge

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Chao Sun <su...@apache.org>.

> Out of interest what are the differences in the approach between this and Glutten?

Overall they are similar, although Gluten supports multiple backends
including Velox and Clickhouse. One major difference is (obviously)
Comet is based on DataFusion and Arrow, and written in Rust, while
Gluten is mostly C++.
I haven't looked very deep into Gluten yet, but there could be other
differences such as how strictly the engine follows Spark's semantics,
table format support (Iceberg, Delta, etc), fallback mechanism
(coarse-grained fallback on stage level or more fine-grained fallback
within stages), UDF support (Comet hasn't started on this yet),
shuffle support, memory management, etc.

Both engines are backed by very strong and vibrant open source
communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
exciting to see how the projects will grow in future.

Best,
Chao

On Tue, Feb 13, 2024 at 10:06 PM John Zhuge <jz...@apache.org> wrote:
>
> Congratulations! Excellent work!
>
> On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:
>>
>> Absolutely thrilled to see the project going open-source! Huge congrats to Chao and the entire team on this milestone!
>>
>> Yufei
>>
>>
>> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>>>
>>> Hi all,
>>>
>>> We are very happy to announce that Project Comet, a plugin to
>>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> has now been open sourced under the Apache Arrow umbrella. Please
>>> check the project repo
>>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> you are interested. We'd love to collaborate with people from the open
>>> source community who share similar goals.
>>>
>>> Thanks,
>>> Chao
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>
>
>
> --
> John Zhuge

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by John Zhuge <jz...@apache.org>.

Congratulations! Excellent work!

On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:

> Absolutely thrilled to see the project going open-source! Huge congrats to
> Chao and the entire team on this milestone!
>
> Yufei
>
>
> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>
>> Hi all,
>>
>> We are very happy to announce that Project Comet, a plugin to
>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>> has now been open sourced under the Apache Arrow umbrella. Please
>> check the project repo
>> https://github.com/apache/arrow-datafusion-comet for more details if
>> you are interested. We'd love to collaborate with people from the open
>> source community who share similar goals.
>>
>> Thanks,
>> Chao
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

-- 
John Zhuge

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by John Zhuge <jz...@apache.org>.

Congratulations! Excellent work!

On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu <fl...@gmail.com> wrote:

> Absolutely thrilled to see the project going open-source! Huge congrats to
> Chao and the entire team on this milestone!
>
> Yufei
>
>
> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:
>
>> Hi all,
>>
>> We are very happy to announce that Project Comet, a plugin to
>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>> has now been open sourced under the Apache Arrow umbrella. Please
>> check the project repo
>> https://github.com/apache/arrow-datafusion-comet for more details if
>> you are interested. We'd love to collaborate with people from the open
>> source community who share similar goals.
>>
>> Thanks,
>> Chao
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>

-- 
John Zhuge

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Yufei Gu <fl...@gmail.com>.

Absolutely thrilled to see the project going open-source! Huge congrats to
Chao and the entire team on this milestone!

Yufei


On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:

> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via leveraging DataFusion and Arrow,
> has now been open sourced under the Apache Arrow umbrella. Please
> check the project repo
> https://github.com/apache/arrow-datafusion-comet for more details if
> you are interested. We'd love to collaborate with people from the open
> source community who share similar goals.
>
> Thanks,
> Chao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Yufei Gu <fl...@gmail.com>.

Absolutely thrilled to see the project going open-source! Huge congrats to
Chao and the entire team on this milestone!

Yufei


On Tue, Feb 13, 2024 at 12:43 PM Chao Sun <su...@apache.org> wrote:

> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via leveraging DataFusion and Arrow,
> has now been open sourced under the Apache Arrow umbrella. Please
> check the project repo
> https://github.com/apache/arrow-datafusion-comet for more details if
> you are interested. We'd love to collaborate with people from the open
> source community who share similar goals.
>
> Thanks,
> Chao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Manoj Kumar <ma...@zettabolt.com>.

Dear @Chao Sun,

I trust you're doing well. Having worked extensively with Spark Nvidia
Rapids, Velox, and Gluten, I'm now contemplating Comet's potential
advantages over Velox in terms of performance and unique features.

While Rapids leverages GPUs effectively, Gazelle's Intel AVX512 intrinsics
which is now EOL. Now, all eyes are on Velox for its universal C++
accelerators(Presto, Spark, PyTorch, XStream (stream processing), F3
(feature engineering), FBETL (data ingestion), XSQL(distributed transaction
processing) , Scribe (message bus infrastructure), Saber (high QPS external
serving), and others...).

In this context, I'm keen to understand Comet's distinctive features and
how its performance compares to Velox. What makes Comet stand out, and how
does its efficiency stack up against Velox across different tasks and
frameworks?

Your insights into Comet's capabilities would be invaluable, it will help
me to evaluate why I should invest my time in this plugin.

Thank you for your time and expertise.

Warm regards,
Manoj Kumar


On Tue, 20 Feb 2024 at 01:51, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Ok thanks for your clarifications
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
> Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".
>
>
> On Mon, 19 Feb 2024 at 17:24, Chao Sun <su...@apache.org> wrote:
>
>> Hi Mich,
>>
>> > Also have you got some benchmark results from your tests that you can
>> possibly share?
>>
>> We only have some partial benchmark results internally so far. Once
>> shuffle and better memory management have been introduced, we plan to
>> publish the benchmark results (at least TPC-H) in the repo.
>>
>> > Compared to standard Spark, what kind of performance gains can be
>> expected with Comet?
>>
>> Currently, users could benefit from Comet in a few areas:
>> - Parquet read: a few improvements have been made against reading from S3
>> in particular, so users can expect better scan performance in this scenario
>> - Hash aggregation
>> - Columnar shuffle
>> - Decimals (Java's BigDecimal is pretty slow)
>>
>> > Can one use Comet on k8s in conjunction with something like a Volcano
>> addon?
>>
>> I think so. Comet is mostly orthogonal to the Spark scheduler framework.
>>
>> Chao
>>
>>
>>
>>
>>
>>
>> On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>> Hi Chao,
>>>
>>> As a cool feature
>>>
>>>
>>>    - Compared to standard Spark, what kind of performance gains can be
>>>    expected with Comet?
>>>    -  Can one use Comet on k8s in conjunction with something like a
>>>    Volcano addon?
>>>
>>>
>>> HTH
>>>
>>> Mich Talebzadeh,
>>> Dad | Technologist | Solutions Architect | Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* The information provided is correct to the best of my
>>> knowledge, sourced from both personal expertise and other resources but of
>>> course cannot be guaranteed . It is essential to note that, as with any
>>> advice, one verified and tested result holds more weight than a thousand
>>> expert opinions.
>>>
>>>
>>> On Tue, 13 Feb 2024 at 20:42, Chao Sun <su...@apache.org> wrote:
>>>
>>>> Hi all,
>>>>
>>>> We are very happy to announce that Project Comet, a plugin to
>>>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>>> has now been open sourced under the Apache Arrow umbrella. Please
>>>> check the project repo
>>>> https://github.com/apache/arrow-datafusion-comet for more details if
>>>> you are interested. We'd love to collaborate with people from the open
>>>> source community who share similar goals.
>>>>
>>>> Thanks,
>>>> Chao
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>>
>>>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Mich Talebzadeh <mi...@gmail.com>.

Ok thanks for your clarifications

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Mon, 19 Feb 2024 at 17:24, Chao Sun <su...@apache.org> wrote:

> Hi Mich,
>
> > Also have you got some benchmark results from your tests that you can
> possibly share?
>
> We only have some partial benchmark results internally so far. Once
> shuffle and better memory management have been introduced, we plan to
> publish the benchmark results (at least TPC-H) in the repo.
>
> > Compared to standard Spark, what kind of performance gains can be
> expected with Comet?
>
> Currently, users could benefit from Comet in a few areas:
> - Parquet read: a few improvements have been made against reading from S3
> in particular, so users can expect better scan performance in this scenario
> - Hash aggregation
> - Columnar shuffle
> - Decimals (Java's BigDecimal is pretty slow)
>
> > Can one use Comet on k8s in conjunction with something like a Volcano
> addon?
>
> I think so. Comet is mostly orthogonal to the Spark scheduler framework.
>
> Chao
>
>
>
>
>
>
> On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> Hi Chao,
>>
>> As a cool feature
>>
>>
>>    - Compared to standard Spark, what kind of performance gains can be
>>    expected with Comet?
>>    -  Can one use Comet on k8s in conjunction with something like a
>>    Volcano addon?
>>
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge, sourced from both personal expertise and other resources but of
>> course cannot be guaranteed . It is essential to note that, as with any
>> advice, one verified and tested result holds more weight than a thousand
>> expert opinions.
>>
>>
>> On Tue, 13 Feb 2024 at 20:42, Chao Sun <su...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> We are very happy to announce that Project Comet, a plugin to
>>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> has now been open sourced under the Apache Arrow umbrella. Please
>>> check the project repo
>>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> you are interested. We'd love to collaborate with people from the open
>>> source community who share similar goals.
>>>
>>> Thanks,
>>> Chao
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Mich Talebzadeh <mi...@gmail.com>.

Ok thanks for your clarifications

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von
Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)".


On Mon, 19 Feb 2024 at 17:24, Chao Sun <su...@apache.org> wrote:

> Hi Mich,
>
> > Also have you got some benchmark results from your tests that you can
> possibly share?
>
> We only have some partial benchmark results internally so far. Once
> shuffle and better memory management have been introduced, we plan to
> publish the benchmark results (at least TPC-H) in the repo.
>
> > Compared to standard Spark, what kind of performance gains can be
> expected with Comet?
>
> Currently, users could benefit from Comet in a few areas:
> - Parquet read: a few improvements have been made against reading from S3
> in particular, so users can expect better scan performance in this scenario
> - Hash aggregation
> - Columnar shuffle
> - Decimals (Java's BigDecimal is pretty slow)
>
> > Can one use Comet on k8s in conjunction with something like a Volcano
> addon?
>
> I think so. Comet is mostly orthogonal to the Spark scheduler framework.
>
> Chao
>
>
>
>
>
>
> On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <mi...@gmail.com>
> wrote:
>
>> Hi Chao,
>>
>> As a cool feature
>>
>>
>>    - Compared to standard Spark, what kind of performance gains can be
>>    expected with Comet?
>>    -  Can one use Comet on k8s in conjunction with something like a
>>    Volcano addon?
>>
>>
>> HTH
>>
>> Mich Talebzadeh,
>> Dad | Technologist | Solutions Architect | Engineer
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge, sourced from both personal expertise and other resources but of
>> course cannot be guaranteed . It is essential to note that, as with any
>> advice, one verified and tested result holds more weight than a thousand
>> expert opinions.
>>
>>
>> On Tue, 13 Feb 2024 at 20:42, Chao Sun <su...@apache.org> wrote:
>>
>>> Hi all,
>>>
>>> We are very happy to announce that Project Comet, a plugin to
>>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> has now been open sourced under the Apache Arrow umbrella. Please
>>> check the project repo
>>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> you are interested. We'd love to collaborate with people from the open
>>> source community who share similar goals.
>>>
>>> Thanks,
>>> Chao
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Chao Sun <su...@apache.org>.

Hi Mich,

> Also have you got some benchmark results from your tests that you can
possibly share?

We only have some partial benchmark results internally so far. Once shuffle
and better memory management have been introduced, we plan to publish the
benchmark results (at least TPC-H) in the repo.

> Compared to standard Spark, what kind of performance gains can be
expected with Comet?

Currently, users could benefit from Comet in a few areas:
- Parquet read: a few improvements have been made against reading from S3
in particular, so users can expect better scan performance in this scenario
- Hash aggregation
- Columnar shuffle
- Decimals (Java's BigDecimal is pretty slow)

> Can one use Comet on k8s in conjunction with something like a Volcano
addon?

I think so. Comet is mostly orthogonal to the Spark scheduler framework.

Chao






On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi Chao,
>
> As a cool feature
>
>
>    - Compared to standard Spark, what kind of performance gains can be
>    expected with Comet?
>    -  Can one use Comet on k8s in conjunction with something like a
>    Volcano addon?
>
>
> HTH
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge, sourced from both personal expertise and other resources but of
> course cannot be guaranteed . It is essential to note that, as with any
> advice, one verified and tested result holds more weight than a thousand
> expert opinions.
>
>
> On Tue, 13 Feb 2024 at 20:42, Chao Sun <su...@apache.org> wrote:
>
>> Hi all,
>>
>> We are very happy to announce that Project Comet, a plugin to
>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>> has now been open sourced under the Apache Arrow umbrella. Please
>> check the project repo
>> https://github.com/apache/arrow-datafusion-comet for more details if
>> you are interested. We'd love to collaborate with people from the open
>> source community who share similar goals.
>>
>> Thanks,
>> Chao
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Chao Sun <su...@apache.org>.

Hi Mich,

> Also have you got some benchmark results from your tests that you can
possibly share?

We only have some partial benchmark results internally so far. Once shuffle
and better memory management have been introduced, we plan to publish the
benchmark results (at least TPC-H) in the repo.

> Compared to standard Spark, what kind of performance gains can be
expected with Comet?

Currently, users could benefit from Comet in a few areas:
- Parquet read: a few improvements have been made against reading from S3
in particular, so users can expect better scan performance in this scenario
- Hash aggregation
- Columnar shuffle
- Decimals (Java's BigDecimal is pretty slow)

> Can one use Comet on k8s in conjunction with something like a Volcano
addon?

I think so. Comet is mostly orthogonal to the Spark scheduler framework.

Chao






On Fri, Feb 16, 2024 at 5:39 AM Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi Chao,
>
> As a cool feature
>
>
>    - Compared to standard Spark, what kind of performance gains can be
>    expected with Comet?
>    -  Can one use Comet on k8s in conjunction with something like a
>    Volcano addon?
>
>
> HTH
>
> Mich Talebzadeh,
> Dad | Technologist | Solutions Architect | Engineer
> London
> United Kingdom
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge, sourced from both personal expertise and other resources but of
> course cannot be guaranteed . It is essential to note that, as with any
> advice, one verified and tested result holds more weight than a thousand
> expert opinions.
>
>
> On Tue, 13 Feb 2024 at 20:42, Chao Sun <su...@apache.org> wrote:
>
>> Hi all,
>>
>> We are very happy to announce that Project Comet, a plugin to
>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>> has now been open sourced under the Apache Arrow umbrella. Please
>> check the project repo
>> https://github.com/apache/arrow-datafusion-comet for more details if
>> you are interested. We'd love to collaborate with people from the open
>> source community who share similar goals.
>>
>> Thanks,
>> Chao
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Chao,

As a cool feature


   - Compared to standard Spark, what kind of performance gains can be
   expected with Comet?
   -  Can one use Comet on k8s in conjunction with something like a Volcano
   addon?


HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge, sourced from both personal expertise and other resources but of
course cannot be guaranteed . It is essential to note that, as with any
advice, one verified and tested result holds more weight than a thousand
expert opinions.


On Tue, 13 Feb 2024 at 20:42, Chao Sun <su...@apache.org> wrote:

> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via leveraging DataFusion and Arrow,
> has now been open sourced under the Apache Arrow umbrella. Please
> check the project repo
> https://github.com/apache/arrow-datafusion-comet for more details if
> you are interested. We'd love to collaborate with people from the open
> source community who share similar goals.
>
> Thanks,
> Chao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Holden Karau <ho...@gmail.com>.

This looks really cool :) Out of interest what are the differences in the
approach between this and Glutten?

On Tue, Feb 13, 2024 at 12:42 PM Chao Sun <su...@apache.org> wrote:

> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via leveraging DataFusion and Arrow,
> has now been open sourced under the Apache Arrow umbrella. Please
> check the project repo
> https://github.com/apache/arrow-datafusion-comet for more details if
> you are interested. We'd love to collaborate with people from the open
> source community who share similar goals.
>
> Thanks,
> Chao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi Chao,

As a cool feature


   - Compared to standard Spark, what kind of performance gains can be
   expected with Comet?
   -  Can one use Comet on k8s in conjunction with something like a Volcano
   addon?


HTH

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge, sourced from both personal expertise and other resources but of
course cannot be guaranteed . It is essential to note that, as with any
advice, one verified and tested result holds more weight than a thousand
expert opinions.


On Tue, 13 Feb 2024 at 20:42, Chao Sun <su...@apache.org> wrote:

> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via leveraging DataFusion and Arrow,
> has now been open sourced under the Apache Arrow umbrella. Please
> check the project repo
> https://github.com/apache/arrow-datafusion-comet for more details if
> you are interested. We'd love to collaborate with people from the open
> source community who share similar goals.
>
> Thanks,
> Chao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

Posted by Holden Karau <ho...@gmail.com>.

This looks really cool :) Out of interest what are the differences in the
approach between this and Glutten?

On Tue, Feb 13, 2024 at 12:42 PM Chao Sun <su...@apache.org> wrote:

> Hi all,
>
> We are very happy to announce that Project Comet, a plugin to
> accelerate Spark query execution via leveraging DataFusion and Arrow,
> has now been open sourced under the Apache Arrow umbrella. Please
> check the project repo
> https://github.com/apache/arrow-datafusion-comet for more details if
> you are interested. We'd love to collaborate with people from the open
> source community who share similar goals.
>
> Thanks,
> Chao
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>