You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by Michael André Pearce <mi...@me.com> on 2017/05/01 21:59:25 UTC

Impala vs Greenplum

No doubt if not already seen cloudera announced the following blog

http://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/

A clear shot across the bows of hawq. 

Also how does hawq really compare? There is some old/dated hawq performance blogs, Should it be something that is updated?

For the hawq community it be good to know how long till hawq would get upstream green plum improvements like codegen.

Likewise what features or changes have impala implemented to make it leap frog greenplum/hawq soo much? Are any of the changes portable to hawq?



Sent from my iPad

Re: Impala vs Greenplum

Posted by Frans van D <do...@gmail.com>.
And instead of focussing on benchmarks, I would focus on things that regular users face, like bad count distinct performance (load ending up on one node)

Verstuurd vanaf mijn iPhone

> Op 3 mei 2017 om 03:17 heeft Lei Chang <ch...@gmail.com> het volgende geschreven:
> 
> 
> I remember there is a benchmark several months ago (not public) comparing hawq and other sql-on-hadoop engines on tpcds benchmark, hawq is much faster. Different vendors might have different benchmark results since different tuning are made on different engines. And there were a lot of discussions around how to improve HAWQ executor before hawq was open sourced including vectorization, codegen, new hardware et al. 
> 
> @Michael, I also think it is a good time to discuss how to build a new HAWQ executor with various new optimizations. This may potentially improve the query performance a lot. 
> 
> I have started a JIRA on this topic (https://issues.apache.org/jira/browse/HAWQ-1450). Hope that we can have a design and start working on this soon. 
> 
> Thanks
> Lei
> 
> 
>> On Wed, May 3, 2017 at 6:03 AM, Michael André Pearce <mi...@me.com> wrote:
>> Indeed the intent was very much less so for the mine is bigger than yours. 
>> 
>> But more was to challenge the question of is the result actual, and if so, is there ideas or improvements that could be learnt from the approaches impala have taken, that could be used in hawq?
>> 
>> https://www.slideshare.net/mobile/cloudera/impala-performance-update
>> http://www.sciencedirect.com/science/article/pii/S0164121216302400
>> 
>> Likewise are we benefitting at all from the upstream greenplum sister project from, as in code gen? 
>> 
>> Yes we know it was greenplum in the results but hawq is its sister, and is indicative. 
>> 
>> Cheers
>> Mike
>> 
>> 
>> 
>> Sent from my iPad
>> 
>>> On 1 May 2017, at 23:27, Konstantin Boudnik <co...@apache.org> wrote:
>>> 
>>> With my Apache hat on, I'd like to say that it is of little, if any at
>>> all, relevance to the Apache projects what companies like Cloudera say
>>> about their internal benchmarks.
>>> 
>>> Apache projects do not compete between each other nor with any
>>> commercial products. While it is completely ok to say "official
>>> release of Apache Foo" was x percent faster than "official release of
>>> Apache Bar" somewhere in Apache Foo's blog or something, it is
>>> unacceptable for Apache Foo to get into pissing contest with something
>>> forked from Apache Bar and sold by a commercial entity as a part of
>>> their offering (sometimes it is even impossible to say what exactly
>>> the entity in question is selling).
>>> 
>>> In other words - let's not get into one of these "My Hadoop is bigger
>>> than yours" [1] moments again.
>>> 
>>> But by all means - let's discuss the technicalities of bringing more
>>> efficient code generation code into the project, etc.
>>> 
>>> [1] https://gigaom.com/2011/12/19/my-hadoop-is-bigger-than-yours/
>>> 
>>> --
>>> With regards,
>>>  Cos
>>> 
>>> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>>> 
>>> Disclaimer: Opinions expressed in this email are those of the author,
>>> and do not necessarily represent the views of any company the author
>>> might be affiliated with at the moment of writing.
>>> 
>>> 
>>> On Mon, May 1, 2017 at 2:59 PM, Michael André Pearce
>>> <mi...@me.com> wrote:
>>>> No doubt if not already seen cloudera announced the following blog
>>>> 
>>>> http://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/
>>>> 
>>>> A clear shot across the bows of hawq.
>>>> 
>>>> Also how does hawq really compare? There is some old/dated hawq performance
>>>> blogs, Should it be something that is updated?
>>>> 
>>>> For the hawq community it be good to know how long till hawq would get
>>>> upstream green plum improvements like codegen.
>>>> 
>>>> Likewise what features or changes have impala implemented to make it leap
>>>> frog greenplum/hawq soo much? Are any of the changes portable to hawq?
>>>> 
>>>> 
>>>> 
>>>> Sent from my iPad
> 

Re: Impala vs Greenplum

Posted by Lei Chang <ch...@gmail.com>.
I remember there is a benchmark several months ago (not public) comparing
hawq and other sql-on-hadoop engines on tpcds benchmark, hawq is much
faster. Different vendors might have different benchmark results since
different tuning are made on different engines. And there were a lot of
discussions around how to improve HAWQ executor before hawq was open
sourced including vectorization, codegen, new hardware et al.

@Michael, I also think it is a good time to discuss how to build a new HAWQ
executor with various new optimizations. This may potentially improve the
query performance a lot.

I have started a JIRA on this topic (
https://issues.apache.org/jira/browse/HAWQ-1450). Hope that we can have a
design and start working on this soon.

Thanks
Lei


On Wed, May 3, 2017 at 6:03 AM, Michael André Pearce <
michael.andre.pearce@me.com> wrote:

> Indeed the intent was very much less so for the mine is bigger than yours.
>
> But more was to challenge the question of is the result actual, and if so,
> is there ideas or improvements that could be learnt from the approaches
> impala have taken, that could be used in hawq?
>
> https://www.slideshare.net/mobile/cloudera/impala-performance-update
> http://www.sciencedirect.com/science/article/pii/S0164121216302400
>
> Likewise are we benefitting at all from the upstream greenplum sister
> project from, as in code gen?
>
> Yes we know it was greenplum in the results but hawq is its sister, and is
> indicative.
>
> Cheers
> Mike
>
>
>
> Sent from my iPad
>
> On 1 May 2017, at 23:27, Konstantin Boudnik <co...@apache.org> wrote:
>
> With my Apache hat on, I'd like to say that it is of little, if any at
> all, relevance to the Apache projects what companies like Cloudera say
> about their internal benchmarks.
>
> Apache projects do not compete between each other nor with any
> commercial products. While it is completely ok to say "official
> release of Apache Foo" was x percent faster than "official release of
> Apache Bar" somewhere in Apache Foo's blog or something, it is
> unacceptable for Apache Foo to get into pissing contest with something
> forked from Apache Bar and sold by a commercial entity as a part of
> their offering (sometimes it is even impossible to say what exactly
> the entity in question is selling).
>
> In other words - let's not get into one of these "My Hadoop is bigger
> than yours" [1] moments again.
>
> But by all means - let's discuss the technicalities of bringing more
> efficient code generation code into the project, etc.
>
> [1] https://gigaom.com/2011/12/19/my-hadoop-is-bigger-than-yours/
>
> --
> With regards,
>  Cos
>
> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>
> Disclaimer: Opinions expressed in this email are those of the author,
> and do not necessarily represent the views of any company the author
> might be affiliated with at the moment of writing.
>
>
> On Mon, May 1, 2017 at 2:59 PM, Michael André Pearce
> <mi...@me.com> wrote:
>
> No doubt if not already seen cloudera announced the following blog
>
>
> http://blog.cloudera.com/blog/2017/04/apache-impala-leads-
> traditional-analytic-database/
>
>
> A clear shot across the bows of hawq.
>
>
> Also how does hawq really compare? There is some old/dated hawq performance
>
> blogs, Should it be something that is updated?
>
>
> For the hawq community it be good to know how long till hawq would get
>
> upstream green plum improvements like codegen.
>
>
> Likewise what features or changes have impala implemented to make it leap
>
> frog greenplum/hawq soo much? Are any of the changes portable to hawq?
>
>
>
>
> Sent from my iPad
>
>

Re: Impala vs Greenplum

Posted by Lei Chang <ch...@gmail.com>.
I remember there is a benchmark several months ago (not public) comparing
hawq and other sql-on-hadoop engines on tpcds benchmark, hawq is much
faster. Different vendors might have different benchmark results since
different tuning are made on different engines. And there were a lot of
discussions around how to improve HAWQ executor before hawq was open
sourced including vectorization, codegen, new hardware et al.

@Michael, I also think it is a good time to discuss how to build a new HAWQ
executor with various new optimizations. This may potentially improve the
query performance a lot.

I have started a JIRA on this topic (
https://issues.apache.org/jira/browse/HAWQ-1450). Hope that we can have a
design and start working on this soon.

Thanks
Lei


On Wed, May 3, 2017 at 6:03 AM, Michael André Pearce <
michael.andre.pearce@me.com> wrote:

> Indeed the intent was very much less so for the mine is bigger than yours.
>
> But more was to challenge the question of is the result actual, and if so,
> is there ideas or improvements that could be learnt from the approaches
> impala have taken, that could be used in hawq?
>
> https://www.slideshare.net/mobile/cloudera/impala-performance-update
> http://www.sciencedirect.com/science/article/pii/S0164121216302400
>
> Likewise are we benefitting at all from the upstream greenplum sister
> project from, as in code gen?
>
> Yes we know it was greenplum in the results but hawq is its sister, and is
> indicative.
>
> Cheers
> Mike
>
>
>
> Sent from my iPad
>
> On 1 May 2017, at 23:27, Konstantin Boudnik <co...@apache.org> wrote:
>
> With my Apache hat on, I'd like to say that it is of little, if any at
> all, relevance to the Apache projects what companies like Cloudera say
> about their internal benchmarks.
>
> Apache projects do not compete between each other nor with any
> commercial products. While it is completely ok to say "official
> release of Apache Foo" was x percent faster than "official release of
> Apache Bar" somewhere in Apache Foo's blog or something, it is
> unacceptable for Apache Foo to get into pissing contest with something
> forked from Apache Bar and sold by a commercial entity as a part of
> their offering (sometimes it is even impossible to say what exactly
> the entity in question is selling).
>
> In other words - let's not get into one of these "My Hadoop is bigger
> than yours" [1] moments again.
>
> But by all means - let's discuss the technicalities of bringing more
> efficient code generation code into the project, etc.
>
> [1] https://gigaom.com/2011/12/19/my-hadoop-is-bigger-than-yours/
>
> --
> With regards,
>  Cos
>
> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
>
> Disclaimer: Opinions expressed in this email are those of the author,
> and do not necessarily represent the views of any company the author
> might be affiliated with at the moment of writing.
>
>
> On Mon, May 1, 2017 at 2:59 PM, Michael André Pearce
> <mi...@me.com> wrote:
>
> No doubt if not already seen cloudera announced the following blog
>
>
> http://blog.cloudera.com/blog/2017/04/apache-impala-leads-
> traditional-analytic-database/
>
>
> A clear shot across the bows of hawq.
>
>
> Also how does hawq really compare? There is some old/dated hawq performance
>
> blogs, Should it be something that is updated?
>
>
> For the hawq community it be good to know how long till hawq would get
>
> upstream green plum improvements like codegen.
>
>
> Likewise what features or changes have impala implemented to make it leap
>
> frog greenplum/hawq soo much? Are any of the changes portable to hawq?
>
>
>
>
> Sent from my iPad
>
>

Re: Impala vs Greenplum

Posted by Michael André Pearce <mi...@me.com>.
Indeed the intent was very much less so for the mine is bigger than yours. 

But more was to challenge the question of is the result actual, and if so, is there ideas or improvements that could be learnt from the approaches impala have taken, that could be used in hawq?

https://www.slideshare.net/mobile/cloudera/impala-performance-update
http://www.sciencedirect.com/science/article/pii/S0164121216302400

Likewise are we benefitting at all from the upstream greenplum sister project from, as in code gen? 

Yes we know it was greenplum in the results but hawq is its sister, and is indicative. 

Cheers
Mike



Sent from my iPad

> On 1 May 2017, at 23:27, Konstantin Boudnik <co...@apache.org> wrote:
> 
> With my Apache hat on, I'd like to say that it is of little, if any at
> all, relevance to the Apache projects what companies like Cloudera say
> about their internal benchmarks.
> 
> Apache projects do not compete between each other nor with any
> commercial products. While it is completely ok to say "official
> release of Apache Foo" was x percent faster than "official release of
> Apache Bar" somewhere in Apache Foo's blog or something, it is
> unacceptable for Apache Foo to get into pissing contest with something
> forked from Apache Bar and sold by a commercial entity as a part of
> their offering (sometimes it is even impossible to say what exactly
> the entity in question is selling).
> 
> In other words - let's not get into one of these "My Hadoop is bigger
> than yours" [1] moments again.
> 
> But by all means - let's discuss the technicalities of bringing more
> efficient code generation code into the project, etc.
> 
> [1] https://gigaom.com/2011/12/19/my-hadoop-is-bigger-than-yours/
> 
> --
> With regards,
>  Cos
> 
> 2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622
> 
> Disclaimer: Opinions expressed in this email are those of the author,
> and do not necessarily represent the views of any company the author
> might be affiliated with at the moment of writing.
> 
> 
> On Mon, May 1, 2017 at 2:59 PM, Michael André Pearce
> <mi...@me.com> wrote:
>> No doubt if not already seen cloudera announced the following blog
>> 
>> http://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/
>> 
>> A clear shot across the bows of hawq.
>> 
>> Also how does hawq really compare? There is some old/dated hawq performance
>> blogs, Should it be something that is updated?
>> 
>> For the hawq community it be good to know how long till hawq would get
>> upstream green plum improvements like codegen.
>> 
>> Likewise what features or changes have impala implemented to make it leap
>> frog greenplum/hawq soo much? Are any of the changes portable to hawq?
>> 
>> 
>> 
>> Sent from my iPad

Re: Impala vs Greenplum

Posted by Konstantin Boudnik <co...@apache.org>.
With my Apache hat on, I'd like to say that it is of little, if any at
all, relevance to the Apache projects what companies like Cloudera say
about their internal benchmarks.

Apache projects do not compete between each other nor with any
commercial products. While it is completely ok to say "official
release of Apache Foo" was x percent faster than "official release of
Apache Bar" somewhere in Apache Foo's blog or something, it is
unacceptable for Apache Foo to get into pissing contest with something
forked from Apache Bar and sold by a commercial entity as a part of
their offering (sometimes it is even impossible to say what exactly
the entity in question is selling).

In other words - let's not get into one of these "My Hadoop is bigger
than yours" [1] moments again.

But by all means - let's discuss the technicalities of bringing more
efficient code generation code into the project, etc.

[1] https://gigaom.com/2011/12/19/my-hadoop-is-bigger-than-yours/

--
With regards,
  Cos

2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.


On Mon, May 1, 2017 at 2:59 PM, Michael André Pearce
<mi...@me.com> wrote:
> No doubt if not already seen cloudera announced the following blog
>
> http://blog.cloudera.com/blog/2017/04/apache-impala-leads-traditional-analytic-database/
>
> A clear shot across the bows of hawq.
>
> Also how does hawq really compare? There is some old/dated hawq performance
> blogs, Should it be something that is updated?
>
> For the hawq community it be good to know how long till hawq would get
> upstream green plum improvements like codegen.
>
> Likewise what features or changes have impala implemented to make it leap
> frog greenplum/hawq soo much? Are any of the changes portable to hawq?
>
>
>
> Sent from my iPad