You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Michael Pearce <Mi...@ig.com> on 2016/10/17 21:11:46 UTC

HAWQ Perfomance.

Hi All,


As now HAWQ is being caught up with by some competitors in terms of real use performance, and in some cases be out performed, most notably Spark 2.0 some queries we can perform faster since project tungsten.


Obviously HAWQ still has the SQL completeness advantage but this also is a slowly changing space, where Spark and others are improving.


Is there any plans to start looking improving the execution performance of HAWQ further with parquet vectorisation and whole stage codegen?


http://www.slideshare.net/databricks/spark-performance-whats-next


http://blog.2ndquadrant.com/postgresql-10-roadmap/


On the note of the postgres 10 roadmap. Is there any plans of updating compatibility / the fork of postgres to later versions (back merging), afaik HAWQ is a fork of 8.x which is quite dated.


Im sure already all of these questions are answered/discussed, but it be great to get some visibility into the roadmap for these areas for HAWQ.


Cheers

Mike




The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.

Re: HAWQ Perfomance.

Posted by Kyle Dunn <kd...@pivotal.io>.
I'm also in strong agreement here.

Codegen is a logical next step on my mind. There are multiple inherent
benefits, ranging from vectorised processing to runtime GPU offload
support. I think data locality and PXF performance are important  although
in pure cloud deployments, compute is, above all, what we influence most of
all. Not to mention, the Greenplum team is showing good potential with
codegen; we should incorporate that work, in any way possible, with HAWQ.

-Kyle

On Mon, Oct 17, 2016, 20:26 Hong Wu <xu...@gmail.com> wrote:

> Strong +1 on this.
>
> Performance is one of the reasons why our customers choose HAWQ, the
> existing leading performance might come from C implementation and Postgres
> implementation I think. Hawq will definitely focus on some performance
> improvement but frankly speaking plan/roadmap should be shaped and
> discussed in detail like this thread. Below are some of our
> beforehand consideration and to-do list:
>
>   - Codegen tech to optimize executor efficiency.
>   - Data-skipping tech to optimize I/O performance.
>   - Optimize external table access, especially PXF.
>   - Some vectorized refactor.
>   - Optimize data locality.
>   - Optimize distributed resource organization and management.
>   - Optimize communication module of interconnect.
>   - Gpus, SSDs
>   - ...
>
> We are running performance tests in several cluster environment for HAWQ
> every week and continue paying attention to latest performance update from
> our competitor and research paper. But we need some more guys joining us to
> be focused on performance feature. We are very very welcome that some
> developers from HAWQ open-source community to be a member of us in
> performance part.
>
> Best
> xunzhang
>
>
>
> 2016-10-18 5:11 GMT+08:00 Michael Pearce <Mi...@ig.com>:
>
> > Hi All,
> >
> >
> > As now HAWQ is being caught up with by some competitors in terms of real
> > use performance, and in some cases be out performed, most notably Spark
> 2.0
> > some queries we can perform faster since project tungsten.
> >
> >
> > Obviously HAWQ still has the SQL completeness advantage but this also is
> a
> > slowly changing space, where Spark and others are improving.
> >
> >
> > Is there any plans to start looking improving the execution performance
> of
> > HAWQ further with parquet vectorisation and whole stage codegen?
> >
> >
> > http://www.slideshare.net/databricks/spark-performance-whats-next
> >
> >
> > http://blog.2ndquadrant.com/postgresql-10-roadmap/
> >
> >
> > On the note of the postgres 10 roadmap. Is there any plans of updating
> > compatibility / the fork of postgres to later versions (back merging),
> > afaik HAWQ is a fork of 8.x which is quite dated.
> >
> >
> > Im sure already all of these questions are answered/discussed, but it be
> > great to get some visibility into the roadmap for these areas for HAWQ.
> >
> >
> > Cheers
> >
> > Mike
> >
> >
> >
> >
> > The information contained in this email is strictly confidential and for
> > the use of the addressee only, unless otherwise indicated. If you are not
> > the intended recipient, please do not read, copy, use or disclose to
> others
> > this message or any attachment. Please also notify the sender by replying
> > to this email or by telephone (+44(020 7896 0011) and then delete the
> > email and any copies of it. Opinions, conclusion (etc) that do not relate
> > to the official business of this company shall be understood as neither
> > given nor endorsed by it. IG is a trading name of IG Markets Limited (a
> > company registered in England and Wales, company number 04008957) and IG
> > Index Limited (a company registered in England and Wales, company number
> > 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> > London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> > Index Limited (register number 114059) are authorised and regulated by
> the
> > Financial Conduct Authority.
> >
>
-- 
*Kyle Dunn | Data Engineering | Pivotal*
Direct: 303.905.3171 <3039053171> | Email: kdunn@pivotal.io

Re: HAWQ Perfomance.

Posted by Hong Wu <xu...@gmail.com>.
Strong +1 on this.

Performance is one of the reasons why our customers choose HAWQ, the
existing leading performance might come from C implementation and Postgres
implementation I think. Hawq will definitely focus on some performance
improvement but frankly speaking plan/roadmap should be shaped and
discussed in detail like this thread. Below are some of our
beforehand consideration and to-do list:

  - Codegen tech to optimize executor efficiency.
  - Data-skipping tech to optimize I/O performance.
  - Optimize external table access, especially PXF.
  - Some vectorized refactor.
  - Optimize data locality.
  - Optimize distributed resource organization and management.
  - Optimize communication module of interconnect.
  - Gpus, SSDs
  - ...

We are running performance tests in several cluster environment for HAWQ
every week and continue paying attention to latest performance update from
our competitor and research paper. But we need some more guys joining us to
be focused on performance feature. We are very very welcome that some
developers from HAWQ open-source community to be a member of us in
performance part.

Best
xunzhang



2016-10-18 5:11 GMT+08:00 Michael Pearce <Mi...@ig.com>:

> Hi All,
>
>
> As now HAWQ is being caught up with by some competitors in terms of real
> use performance, and in some cases be out performed, most notably Spark 2.0
> some queries we can perform faster since project tungsten.
>
>
> Obviously HAWQ still has the SQL completeness advantage but this also is a
> slowly changing space, where Spark and others are improving.
>
>
> Is there any plans to start looking improving the execution performance of
> HAWQ further with parquet vectorisation and whole stage codegen?
>
>
> http://www.slideshare.net/databricks/spark-performance-whats-next
>
>
> http://blog.2ndquadrant.com/postgresql-10-roadmap/
>
>
> On the note of the postgres 10 roadmap. Is there any plans of updating
> compatibility / the fork of postgres to later versions (back merging),
> afaik HAWQ is a fork of 8.x which is quite dated.
>
>
> Im sure already all of these questions are answered/discussed, but it be
> great to get some visibility into the roadmap for these areas for HAWQ.
>
>
> Cheers
>
> Mike
>
>
>
>
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the
> email and any copies of it. Opinions, conclusion (etc) that do not relate
> to the official business of this company shall be understood as neither
> given nor endorsed by it. IG is a trading name of IG Markets Limited (a
> company registered in England and Wales, company number 04008957) and IG
> Index Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>

Re: HAWQ Perfomance.

Posted by Hong Wu <xu...@gmail.com>.
Hi Mike,

For another question you mentioned, we are trying to sync with upstream
Postgres. Since it is so difficult to make it in just one big step, we need
to do that module by module, feature by feature. For example, in HAWQ-786,
we try to import FDW from Postgres 9.6x, and this is still on the way.

The forking philosophy from Postgres brings HAWQ lots of great advantages
but at the same time we suffer a lot to update from Postgres community. We
are just trying to do this.


Thanks,
xunzhang

2016-10-18 5:11 GMT+08:00 Michael Pearce <Mi...@ig.com>:

> Hi All,
>
>
> As now HAWQ is being caught up with by some competitors in terms of real
> use performance, and in some cases be out performed, most notably Spark 2.0
> some queries we can perform faster since project tungsten.
>
>
> Obviously HAWQ still has the SQL completeness advantage but this also is a
> slowly changing space, where Spark and others are improving.
>
>
> Is there any plans to start looking improving the execution performance of
> HAWQ further with parquet vectorisation and whole stage codegen?
>
>
> http://www.slideshare.net/databricks/spark-performance-whats-next
>
>
> http://blog.2ndquadrant.com/postgresql-10-roadmap/
>
>
> On the note of the postgres 10 roadmap. Is there any plans of updating
> compatibility / the fork of postgres to later versions (back merging),
> afaik HAWQ is a fork of 8.x which is quite dated.
>
>
> Im sure already all of these questions are answered/discussed, but it be
> great to get some visibility into the roadmap for these areas for HAWQ.
>
>
> Cheers
>
> Mike
>
>
>
>
> The information contained in this email is strictly confidential and for
> the use of the addressee only, unless otherwise indicated. If you are not
> the intended recipient, please do not read, copy, use or disclose to others
> this message or any attachment. Please also notify the sender by replying
> to this email or by telephone (+44(020 7896 0011) and then delete the
> email and any copies of it. Opinions, conclusion (etc) that do not relate
> to the official business of this company shall be understood as neither
> given nor endorsed by it. IG is a trading name of IG Markets Limited (a
> company registered in England and Wales, company number 04008957) and IG
> Index Limited (a company registered in England and Wales, company number
> 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill,
> London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG
> Index Limited (register number 114059) are authorised and regulated by the
> Financial Conduct Authority.
>