You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Ed Espino <es...@apache.org> on 2017/12/04 19:02:30 UTC

Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

To the HAWQ dev community,

I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
<https://issues.apache.org/jira/browse/HAWQ-786>. This is a proposal for a
new component/functionality (Framework to support pluggable formats and
file systems) that appears to replace that currently provided by the PXF
component.

PXF was recently re-used in another open source project: Greenplum-DB (
https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
and depends on the server-side components that exist today in HAWQ’s source
tree.

The question I have for the community is: with the possibility of PXF being
replaced by a new component in a future release of HAWQ, what should become
of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
but there is an outside project now depending on it.

Does the HAWQ community want to maintain the PXF code in the HAWQ project
or if not here, where? If the GPDB community forked PXF from HAWQ would
that be ok?

Regards,

Ed Espino

Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by ch...@gmail.com.


	
		
		
	
		agree, we should keep the move smooth.
ThanksLei
		
		

		CheersLei
	




On Tue, Dec 5, 2017 at 10:43 PM +0800, "Michael Pearce" <Mi...@ig.com> wrote:










I think that before we discuss the off-shooting of PXF, that needs to be the case, and also a period of adoption to move as some companies may have custom PXF plugins working with HAWQ.

On 05/12/2017, 14:10, "chang.lei.cn@gmail.com"  wrote:

    
    
    
    	
    		
    		
    	
    		Mike,  the new framework will include all the plugins for external data. 
    ThanksLei
    
    		
    		
    
    		CheersLei
    	
    
    
    
    
    On Tue, Dec 5, 2017 at 5:30 PM +0800, "Michael Pearce"  wrote:
    
    
    
    
    
    
    
    
    
    
    Lei,
    
    I would disagree. The Jira / feature doesn’t cater for integration with other Hadoop components or other sources such as HBase etc, it simply seems to cover storage of internal data.
    
    As such I wouldn’t like to see a component maintained in apache for Apache Hawq, moved out of Apache ownership and governance.
    
    It should remain whilst still core to HAWQ in this ASF project.
    
    Regards
    Mike
    
    On 05/12/2017, 01:19, "Lei Chang"  wrote:
    
        Great to see Greenplum is using PXF. I think PXF is a very good fit for
        Greenplum's current architecture.
    
        To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
        code in one place: Greenplum.
    
        From HAWQ side, it can be deprecated in a future release after the new
        framework is ready.
    
        Thanks
        Lei
    
    
        On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino  wrote:
    
        > To the HAWQ dev community,
        >
        > I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
        > . This is a proposal for a
        > new component/functionality (Framework to support pluggable formats and
        > file systems) that appears to replace that currently provided by the PXF
        > component.
        >
        > PXF was recently re-used in another open source project: Greenplum-DB (
        > https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
        > and depends on the server-side components that exist today in HAWQ’s source
        > tree.
        >
        > The question I have for the community is: with the possibility of PXF being
        > replaced by a new component in a future release of HAWQ, what should become
        > of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
        > but there is an outside project now depending on it.
        >
        > Does the HAWQ community want to maintain the PXF code in the HAWQ project
        > or if not here, where? If the GPDB community forked PXF from HAWQ would
        > that be ok?
        >
        > Regards,
        >
        > Ed Espino
        >
    
    
    The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.
    
    
    
    
    
    







Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by ch...@gmail.com.


	
		
		
	
		agree, we should keep the move smooth.
ThanksLei
		
		

		CheersLei
	




On Tue, Dec 5, 2017 at 10:43 PM +0800, "Michael Pearce" <Mi...@ig.com> wrote:










I think that before we discuss the off-shooting of PXF, that needs to be the case, and also a period of adoption to move as some companies may have custom PXF plugins working with HAWQ.

On 05/12/2017, 14:10, "chang.lei.cn@gmail.com"  wrote:

    
    
    
    	
    		
    		
    	
    		Mike,  the new framework will include all the plugins for external data. 
    ThanksLei
    
    		
    		
    
    		CheersLei
    	
    
    
    
    
    On Tue, Dec 5, 2017 at 5:30 PM +0800, "Michael Pearce"  wrote:
    
    
    
    
    
    
    
    
    
    
    Lei,
    
    I would disagree. The Jira / feature doesn’t cater for integration with other Hadoop components or other sources such as HBase etc, it simply seems to cover storage of internal data.
    
    As such I wouldn’t like to see a component maintained in apache for Apache Hawq, moved out of Apache ownership and governance.
    
    It should remain whilst still core to HAWQ in this ASF project.
    
    Regards
    Mike
    
    On 05/12/2017, 01:19, "Lei Chang"  wrote:
    
        Great to see Greenplum is using PXF. I think PXF is a very good fit for
        Greenplum's current architecture.
    
        To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
        code in one place: Greenplum.
    
        From HAWQ side, it can be deprecated in a future release after the new
        framework is ready.
    
        Thanks
        Lei
    
    
        On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino  wrote:
    
        > To the HAWQ dev community,
        >
        > I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
        > . This is a proposal for a
        > new component/functionality (Framework to support pluggable formats and
        > file systems) that appears to replace that currently provided by the PXF
        > component.
        >
        > PXF was recently re-used in another open source project: Greenplum-DB (
        > https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
        > and depends on the server-side components that exist today in HAWQ’s source
        > tree.
        >
        > The question I have for the community is: with the possibility of PXF being
        > replaced by a new component in a future release of HAWQ, what should become
        > of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
        > but there is an outside project now depending on it.
        >
        > Does the HAWQ community want to maintain the PXF code in the HAWQ project
        > or if not here, where? If the GPDB community forked PXF from HAWQ would
        > that be ok?
        >
        > Regards,
        >
        > Ed Espino
        >
    
    
    The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.
    
    
    
    
    
    







Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by Michael Pearce <Mi...@ig.com>.
I think that before we discuss the off-shooting of PXF, that needs to be the case, and also a period of adoption to move as some companies may have custom PXF plugins working with HAWQ.

On 05/12/2017, 14:10, "chang.lei.cn@gmail.com" <ch...@gmail.com> wrote:

    
    
    
    	
    		
    		
    	
    		Mike,  the new framework will include all the plugins for external data. 
    ThanksLei
    
    		
    		
    
    		CheersLei
    	
    
    
    
    
    On Tue, Dec 5, 2017 at 5:30 PM +0800, "Michael Pearce" <Mi...@ig.com> wrote:
    
    
    
    
    
    
    
    
    
    
    Lei,
    
    I would disagree. The Jira / feature doesn’t cater for integration with other Hadoop components or other sources such as HBase etc, it simply seems to cover storage of internal data.
    
    As such I wouldn’t like to see a component maintained in apache for Apache Hawq, moved out of Apache ownership and governance.
    
    It should remain whilst still core to HAWQ in this ASF project.
    
    Regards
    Mike
    
    On 05/12/2017, 01:19, "Lei Chang"  wrote:
    
        Great to see Greenplum is using PXF. I think PXF is a very good fit for
        Greenplum's current architecture.
    
        To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
        code in one place: Greenplum.
    
        From HAWQ side, it can be deprecated in a future release after the new
        framework is ready.
    
        Thanks
        Lei
    
    
        On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino  wrote:
    
        > To the HAWQ dev community,
        >
        > I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
        > . This is a proposal for a
        > new component/functionality (Framework to support pluggable formats and
        > file systems) that appears to replace that currently provided by the PXF
        > component.
        >
        > PXF was recently re-used in another open source project: Greenplum-DB (
        > https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
        > and depends on the server-side components that exist today in HAWQ’s source
        > tree.
        >
        > The question I have for the community is: with the possibility of PXF being
        > replaced by a new component in a future release of HAWQ, what should become
        > of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
        > but there is an outside project now depending on it.
        >
        > Does the HAWQ community want to maintain the PXF code in the HAWQ project
        > or if not here, where? If the GPDB community forked PXF from HAWQ would
        > that be ok?
        >
        > Regards,
        >
        > Ed Espino
        >
    
    
    The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.
    
    
    
    
    
    


Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by ch...@gmail.com.


	
		
		
	
		Mike,  the new framework will include all the plugins for external data. 
ThanksLei

		
		

		CheersLei
	




On Tue, Dec 5, 2017 at 5:30 PM +0800, "Michael Pearce" <Mi...@ig.com> wrote:










Lei,

I would disagree. The Jira / feature doesn’t cater for integration with other Hadoop components or other sources such as HBase etc, it simply seems to cover storage of internal data.

As such I wouldn’t like to see a component maintained in apache for Apache Hawq, moved out of Apache ownership and governance.

It should remain whilst still core to HAWQ in this ASF project.

Regards
Mike

On 05/12/2017, 01:19, "Lei Chang"  wrote:

    Great to see Greenplum is using PXF. I think PXF is a very good fit for
    Greenplum's current architecture.

    To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
    code in one place: Greenplum.

    From HAWQ side, it can be deprecated in a future release after the new
    framework is ready.

    Thanks
    Lei


    On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino  wrote:

    > To the HAWQ dev community,
    >
    > I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
    > . This is a proposal for a
    > new component/functionality (Framework to support pluggable formats and
    > file systems) that appears to replace that currently provided by the PXF
    > component.
    >
    > PXF was recently re-used in another open source project: Greenplum-DB (
    > https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
    > and depends on the server-side components that exist today in HAWQ’s source
    > tree.
    >
    > The question I have for the community is: with the possibility of PXF being
    > replaced by a new component in a future release of HAWQ, what should become
    > of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
    > but there is an outside project now depending on it.
    >
    > Does the HAWQ community want to maintain the PXF code in the HAWQ project
    > or if not here, where? If the GPDB community forked PXF from HAWQ would
    > that be ok?
    >
    > Regards,
    >
    > Ed Espino
    >


The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.






Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by ch...@gmail.com.


	
		
		
	
		Mike,  the new framework will include all the plugins for external data. 
ThanksLei

		
		

		CheersLei
	




On Tue, Dec 5, 2017 at 5:30 PM +0800, "Michael Pearce" <Mi...@ig.com> wrote:










Lei,

I would disagree. The Jira / feature doesn’t cater for integration with other Hadoop components or other sources such as HBase etc, it simply seems to cover storage of internal data.

As such I wouldn’t like to see a component maintained in apache for Apache Hawq, moved out of Apache ownership and governance.

It should remain whilst still core to HAWQ in this ASF project.

Regards
Mike

On 05/12/2017, 01:19, "Lei Chang"  wrote:

    Great to see Greenplum is using PXF. I think PXF is a very good fit for
    Greenplum's current architecture.

    To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
    code in one place: Greenplum.

    From HAWQ side, it can be deprecated in a future release after the new
    framework is ready.

    Thanks
    Lei


    On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino  wrote:

    > To the HAWQ dev community,
    >
    > I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
    > . This is a proposal for a
    > new component/functionality (Framework to support pluggable formats and
    > file systems) that appears to replace that currently provided by the PXF
    > component.
    >
    > PXF was recently re-used in another open source project: Greenplum-DB (
    > https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
    > and depends on the server-side components that exist today in HAWQ’s source
    > tree.
    >
    > The question I have for the community is: with the possibility of PXF being
    > replaced by a new component in a future release of HAWQ, what should become
    > of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
    > but there is an outside project now depending on it.
    >
    > Does the HAWQ community want to maintain the PXF code in the HAWQ project
    > or if not here, where? If the GPDB community forked PXF from HAWQ would
    > that be ok?
    >
    > Regards,
    >
    > Ed Espino
    >


The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.






Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by Michael Pearce <Mi...@ig.com>.
Lei,

I would disagree. The Jira / feature doesn’t cater for integration with other Hadoop components or other sources such as HBase etc, it simply seems to cover storage of internal data.

As such I wouldn’t like to see a component maintained in apache for Apache Hawq, moved out of Apache ownership and governance.

It should remain whilst still core to HAWQ in this ASF project.

Regards
Mike

On 05/12/2017, 01:19, "Lei Chang" <ch...@gmail.com> wrote:

    Great to see Greenplum is using PXF. I think PXF is a very good fit for
    Greenplum's current architecture.

    To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
    code in one place: Greenplum.

    From HAWQ side, it can be deprecated in a future release after the new
    framework is ready.

    Thanks
    Lei


    On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino <es...@apache.org> wrote:

    > To the HAWQ dev community,
    >
    > I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
    > <https://issues.apache.org/jira/browse/HAWQ-786>. This is a proposal for a
    > new component/functionality (Framework to support pluggable formats and
    > file systems) that appears to replace that currently provided by the PXF
    > component.
    >
    > PXF was recently re-used in another open source project: Greenplum-DB (
    > https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
    > and depends on the server-side components that exist today in HAWQ’s source
    > tree.
    >
    > The question I have for the community is: with the possibility of PXF being
    > replaced by a new component in a future release of HAWQ, what should become
    > of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
    > but there is an outside project now depending on it.
    >
    > Does the HAWQ community want to maintain the PXF code in the HAWQ project
    > or if not here, where? If the GPDB community forked PXF from HAWQ would
    > that be ok?
    >
    > Regards,
    >
    > Ed Espino
    >


The information contained in this email is strictly confidential and for the use of the addressee only, unless otherwise indicated. If you are not the intended recipient, please do not read, copy, use or disclose to others this message or any attachment. Please also notify the sender by replying to this email or by telephone (+44(020 7896 0011) and then delete the email and any copies of it. Opinions, conclusion (etc) that do not relate to the official business of this company shall be understood as neither given nor endorsed by it. IG is a trading name of IG Markets Limited (a company registered in England and Wales, company number 04008957) and IG Index Limited (a company registered in England and Wales, company number 01190902). Registered address at Cannon Bridge House, 25 Dowgate Hill, London EC4R 2YA. Both IG Markets Limited (register number 195355) and IG Index Limited (register number 114059) are authorised and regulated by the Financial Conduct Authority.

Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by Lei Chang <ch...@gmail.com>.
Great to see Greenplum is using PXF. I think PXF is a very good fit for
Greenplum's current architecture.

To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
code in one place: Greenplum.

From HAWQ side, it can be deprecated in a future release after the new
framework is ready.

Thanks
Lei


On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino <es...@apache.org> wrote:

> To the HAWQ dev community,
>
> I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
> <https://issues.apache.org/jira/browse/HAWQ-786>. This is a proposal for a
> new component/functionality (Framework to support pluggable formats and
> file systems) that appears to replace that currently provided by the PXF
> component.
>
> PXF was recently re-used in another open source project: Greenplum-DB (
> https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
> and depends on the server-side components that exist today in HAWQ’s source
> tree.
>
> The question I have for the community is: with the possibility of PXF being
> replaced by a new component in a future release of HAWQ, what should become
> of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
> but there is an outside project now depending on it.
>
> Does the HAWQ community want to maintain the PXF code in the HAWQ project
> or if not here, where? If the GPDB community forked PXF from HAWQ would
> that be ok?
>
> Regards,
>
> Ed Espino
>

Re: Questions for HAWQ dev community: Pluggable storage formats and files systems vs. PXF

Posted by Lei Chang <ch...@gmail.com>.
Great to see Greenplum is using PXF. I think PXF is a very good fit for
Greenplum's current architecture.

To avoid duplicate maintenance cost, my suggestion is to only maintain PXF
code in one place: Greenplum.

From HAWQ side, it can be deprecated in a future release after the new
framework is ready.

Thanks
Lei


On Tue, Dec 5, 2017 at 3:02 AM, Ed Espino <es...@apache.org> wrote:

> To the HAWQ dev community,
>
> I wanted to raise up an issue for discussion regarding JIRA HAWQ-786
> <https://issues.apache.org/jira/browse/HAWQ-786>. This is a proposal for a
> new component/functionality (Framework to support pluggable formats and
> file systems) that appears to replace that currently provided by the PXF
> component.
>
> PXF was recently re-used in another open source project: Greenplum-DB (
> https://github.com/greenplum-db/gpdb/tree/5X_STABLE/gpAux/extensions/pxf )
> and depends on the server-side components that exist today in HAWQ’s source
> tree.
>
> The question I have for the community is: with the possibility of PXF being
> replaced by a new component in a future release of HAWQ, what should become
> of the PXF code? Older releases of HAWQ (2.3.0 >) will continue to use it
> but there is an outside project now depending on it.
>
> Does the HAWQ community want to maintain the PXF code in the HAWQ project
> or if not here, where? If the GPDB community forked PXF from HAWQ would
> that be ok?
>
> Regards,
>
> Ed Espino
>