You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hawq.apache.org by Konstantin Boudnik <co...@apache.org> on 2016/04/25 21:56:50 UTC

HAWQ integration to Apache bigdata stack: remaining steps

guys,

I wanted to put together a list of remaining steps needed before we can
declare Hawq to be a good citizen of Apache Bigtop (aka Apache bigdata stack).

I have put together a JIRA [1] to track these points, and here's the gist of
it for the reader's convenience. Please ping me if you have any questions or
follow up questions.

Regards,
  Cos

The overview of the remaining steps and the overall status of the integration work.

*External dependencies*
- the biggest issue was and remains the use of libthrift, which isn't packaged,
provided nor supported by anyone. Right now, Bigtop-HAWQ integration branch
[uses|https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob_plain;f=bigtop_toolchain/manifests/libhdfs.pp;hb=refs/heads/BIGTOP-2320]
my own pre-built version of the library, hosted
[here|https://bintray.com/artifact/download/wangzw/deb/dists/trusty/contrib/binary-amd64].
However, this is clearly an insecure and has to be either solved by HAWQ adding
this dependency as the source; or by convincing Bigtop community that hosting
libthrift library is beneficial for the community at large

*Packaging*
- overall, the packaging code is complete and is pushed to the Bigtop branch
(see link below). Considering that the work has been completed about 5 weeks
ago and was aimed at the state of trunk back in the March, there might be some
minor changes, which would require additional tweaks
- libhdfs library code (if already included into HAWQ project) might require
additional changes to the packaging code, so the library can be produces and
properly set in the installation phase
- Bigtop CI has jobs to create CentOS and Ubuntu packages (linked from the
BIGTOP-2320 below)

*Tests*
- smoke tests need to be created (as per BIGTOP-2322), but that seems to be a
minor undertaking once the rest of the work is finished
- packaging tests are required to be integrated into Bigtop stack BIGTOP-2324

*Deployment*
- deployment code is completed. However, it needs to be extended to property
support cluster roles and to be linked to the main {{site.pp}} recipe
- because real-life deployment can not rely on in-house python wrappers using
passwordless-ssh, the lifecycle management and initial bootstrap are done
directly by calling into HAWQ scripts, providing such functionality. It is
possible that some of these interfaces were updated in the last 6 weeks, so
additional testing would be needed.
- it should be responsibility of the HAWQ to provide a concise way of
initializing a master, segment, and so on without a need for password-less ssh,
which is suboptimal and won't be accepted by Bigtop community as it is breaks
the deployment model

*Toolchain*
- toolchain code is completed in the bigtop branch. This will allow to build
HAWQ in the standard Bigtop container available for the CI and 3rd party users
- toolchain code needs to be rebased on top of current Bigtop master. and
possible conflicts would have to be resolved
- once the integration is finished, Bigtop slave images will have to be updated
to enable automatic CI runs


[1] https://issues.apache.org/jira/browse/HAWQ-706

Re: HAWQ integration to Apache bigdata stack: remaining steps

Posted by Lei Chang <le...@apache.org>.
+1 for the proposal. Formalizing the requirements from Bigtop and Ambari
sounds a good start point.

And It looks making bigtop work in short term without exchange ssh keys
only needs a few small changes in the standby script if bigtop takes the
responsibility to copy the files between master and standby as a
precondition.

Cheers
Lei


On Wed, Apr 27, 2016 at 2:37 AM, Alexander Denissov <ad...@pivotal.io>
wrote:

> For the deployment and management use cases I think we need to revisit the
> design and implementation of the current tools taking into account the
> requirements from Ambari and Bigtop management practices.
>
> I would propose 2-prong approach:
> - atomic management scripts for operations on individual hosts
> - orchestration layer for cluster-level operations
>
> Both Ambari and Bigtop (I'd think) will prefer to deal with scripts on
> individual nodes, which will not require SSH keys being distributed for
> gpadmin user as the tooling will rely on Ambari or Bigtop key management.
> These tools will then orchestrate cross-component operations (like init
> standby) via their own wizards or scripts.
>
> For people not using either Ambari or Bigtop, we can provide script-based
> orchestration layer similar to cluster-level operations we have currently.
> This layer will require ssh keys be exchanged to function properly.
>
> If we agree in principal, I can take a stab into formalizing the Ambari
> requirements for the management scripts.
>
> --
> Thanks
> Alex
>
> On Tue, Apr 26, 2016 at 12:51 AM, Radar Da lei <rl...@pivotal.io> wrote:
>
> > Hi Konstantin,
> >
> > Thanks for list these items out.
> >
> > For 'External dependencies' part, do you mean 'libthrift' or 'libhdfs'? I
> > see all the links above point to libhdfs.
> >
> >     1. If you mean 'libhdfs', now it's already in HAWQ's source code, it
> is
> > located in 'depends/libhdfs3', we should build it as the same as libyarn
> > does.
> >
> >     2. If you mean thrift, I didn't get what make it different with other
> > dependencies. Would you please specify the details need to be done?
> >
> > For "Deployment" part:
> >
> >     1. Sure we can try to make 'master' and 'segment' to do
> init/start/stop
> > without pasword-less. But initialize standby node will require to
> > synchronize
> > files with master. Any advice how should we handle standby?
> >         Now HAWQ-469 <https://issues.apache.org/jira/browse/HAWQ-469> is
> > tracking this, would you share the status, maybe we can assist on this to
> > speed it up.
> >
> >     2. Another question is  if "remove password-less" is only required
> > during hawq installation/initialization(deployment)? Is it required to
> our
> > other management tools, e.g. 'hawq config/check/scp/ssh/...', these tools
> > will not function without password-less.
> >
> > Thanks.
> >
> >
> >
> > Regards,
> > Radar
> >
> > On Tue, Apr 26, 2016 at 3:56 AM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> >
> > > guys,
> > >
> > > I wanted to put together a list of remaining steps needed before we can
> > > declare Hawq to be a good citizen of Apache Bigtop (aka Apache bigdata
> > > stack).
> > >
> > > I have put together a JIRA [1] to track these points, and here's the
> gist
> > > of
> > > it for the reader's convenience. Please ping me if you have any
> questions
> > > or
> > > follow up questions.
> > >
> > > Regards,
> > >   Cos
> > >
> > > The overview of the remaining steps and the overall status of the
> > > integration work.
> > >
> > > *External dependencies*
> > > - the biggest issue was and remains the use of libthrift, which isn't
> > > packaged,
> > > provided nor supported by anyone. Right now, Bigtop-HAWQ integration
> > branch
> > > [uses|
> > >
> >
> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob_plain;f=bigtop_toolchain/manifests/libhdfs.pp;hb=refs/heads/BIGTOP-2320
> > > ]
> > > my own pre-built version of the library, hosted
> > > [here|
> > >
> >
> https://bintray.com/artifact/download/wangzw/deb/dists/trusty/contrib/binary-amd64
> > > ].
> > > However, this is clearly an insecure and has to be either solved by
> HAWQ
> > > adding
> > > this dependency as the source; or by convincing Bigtop community that
> > > hosting
> > > libthrift library is beneficial for the community at large
> > >
> > > *Packaging*
> > > - overall, the packaging code is complete and is pushed to the Bigtop
> > > branch
> > > (see link below). Considering that the work has been completed about 5
> > > weeks
> > > ago and was aimed at the state of trunk back in the March, there might
> be
> > > some
> > > minor changes, which would require additional tweaks
> > > - libhdfs library code (if already included into HAWQ project) might
> > > require
> > > additional changes to the packaging code, so the library can be
> produces
> > > and
> > > properly set in the installation phase
> > > - Bigtop CI has jobs to create CentOS and Ubuntu packages (linked from
> > the
> > > BIGTOP-2320 below)
> > >
> > > *Tests*
> > > - smoke tests need to be created (as per BIGTOP-2322), but that seems
> to
> > > be a
> > > minor undertaking once the rest of the work is finished
> > > - packaging tests are required to be integrated into Bigtop stack
> > > BIGTOP-2324
> > >
> > > *Deployment*
> > > - deployment code is completed. However, it needs to be extended to
> > > property
> > > support cluster roles and to be linked to the main {{site.pp}} recipe
> > > - because real-life deployment can not rely on in-house python wrappers
> > > using
> > > passwordless-ssh, the lifecycle management and initial bootstrap are
> done
> > > directly by calling into HAWQ scripts, providing such functionality. It
> > is
> > > possible that some of these interfaces were updated in the last 6
> weeks,
> > so
> > > additional testing would be needed.
> > > - it should be responsibility of the HAWQ to provide a concise way of
> > > initializing a master, segment, and so on without a need for
> > password-less
> > > ssh,
> > > which is suboptimal and won't be accepted by Bigtop community as it is
> > > breaks
> > > the deployment model
> > >
> > > *Toolchain*
> > > - toolchain code is completed in the bigtop branch. This will allow to
> > > build
> > > HAWQ in the standard Bigtop container available for the CI and 3rd
> party
> > > users
> > > - toolchain code needs to be rebased on top of current Bigtop master.
> and
> > > possible conflicts would have to be resolved
> > > - once the integration is finished, Bigtop slave images will have to be
> > > updated
> > > to enable automatic CI runs
> > >
> > >
> > > [1] https://issues.apache.org/jira/browse/HAWQ-706
> > >
> >
>

Re: HAWQ integration to Apache bigdata stack: remaining steps

Posted by Alexander Denissov <ad...@pivotal.io>.
For the deployment and management use cases I think we need to revisit the
design and implementation of the current tools taking into account the
requirements from Ambari and Bigtop management practices.

I would propose 2-prong approach:
- atomic management scripts for operations on individual hosts
- orchestration layer for cluster-level operations

Both Ambari and Bigtop (I'd think) will prefer to deal with scripts on
individual nodes, which will not require SSH keys being distributed for
gpadmin user as the tooling will rely on Ambari or Bigtop key management.
These tools will then orchestrate cross-component operations (like init
standby) via their own wizards or scripts.

For people not using either Ambari or Bigtop, we can provide script-based
orchestration layer similar to cluster-level operations we have currently.
This layer will require ssh keys be exchanged to function properly.

If we agree in principal, I can take a stab into formalizing the Ambari
requirements for the management scripts.

--
Thanks
Alex

On Tue, Apr 26, 2016 at 12:51 AM, Radar Da lei <rl...@pivotal.io> wrote:

> Hi Konstantin,
>
> Thanks for list these items out.
>
> For 'External dependencies' part, do you mean 'libthrift' or 'libhdfs'? I
> see all the links above point to libhdfs.
>
>     1. If you mean 'libhdfs', now it's already in HAWQ's source code, it is
> located in 'depends/libhdfs3', we should build it as the same as libyarn
> does.
>
>     2. If you mean thrift, I didn't get what make it different with other
> dependencies. Would you please specify the details need to be done?
>
> For "Deployment" part:
>
>     1. Sure we can try to make 'master' and 'segment' to do init/start/stop
> without pasword-less. But initialize standby node will require to
> synchronize
> files with master. Any advice how should we handle standby?
>         Now HAWQ-469 <https://issues.apache.org/jira/browse/HAWQ-469> is
> tracking this, would you share the status, maybe we can assist on this to
> speed it up.
>
>     2. Another question is  if "remove password-less" is only required
> during hawq installation/initialization(deployment)? Is it required to our
> other management tools, e.g. 'hawq config/check/scp/ssh/...', these tools
> will not function without password-less.
>
> Thanks.
>
>
>
> Regards,
> Radar
>
> On Tue, Apr 26, 2016 at 3:56 AM, Konstantin Boudnik <co...@apache.org>
> wrote:
>
> > guys,
> >
> > I wanted to put together a list of remaining steps needed before we can
> > declare Hawq to be a good citizen of Apache Bigtop (aka Apache bigdata
> > stack).
> >
> > I have put together a JIRA [1] to track these points, and here's the gist
> > of
> > it for the reader's convenience. Please ping me if you have any questions
> > or
> > follow up questions.
> >
> > Regards,
> >   Cos
> >
> > The overview of the remaining steps and the overall status of the
> > integration work.
> >
> > *External dependencies*
> > - the biggest issue was and remains the use of libthrift, which isn't
> > packaged,
> > provided nor supported by anyone. Right now, Bigtop-HAWQ integration
> branch
> > [uses|
> >
> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob_plain;f=bigtop_toolchain/manifests/libhdfs.pp;hb=refs/heads/BIGTOP-2320
> > ]
> > my own pre-built version of the library, hosted
> > [here|
> >
> https://bintray.com/artifact/download/wangzw/deb/dists/trusty/contrib/binary-amd64
> > ].
> > However, this is clearly an insecure and has to be either solved by HAWQ
> > adding
> > this dependency as the source; or by convincing Bigtop community that
> > hosting
> > libthrift library is beneficial for the community at large
> >
> > *Packaging*
> > - overall, the packaging code is complete and is pushed to the Bigtop
> > branch
> > (see link below). Considering that the work has been completed about 5
> > weeks
> > ago and was aimed at the state of trunk back in the March, there might be
> > some
> > minor changes, which would require additional tweaks
> > - libhdfs library code (if already included into HAWQ project) might
> > require
> > additional changes to the packaging code, so the library can be produces
> > and
> > properly set in the installation phase
> > - Bigtop CI has jobs to create CentOS and Ubuntu packages (linked from
> the
> > BIGTOP-2320 below)
> >
> > *Tests*
> > - smoke tests need to be created (as per BIGTOP-2322), but that seems to
> > be a
> > minor undertaking once the rest of the work is finished
> > - packaging tests are required to be integrated into Bigtop stack
> > BIGTOP-2324
> >
> > *Deployment*
> > - deployment code is completed. However, it needs to be extended to
> > property
> > support cluster roles and to be linked to the main {{site.pp}} recipe
> > - because real-life deployment can not rely on in-house python wrappers
> > using
> > passwordless-ssh, the lifecycle management and initial bootstrap are done
> > directly by calling into HAWQ scripts, providing such functionality. It
> is
> > possible that some of these interfaces were updated in the last 6 weeks,
> so
> > additional testing would be needed.
> > - it should be responsibility of the HAWQ to provide a concise way of
> > initializing a master, segment, and so on without a need for
> password-less
> > ssh,
> > which is suboptimal and won't be accepted by Bigtop community as it is
> > breaks
> > the deployment model
> >
> > *Toolchain*
> > - toolchain code is completed in the bigtop branch. This will allow to
> > build
> > HAWQ in the standard Bigtop container available for the CI and 3rd party
> > users
> > - toolchain code needs to be rebased on top of current Bigtop master. and
> > possible conflicts would have to be resolved
> > - once the integration is finished, Bigtop slave images will have to be
> > updated
> > to enable automatic CI runs
> >
> >
> > [1] https://issues.apache.org/jira/browse/HAWQ-706
> >
>

Re: HAWQ integration to Apache bigdata stack: remaining steps

Posted by Lei Chang <le...@apache.org>.
Hi Konstantin,

thrift is from apache, and the installation step is quite like other
libraries. can you guys use "yum" to install it on bigtop side?

Cheers
Lei



On Sat, Apr 30, 2016 at 2:12 AM, Konstantin Boudnik <co...@apache.org> wrote:

> On Tue, Apr 26, 2016 at 03:51PM, Radar Da lei wrote:
> > Hi Konstantin,
> >
> > Thanks for list these items out.
> >
> > For 'External dependencies' part, do you mean 'libthrift' or 'libhdfs'? I
> > see all the links above point to libhdfs.
> >
> >     1. If you mean 'libhdfs', now it's already in HAWQ's source code, it
> is
> > located in 'depends/libhdfs3', we should build it as the same as libyarn
> > does.
> >
> >     2. If you mean thrift, I didn't get what make it different with other
> > dependencies. Would you please specify the details need to be done?
>
> Sorry, I was taking about libthrift. The points about libhdfs are no longer
> valid indeed, as it has been moved into the project codebase.
>
> > For "Deployment" part:
> >
> >     1. Sure we can try to make 'master' and 'segment' to do
> init/start/stop
> > without pasword-less. But initialize standby node will require to
> synchronize
> > files with master. Any advice how should we handle standby?
> >         Now HAWQ-469 <https://issues.apache.org/jira/browse/HAWQ-469> is
> > tracking this, would you share the status, maybe we can assist on this to
> > speed it up.
>
> This is fine. We doing something similar when standing up HDFS HA, so
> there's
> no technical blocker in this.
>
> >     2. Another question is  if "remove password-less" is only required
> > during hawq installation/initialization(deployment)? Is it required to
> our
> > other management tools, e.g. 'hawq config/check/scp/ssh/...', these tools
> > will not function without password-less.
>
> I am not asking to remove per se, but rather to have a basic set of scripts
> that would work at the node level only, and then wrap them into
> shh-dependant
> logic where you see fit.
>
> Cos
>
> > On Tue, Apr 26, 2016 at 3:56 AM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> > > guys,
> > >
> > > I wanted to put together a list of remaining steps needed before we can
> > > declare Hawq to be a good citizen of Apache Bigtop (aka Apache bigdata
> > > stack).
> > >
> > > I have put together a JIRA [1] to track these points, and here's the
> gist
> > > of
> > > it for the reader's convenience. Please ping me if you have any
> questions
> > > or
> > > follow up questions.
> > >
> > > Regards,
> > >   Cos
> > >
> > > The overview of the remaining steps and the overall status of the
> > > integration work.
> > >
> > > *External dependencies*
> > > - the biggest issue was and remains the use of libthrift, which isn't
> > > packaged,
> > > provided nor supported by anyone. Right now, Bigtop-HAWQ integration
> branch
> > > [uses|
> > >
> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob_plain;f=bigtop_toolchain/manifests/libhdfs.pp;hb=refs/heads/BIGTOP-2320
> > > ]
> > > my own pre-built version of the library, hosted
> > > [here|
> > >
> https://bintray.com/artifact/download/wangzw/deb/dists/trusty/contrib/binary-amd64
> > > ].
> > > However, this is clearly an insecure and has to be either solved by
> HAWQ
> > > adding
> > > this dependency as the source; or by convincing Bigtop community that
> > > hosting
> > > libthrift library is beneficial for the community at large
> > >
> > > *Packaging*
> > > - overall, the packaging code is complete and is pushed to the Bigtop
> > > branch
> > > (see link below). Considering that the work has been completed about 5
> > > weeks
> > > ago and was aimed at the state of trunk back in the March, there might
> be
> > > some
> > > minor changes, which would require additional tweaks
> > > - libhdfs library code (if already included into HAWQ project) might
> > > require
> > > additional changes to the packaging code, so the library can be
> produces
> > > and
> > > properly set in the installation phase
> > > - Bigtop CI has jobs to create CentOS and Ubuntu packages (linked from
> the
> > > BIGTOP-2320 below)
> > >
> > > *Tests*
> > > - smoke tests need to be created (as per BIGTOP-2322), but that seems
> to
> > > be a
> > > minor undertaking once the rest of the work is finished
> > > - packaging tests are required to be integrated into Bigtop stack
> > > BIGTOP-2324
> > >
> > > *Deployment*
> > > - deployment code is completed. However, it needs to be extended to
> > > property
> > > support cluster roles and to be linked to the main {{site.pp}} recipe
> > > - because real-life deployment can not rely on in-house python wrappers
> > > using
> > > passwordless-ssh, the lifecycle management and initial bootstrap are
> done
> > > directly by calling into HAWQ scripts, providing such functionality.
> It is
> > > possible that some of these interfaces were updated in the last 6
> weeks, so
> > > additional testing would be needed.
> > > - it should be responsibility of the HAWQ to provide a concise way of
> > > initializing a master, segment, and so on without a need for
> password-less
> > > ssh,
> > > which is suboptimal and won't be accepted by Bigtop community as it is
> > > breaks
> > > the deployment model
> > >
> > > *Toolchain*
> > > - toolchain code is completed in the bigtop branch. This will allow to
> > > build
> > > HAWQ in the standard Bigtop container available for the CI and 3rd
> party
> > > users
> > > - toolchain code needs to be rebased on top of current Bigtop master.
> and
> > > possible conflicts would have to be resolved
> > > - once the integration is finished, Bigtop slave images will have to be
> > > updated
> > > to enable automatic CI runs
> > >
> > >
> > > [1] https://issues.apache.org/jira/browse/HAWQ-706
> > >
>

Re: HAWQ integration to Apache bigdata stack: remaining steps

Posted by Konstantin Boudnik <co...@apache.org>.
On Tue, Apr 26, 2016 at 03:51PM, Radar Da lei wrote:
> Hi Konstantin,
> 
> Thanks for list these items out.
> 
> For 'External dependencies' part, do you mean 'libthrift' or 'libhdfs'? I
> see all the links above point to libhdfs.
> 
>     1. If you mean 'libhdfs', now it's already in HAWQ's source code, it is
> located in 'depends/libhdfs3', we should build it as the same as libyarn
> does.
> 
>     2. If you mean thrift, I didn't get what make it different with other
> dependencies. Would you please specify the details need to be done?

Sorry, I was taking about libthrift. The points about libhdfs are no longer
valid indeed, as it has been moved into the project codebase.

> For "Deployment" part:
> 
>     1. Sure we can try to make 'master' and 'segment' to do init/start/stop
> without pasword-less. But initialize standby node will require to synchronize
> files with master. Any advice how should we handle standby?
>         Now HAWQ-469 <https://issues.apache.org/jira/browse/HAWQ-469> is
> tracking this, would you share the status, maybe we can assist on this to
> speed it up.

This is fine. We doing something similar when standing up HDFS HA, so there's
no technical blocker in this.

>     2. Another question is  if "remove password-less" is only required
> during hawq installation/initialization(deployment)? Is it required to our
> other management tools, e.g. 'hawq config/check/scp/ssh/...', these tools
> will not function without password-less.

I am not asking to remove per se, but rather to have a basic set of scripts
that would work at the node level only, and then wrap them into shh-dependant
logic where you see fit.

Cos

> On Tue, Apr 26, 2016 at 3:56 AM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > guys,
> >
> > I wanted to put together a list of remaining steps needed before we can
> > declare Hawq to be a good citizen of Apache Bigtop (aka Apache bigdata
> > stack).
> >
> > I have put together a JIRA [1] to track these points, and here's the gist
> > of
> > it for the reader's convenience. Please ping me if you have any questions
> > or
> > follow up questions.
> >
> > Regards,
> >   Cos
> >
> > The overview of the remaining steps and the overall status of the
> > integration work.
> >
> > *External dependencies*
> > - the biggest issue was and remains the use of libthrift, which isn't
> > packaged,
> > provided nor supported by anyone. Right now, Bigtop-HAWQ integration branch
> > [uses|
> > https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob_plain;f=bigtop_toolchain/manifests/libhdfs.pp;hb=refs/heads/BIGTOP-2320
> > ]
> > my own pre-built version of the library, hosted
> > [here|
> > https://bintray.com/artifact/download/wangzw/deb/dists/trusty/contrib/binary-amd64
> > ].
> > However, this is clearly an insecure and has to be either solved by HAWQ
> > adding
> > this dependency as the source; or by convincing Bigtop community that
> > hosting
> > libthrift library is beneficial for the community at large
> >
> > *Packaging*
> > - overall, the packaging code is complete and is pushed to the Bigtop
> > branch
> > (see link below). Considering that the work has been completed about 5
> > weeks
> > ago and was aimed at the state of trunk back in the March, there might be
> > some
> > minor changes, which would require additional tweaks
> > - libhdfs library code (if already included into HAWQ project) might
> > require
> > additional changes to the packaging code, so the library can be produces
> > and
> > properly set in the installation phase
> > - Bigtop CI has jobs to create CentOS and Ubuntu packages (linked from the
> > BIGTOP-2320 below)
> >
> > *Tests*
> > - smoke tests need to be created (as per BIGTOP-2322), but that seems to
> > be a
> > minor undertaking once the rest of the work is finished
> > - packaging tests are required to be integrated into Bigtop stack
> > BIGTOP-2324
> >
> > *Deployment*
> > - deployment code is completed. However, it needs to be extended to
> > property
> > support cluster roles and to be linked to the main {{site.pp}} recipe
> > - because real-life deployment can not rely on in-house python wrappers
> > using
> > passwordless-ssh, the lifecycle management and initial bootstrap are done
> > directly by calling into HAWQ scripts, providing such functionality. It is
> > possible that some of these interfaces were updated in the last 6 weeks, so
> > additional testing would be needed.
> > - it should be responsibility of the HAWQ to provide a concise way of
> > initializing a master, segment, and so on without a need for password-less
> > ssh,
> > which is suboptimal and won't be accepted by Bigtop community as it is
> > breaks
> > the deployment model
> >
> > *Toolchain*
> > - toolchain code is completed in the bigtop branch. This will allow to
> > build
> > HAWQ in the standard Bigtop container available for the CI and 3rd party
> > users
> > - toolchain code needs to be rebased on top of current Bigtop master. and
> > possible conflicts would have to be resolved
> > - once the integration is finished, Bigtop slave images will have to be
> > updated
> > to enable automatic CI runs
> >
> >
> > [1] https://issues.apache.org/jira/browse/HAWQ-706
> >

Re: HAWQ integration to Apache bigdata stack: remaining steps

Posted by Radar Da lei <rl...@pivotal.io>.
Hi Konstantin,

Thanks for list these items out.

For 'External dependencies' part, do you mean 'libthrift' or 'libhdfs'? I
see all the links above point to libhdfs.

    1. If you mean 'libhdfs', now it's already in HAWQ's source code, it is
located in 'depends/libhdfs3', we should build it as the same as libyarn
does.

    2. If you mean thrift, I didn't get what make it different with other
dependencies. Would you please specify the details need to be done?

For "Deployment" part:

    1. Sure we can try to make 'master' and 'segment' to do init/start/stop
without pasword-less. But initialize standby node will require to synchronize
files with master. Any advice how should we handle standby?
        Now HAWQ-469 <https://issues.apache.org/jira/browse/HAWQ-469> is
tracking this, would you share the status, maybe we can assist on this to
speed it up.

    2. Another question is  if "remove password-less" is only required
during hawq installation/initialization(deployment)? Is it required to our
other management tools, e.g. 'hawq config/check/scp/ssh/...', these tools
will not function without password-less.

Thanks.



Regards,
Radar

On Tue, Apr 26, 2016 at 3:56 AM, Konstantin Boudnik <co...@apache.org> wrote:

> guys,
>
> I wanted to put together a list of remaining steps needed before we can
> declare Hawq to be a good citizen of Apache Bigtop (aka Apache bigdata
> stack).
>
> I have put together a JIRA [1] to track these points, and here's the gist
> of
> it for the reader's convenience. Please ping me if you have any questions
> or
> follow up questions.
>
> Regards,
>   Cos
>
> The overview of the remaining steps and the overall status of the
> integration work.
>
> *External dependencies*
> - the biggest issue was and remains the use of libthrift, which isn't
> packaged,
> provided nor supported by anyone. Right now, Bigtop-HAWQ integration branch
> [uses|
> https://git-wip-us.apache.org/repos/asf?p=bigtop.git;a=blob_plain;f=bigtop_toolchain/manifests/libhdfs.pp;hb=refs/heads/BIGTOP-2320
> ]
> my own pre-built version of the library, hosted
> [here|
> https://bintray.com/artifact/download/wangzw/deb/dists/trusty/contrib/binary-amd64
> ].
> However, this is clearly an insecure and has to be either solved by HAWQ
> adding
> this dependency as the source; or by convincing Bigtop community that
> hosting
> libthrift library is beneficial for the community at large
>
> *Packaging*
> - overall, the packaging code is complete and is pushed to the Bigtop
> branch
> (see link below). Considering that the work has been completed about 5
> weeks
> ago and was aimed at the state of trunk back in the March, there might be
> some
> minor changes, which would require additional tweaks
> - libhdfs library code (if already included into HAWQ project) might
> require
> additional changes to the packaging code, so the library can be produces
> and
> properly set in the installation phase
> - Bigtop CI has jobs to create CentOS and Ubuntu packages (linked from the
> BIGTOP-2320 below)
>
> *Tests*
> - smoke tests need to be created (as per BIGTOP-2322), but that seems to
> be a
> minor undertaking once the rest of the work is finished
> - packaging tests are required to be integrated into Bigtop stack
> BIGTOP-2324
>
> *Deployment*
> - deployment code is completed. However, it needs to be extended to
> property
> support cluster roles and to be linked to the main {{site.pp}} recipe
> - because real-life deployment can not rely on in-house python wrappers
> using
> passwordless-ssh, the lifecycle management and initial bootstrap are done
> directly by calling into HAWQ scripts, providing such functionality. It is
> possible that some of these interfaces were updated in the last 6 weeks, so
> additional testing would be needed.
> - it should be responsibility of the HAWQ to provide a concise way of
> initializing a master, segment, and so on without a need for password-less
> ssh,
> which is suboptimal and won't be accepted by Bigtop community as it is
> breaks
> the deployment model
>
> *Toolchain*
> - toolchain code is completed in the bigtop branch. This will allow to
> build
> HAWQ in the standard Bigtop container available for the CI and 3rd party
> users
> - toolchain code needs to be rebased on top of current Bigtop master. and
> possible conflicts would have to be resolved
> - once the integration is finished, Bigtop slave images will have to be
> updated
> to enable automatic CI runs
>
>
> [1] https://issues.apache.org/jira/browse/HAWQ-706
>