You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Michael Ho <kw...@cloudera.com> on 2016/05/26 03:42:09 UTC

RFC: Remove thirdparty

Hi,

Following up on the discussion about IMPALA-3223, I'd like to send out
an email about the removal of thirdparty. In particular, the following
changes
will happen in stages. Please voice your comment before I commit to
any action.

1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
In other words, all the logic in the build script to build thirdparty
component
if $IMPALA_TOOLCHAIN is not set will be removed.

2. Remove build_thirdparty.sh

3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and
update
scripts about it.

4. Remove everything in thirdparty directory except for the following
components:
hadoop, hbase, hive, llama and sentry.

5. Update integration jenkins job to copy the snapshots of the components
above to
internal jenkins repo in addition to checking them in to github. Update
bootstrap_toolchain
to point to internal repos.

6. Remove thirdparty directory and update integration job to not check in
to git repo.

After step (3) is done, we can already push the changes of the build script
to ASF tree
and check in snapshots of hadoop, hbase, llama and sentry to S3 and
hopefully
get the build to work.


-- 
Thanks,
Michael

Re: RFC: Remove thirdparty

Posted by Michael Ho <kw...@cloudera.com>.
As mentioned in the original email, after step 3, we can start pushing
changes to the build scripts
to the ASF repos and push some released version of CDH components to S3. By
then, ASF repos
should be buildable (probably with some flags such as IMPALA_ASF_BUILD=1).

On Thu, May 26, 2016 at 11:08 AM, Jim Apple <jb...@cloudera.com> wrote:

> And , when that is done, Clouderans would be able to build the ASF
> repo, but non-Clouderans would not?
>
> On Thu, May 26, 2016 at 11:05 AM, Michael Ho <kw...@cloudera.com> wrote:
> > The jenkins job is this one:
> >
> http://sandbox.jenkins.cloudera.com/job/impala-cdh5-trunk-core-integration
> > Harrison probably knows if there are other related jobs too.
> >
> > I am thinking of using the following location to host the golden
> snapshot of
> > CDH components.
> > http://repos.jenkins.cloudera.com/impala-repos/
> >
> > Michael
> >
> > On Thu, May 26, 2016 at 8:22 AM, Jim Apple <jb...@cloudera.com> wrote:
> >>
> >> > 5. Update integration jenkins job to copy the snapshots of the
> >> > components above to
> >> > internal jenkins repo in addition to checking them in to github.
> Update
> >> > bootstrap_toolchain
> >> > to point to internal repos.
> >>
> >> Which Jenkins job(s), exactly? Which internal Jenkins repo?
> >
> >
> >
> >
> > --
> > Thanks,
> > Michael
>



-- 
Thanks,
Michael

Re: RFC: Remove thirdparty

Posted by Jim Apple <jb...@cloudera.com>.
And , when that is done, Clouderans would be able to build the ASF
repo, but non-Clouderans would not?

On Thu, May 26, 2016 at 11:05 AM, Michael Ho <kw...@cloudera.com> wrote:
> The jenkins job is this one:
> http://sandbox.jenkins.cloudera.com/job/impala-cdh5-trunk-core-integration
> Harrison probably knows if there are other related jobs too.
>
> I am thinking of using the following location to host the golden snapshot of
> CDH components.
> http://repos.jenkins.cloudera.com/impala-repos/
>
> Michael
>
> On Thu, May 26, 2016 at 8:22 AM, Jim Apple <jb...@cloudera.com> wrote:
>>
>> > 5. Update integration jenkins job to copy the snapshots of the
>> > components above to
>> > internal jenkins repo in addition to checking them in to github. Update
>> > bootstrap_toolchain
>> > to point to internal repos.
>>
>> Which Jenkins job(s), exactly? Which internal Jenkins repo?
>
>
>
>
> --
> Thanks,
> Michael

Re: RFC: Remove thirdparty

Posted by Michael Ho <kw...@cloudera.com>.
The jenkins job is this one:
http://sandbox.jenkins.cloudera.com/job/impala-cdh5-trunk-core-integration
Harrison probably knows if there are other related jobs too.

I am thinking of using the following location to host the golden snapshot
of CDH components.
http://repos.jenkins.cloudera.com/impala-repos/

Michael

On Thu, May 26, 2016 at 8:22 AM, Jim Apple <jb...@cloudera.com> wrote:

> > 5. Update integration jenkins job to copy the snapshots of the
> components above to
> > internal jenkins repo in addition to checking them in to github. Update
> bootstrap_toolchain
> > to point to internal repos.
>
> Which Jenkins job(s), exactly? Which internal Jenkins repo?
>



-- 
Thanks,
Michael

Re: RFC: Remove thirdparty

Posted by Jim Apple <jb...@cloudera.com>.
> 5. Update integration jenkins job to copy the snapshots of the components above to
> internal jenkins repo in addition to checking them in to github. Update bootstrap_toolchain
> to point to internal repos.

Which Jenkins job(s), exactly? Which internal Jenkins repo?

Re: RFC: Remove thirdparty

Posted by Todd Lipcon <to...@cloudera.com>.
On Thu, May 26, 2016 at 10:13 AM, Henry Robinson <he...@cloudera.com> wrote:

>
>
> On 26 May 2016 at 10:06, Todd Lipcon <to...@cloudera.com> wrote:
>
>> In terms of Apache policies, it's OK to require some "Impala" toolchain,
>> so long as the ability to regenerate that toolchain is public.
>>
>> For example, in Kudu, we use thirdparty tarballs hosted on S3. The actual
>> bucket is owned by Cloudera (someone has to pay for it), but the tarballs
>> are exactly the upstream source releases of the dependencies, so if someone
>> wanted to use their own copies, it could be done with a bit of work.
>>
>
> What about LLVM / GCC? Are those hosted in S3 as well for Kudu?
>

Yes, though we don't currently rebuild GCC. We do rebuild libstdcxx for the
purposes of TSAN builds.

It does make our initial build pretty long, so caching built artifacts for
different platforms would be nice, but we don't do that today.

-Todd

>
>
>>
>> I think Impala depending upon pre-built thirdparty deps is also fine, so
>> long as they can be re-built from source using publicly available scripts.
>> Making it trivial to do so isn't a strict requirement IMO -- so long as if
>> someone asked for help to do that work, they got the appropriate assistance.
>>
>> In terms of depending upon vendor packages (CDH) vs upstream releases,
>> again I think it's reasonable to continue to use the current dependencies
>> for the time being until some contributor steps forward and volunteers to
>> make some change. Projects like Apache Ambari already do this (they deploy
>> HDP) so there's precedent.
>>
>> -Todd
>>
>> On Thu, May 26, 2016 at 9:40 AM, Michael Ho <kw...@cloudera.com> wrote:
>>
>>> Also adding mentors.
>>>
>>> On Thu, May 26, 2016 at 9:37 AM, Michael Ho <kw...@cloudera.com> wrote:
>>>
>>>> I guess point number 1 is more about requiring all the thirdparty
>>>> binary for getting Impala to build
>>>> and work to be located at a location specified by the environment
>>>> variable $IMPALA_TOOLCHAIN.
>>>>
>>>> It's not strictly necessary for users to use exactly the version of
>>>> toolchain we provide. For instance,
>>>> a user can check out a copy of our native-toolchain (which is public)
>>>> and tinkle with it or they can
>>>> create their own version of IMPALA_TOOLCHAIN as long as they have all
>>>> the necessary binaries
>>>> we expect.
>>>>
>>>> The user can also feel free to create a symlink to the system library
>>>> of their choice in the
>>>> $IMPALA_TOOLCHAIN directory if they choose to do so.
>>>>
>>>> My question is more about whether we should clean up our build script
>>>> so that we expect to find
>>>> everything we need to build in $IMPALA_TOOLCHAIN ?
>>>>
>>>> Michael
>>>>
>>>> On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <tarmstrong@cloudera.com
>>>> > wrote:
>>>>
>>>>>
>>>>>
>>>>> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <kw...@cloudera.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Following up on the discussion about IMPALA-3223, I'd like to send out
>>>>>> an email about the removal of thirdparty. In particular, the
>>>>>> following changes
>>>>>> will happen in stages. Please voice your comment before I commit to
>>>>>> any action.
>>>>>>
>>>>>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
>>>>>> In other words, all the logic in the build script to build thirdparty
>>>>>> component
>>>>>> if $IMPALA_TOOLCHAIN is not set will be removed.
>>>>>>
>>>>>
>>>>> I think we probably need to make a firm decision about whether we're
>>>>> going to try to support non-toolchain builds. In the past we've said that
>>>>> it would be nice to allow building Impala with system libraries (even if we
>>>>> don't put special effort into supporting it), but I don't think we've
>>>>> committed to the idea, or committed to toolchain builds only.
>>>>>
>>>>> If we're going to support non-toolchain builds we would need some kind
>>>>> of testing to prevent it breaking all the time.
>>>>>
>>>>> It would be nice to have, but I'm not sure anyone has the
>>>>> time/motivation to do it. What do people think?
>>>>>
>>>>>
>>>>>>
>>>>>> 2. Remove build_thirdparty.sh
>>>>>>
>>>>>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain
>>>>>> and update
>>>>>> scripts about it.
>>>>>>
>>>>>
>>>>>> 4. Remove everything in thirdparty directory except for the following
>>>>>> components:
>>>>>> hadoop, hbase, hive, llama and sentry.
>>>>>>
>>>>>> 5. Update integration jenkins job to copy the snapshots of the
>>>>>> components above to
>>>>>> internal jenkins repo in addition to checking them in to github.
>>>>>> Update bootstrap_toolchain
>>>>>> to point to internal repos.
>>>>>>
>>>>>> 6. Remove thirdparty directory and update integration job to not
>>>>>> check in to git repo.
>>>>>>
>>>>>> After step (3) is done, we can already push the changes of the build
>>>>>> script to ASF tree
>>>>>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
>>>>>> hopefully
>>>>>> get the build to work.
>>>>>>
>>>>>
>>>>> We can probably test this out as we go by manually copying the
>>>>> artifacts to the impala-incubator repo. I did a test of this yesterday
>>>>> (running download_requirements and copying thirdparty) and it built ok.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Thanks,
>>>>>> Michael
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Michael
>>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Michael
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: RFC: Remove thirdparty

Posted by Henry Robinson <he...@cloudera.com>.
On 26 May 2016 at 10:06, Todd Lipcon <to...@cloudera.com> wrote:

> In terms of Apache policies, it's OK to require some "Impala" toolchain,
> so long as the ability to regenerate that toolchain is public.
>
> For example, in Kudu, we use thirdparty tarballs hosted on S3. The actual
> bucket is owned by Cloudera (someone has to pay for it), but the tarballs
> are exactly the upstream source releases of the dependencies, so if someone
> wanted to use their own copies, it could be done with a bit of work.
>

What about LLVM / GCC? Are those hosted in S3 as well for Kudu?


>
> I think Impala depending upon pre-built thirdparty deps is also fine, so
> long as they can be re-built from source using publicly available scripts.
> Making it trivial to do so isn't a strict requirement IMO -- so long as if
> someone asked for help to do that work, they got the appropriate assistance.
>
> In terms of depending upon vendor packages (CDH) vs upstream releases,
> again I think it's reasonable to continue to use the current dependencies
> for the time being until some contributor steps forward and volunteers to
> make some change. Projects like Apache Ambari already do this (they deploy
> HDP) so there's precedent.
>
> -Todd
>
> On Thu, May 26, 2016 at 9:40 AM, Michael Ho <kw...@cloudera.com> wrote:
>
>> Also adding mentors.
>>
>> On Thu, May 26, 2016 at 9:37 AM, Michael Ho <kw...@cloudera.com> wrote:
>>
>>> I guess point number 1 is more about requiring all the thirdparty binary
>>> for getting Impala to build
>>> and work to be located at a location specified by the environment
>>> variable $IMPALA_TOOLCHAIN.
>>>
>>> It's not strictly necessary for users to use exactly the version of
>>> toolchain we provide. For instance,
>>> a user can check out a copy of our native-toolchain (which is public)
>>> and tinkle with it or they can
>>> create their own version of IMPALA_TOOLCHAIN as long as they have all
>>> the necessary binaries
>>> we expect.
>>>
>>> The user can also feel free to create a symlink to the system library of
>>> their choice in the
>>> $IMPALA_TOOLCHAIN directory if they choose to do so.
>>>
>>> My question is more about whether we should clean up our build script so
>>> that we expect to find
>>> everything we need to build in $IMPALA_TOOLCHAIN ?
>>>
>>> Michael
>>>
>>> On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <ta...@cloudera.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <kw...@cloudera.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Following up on the discussion about IMPALA-3223, I'd like to send out
>>>>> an email about the removal of thirdparty. In particular, the following
>>>>> changes
>>>>> will happen in stages. Please voice your comment before I commit to
>>>>> any action.
>>>>>
>>>>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
>>>>> In other words, all the logic in the build script to build thirdparty
>>>>> component
>>>>> if $IMPALA_TOOLCHAIN is not set will be removed.
>>>>>
>>>>
>>>> I think we probably need to make a firm decision about whether we're
>>>> going to try to support non-toolchain builds. In the past we've said that
>>>> it would be nice to allow building Impala with system libraries (even if we
>>>> don't put special effort into supporting it), but I don't think we've
>>>> committed to the idea, or committed to toolchain builds only.
>>>>
>>>> If we're going to support non-toolchain builds we would need some kind
>>>> of testing to prevent it breaking all the time.
>>>>
>>>> It would be nice to have, but I'm not sure anyone has the
>>>> time/motivation to do it. What do people think?
>>>>
>>>>
>>>>>
>>>>> 2. Remove build_thirdparty.sh
>>>>>
>>>>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and
>>>>> update
>>>>> scripts about it.
>>>>>
>>>>
>>>>> 4. Remove everything in thirdparty directory except for the following
>>>>> components:
>>>>> hadoop, hbase, hive, llama and sentry.
>>>>>
>>>>> 5. Update integration jenkins job to copy the snapshots of the
>>>>> components above to
>>>>> internal jenkins repo in addition to checking them in to github.
>>>>> Update bootstrap_toolchain
>>>>> to point to internal repos.
>>>>>
>>>>> 6. Remove thirdparty directory and update integration job to not check
>>>>> in to git repo.
>>>>>
>>>>> After step (3) is done, we can already push the changes of the build
>>>>> script to ASF tree
>>>>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
>>>>> hopefully
>>>>> get the build to work.
>>>>>
>>>>
>>>> We can probably test this out as we go by manually copying the
>>>> artifacts to the impala-incubator repo. I did a test of this yesterday
>>>> (running download_requirements and copying thirdparty) and it built ok.
>>>>
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Thanks,
>>>>> Michael
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Michael
>>>
>>
>>
>>
>> --
>> Thanks,
>> Michael
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: RFC: Remove thirdparty

Posted by Todd Lipcon <to...@cloudera.com>.
In terms of Apache policies, it's OK to require some "Impala" toolchain, so
long as the ability to regenerate that toolchain is public.

For example, in Kudu, we use thirdparty tarballs hosted on S3. The actual
bucket is owned by Cloudera (someone has to pay for it), but the tarballs
are exactly the upstream source releases of the dependencies, so if someone
wanted to use their own copies, it could be done with a bit of work.

I think Impala depending upon pre-built thirdparty deps is also fine, so
long as they can be re-built from source using publicly available scripts.
Making it trivial to do so isn't a strict requirement IMO -- so long as if
someone asked for help to do that work, they got the appropriate assistance.

In terms of depending upon vendor packages (CDH) vs upstream releases,
again I think it's reasonable to continue to use the current dependencies
for the time being until some contributor steps forward and volunteers to
make some change. Projects like Apache Ambari already do this (they deploy
HDP) so there's precedent.

-Todd

On Thu, May 26, 2016 at 9:40 AM, Michael Ho <kw...@cloudera.com> wrote:

> Also adding mentors.
>
> On Thu, May 26, 2016 at 9:37 AM, Michael Ho <kw...@cloudera.com> wrote:
>
>> I guess point number 1 is more about requiring all the thirdparty binary
>> for getting Impala to build
>> and work to be located at a location specified by the environment
>> variable $IMPALA_TOOLCHAIN.
>>
>> It's not strictly necessary for users to use exactly the version of
>> toolchain we provide. For instance,
>> a user can check out a copy of our native-toolchain (which is public) and
>> tinkle with it or they can
>> create their own version of IMPALA_TOOLCHAIN as long as they have all the
>> necessary binaries
>> we expect.
>>
>> The user can also feel free to create a symlink to the system library of
>> their choice in the
>> $IMPALA_TOOLCHAIN directory if they choose to do so.
>>
>> My question is more about whether we should clean up our build script so
>> that we expect to find
>> everything we need to build in $IMPALA_TOOLCHAIN ?
>>
>> Michael
>>
>> On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <ta...@cloudera.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <kw...@cloudera.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Following up on the discussion about IMPALA-3223, I'd like to send out
>>>> an email about the removal of thirdparty. In particular, the following
>>>> changes
>>>> will happen in stages. Please voice your comment before I commit to
>>>> any action.
>>>>
>>>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
>>>> In other words, all the logic in the build script to build thirdparty
>>>> component
>>>> if $IMPALA_TOOLCHAIN is not set will be removed.
>>>>
>>>
>>> I think we probably need to make a firm decision about whether we're
>>> going to try to support non-toolchain builds. In the past we've said that
>>> it would be nice to allow building Impala with system libraries (even if we
>>> don't put special effort into supporting it), but I don't think we've
>>> committed to the idea, or committed to toolchain builds only.
>>>
>>> If we're going to support non-toolchain builds we would need some kind
>>> of testing to prevent it breaking all the time.
>>>
>>> It would be nice to have, but I'm not sure anyone has the
>>> time/motivation to do it. What do people think?
>>>
>>>
>>>>
>>>> 2. Remove build_thirdparty.sh
>>>>
>>>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and
>>>> update
>>>> scripts about it.
>>>>
>>>
>>>> 4. Remove everything in thirdparty directory except for the following
>>>> components:
>>>> hadoop, hbase, hive, llama and sentry.
>>>>
>>>> 5. Update integration jenkins job to copy the snapshots of the
>>>> components above to
>>>> internal jenkins repo in addition to checking them in to github. Update
>>>> bootstrap_toolchain
>>>> to point to internal repos.
>>>>
>>>> 6. Remove thirdparty directory and update integration job to not check
>>>> in to git repo.
>>>>
>>>> After step (3) is done, we can already push the changes of the build
>>>> script to ASF tree
>>>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
>>>> hopefully
>>>> get the build to work.
>>>>
>>>
>>> We can probably test this out as we go by manually copying the artifacts
>>> to the impala-incubator repo. I did a test of this yesterday (running
>>> download_requirements and copying thirdparty) and it built ok.
>>>
>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Michael
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks,
>> Michael
>>
>
>
>
> --
> Thanks,
> Michael
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: RFC: Remove thirdparty

Posted by Michael Ho <kw...@cloudera.com>.
Also adding mentors.

On Thu, May 26, 2016 at 9:37 AM, Michael Ho <kw...@cloudera.com> wrote:

> I guess point number 1 is more about requiring all the thirdparty binary
> for getting Impala to build
> and work to be located at a location specified by the environment variable
> $IMPALA_TOOLCHAIN.
>
> It's not strictly necessary for users to use exactly the version of
> toolchain we provide. For instance,
> a user can check out a copy of our native-toolchain (which is public) and
> tinkle with it or they can
> create their own version of IMPALA_TOOLCHAIN as long as they have all the
> necessary binaries
> we expect.
>
> The user can also feel free to create a symlink to the system library of
> their choice in the
> $IMPALA_TOOLCHAIN directory if they choose to do so.
>
> My question is more about whether we should clean up our build script so
> that we expect to find
> everything we need to build in $IMPALA_TOOLCHAIN ?
>
> Michael
>
> On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <ta...@cloudera.com>
> wrote:
>
>>
>>
>> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <kw...@cloudera.com> wrote:
>>
>>> Hi,
>>>
>>> Following up on the discussion about IMPALA-3223, I'd like to send out
>>> an email about the removal of thirdparty. In particular, the following
>>> changes
>>> will happen in stages. Please voice your comment before I commit to
>>> any action.
>>>
>>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
>>> In other words, all the logic in the build script to build thirdparty
>>> component
>>> if $IMPALA_TOOLCHAIN is not set will be removed.
>>>
>>
>> I think we probably need to make a firm decision about whether we're
>> going to try to support non-toolchain builds. In the past we've said that
>> it would be nice to allow building Impala with system libraries (even if we
>> don't put special effort into supporting it), but I don't think we've
>> committed to the idea, or committed to toolchain builds only.
>>
>> If we're going to support non-toolchain builds we would need some kind of
>> testing to prevent it breaking all the time.
>>
>> It would be nice to have, but I'm not sure anyone has the time/motivation
>> to do it. What do people think?
>>
>>
>>>
>>> 2. Remove build_thirdparty.sh
>>>
>>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and
>>> update
>>> scripts about it.
>>>
>>
>>> 4. Remove everything in thirdparty directory except for the following
>>> components:
>>> hadoop, hbase, hive, llama and sentry.
>>>
>>> 5. Update integration jenkins job to copy the snapshots of the
>>> components above to
>>> internal jenkins repo in addition to checking them in to github. Update
>>> bootstrap_toolchain
>>> to point to internal repos.
>>>
>>> 6. Remove thirdparty directory and update integration job to not check
>>> in to git repo.
>>>
>>> After step (3) is done, we can already push the changes of the build
>>> script to ASF tree
>>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
>>> hopefully
>>> get the build to work.
>>>
>>
>> We can probably test this out as we go by manually copying the artifacts
>> to the impala-incubator repo. I did a test of this yesterday (running
>> download_requirements and copying thirdparty) and it built ok.
>>
>>
>>>
>>>
>>> --
>>> Thanks,
>>> Michael
>>>
>>
>>
>
>
> --
> Thanks,
> Michael
>



-- 
Thanks,
Michael

Re: RFC: Remove thirdparty

Posted by Michael Ho <kw...@cloudera.com>.
I guess point number 1 is more about requiring all the thirdparty binary
for getting Impala to build
and work to be located at a location specified by the environment variable
$IMPALA_TOOLCHAIN.

It's not strictly necessary for users to use exactly the version of
toolchain we provide. For instance,
a user can check out a copy of our native-toolchain (which is public) and
tinkle with it or they can
create their own version of IMPALA_TOOLCHAIN as long as they have all the
necessary binaries
we expect.

The user can also feel free to create a symlink to the system library of
their choice in the
$IMPALA_TOOLCHAIN directory if they choose to do so.

My question is more about whether we should clean up our build script so
that we expect to find
everything we need to build in $IMPALA_TOOLCHAIN ?

Michael

On Thu, May 26, 2016 at 8:53 AM, Tim Armstrong <ta...@cloudera.com>
wrote:

>
>
> On Wed, May 25, 2016 at 8:42 PM, Michael Ho <kw...@cloudera.com> wrote:
>
>> Hi,
>>
>> Following up on the discussion about IMPALA-3223, I'd like to send out
>> an email about the removal of thirdparty. In particular, the following
>> changes
>> will happen in stages. Please voice your comment before I commit to
>> any action.
>>
>> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
>> In other words, all the logic in the build script to build thirdparty
>> component
>> if $IMPALA_TOOLCHAIN is not set will be removed.
>>
>
> I think we probably need to make a firm decision about whether we're going
> to try to support non-toolchain builds. In the past we've said that it
> would be nice to allow building Impala with system libraries (even if we
> don't put special effort into supporting it), but I don't think we've
> committed to the idea, or committed to toolchain builds only.
>
> If we're going to support non-toolchain builds we would need some kind of
> testing to prevent it breaking all the time.
>
> It would be nice to have, but I'm not sure anyone has the time/motivation
> to do it. What do people think?
>
>
>>
>> 2. Remove build_thirdparty.sh
>>
>> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and
>> update
>> scripts about it.
>>
>
>> 4. Remove everything in thirdparty directory except for the following
>> components:
>> hadoop, hbase, hive, llama and sentry.
>>
>> 5. Update integration jenkins job to copy the snapshots of the components
>> above to
>> internal jenkins repo in addition to checking them in to github. Update
>> bootstrap_toolchain
>> to point to internal repos.
>>
>> 6. Remove thirdparty directory and update integration job to not check in
>> to git repo.
>>
>> After step (3) is done, we can already push the changes of the build
>> script to ASF tree
>> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
>> hopefully
>> get the build to work.
>>
>
> We can probably test this out as we go by manually copying the artifacts
> to the impala-incubator repo. I did a test of this yesterday (running
> download_requirements and copying thirdparty) and it built ok.
>
>
>>
>>
>> --
>> Thanks,
>> Michael
>>
>
>


-- 
Thanks,
Michael

Re: RFC: Remove thirdparty

Posted by Henry Robinson <he...@cloudera.com>.
(Actually adding mentors this time)

On 26 May 2016 at 09:19, Henry Robinson <he...@apache.org> wrote:

> (+Impala's podling mentors for advice)
>
>
> On 26 May 2016 at 08:57, Jim Apple <jb...@cloudera.com> wrote:
>
>> > I think we probably need to make a firm decision about whether we're
>> going
>> > to try to support non-toolchain builds. In the past we've said that it
>> would
>> > be nice to allow building Impala with system libraries (even if we
>> don't put
>> > special effort into supporting it), but I don't think we've committed
>> to the
>> > idea, or committed to toolchain builds only.
>> >
>> > If we're going to support non-toolchain builds we would need some kind
>> of
>> > testing to prevent it breaking all the time.
>> >
>> > It would be nice to have, but I'm not sure anyone has the
>> time/motivation to
>> > do it. What do people think?
>>
>> I agree that it would be nice to support non-toolchain builds, and I
>> agree that we don't have the time for this right now.
>>
>> I would call this a lower priority than most of the other ASF infra
>> transition work.
>>
>
> Is it (or will it be) possible to build Impala without downloading source
> or binary packages from Cloudera's managed S3 bucket? Is the situation
> different at all for link-time dependencies compared to system tools like
> gcc? Both of these are managed through the toolchain.
>
> My concern is that people might balk at being forced to use compiler
> binaries from a non-ASF source, and that if they want to at least verify
> for themselves that the compiler binaries are built from a clean source
> tarball they have to rebuild the toolchain themselves, which takes hours.
> Looking at this from the perspective of a fresh user it's not very
> user-friendly to say you can't use the system compiler that you already
> have installed from a trusted source. However, if it's easy to override the
> compiler location in the toolchain, that point is moot.
>
> We should ask the podling mentors for guidance once the technical details
> are clear.
>
>
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: RFC: Remove thirdparty

Posted by Henry Robinson <he...@apache.org>.
(+Impala's podling mentors for advice)

On 26 May 2016 at 08:57, Jim Apple <jb...@cloudera.com> wrote:

> > I think we probably need to make a firm decision about whether we're
> going
> > to try to support non-toolchain builds. In the past we've said that it
> would
> > be nice to allow building Impala with system libraries (even if we don't
> put
> > special effort into supporting it), but I don't think we've committed to
> the
> > idea, or committed to toolchain builds only.
> >
> > If we're going to support non-toolchain builds we would need some kind of
> > testing to prevent it breaking all the time.
> >
> > It would be nice to have, but I'm not sure anyone has the
> time/motivation to
> > do it. What do people think?
>
> I agree that it would be nice to support non-toolchain builds, and I
> agree that we don't have the time for this right now.
>
> I would call this a lower priority than most of the other ASF infra
> transition work.
>

Is it (or will it be) possible to build Impala without downloading source
or binary packages from Cloudera's managed S3 bucket? Is the situation
different at all for link-time dependencies compared to system tools like
gcc? Both of these are managed through the toolchain.

My concern is that people might balk at being forced to use compiler
binaries from a non-ASF source, and that if they want to at least verify
for themselves that the compiler binaries are built from a clean source
tarball they have to rebuild the toolchain themselves, which takes hours.
Looking at this from the perspective of a fresh user it's not very
user-friendly to say you can't use the system compiler that you already
have installed from a trusted source. However, if it's easy to override the
compiler location in the toolchain, that point is moot.

We should ask the podling mentors for guidance once the technical details
are clear.

Re: RFC: Remove thirdparty

Posted by Jim Apple <jb...@cloudera.com>.
> I think we probably need to make a firm decision about whether we're going
> to try to support non-toolchain builds. In the past we've said that it would
> be nice to allow building Impala with system libraries (even if we don't put
> special effort into supporting it), but I don't think we've committed to the
> idea, or committed to toolchain builds only.
>
> If we're going to support non-toolchain builds we would need some kind of
> testing to prevent it breaking all the time.
>
> It would be nice to have, but I'm not sure anyone has the time/motivation to
> do it. What do people think?

I agree that it would be nice to support non-toolchain builds, and I
agree that we don't have the time for this right now.

I would call this a lower priority than most of the other ASF infra
transition work.

Re: RFC: Remove thirdparty

Posted by Tim Armstrong <ta...@cloudera.com>.
On Wed, May 25, 2016 at 8:42 PM, Michael Ho <kw...@cloudera.com> wrote:

> Hi,
>
> Following up on the discussion about IMPALA-3223, I'd like to send out
> an email about the removal of thirdparty. In particular, the following
> changes
> will happen in stages. Please voice your comment before I commit to
> any action.
>
> 1. Requires $IMPALA_TOOLCHAIN to be set in order to build Impala.
> In other words, all the logic in the build script to build thirdparty
> component
> if $IMPALA_TOOLCHAIN is not set will be removed.
>

I think we probably need to make a firm decision about whether we're going
to try to support non-toolchain builds. In the past we've said that it
would be nice to allow building Impala with system libraries (even if we
don't put special effort into supporting it), but I don't think we've
committed to the idea, or committed to toolchain builds only.

If we're going to support non-toolchain builds we would need some kind of
testing to prevent it breaking all the time.

It would be nice to have, but I'm not sure anyone has the time/motivation
to do it. What do people think?


>
> 2. Remove build_thirdparty.sh
>
> 3. Move postgressql-jdbc and may be llama-minikdc (?) to toolchain and
> update
> scripts about it.
>

> 4. Remove everything in thirdparty directory except for the following
> components:
> hadoop, hbase, hive, llama and sentry.
>
> 5. Update integration jenkins job to copy the snapshots of the components
> above to
> internal jenkins repo in addition to checking them in to github. Update
> bootstrap_toolchain
> to point to internal repos.
>
> 6. Remove thirdparty directory and update integration job to not check in
> to git repo.
>
> After step (3) is done, we can already push the changes of the build
> script to ASF tree
> and check in snapshots of hadoop, hbase, llama and sentry to S3 and
> hopefully
> get the build to work.
>

We can probably test this out as we go by manually copying the artifacts to
the impala-incubator repo. I did a test of this yesterday (running
download_requirements and copying thirdparty) and it built ok.


>
>
> --
> Thanks,
> Michael
>