You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Henry Robinson <he...@apache.org> on 2017/02/26 06:22:43 UTC

Toolchain - versioning dependencies with the same version number

As written, the toolchain can't apparently deal with the possibility of
build flags changing, but a dependency version remaining the same.

LZ4 has never (afaict) been built with optimization enabled. I have a
commit that enables -O3, but that continues to produce artifacts for
lz4-1.7.5 with no version change. This is a problem because bootstrapping
the toolchain will fail to pick up the new binaries - because the
previously downloaded version is still in the local cache, and won't be
overwritten because of the version change.

I think the simplest way to fix this is to write the toolchain build ID to
the dependency version file (that's in the local cache only) when it's
downloaded. If that ID changes, the dependency will be re-downloaded.

This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID will
invalidate all dependencies, and bin/bootstrap_toolchain.py will
re-download all of them. My feeling is that that cost is better than trying
to individually determine whether a dependency has changed between
toolchain builds.

Any thoughts on whether this is the right way to go?

Henry

Re: Toolchain - versioning dependencies with the same version number

Posted by Henry Robinson <he...@cloudera.com>.
On 28 February 2017 at 12:57, Marcel Kornacker <ma...@cloudera.com> wrote:

> Yes, I too am particularly concerned about maintaining the ability to
> build offline, and downloading the same things over and over again.
>
> I don't quite understand the case against versioning - if gc'ing
> obsolete versions in order to reduce storage space is a concern, then
> it's probably fine to a) blow away and re-download everything, or b)
> throw away old versions manually, if you happen to be in a situation
> where a) isn't possible.
>

The issue I have with versioning is that there's no way to understand the
link between the version number, and what actually changed. It's a kludge
to deal with the fact that the toolchain can't handle this kind of
situation.

That said, my immediate goal is to make sure that everyone picks up the new
LZ4 build. So I'll add a new version for now, and we can revisit this some
other time.


>
> On Tue, Feb 28, 2017 at 12:20 PM, Tim Armstrong <ta...@cloudera.com>
> wrote:
> > I agree it's not too bad if you have a fat pipe to S3, but it's a pretty
> > bad regression in usability to make it the default and particularly
> provide
> > no way to opt out.
> >
> > The toolchain is almost 1GB though, which is pretty problematic to
> download
> > if a developer is on coffee-shop wifi, cellular wireless, airplane wifi,
> > etc. It'd also be pretty easy for a developer working offline to switch
> > branches, run buildall.sh, have gcc, etc, automatically deleted and then
> be
> > stuck unable to build anything.
> >
> >
> > On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson <he...@apache.org>
> wrote:
> >
> >> I'd prefer not to do that because it's something of a hack and generates
> >> too many artifacts if we make incremental build changes, not to mention
> the
> >> extra complexity required to make such a change because new tarballs
> might
> >> need to be uploaded.
> >>
> >>
> >>
> >>
> >> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker <lv...@cloudera.com> wrote:
> >>
> >> > Can we add another version string component like -1 or -impala1, or
> add a
> >> > dummy patch to the affected packages to allow for new versions with
> the
> >> > same upstream version? I think this is what Linux distributions
> commonly
> >> do
> >> > to have several versions of the same upstream version.
> >> >
> >> > On Feb 27, 2017 21:15, "Henry Robinson" <he...@cloudera.com> wrote:
> >> >
> >> > Yes, it would force re-downloading. At my office, downloading a
> toolchain
> >> > takes a matter of a few seconds, so I'm not sure the cost is that
> great.
> >> > And if it turned out to be problematic, one could always change the
> >> > toolchain directory for different branches. Having something locally
> that
> >> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/
> >> would
> >> > work.
> >> >
> >> > However I wouldn't want to force behaviour that into the toolchain
> >> scripts
> >> > because of the need for garbage collection it would raise - it
> wouldn't
> >> be
> >> > clear when to delete old toolchains programatically.
> >> >
> >> > On 27 February 2017 at 20:51, Tim Armstrong <ta...@cloudera.com>
> >> > wrote:
> >> >
> >> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading
> of
> >> the
> >> > > entire toolchain every time a developer switches between branches
> with
> >> > > different build IDs?
> >> > >
> >> > > I know some developers do that frequently, e.g. to try and reproduce
> >> bugs
> >> > > on older versions or backport patches.
> >> > >
> >> > > I agree it would be good to fix this, since I've run into this
> problem
> >> > > before, I'm just not quite sure what the best solution is. In the
> other
> >> > > case where I had this issue with LLVM I changed the version number
> (by
> >> > > appending noasserts-) to it, but that's really just a hack.
> >> > >
> >> > > -Tim
> >> > >
> >> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <henry@cloudera.com
> >
> >> > > wrote:
> >> > >
> >> > > > As Matt said, I have a patch that implements build ID-based
> >> versioning
> >> > at
> >> > > > https://gerrit.cloudera.org/#/c/6166/2.
> >> > > >
> >> > > > Does anyone want to take a look? If we could get this in soon it
> >> would
> >> > > help
> >> > > > smooth over the LZ4 change which is going in shortly.
> >> > > >
> >> > > > On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com>
> >> > wrote:
> >> > > >
> >> > > > > I agree that that might be useful, and that it's a separately
> >> > > addressable
> >> > > > > problem.
> >> > > > >
> >> > > > > On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com>
> >> > wrote:
> >> > > > >
> >> > > > >> Just catching up to this e-mail, though I had seen your code
> >> reviews
> >> > > > >> and I think this approach makes sense. An additional concern
> would
> >> > be
> >> > > > >> how to identify how a toolchain package was built, and AFAIK
> this
> >> is
> >> > > > >> tricky now if only the 'toolchain ID' is known. Before I saw
> this
> >> > > > >> e-mail I was thinking about this problem (which I think we can
> >> > address
> >> > > > >> separately), and that we might want to write the
> native-toolchain
> >> > git
> >> > > > >> hash with every toolchain build so that the exact build scripts
> >> are
> >> > > > >> associated with those build artifacts. I filed
> >> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this
> related
> >> > > > >> problem.
> >> > > > >>
> >> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <
> >> henry@apache.org>
> >> > > > >> wrote:
> >> > > > >> > As written, the toolchain can't apparently deal with the
> >> > possibility
> >> > > > of
> >> > > > >> > build flags changing, but a dependency version remaining the
> >> same.
> >> > > > >> >
> >> > > > >> > LZ4 has never (afaict) been built with optimization enabled.
> I
> >> > have
> >> > > a
> >> > > > >> > commit that enables -O3, but that continues to produce
> artifacts
> >> > for
> >> > > > >> > lz4-1.7.5 with no version change. This is a problem because
> >> > > > >> bootstrapping
> >> > > > >> > the toolchain will fail to pick up the new binaries - because
> >> the
> >> > > > >> > previously downloaded version is still in the local cache,
> and
> >> > won't
> >> > > > be
> >> > > > >> > overwritten because of the version change.
> >> > > > >> >
> >> > > > >> > I think the simplest way to fix this is to write the
> toolchain
> >> > build
> >> > > > ID
> >> > > > >> to
> >> > > > >> > the dependency version file (that's in the local cache only)
> >> when
> >> > > it's
> >> > > > >> > downloaded. If that ID changes, the dependency will be
> >> > > re-downloaded.
> >> > > > >> >
> >> > > > >> > This has the disadvantage that any bump in
> >> > IMPALA_TOOLCHAIN_BUILD_ID
> >> > > > >> will
> >> > > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py
> will
> >> > > > >> > re-download all of them. My feeling is that that cost is
> better
> >> > than
> >> > > > >> trying
> >> > > > >> > to individually determine whether a dependency has changed
> >> between
> >> > > > >> > toolchain builds.
> >> > > > >> >
> >> > > > >> > Any thoughts on whether this is the right way to go?
> >> > > > >> >
> >> > > > >> > Henry
> >> > > > >>
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Henry Robinson
> >> > > > > Software Engineer
> >> > > > > Cloudera
> >> > > > > 415-994-6679 <(415)%20994-6679>
> >> > > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Henry Robinson
> >> > > > Software Engineer
> >> > > > Cloudera
> >> > > > 415-994-6679 <(415)%20994-6679> <(415)%20994-6679>
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Henry Robinson
> >> > Software Engineer
> >> > Cloudera
> >> > 415-994-6679 <(415)%20994-6679> <(415)%20994-6679> <(415)%20994-6679>
> >> >
> >>
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679 <(415)%20994-6679>

Re: Toolchain - versioning dependencies with the same version number

Posted by Marcel Kornacker <ma...@cloudera.com>.
Yes, I too am particularly concerned about maintaining the ability to
build offline, and downloading the same things over and over again.

I don't quite understand the case against versioning - if gc'ing
obsolete versions in order to reduce storage space is a concern, then
it's probably fine to a) blow away and re-download everything, or b)
throw away old versions manually, if you happen to be in a situation
where a) isn't possible.

On Tue, Feb 28, 2017 at 12:20 PM, Tim Armstrong <ta...@cloudera.com> wrote:
> I agree it's not too bad if you have a fat pipe to S3, but it's a pretty
> bad regression in usability to make it the default and particularly provide
> no way to opt out.
>
> The toolchain is almost 1GB though, which is pretty problematic to download
> if a developer is on coffee-shop wifi, cellular wireless, airplane wifi,
> etc. It'd also be pretty easy for a developer working offline to switch
> branches, run buildall.sh, have gcc, etc, automatically deleted and then be
> stuck unable to build anything.
>
>
> On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson <he...@apache.org> wrote:
>
>> I'd prefer not to do that because it's something of a hack and generates
>> too many artifacts if we make incremental build changes, not to mention the
>> extra complexity required to make such a change because new tarballs might
>> need to be uploaded.
>>
>>
>>
>>
>> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker <lv...@cloudera.com> wrote:
>>
>> > Can we add another version string component like -1 or -impala1, or add a
>> > dummy patch to the affected packages to allow for new versions with the
>> > same upstream version? I think this is what Linux distributions commonly
>> do
>> > to have several versions of the same upstream version.
>> >
>> > On Feb 27, 2017 21:15, "Henry Robinson" <he...@cloudera.com> wrote:
>> >
>> > Yes, it would force re-downloading. At my office, downloading a toolchain
>> > takes a matter of a few seconds, so I'm not sure the cost is that great.
>> > And if it turned out to be problematic, one could always change the
>> > toolchain directory for different branches. Having something locally that
>> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/
>> would
>> > work.
>> >
>> > However I wouldn't want to force behaviour that into the toolchain
>> scripts
>> > because of the need for garbage collection it would raise - it wouldn't
>> be
>> > clear when to delete old toolchains programatically.
>> >
>> > On 27 February 2017 at 20:51, Tim Armstrong <ta...@cloudera.com>
>> > wrote:
>> >
>> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of
>> the
>> > > entire toolchain every time a developer switches between branches with
>> > > different build IDs?
>> > >
>> > > I know some developers do that frequently, e.g. to try and reproduce
>> bugs
>> > > on older versions or backport patches.
>> > >
>> > > I agree it would be good to fix this, since I've run into this problem
>> > > before, I'm just not quite sure what the best solution is. In the other
>> > > case where I had this issue with LLVM I changed the version number (by
>> > > appending noasserts-) to it, but that's really just a hack.
>> > >
>> > > -Tim
>> > >
>> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <he...@cloudera.com>
>> > > wrote:
>> > >
>> > > > As Matt said, I have a patch that implements build ID-based
>> versioning
>> > at
>> > > > https://gerrit.cloudera.org/#/c/6166/2.
>> > > >
>> > > > Does anyone want to take a look? If we could get this in soon it
>> would
>> > > help
>> > > > smooth over the LZ4 change which is going in shortly.
>> > > >
>> > > > On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com>
>> > wrote:
>> > > >
>> > > > > I agree that that might be useful, and that it's a separately
>> > > addressable
>> > > > > problem.
>> > > > >
>> > > > > On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com>
>> > wrote:
>> > > > >
>> > > > >> Just catching up to this e-mail, though I had seen your code
>> reviews
>> > > > >> and I think this approach makes sense. An additional concern would
>> > be
>> > > > >> how to identify how a toolchain package was built, and AFAIK this
>> is
>> > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this
>> > > > >> e-mail I was thinking about this problem (which I think we can
>> > address
>> > > > >> separately), and that we might want to write the native-toolchain
>> > git
>> > > > >> hash with every toolchain build so that the exact build scripts
>> are
>> > > > >> associated with those build artifacts. I filed
>> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
>> > > > >> problem.
>> > > > >>
>> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <
>> henry@apache.org>
>> > > > >> wrote:
>> > > > >> > As written, the toolchain can't apparently deal with the
>> > possibility
>> > > > of
>> > > > >> > build flags changing, but a dependency version remaining the
>> same.
>> > > > >> >
>> > > > >> > LZ4 has never (afaict) been built with optimization enabled. I
>> > have
>> > > a
>> > > > >> > commit that enables -O3, but that continues to produce artifacts
>> > for
>> > > > >> > lz4-1.7.5 with no version change. This is a problem because
>> > > > >> bootstrapping
>> > > > >> > the toolchain will fail to pick up the new binaries - because
>> the
>> > > > >> > previously downloaded version is still in the local cache, and
>> > won't
>> > > > be
>> > > > >> > overwritten because of the version change.
>> > > > >> >
>> > > > >> > I think the simplest way to fix this is to write the toolchain
>> > build
>> > > > ID
>> > > > >> to
>> > > > >> > the dependency version file (that's in the local cache only)
>> when
>> > > it's
>> > > > >> > downloaded. If that ID changes, the dependency will be
>> > > re-downloaded.
>> > > > >> >
>> > > > >> > This has the disadvantage that any bump in
>> > IMPALA_TOOLCHAIN_BUILD_ID
>> > > > >> will
>> > > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
>> > > > >> > re-download all of them. My feeling is that that cost is better
>> > than
>> > > > >> trying
>> > > > >> > to individually determine whether a dependency has changed
>> between
>> > > > >> > toolchain builds.
>> > > > >> >
>> > > > >> > Any thoughts on whether this is the right way to go?
>> > > > >> >
>> > > > >> > Henry
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Henry Robinson
>> > > > > Software Engineer
>> > > > > Cloudera
>> > > > > 415-994-6679 <(415)%20994-6679>
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Henry Robinson
>> > > > Software Engineer
>> > > > Cloudera
>> > > > 415-994-6679 <(415)%20994-6679>
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Henry Robinson
>> > Software Engineer
>> > Cloudera
>> > 415-994-6679 <(415)%20994-6679> <(415)%20994-6679>
>> >
>>

Re: Toolchain - versioning dependencies with the same version number

Posted by Tim Armstrong <ta...@cloudera.com>.
I agree it's not too bad if you have a fat pipe to S3, but it's a pretty
bad regression in usability to make it the default and particularly provide
no way to opt out.

The toolchain is almost 1GB though, which is pretty problematic to download
if a developer is on coffee-shop wifi, cellular wireless, airplane wifi,
etc. It'd also be pretty easy for a developer working offline to switch
branches, run buildall.sh, have gcc, etc, automatically deleted and then be
stuck unable to build anything.


On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson <he...@apache.org> wrote:

> I'd prefer not to do that because it's something of a hack and generates
> too many artifacts if we make incremental build changes, not to mention the
> extra complexity required to make such a change because new tarballs might
> need to be uploaded.
>
>
>
>
> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker <lv...@cloudera.com> wrote:
>
> > Can we add another version string component like -1 or -impala1, or add a
> > dummy patch to the affected packages to allow for new versions with the
> > same upstream version? I think this is what Linux distributions commonly
> do
> > to have several versions of the same upstream version.
> >
> > On Feb 27, 2017 21:15, "Henry Robinson" <he...@cloudera.com> wrote:
> >
> > Yes, it would force re-downloading. At my office, downloading a toolchain
> > takes a matter of a few seconds, so I'm not sure the cost is that great.
> > And if it turned out to be problematic, one could always change the
> > toolchain directory for different branches. Having something locally that
> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/
> would
> > work.
> >
> > However I wouldn't want to force behaviour that into the toolchain
> scripts
> > because of the need for garbage collection it would raise - it wouldn't
> be
> > clear when to delete old toolchains programatically.
> >
> > On 27 February 2017 at 20:51, Tim Armstrong <ta...@cloudera.com>
> > wrote:
> >
> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of
> the
> > > entire toolchain every time a developer switches between branches with
> > > different build IDs?
> > >
> > > I know some developers do that frequently, e.g. to try and reproduce
> bugs
> > > on older versions or backport patches.
> > >
> > > I agree it would be good to fix this, since I've run into this problem
> > > before, I'm just not quite sure what the best solution is. In the other
> > > case where I had this issue with LLVM I changed the version number (by
> > > appending noasserts-) to it, but that's really just a hack.
> > >
> > > -Tim
> > >
> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <he...@cloudera.com>
> > > wrote:
> > >
> > > > As Matt said, I have a patch that implements build ID-based
> versioning
> > at
> > > > https://gerrit.cloudera.org/#/c/6166/2.
> > > >
> > > > Does anyone want to take a look? If we could get this in soon it
> would
> > > help
> > > > smooth over the LZ4 change which is going in shortly.
> > > >
> > > > On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com>
> > wrote:
> > > >
> > > > > I agree that that might be useful, and that it's a separately
> > > addressable
> > > > > problem.
> > > > >
> > > > > On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com>
> > wrote:
> > > > >
> > > > >> Just catching up to this e-mail, though I had seen your code
> reviews
> > > > >> and I think this approach makes sense. An additional concern would
> > be
> > > > >> how to identify how a toolchain package was built, and AFAIK this
> is
> > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this
> > > > >> e-mail I was thinking about this problem (which I think we can
> > address
> > > > >> separately), and that we might want to write the native-toolchain
> > git
> > > > >> hash with every toolchain build so that the exact build scripts
> are
> > > > >> associated with those build artifacts. I filed
> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> > > > >> problem.
> > > > >>
> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <
> henry@apache.org>
> > > > >> wrote:
> > > > >> > As written, the toolchain can't apparently deal with the
> > possibility
> > > > of
> > > > >> > build flags changing, but a dependency version remaining the
> same.
> > > > >> >
> > > > >> > LZ4 has never (afaict) been built with optimization enabled. I
> > have
> > > a
> > > > >> > commit that enables -O3, but that continues to produce artifacts
> > for
> > > > >> > lz4-1.7.5 with no version change. This is a problem because
> > > > >> bootstrapping
> > > > >> > the toolchain will fail to pick up the new binaries - because
> the
> > > > >> > previously downloaded version is still in the local cache, and
> > won't
> > > > be
> > > > >> > overwritten because of the version change.
> > > > >> >
> > > > >> > I think the simplest way to fix this is to write the toolchain
> > build
> > > > ID
> > > > >> to
> > > > >> > the dependency version file (that's in the local cache only)
> when
> > > it's
> > > > >> > downloaded. If that ID changes, the dependency will be
> > > re-downloaded.
> > > > >> >
> > > > >> > This has the disadvantage that any bump in
> > IMPALA_TOOLCHAIN_BUILD_ID
> > > > >> will
> > > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> > > > >> > re-download all of them. My feeling is that that cost is better
> > than
> > > > >> trying
> > > > >> > to individually determine whether a dependency has changed
> between
> > > > >> > toolchain builds.
> > > > >> >
> > > > >> > Any thoughts on whether this is the right way to go?
> > > > >> >
> > > > >> > Henry
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Henry Robinson
> > > > > Software Engineer
> > > > > Cloudera
> > > > > 415-994-6679 <(415)%20994-6679>
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Henry Robinson
> > > > Software Engineer
> > > > Cloudera
> > > > 415-994-6679 <(415)%20994-6679>
> > > >
> > >
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679 <(415)%20994-6679> <(415)%20994-6679>
> >
>

Re: Toolchain - versioning dependencies with the same version number

Posted by Henry Robinson <he...@apache.org>.
I'd prefer not to do that because it's something of a hack and generates
too many artifacts if we make incremental build changes, not to mention the
extra complexity required to make such a change because new tarballs might
need to be uploaded.




On Tue, Feb 28, 2017 at 8:55 AM Lars Volker <lv...@cloudera.com> wrote:

> Can we add another version string component like -1 or -impala1, or add a
> dummy patch to the affected packages to allow for new versions with the
> same upstream version? I think this is what Linux distributions commonly do
> to have several versions of the same upstream version.
>
> On Feb 27, 2017 21:15, "Henry Robinson" <he...@cloudera.com> wrote:
>
> Yes, it would force re-downloading. At my office, downloading a toolchain
> takes a matter of a few seconds, so I'm not sure the cost is that great.
> And if it turned out to be problematic, one could always change the
> toolchain directory for different branches. Having something locally that
> set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ would
> work.
>
> However I wouldn't want to force behaviour that into the toolchain scripts
> because of the need for garbage collection it would raise - it wouldn't be
> clear when to delete old toolchains programatically.
>
> On 27 February 2017 at 20:51, Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > Maybe I'm misunderstanding, but wouldn't that force re-downloading of the
> > entire toolchain every time a developer switches between branches with
> > different build IDs?
> >
> > I know some developers do that frequently, e.g. to try and reproduce bugs
> > on older versions or backport patches.
> >
> > I agree it would be good to fix this, since I've run into this problem
> > before, I'm just not quite sure what the best solution is. In the other
> > case where I had this issue with LLVM I changed the version number (by
> > appending noasserts-) to it, but that's really just a hack.
> >
> > -Tim
> >
> > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <he...@cloudera.com>
> > wrote:
> >
> > > As Matt said, I have a patch that implements build ID-based versioning
> at
> > > https://gerrit.cloudera.org/#/c/6166/2.
> > >
> > > Does anyone want to take a look? If we could get this in soon it would
> > help
> > > smooth over the LZ4 change which is going in shortly.
> > >
> > > On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com>
> wrote:
> > >
> > > > I agree that that might be useful, and that it's a separately
> > addressable
> > > > problem.
> > > >
> > > > On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com>
> wrote:
> > > >
> > > >> Just catching up to this e-mail, though I had seen your code reviews
> > > >> and I think this approach makes sense. An additional concern would
> be
> > > >> how to identify how a toolchain package was built, and AFAIK this is
> > > >> tricky now if only the 'toolchain ID' is known. Before I saw this
> > > >> e-mail I was thinking about this problem (which I think we can
> address
> > > >> separately), and that we might want to write the native-toolchain
> git
> > > >> hash with every toolchain build so that the exact build scripts are
> > > >> associated with those build artifacts. I filed
> > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> > > >> problem.
> > > >>
> > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <he...@apache.org>
> > > >> wrote:
> > > >> > As written, the toolchain can't apparently deal with the
> possibility
> > > of
> > > >> > build flags changing, but a dependency version remaining the same.
> > > >> >
> > > >> > LZ4 has never (afaict) been built with optimization enabled. I
> have
> > a
> > > >> > commit that enables -O3, but that continues to produce artifacts
> for
> > > >> > lz4-1.7.5 with no version change. This is a problem because
> > > >> bootstrapping
> > > >> > the toolchain will fail to pick up the new binaries - because the
> > > >> > previously downloaded version is still in the local cache, and
> won't
> > > be
> > > >> > overwritten because of the version change.
> > > >> >
> > > >> > I think the simplest way to fix this is to write the toolchain
> build
> > > ID
> > > >> to
> > > >> > the dependency version file (that's in the local cache only) when
> > it's
> > > >> > downloaded. If that ID changes, the dependency will be
> > re-downloaded.
> > > >> >
> > > >> > This has the disadvantage that any bump in
> IMPALA_TOOLCHAIN_BUILD_ID
> > > >> will
> > > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> > > >> > re-download all of them. My feeling is that that cost is better
> than
> > > >> trying
> > > >> > to individually determine whether a dependency has changed between
> > > >> > toolchain builds.
> > > >> >
> > > >> > Any thoughts on whether this is the right way to go?
> > > >> >
> > > >> > Henry
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Henry Robinson
> > > > Software Engineer
> > > > Cloudera
> > > > 415-994-6679 <(415)%20994-6679>
> > > >
> > >
> > >
> > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679
> > >
> >
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679 <(415)%20994-6679>
>

Re: Toolchain - versioning dependencies with the same version number

Posted by Lars Volker <lv...@cloudera.com>.
Can we add another version string component like -1 or -impala1, or add a
dummy patch to the affected packages to allow for new versions with the
same upstream version? I think this is what Linux distributions commonly do
to have several versions of the same upstream version.

On Feb 27, 2017 21:15, "Henry Robinson" <he...@cloudera.com> wrote:

Yes, it would force re-downloading. At my office, downloading a toolchain
takes a matter of a few seconds, so I'm not sure the cost is that great.
And if it turned out to be problematic, one could always change the
toolchain directory for different branches. Having something locally that
set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ would
work.

However I wouldn't want to force behaviour that into the toolchain scripts
because of the need for garbage collection it would raise - it wouldn't be
clear when to delete old toolchains programatically.

On 27 February 2017 at 20:51, Tim Armstrong <ta...@cloudera.com> wrote:

> Maybe I'm misunderstanding, but wouldn't that force re-downloading of the
> entire toolchain every time a developer switches between branches with
> different build IDs?
>
> I know some developers do that frequently, e.g. to try and reproduce bugs
> on older versions or backport patches.
>
> I agree it would be good to fix this, since I've run into this problem
> before, I'm just not quite sure what the best solution is. In the other
> case where I had this issue with LLVM I changed the version number (by
> appending noasserts-) to it, but that's really just a hack.
>
> -Tim
>
> On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <he...@cloudera.com>
> wrote:
>
> > As Matt said, I have a patch that implements build ID-based versioning
at
> > https://gerrit.cloudera.org/#/c/6166/2.
> >
> > Does anyone want to take a look? If we could get this in soon it would
> help
> > smooth over the LZ4 change which is going in shortly.
> >
> > On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com> wrote:
> >
> > > I agree that that might be useful, and that it's a separately
> addressable
> > > problem.
> > >
> > > On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com> wrote:
> > >
> > >> Just catching up to this e-mail, though I had seen your code reviews
> > >> and I think this approach makes sense. An additional concern would be
> > >> how to identify how a toolchain package was built, and AFAIK this is
> > >> tricky now if only the 'toolchain ID' is known. Before I saw this
> > >> e-mail I was thinking about this problem (which I think we can
address
> > >> separately), and that we might want to write the native-toolchain git
> > >> hash with every toolchain build so that the exact build scripts are
> > >> associated with those build artifacts. I filed
> > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> > >> problem.
> > >>
> > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <he...@apache.org>
> > >> wrote:
> > >> > As written, the toolchain can't apparently deal with the
possibility
> > of
> > >> > build flags changing, but a dependency version remaining the same.
> > >> >
> > >> > LZ4 has never (afaict) been built with optimization enabled. I have
> a
> > >> > commit that enables -O3, but that continues to produce artifacts
for
> > >> > lz4-1.7.5 with no version change. This is a problem because
> > >> bootstrapping
> > >> > the toolchain will fail to pick up the new binaries - because the
> > >> > previously downloaded version is still in the local cache, and
won't
> > be
> > >> > overwritten because of the version change.
> > >> >
> > >> > I think the simplest way to fix this is to write the toolchain
build
> > ID
> > >> to
> > >> > the dependency version file (that's in the local cache only) when
> it's
> > >> > downloaded. If that ID changes, the dependency will be
> re-downloaded.
> > >> >
> > >> > This has the disadvantage that any bump in
IMPALA_TOOLCHAIN_BUILD_ID
> > >> will
> > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> > >> > re-download all of them. My feeling is that that cost is better
than
> > >> trying
> > >> > to individually determine whether a dependency has changed between
> > >> > toolchain builds.
> > >> >
> > >> > Any thoughts on whether this is the right way to go?
> > >> >
> > >> > Henry
> > >>
> > >
> > >
> > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679 <(415)%20994-6679>
> > >
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679
> >
>



--
Henry Robinson
Software Engineer
Cloudera
415-994-6679 <(415)%20994-6679>

Re: Toolchain - versioning dependencies with the same version number

Posted by Henry Robinson <he...@cloudera.com>.
Yes, it would force re-downloading. At my office, downloading a toolchain
takes a matter of a few seconds, so I'm not sure the cost is that great.
And if it turned out to be problematic, one could always change the
toolchain directory for different branches. Having something locally that
set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/ would
work.

However I wouldn't want to force behaviour that into the toolchain scripts
because of the need for garbage collection it would raise - it wouldn't be
clear when to delete old toolchains programatically.

On 27 February 2017 at 20:51, Tim Armstrong <ta...@cloudera.com> wrote:

> Maybe I'm misunderstanding, but wouldn't that force re-downloading of the
> entire toolchain every time a developer switches between branches with
> different build IDs?
>
> I know some developers do that frequently, e.g. to try and reproduce bugs
> on older versions or backport patches.
>
> I agree it would be good to fix this, since I've run into this problem
> before, I'm just not quite sure what the best solution is. In the other
> case where I had this issue with LLVM I changed the version number (by
> appending noasserts-) to it, but that's really just a hack.
>
> -Tim
>
> On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <he...@cloudera.com>
> wrote:
>
> > As Matt said, I have a patch that implements build ID-based versioning at
> > https://gerrit.cloudera.org/#/c/6166/2.
> >
> > Does anyone want to take a look? If we could get this in soon it would
> help
> > smooth over the LZ4 change which is going in shortly.
> >
> > On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com> wrote:
> >
> > > I agree that that might be useful, and that it's a separately
> addressable
> > > problem.
> > >
> > > On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com> wrote:
> > >
> > >> Just catching up to this e-mail, though I had seen your code reviews
> > >> and I think this approach makes sense. An additional concern would be
> > >> how to identify how a toolchain package was built, and AFAIK this is
> > >> tricky now if only the 'toolchain ID' is known. Before I saw this
> > >> e-mail I was thinking about this problem (which I think we can address
> > >> separately), and that we might want to write the native-toolchain git
> > >> hash with every toolchain build so that the exact build scripts are
> > >> associated with those build artifacts. I filed
> > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> > >> problem.
> > >>
> > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <he...@apache.org>
> > >> wrote:
> > >> > As written, the toolchain can't apparently deal with the possibility
> > of
> > >> > build flags changing, but a dependency version remaining the same.
> > >> >
> > >> > LZ4 has never (afaict) been built with optimization enabled. I have
> a
> > >> > commit that enables -O3, but that continues to produce artifacts for
> > >> > lz4-1.7.5 with no version change. This is a problem because
> > >> bootstrapping
> > >> > the toolchain will fail to pick up the new binaries - because the
> > >> > previously downloaded version is still in the local cache, and won't
> > be
> > >> > overwritten because of the version change.
> > >> >
> > >> > I think the simplest way to fix this is to write the toolchain build
> > ID
> > >> to
> > >> > the dependency version file (that's in the local cache only) when
> it's
> > >> > downloaded. If that ID changes, the dependency will be
> re-downloaded.
> > >> >
> > >> > This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID
> > >> will
> > >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> > >> > re-download all of them. My feeling is that that cost is better than
> > >> trying
> > >> > to individually determine whether a dependency has changed between
> > >> > toolchain builds.
> > >> >
> > >> > Any thoughts on whether this is the right way to go?
> > >> >
> > >> > Henry
> > >>
> > >
> > >
> > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679 <(415)%20994-6679>
> > >
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679
> >
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679 <(415)%20994-6679>

Re: Toolchain - versioning dependencies with the same version number

Posted by Tim Armstrong <ta...@cloudera.com>.
Maybe I'm misunderstanding, but wouldn't that force re-downloading of the
entire toolchain every time a developer switches between branches with
different build IDs?

I know some developers do that frequently, e.g. to try and reproduce bugs
on older versions or backport patches.

I agree it would be good to fix this, since I've run into this problem
before, I'm just not quite sure what the best solution is. In the other
case where I had this issue with LLVM I changed the version number (by
appending noasserts-) to it, but that's really just a hack.

-Tim

On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <he...@cloudera.com> wrote:

> As Matt said, I have a patch that implements build ID-based versioning at
> https://gerrit.cloudera.org/#/c/6166/2.
>
> Does anyone want to take a look? If we could get this in soon it would help
> smooth over the LZ4 change which is going in shortly.
>
> On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com> wrote:
>
> > I agree that that might be useful, and that it's a separately addressable
> > problem.
> >
> > On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com> wrote:
> >
> >> Just catching up to this e-mail, though I had seen your code reviews
> >> and I think this approach makes sense. An additional concern would be
> >> how to identify how a toolchain package was built, and AFAIK this is
> >> tricky now if only the 'toolchain ID' is known. Before I saw this
> >> e-mail I was thinking about this problem (which I think we can address
> >> separately), and that we might want to write the native-toolchain git
> >> hash with every toolchain build so that the exact build scripts are
> >> associated with those build artifacts. I filed
> >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> >> problem.
> >>
> >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <he...@apache.org>
> >> wrote:
> >> > As written, the toolchain can't apparently deal with the possibility
> of
> >> > build flags changing, but a dependency version remaining the same.
> >> >
> >> > LZ4 has never (afaict) been built with optimization enabled. I have a
> >> > commit that enables -O3, but that continues to produce artifacts for
> >> > lz4-1.7.5 with no version change. This is a problem because
> >> bootstrapping
> >> > the toolchain will fail to pick up the new binaries - because the
> >> > previously downloaded version is still in the local cache, and won't
> be
> >> > overwritten because of the version change.
> >> >
> >> > I think the simplest way to fix this is to write the toolchain build
> ID
> >> to
> >> > the dependency version file (that's in the local cache only) when it's
> >> > downloaded. If that ID changes, the dependency will be re-downloaded.
> >> >
> >> > This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID
> >> will
> >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> >> > re-download all of them. My feeling is that that cost is better than
> >> trying
> >> > to individually determine whether a dependency has changed between
> >> > toolchain builds.
> >> >
> >> > Any thoughts on whether this is the right way to go?
> >> >
> >> > Henry
> >>
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679 <(415)%20994-6679>
> >
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>

Re: Toolchain - versioning dependencies with the same version number

Posted by Henry Robinson <he...@cloudera.com>.
As Matt said, I have a patch that implements build ID-based versioning at
https://gerrit.cloudera.org/#/c/6166/2.

Does anyone want to take a look? If we could get this in soon it would help
smooth over the LZ4 change which is going in shortly.

On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com> wrote:

> I agree that that might be useful, and that it's a separately addressable
> problem.
>
> On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com> wrote:
>
>> Just catching up to this e-mail, though I had seen your code reviews
>> and I think this approach makes sense. An additional concern would be
>> how to identify how a toolchain package was built, and AFAIK this is
>> tricky now if only the 'toolchain ID' is known. Before I saw this
>> e-mail I was thinking about this problem (which I think we can address
>> separately), and that we might want to write the native-toolchain git
>> hash with every toolchain build so that the exact build scripts are
>> associated with those build artifacts. I filed
>> https://issues.cloudera.org/browse/IMPALA-5002 for this related
>> problem.
>>
>> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <he...@apache.org>
>> wrote:
>> > As written, the toolchain can't apparently deal with the possibility of
>> > build flags changing, but a dependency version remaining the same.
>> >
>> > LZ4 has never (afaict) been built with optimization enabled. I have a
>> > commit that enables -O3, but that continues to produce artifacts for
>> > lz4-1.7.5 with no version change. This is a problem because
>> bootstrapping
>> > the toolchain will fail to pick up the new binaries - because the
>> > previously downloaded version is still in the local cache, and won't be
>> > overwritten because of the version change.
>> >
>> > I think the simplest way to fix this is to write the toolchain build ID
>> to
>> > the dependency version file (that's in the local cache only) when it's
>> > downloaded. If that ID changes, the dependency will be re-downloaded.
>> >
>> > This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID
>> will
>> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
>> > re-download all of them. My feeling is that that cost is better than
>> trying
>> > to individually determine whether a dependency has changed between
>> > toolchain builds.
>> >
>> > Any thoughts on whether this is the right way to go?
>> >
>> > Henry
>>
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679 <(415)%20994-6679>
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: Toolchain - versioning dependencies with the same version number

Posted by Henry Robinson <he...@cloudera.com>.
I agree that that might be useful, and that it's a separately addressable
problem.

On 27 February 2017 at 14:18, Matthew Jacobs <mj...@cloudera.com> wrote:

> Just catching up to this e-mail, though I had seen your code reviews
> and I think this approach makes sense. An additional concern would be
> how to identify how a toolchain package was built, and AFAIK this is
> tricky now if only the 'toolchain ID' is known. Before I saw this
> e-mail I was thinking about this problem (which I think we can address
> separately), and that we might want to write the native-toolchain git
> hash with every toolchain build so that the exact build scripts are
> associated with those build artifacts. I filed
> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> problem.
>
> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <he...@apache.org> wrote:
> > As written, the toolchain can't apparently deal with the possibility of
> > build flags changing, but a dependency version remaining the same.
> >
> > LZ4 has never (afaict) been built with optimization enabled. I have a
> > commit that enables -O3, but that continues to produce artifacts for
> > lz4-1.7.5 with no version change. This is a problem because bootstrapping
> > the toolchain will fail to pick up the new binaries - because the
> > previously downloaded version is still in the local cache, and won't be
> > overwritten because of the version change.
> >
> > I think the simplest way to fix this is to write the toolchain build ID
> to
> > the dependency version file (that's in the local cache only) when it's
> > downloaded. If that ID changes, the dependency will be re-downloaded.
> >
> > This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID will
> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> > re-download all of them. My feeling is that that cost is better than
> trying
> > to individually determine whether a dependency has changed between
> > toolchain builds.
> >
> > Any thoughts on whether this is the right way to go?
> >
> > Henry
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679

Re: Toolchain - versioning dependencies with the same version number

Posted by Matthew Jacobs <mj...@cloudera.com>.
Just catching up to this e-mail, though I had seen your code reviews
and I think this approach makes sense. An additional concern would be
how to identify how a toolchain package was built, and AFAIK this is
tricky now if only the 'toolchain ID' is known. Before I saw this
e-mail I was thinking about this problem (which I think we can address
separately), and that we might want to write the native-toolchain git
hash with every toolchain build so that the exact build scripts are
associated with those build artifacts. I filed
https://issues.cloudera.org/browse/IMPALA-5002 for this related
problem.

On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <he...@apache.org> wrote:
> As written, the toolchain can't apparently deal with the possibility of
> build flags changing, but a dependency version remaining the same.
>
> LZ4 has never (afaict) been built with optimization enabled. I have a
> commit that enables -O3, but that continues to produce artifacts for
> lz4-1.7.5 with no version change. This is a problem because bootstrapping
> the toolchain will fail to pick up the new binaries - because the
> previously downloaded version is still in the local cache, and won't be
> overwritten because of the version change.
>
> I think the simplest way to fix this is to write the toolchain build ID to
> the dependency version file (that's in the local cache only) when it's
> downloaded. If that ID changes, the dependency will be re-downloaded.
>
> This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID will
> invalidate all dependencies, and bin/bootstrap_toolchain.py will
> re-download all of them. My feeling is that that cost is better than trying
> to individually determine whether a dependency has changed between
> toolchain builds.
>
> Any thoughts on whether this is the right way to go?
>
> Henry