You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bigtop.apache.org by Konstantin Boudnik <co...@apache.org> on 2015/09/10 03:18:49 UTC

Issues with Kite build

I have been running the full build to validate the DSL patch and have noticed
that kite downloads an enormous amount of Apache stuff from Cloudera's repo.
While is bad by itself, as we have no clue what's in there, I don't understand
why we have to bring things like httpcomponents, ant, etc from a 3rd party
repo-server. That's seems quite bad to me.

Also, looking at it I see that the build creates things like
    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0

which creates a bad impression that Apache Bigtop is providing a commercial's
vendors binaries. Can anyone who has the knowledge about this component
address these issues somehow?

Thank you very much!
  Cos


Re: Issues with Kite build

Posted by Konstantin Boudnik <co...@apache.org>.
On Fri, Sep 11, 2015 at 10:37AM, Ryan Blue wrote:
> Cos,
> 
> I wasn't aware that all of the dependencies are being pulled from
> the Cloudera repos and you're right that it seems strange. What's
> causing that to happen? If it's something on the Kite side, let us
> know how we can fix it.

I haven't look into the build code, but I presume that repo.cloudera.org (if I
am not mistaken about the hostname) might be hardcoded within the pom.xml or
something of the sort. Honestly, this is the first time I've seen Kyte being
built, so I am no expert ;)

Thanks for your help!
  Cos

> Yes, we do publish to maven central. It sounds like the problem is
> that somehow the Cloudera repo is being used instead of maven
> central.
> 
> rb
> 
> On 09/11/2015 09:31 AM, Mark Grover wrote:
> >Adding Ryan back
> >
> >Thanks Ryan!
> >
> >And, thanks Cos. Yeah, that makes total sense. I think that's a
> >reasonable goal, it's likely not something we can achieve overnight but
> >we can march towards that steadily. In particular, that involves two things:
> >1) Selectively building stuff that bigtop cares about.
> >2) Encouraging projects like Kite to publish their jars in maven
> >central, etc. (Ryan do you have any plans to do that? Or, is that the
> >case already?)
> >
> >I'll create a JIRA and add some details there, feel free to add if I
> >missed something.
> >
> >Mark
> >
> >On Thu, Sep 10, 2015 at 4:02 PM, Konstantin Boudnik <cos@apache.org
> ><ma...@apache.org>> wrote:
> >
> >    Thanks for the explanation Ryan! Certainly excluding the non-Apache
> >    specific
> >    modules make sense and needs to be done.
> >
> >    The other issue here, is that _all_ dependencies, including those
> >    that Hadoop
> >    and other components depends on, are pulled out of Cloudera repo.
> >    That's the biggest one in my opinion. While I am not suspecting Cloudera
> >    will be putting anything malicious into httpcomponents I, as a RM
> >    and a PMC
> >    member of this project, don't feel right gpg-signing packages
> >    without knowing
> >    what some of the jars contain. So my main concern is that if we
> >    supply binary
> >    packages to our users we should be sure that we are using either
> >      - official public repos like mavencentral, that contains the jars
> >    deployed by
> >        the official development teams of those components; or
> >      - ASF Infra repos where all the artifacts are controlled and a
> >    responsibility
> >        of a particular project's PMC
> >
> >    Does it make sense?
> >       Cos
> 
> 
> -- 
> Ryan Blue

Re: Issues with Kite build

Posted by Konstantin Boudnik <co...@apache.org>.
Gentlemen, 

is there any progress with this issue? We are approaching release 1.1 and I
would be very hesitant to keep a component with 3rd party binaries as a part
of it. Will it get fixed on the Kite side or we'll have to patch the source
code as a workaround?

Thanks for your help!
  Cos

On Fri, Sep 11, 2015 at 10:37AM, Ryan Blue wrote:
> Cos,
> 
> I wasn't aware that all of the dependencies are being pulled from
> the Cloudera repos and you're right that it seems strange. What's
> causing that to happen? If it's something on the Kite side, let us
> know how we can fix it.
> 
> Mark,
> 
> Yes, we do publish to maven central. It sounds like the problem is
> that somehow the Cloudera repo is being used instead of maven
> central.
> 
> rb
> 
> On 09/11/2015 09:31 AM, Mark Grover wrote:
> >Adding Ryan back
> >
> >Thanks Ryan!
> >
> >And, thanks Cos. Yeah, that makes total sense. I think that's a
> >reasonable goal, it's likely not something we can achieve overnight but
> >we can march towards that steadily. In particular, that involves two things:
> >1) Selectively building stuff that bigtop cares about.
> >2) Encouraging projects like Kite to publish their jars in maven
> >central, etc. (Ryan do you have any plans to do that? Or, is that the
> >case already?)
> >
> >I'll create a JIRA and add some details there, feel free to add if I
> >missed something.
> >
> >Mark
> >
> >On Thu, Sep 10, 2015 at 4:02 PM, Konstantin Boudnik <cos@apache.org
> ><ma...@apache.org>> wrote:
> >
> >    Thanks for the explanation Ryan! Certainly excluding the non-Apache
> >    specific
> >    modules make sense and needs to be done.
> >
> >    The other issue here, is that _all_ dependencies, including those
> >    that Hadoop
> >    and other components depends on, are pulled out of Cloudera repo.
> >    That's the biggest one in my opinion. While I am not suspecting Cloudera
> >    will be putting anything malicious into httpcomponents I, as a RM
> >    and a PMC
> >    member of this project, don't feel right gpg-signing packages
> >    without knowing
> >    what some of the jars contain. So my main concern is that if we
> >    supply binary
> >    packages to our users we should be sure that we are using either
> >      - official public repos like mavencentral, that contains the jars
> >    deployed by
> >        the official development teams of those components; or
> >      - ASF Infra repos where all the artifacts are controlled and a
> >    responsibility
> >        of a particular project's PMC
> >
> >    Does it make sense?
> >       Cos
> 
> 
> -- 
> Ryan Blue

Re: Issues with Kite build

Posted by Ryan Blue <bl...@apache.org>.
Cos,

I wasn't aware that all of the dependencies are being pulled from the 
Cloudera repos and you're right that it seems strange. What's causing 
that to happen? If it's something on the Kite side, let us know how we 
can fix it.

Mark,

Yes, we do publish to maven central. It sounds like the problem is that 
somehow the Cloudera repo is being used instead of maven central.

rb

On 09/11/2015 09:31 AM, Mark Grover wrote:
> Adding Ryan back
>
> Thanks Ryan!
>
> And, thanks Cos. Yeah, that makes total sense. I think that's a
> reasonable goal, it's likely not something we can achieve overnight but
> we can march towards that steadily. In particular, that involves two things:
> 1) Selectively building stuff that bigtop cares about.
> 2) Encouraging projects like Kite to publish their jars in maven
> central, etc. (Ryan do you have any plans to do that? Or, is that the
> case already?)
>
> I'll create a JIRA and add some details there, feel free to add if I
> missed something.
>
> Mark
>
> On Thu, Sep 10, 2015 at 4:02 PM, Konstantin Boudnik <cos@apache.org
> <ma...@apache.org>> wrote:
>
>     Thanks for the explanation Ryan! Certainly excluding the non-Apache
>     specific
>     modules make sense and needs to be done.
>
>     The other issue here, is that _all_ dependencies, including those
>     that Hadoop
>     and other components depends on, are pulled out of Cloudera repo.
>     That's the biggest one in my opinion. While I am not suspecting Cloudera
>     will be putting anything malicious into httpcomponents I, as a RM
>     and a PMC
>     member of this project, don't feel right gpg-signing packages
>     without knowing
>     what some of the jars contain. So my main concern is that if we
>     supply binary
>     packages to our users we should be sure that we are using either
>       - official public repos like mavencentral, that contains the jars
>     deployed by
>         the official development teams of those components; or
>       - ASF Infra repos where all the artifacts are controlled and a
>     responsibility
>         of a particular project's PMC
>
>     Does it make sense?
>        Cos


-- 
Ryan Blue

Re: Issues with Kite build

Posted by Mark Grover <ma...@apache.org>.
Adding Ryan back

Thanks Ryan!

And, thanks Cos. Yeah, that makes total sense. I think that's a reasonable
goal, it's likely not something we can achieve overnight but we can march
towards that steadily. In particular, that involves two things:
1) Selectively building stuff that bigtop cares about.
2) Encouraging projects like Kite to publish their jars in maven central,
etc. (Ryan do you have any plans to do that? Or, is that the case already?)

I'll create a JIRA and add some details there, feel free to add if I missed
something.

Mark

On Thu, Sep 10, 2015 at 4:02 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Thanks for the explanation Ryan! Certainly excluding the non-Apache
> specific
> modules make sense and needs to be done.
>
> The other issue here, is that _all_ dependencies, including those that
> Hadoop
> and other components depends on, are pulled out of Cloudera repo.
> That's the biggest one in my opinion. While I am not suspecting Cloudera
> will be putting anything malicious into httpcomponents I, as a RM and a PMC
> member of this project, don't feel right gpg-signing packages without
> knowing
> what some of the jars contain. So my main concern is that if we supply
> binary
> packages to our users we should be sure that we are using either
>  - official public repos like mavencentral, that contains the jars
> deployed by
>    the official development teams of those components; or
>  - ASF Infra repos where all the artifacts are controlled and a
> responsibility
>    of a particular project's PMC
>
> Does it make sense?
>   Cos
>
> On Thu, Sep 10, 2015 at 02:15PM, Ryan Blue wrote:
> > Hi everyone,
> >
> > Sorry about the confusion here, hopefully a little more info about
> > the Kite project will help. Kite is intended to work across any
> > Hadoop distribution and we've structured the libraries to depend on
> > the upstream Apache versions by default. But, we also want CI to
> > tell us if anything breaks downstream, so we allow vendor-specific
> > parts. That's why we have dependency aggregators named "default"
> > (upstream Apache), "cdh5", etc.
> >
> > You can also see this at work in the kite-tools module, where we
> > have a generic runtime that tries to construct the right classpath
> > to run on any distribution (we've seen people running it on MapR and
> > HDP). The likely culprit in this situation, though, is our
> > kite-tools-cdh5 module that bundles CDH5 dependencies so people can
> > use it their local machine that doesn't have Hadoop installed.
> >
> > I sympathize with the view that we shouldn't depend on proprietary
> > infra, but I think we have good reasons for catching bugs early
> > (testing against vendors) and making the CLI run on non-Hadoop
> > machines.
> >
> > To avoid this issue, I suggest excluding the vendor-specific modules
> > from the build. That should be easy to do by using the -pl and -amr
> > maven command options. The -pl option allows you to supply a list of
> > the modules to build and -amr ensures the dependency modules are
> > present. By running with "-pl kite-tools-runtime -amr" you should be
> > able to avoid hitting vendor repos.
> >
> > If we need upstream changes, we can make that happen too. I hope that
> helps!
> >
> > rb
> >
> > (By the way, I'm not subscribed to dev@bigtop so cc me to keep me in
> > the discussion.)
> >
> > On 09/10/2015 01:44 PM, Mark Grover wrote:
> > >Hi all,
> > >I am not quite sure I completely understand the issue being discussed
> > >here. Is the issue that there are some CDH5 dependencies being bundled
> > >in Kite build? If so, I am adding Ryan Blue one of the contributors to
> > >Kite to share some thoughts on it.
> > >
> > >If not, please let me know how I can help.
> > >
> > >Here are my thoughts on a few other things being discussed:
> > >>Are you saying these dependencies are
> > >'compile-time' only?
> > >Actually, they are not. Many projects, Apache Flume, for example, use
> > >Kite and those jars likely end up in the packages.
> > >
> > > >even hue is downloading cloudera snapshots (sic!) from maven to
> > >compile against. IIRC these binaries are not bundled with "our"
> packaging.
> > >I think you are correct here. However, we also ship LinkedIn's DataFu,
> > >(Yahoo's) YCSB, Amplab's Tachyon. I don't think it's productive use of
> > >anyone's time to go searching for where all these dependencies come
> > >from. I personally like to think from a Bigtop community perspective.
> > >What tools does an average Bigtop user want to use from the Hadoop
> > >ecosystem? And, if someone in the Bigtop community is willing to
> > >contribute that tool to the project, that's great! (and the license
> > >being compatible/ ASL v2).
> > >
> > >Mark
> > >
> > >
> > >On Thu, Sep 10, 2015 at 1:02 PM, Konstantin Boudnik <cos@apache.org
> > ><ma...@apache.org>> wrote:
> > >
> > >    On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
> > >    > I think that was the second part of my statement.  :)  I don't
> see that
> > >    > happening since neither Hue nor Kite are Apache projects and have
> no
> > >    > incentive to distance themselves from particular vendors.
> > >
> > >    It isn't so much of distancing away from anything. It's more like a
> > >    common
> > >    sense of not using a proprietary infra (it's an implementation
> > >    detail, as you
> > >    should be able to plug this in via settings.xml file in your private
> > >    environment) for commonly available artifacts. If I can put it
> > >    bluntly - learn
> > >    your tools before opening your stuff to others ;)
> > >
> > >    Cos
> > >
> > >     > On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik
> > >    <cos@apache.org <ma...@apache.org>> wrote:
> > >     >
> > >     > > Or trying to convince the upstream projects to stop using their
> > >    mirrors for
> > >     > > what they call "open source" projects?
> > >     > >
> > >     > > Cos
> > >     > >
> > >     > > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> > >     > > > I don't know how we'd get around it without patching the
> > >    upstreams'
> > >     > > > dependencies or convincing the upstream projects to use the
> > >    Apache repos
> > >     > > >
> > >     > > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik
> > >    <cos@apache.org <ma...@apache.org>>
> > >     > > wrote:
> > >     > > >
> > >     > > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> > >     > > > >
> > >     > > > > > even hue is downloading cloudera snapshots (sic!) from
> > >    maven to
> > >     > > compile
> > >     > > > > > against. IIRC these binaries are not bundled with "our"
> > >    packaging.
> > >     > > > >
> > >     > > > > Yeah, Hue is another of those. Are you saying these
> > >    dependencies are
> > >     > > > > 'compile-time' only?  If we aren't bundling anything of
> > >    these 3rd party
> > >     > > > > binaries to our convenience packages - I am ok. Still feels
> > >    icky though
> > >     > > > >
> > >     > > > > Thanks.
> > >     > > > >   Cos
> > >     > > > >
> > >     > > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik
> > >    <cos@apache.org <ma...@apache.org>
> > >     > > >:
> > >     > > > > > >
> > >     > > > > > > I have been running the full build to validate the DSL
> > >    patch and
> > >     > > have
> > >     > > > > noticed
> > >     > > > > > > that kite downloads an enormous amount of Apache stuff
> from
> > >     > > Cloudera's
> > >     > > > > repo.
> > >     > > > > > > While is bad by itself, as we have no clue what's in
> > >    there, I don't
> > >     > > > > understand
> > >     > > > > > > why we have to bring things like httpcomponents, ant,
> > >    etc from a
> > >     > > 3rd
> > >     > > > > party
> > >     > > > > > > repo-server. That's seems quite bad to me.
> > >     > > > > > >
> > >     > > > > > > Also, looking at it I see that the build creates things
> > >    like
> > >     > > > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module
> > >    1.1.0
> > >     > > > > > >
> > >     > > > > > > which creates a bad impression that Apache Bigtop is
> > >    providing a
> > >     > > > > commercial's
> > >     > > > > > > vendors binaries. Can anyone who has the knowledge
> > >    about this
> > >     > > component
> > >     > > > > > > address these issues somehow?
> > >     > > > > > >
> > >     > > > > > > Thank you very much!
> > >     > > > > > >  Cos
> > >     > > > > > >
> > >     > > > > >
> > >     > > > >
> > >     > > > >
> > >     > > > >
> > >     > >
> > >
> > >
> >
> >
> > --
> > Ryan Blue
>

Re: Issues with Kite build

Posted by Konstantin Boudnik <co...@apache.org>.
Thanks for the explanation Ryan! Certainly excluding the non-Apache specific
modules make sense and needs to be done.

The other issue here, is that _all_ dependencies, including those that Hadoop
and other components depends on, are pulled out of Cloudera repo.
That's the biggest one in my opinion. While I am not suspecting Cloudera
will be putting anything malicious into httpcomponents I, as a RM and a PMC
member of this project, don't feel right gpg-signing packages without knowing
what some of the jars contain. So my main concern is that if we supply binary
packages to our users we should be sure that we are using either
 - official public repos like mavencentral, that contains the jars deployed by
   the official development teams of those components; or
 - ASF Infra repos where all the artifacts are controlled and a responsibility
   of a particular project's PMC

Does it make sense?
  Cos

On Thu, Sep 10, 2015 at 02:15PM, Ryan Blue wrote:
> Hi everyone,
> 
> Sorry about the confusion here, hopefully a little more info about
> the Kite project will help. Kite is intended to work across any
> Hadoop distribution and we've structured the libraries to depend on
> the upstream Apache versions by default. But, we also want CI to
> tell us if anything breaks downstream, so we allow vendor-specific
> parts. That's why we have dependency aggregators named "default"
> (upstream Apache), "cdh5", etc.
> 
> You can also see this at work in the kite-tools module, where we
> have a generic runtime that tries to construct the right classpath
> to run on any distribution (we've seen people running it on MapR and
> HDP). The likely culprit in this situation, though, is our
> kite-tools-cdh5 module that bundles CDH5 dependencies so people can
> use it their local machine that doesn't have Hadoop installed.
> 
> I sympathize with the view that we shouldn't depend on proprietary
> infra, but I think we have good reasons for catching bugs early
> (testing against vendors) and making the CLI run on non-Hadoop
> machines.
> 
> To avoid this issue, I suggest excluding the vendor-specific modules
> from the build. That should be easy to do by using the -pl and -amr
> maven command options. The -pl option allows you to supply a list of
> the modules to build and -amr ensures the dependency modules are
> present. By running with "-pl kite-tools-runtime -amr" you should be
> able to avoid hitting vendor repos.
> 
> If we need upstream changes, we can make that happen too. I hope that helps!
> 
> rb
> 
> (By the way, I'm not subscribed to dev@bigtop so cc me to keep me in
> the discussion.)
> 
> On 09/10/2015 01:44 PM, Mark Grover wrote:
> >Hi all,
> >I am not quite sure I completely understand the issue being discussed
> >here. Is the issue that there are some CDH5 dependencies being bundled
> >in Kite build? If so, I am adding Ryan Blue one of the contributors to
> >Kite to share some thoughts on it.
> >
> >If not, please let me know how I can help.
> >
> >Here are my thoughts on a few other things being discussed:
> >>Are you saying these dependencies are
> >'compile-time' only?
> >Actually, they are not. Many projects, Apache Flume, for example, use
> >Kite and those jars likely end up in the packages.
> >
> > >even hue is downloading cloudera snapshots (sic!) from maven to
> >compile against. IIRC these binaries are not bundled with "our" packaging.
> >I think you are correct here. However, we also ship LinkedIn's DataFu,
> >(Yahoo's) YCSB, Amplab's Tachyon. I don't think it's productive use of
> >anyone's time to go searching for where all these dependencies come
> >from. I personally like to think from a Bigtop community perspective.
> >What tools does an average Bigtop user want to use from the Hadoop
> >ecosystem? And, if someone in the Bigtop community is willing to
> >contribute that tool to the project, that's great! (and the license
> >being compatible/ ASL v2).
> >
> >Mark
> >
> >
> >On Thu, Sep 10, 2015 at 1:02 PM, Konstantin Boudnik <cos@apache.org
> ><ma...@apache.org>> wrote:
> >
> >    On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
> >    > I think that was the second part of my statement.  :)  I don't see that
> >    > happening since neither Hue nor Kite are Apache projects and have no
> >    > incentive to distance themselves from particular vendors.
> >
> >    It isn't so much of distancing away from anything. It's more like a
> >    common
> >    sense of not using a proprietary infra (it's an implementation
> >    detail, as you
> >    should be able to plug this in via settings.xml file in your private
> >    environment) for commonly available artifacts. If I can put it
> >    bluntly - learn
> >    your tools before opening your stuff to others ;)
> >
> >    Cos
> >
> >     > On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik
> >    <cos@apache.org <ma...@apache.org>> wrote:
> >     >
> >     > > Or trying to convince the upstream projects to stop using their
> >    mirrors for
> >     > > what they call "open source" projects?
> >     > >
> >     > > Cos
> >     > >
> >     > > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> >     > > > I don't know how we'd get around it without patching the
> >    upstreams'
> >     > > > dependencies or convincing the upstream projects to use the
> >    Apache repos
> >     > > >
> >     > > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik
> >    <cos@apache.org <ma...@apache.org>>
> >     > > wrote:
> >     > > >
> >     > > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> >     > > > >
> >     > > > > > even hue is downloading cloudera snapshots (sic!) from
> >    maven to
> >     > > compile
> >     > > > > > against. IIRC these binaries are not bundled with "our"
> >    packaging.
> >     > > > >
> >     > > > > Yeah, Hue is another of those. Are you saying these
> >    dependencies are
> >     > > > > 'compile-time' only?  If we aren't bundling anything of
> >    these 3rd party
> >     > > > > binaries to our convenience packages - I am ok. Still feels
> >    icky though
> >     > > > >
> >     > > > > Thanks.
> >     > > > >   Cos
> >     > > > >
> >     > > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik
> >    <cos@apache.org <ma...@apache.org>
> >     > > >:
> >     > > > > > >
> >     > > > > > > I have been running the full build to validate the DSL
> >    patch and
> >     > > have
> >     > > > > noticed
> >     > > > > > > that kite downloads an enormous amount of Apache stuff from
> >     > > Cloudera's
> >     > > > > repo.
> >     > > > > > > While is bad by itself, as we have no clue what's in
> >    there, I don't
> >     > > > > understand
> >     > > > > > > why we have to bring things like httpcomponents, ant,
> >    etc from a
> >     > > 3rd
> >     > > > > party
> >     > > > > > > repo-server. That's seems quite bad to me.
> >     > > > > > >
> >     > > > > > > Also, looking at it I see that the build creates things
> >    like
> >     > > > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module
> >    1.1.0
> >     > > > > > >
> >     > > > > > > which creates a bad impression that Apache Bigtop is
> >    providing a
> >     > > > > commercial's
> >     > > > > > > vendors binaries. Can anyone who has the knowledge
> >    about this
> >     > > component
> >     > > > > > > address these issues somehow?
> >     > > > > > >
> >     > > > > > > Thank you very much!
> >     > > > > > >  Cos
> >     > > > > > >
> >     > > > > >
> >     > > > >
> >     > > > >
> >     > > > >
> >     > >
> >
> >
> 
> 
> -- 
> Ryan Blue

Re: Issues with Kite build

Posted by Ryan Blue <bl...@apache.org>.
Hi everyone,

Sorry about the confusion here, hopefully a little more info about the 
Kite project will help. Kite is intended to work across any Hadoop 
distribution and we've structured the libraries to depend on the 
upstream Apache versions by default. But, we also want CI to tell us if 
anything breaks downstream, so we allow vendor-specific parts. That's 
why we have dependency aggregators named "default" (upstream Apache), 
"cdh5", etc.

You can also see this at work in the kite-tools module, where we have a 
generic runtime that tries to construct the right classpath to run on 
any distribution (we've seen people running it on MapR and HDP). The 
likely culprit in this situation, though, is our kite-tools-cdh5 module 
that bundles CDH5 dependencies so people can use it their local machine 
that doesn't have Hadoop installed.

I sympathize with the view that we shouldn't depend on proprietary 
infra, but I think we have good reasons for catching bugs early (testing 
against vendors) and making the CLI run on non-Hadoop machines.

To avoid this issue, I suggest excluding the vendor-specific modules 
from the build. That should be easy to do by using the -pl and -amr 
maven command options. The -pl option allows you to supply a list of the 
modules to build and -amr ensures the dependency modules are present. By 
running with "-pl kite-tools-runtime -amr" you should be able to avoid 
hitting vendor repos.

If we need upstream changes, we can make that happen too. I hope that helps!

rb

(By the way, I'm not subscribed to dev@bigtop so cc me to keep me in the 
discussion.)

On 09/10/2015 01:44 PM, Mark Grover wrote:
> Hi all,
> I am not quite sure I completely understand the issue being discussed
> here. Is the issue that there are some CDH5 dependencies being bundled
> in Kite build? If so, I am adding Ryan Blue one of the contributors to
> Kite to share some thoughts on it.
>
> If not, please let me know how I can help.
>
> Here are my thoughts on a few other things being discussed:
>> Are you saying these dependencies are
> 'compile-time' only?
> Actually, they are not. Many projects, Apache Flume, for example, use
> Kite and those jars likely end up in the packages.
>
>  >even hue is downloading cloudera snapshots (sic!) from maven to
> compile against. IIRC these binaries are not bundled with "our" packaging.
> I think you are correct here. However, we also ship LinkedIn's DataFu,
> (Yahoo's) YCSB, Amplab's Tachyon. I don't think it's productive use of
> anyone's time to go searching for where all these dependencies come
> from. I personally like to think from a Bigtop community perspective.
> What tools does an average Bigtop user want to use from the Hadoop
> ecosystem? And, if someone in the Bigtop community is willing to
> contribute that tool to the project, that's great! (and the license
> being compatible/ ASL v2).
>
> Mark
>
>
> On Thu, Sep 10, 2015 at 1:02 PM, Konstantin Boudnik <cos@apache.org
> <ma...@apache.org>> wrote:
>
>     On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
>     > I think that was the second part of my statement.  :)  I don't see that
>     > happening since neither Hue nor Kite are Apache projects and have no
>     > incentive to distance themselves from particular vendors.
>
>     It isn't so much of distancing away from anything. It's more like a
>     common
>     sense of not using a proprietary infra (it's an implementation
>     detail, as you
>     should be able to plug this in via settings.xml file in your private
>     environment) for commonly available artifacts. If I can put it
>     bluntly - learn
>     your tools before opening your stuff to others ;)
>
>     Cos
>
>      > On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik
>     <cos@apache.org <ma...@apache.org>> wrote:
>      >
>      > > Or trying to convince the upstream projects to stop using their
>     mirrors for
>      > > what they call "open source" projects?
>      > >
>      > > Cos
>      > >
>      > > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
>      > > > I don't know how we'd get around it without patching the
>     upstreams'
>      > > > dependencies or convincing the upstream projects to use the
>     Apache repos
>      > > >
>      > > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik
>     <cos@apache.org <ma...@apache.org>>
>      > > wrote:
>      > > >
>      > > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
>      > > > >
>      > > > > > even hue is downloading cloudera snapshots (sic!) from
>     maven to
>      > > compile
>      > > > > > against. IIRC these binaries are not bundled with "our"
>     packaging.
>      > > > >
>      > > > > Yeah, Hue is another of those. Are you saying these
>     dependencies are
>      > > > > 'compile-time' only?  If we aren't bundling anything of
>     these 3rd party
>      > > > > binaries to our convenience packages - I am ok. Still feels
>     icky though
>      > > > >
>      > > > > Thanks.
>      > > > >   Cos
>      > > > >
>      > > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik
>     <cos@apache.org <ma...@apache.org>
>      > > >:
>      > > > > > >
>      > > > > > > I have been running the full build to validate the DSL
>     patch and
>      > > have
>      > > > > noticed
>      > > > > > > that kite downloads an enormous amount of Apache stuff from
>      > > Cloudera's
>      > > > > repo.
>      > > > > > > While is bad by itself, as we have no clue what's in
>     there, I don't
>      > > > > understand
>      > > > > > > why we have to bring things like httpcomponents, ant,
>     etc from a
>      > > 3rd
>      > > > > party
>      > > > > > > repo-server. That's seems quite bad to me.
>      > > > > > >
>      > > > > > > Also, looking at it I see that the build creates things
>     like
>      > > > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module
>     1.1.0
>      > > > > > >
>      > > > > > > which creates a bad impression that Apache Bigtop is
>     providing a
>      > > > > commercial's
>      > > > > > > vendors binaries. Can anyone who has the knowledge
>     about this
>      > > component
>      > > > > > > address these issues somehow?
>      > > > > > >
>      > > > > > > Thank you very much!
>      > > > > > >  Cos
>      > > > > > >
>      > > > > >
>      > > > >
>      > > > >
>      > > > >
>      > >
>
>


-- 
Ryan Blue

Re: Issues with Kite build

Posted by Mark Grover <ma...@apache.org>.
Hi all,
I am not quite sure I completely understand the issue being discussed here.
Is the issue that there are some CDH5 dependencies being bundled in Kite
build? If so, I am adding Ryan Blue one of the contributors to Kite to
share some thoughts on it.

If not, please let me know how I can help.

Here are my thoughts on a few other things being discussed:
> Are you saying these dependencies are
'compile-time' only?
Actually, they are not. Many projects, Apache Flume, for example, use Kite
and those jars likely end up in the packages.

>even hue is downloading cloudera snapshots (sic!) from maven to compile
against. IIRC these binaries are not bundled with "our" packaging.
I think you are correct here. However, we also ship LinkedIn's DataFu,
(Yahoo's) YCSB, Amplab's Tachyon. I don't think it's productive use of
anyone's time to go searching for where all these dependencies come from. I
personally like to think from a Bigtop community perspective. What tools
does an average Bigtop user want to use from the Hadoop ecosystem? And, if
someone in the Bigtop community is willing to contribute that tool to the
project, that's great! (and the license being compatible/ ASL v2).

Mark


On Thu, Sep 10, 2015 at 1:02 PM, Konstantin Boudnik <co...@apache.org> wrote:

> On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
> > I think that was the second part of my statement.  :)  I don't see that
> > happening since neither Hue nor Kite are Apache projects and have no
> > incentive to distance themselves from particular vendors.
>
> It isn't so much of distancing away from anything. It's more like a common
> sense of not using a proprietary infra (it's an implementation detail, as
> you
> should be able to plug this in via settings.xml file in your private
> environment) for commonly available artifacts. If I can put it bluntly -
> learn
> your tools before opening your stuff to others ;)
>
> Cos
>
> > On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> > > Or trying to convince the upstream projects to stop using their
> mirrors for
> > > what they call "open source" projects?
> > >
> > > Cos
> > >
> > > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> > > > I don't know how we'd get around it without patching the upstreams'
> > > > dependencies or convincing the upstream projects to use the Apache
> repos
> > > >
> > > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik <cos@apache.org
> >
> > > wrote:
> > > >
> > > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> > > > >
> > > > > > even hue is downloading cloudera snapshots (sic!) from maven to
> > > compile
> > > > > > against. IIRC these binaries are not bundled with "our"
> packaging.
> > > > >
> > > > > Yeah, Hue is another of those. Are you saying these dependencies
> are
> > > > > 'compile-time' only?  If we aren't bundling anything of these 3rd
> party
> > > > > binaries to our convenience packages - I am ok. Still feels icky
> though
> > > > >
> > > > > Thanks.
> > > > >   Cos
> > > > >
> > > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik <
> cos@apache.org
> > > >:
> > > > > > >
> > > > > > > I have been running the full build to validate the DSL patch
> and
> > > have
> > > > > noticed
> > > > > > > that kite downloads an enormous amount of Apache stuff from
> > > Cloudera's
> > > > > repo.
> > > > > > > While is bad by itself, as we have no clue what's in there, I
> don't
> > > > > understand
> > > > > > > why we have to bring things like httpcomponents, ant, etc from
> a
> > > 3rd
> > > > > party
> > > > > > > repo-server. That's seems quite bad to me.
> > > > > > >
> > > > > > > Also, looking at it I see that the build creates things like
> > > > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0
> > > > > > >
> > > > > > > which creates a bad impression that Apache Bigtop is providing
> a
> > > > > commercial's
> > > > > > > vendors binaries. Can anyone who has the knowledge about this
> > > component
> > > > > > > address these issues somehow?
> > > > > > >
> > > > > > > Thank you very much!
> > > > > > >  Cos
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > >
>

Re: Issues with Kite build

Posted by Konstantin Boudnik <co...@apache.org>.
On Thu, Sep 10, 2015 at 01:20PM, RJ Nowling wrote:
> I think that was the second part of my statement.  :)  I don't see that
> happening since neither Hue nor Kite are Apache projects and have no
> incentive to distance themselves from particular vendors.

It isn't so much of distancing away from anything. It's more like a common
sense of not using a proprietary infra (it's an implementation detail, as you
should be able to plug this in via settings.xml file in your private
environment) for commonly available artifacts. If I can put it bluntly - learn
your tools before opening your stuff to others ;)

Cos

> On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > Or trying to convince the upstream projects to stop using their mirrors for
> > what they call "open source" projects?
> >
> > Cos
> >
> > On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> > > I don't know how we'd get around it without patching the upstreams'
> > > dependencies or convincing the upstream projects to use the Apache repos
> > >
> > > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik <co...@apache.org>
> > wrote:
> > >
> > > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> > > >
> > > > > even hue is downloading cloudera snapshots (sic!) from maven to
> > compile
> > > > > against. IIRC these binaries are not bundled with "our" packaging.
> > > >
> > > > Yeah, Hue is another of those. Are you saying these dependencies are
> > > > 'compile-time' only?  If we aren't bundling anything of these 3rd party
> > > > binaries to our convenience packages - I am ok. Still feels icky though
> > > >
> > > > Thanks.
> > > >   Cos
> > > >
> > > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik <cos@apache.org
> > >:
> > > > > >
> > > > > > I have been running the full build to validate the DSL patch and
> > have
> > > > noticed
> > > > > > that kite downloads an enormous amount of Apache stuff from
> > Cloudera's
> > > > repo.
> > > > > > While is bad by itself, as we have no clue what's in there, I don't
> > > > understand
> > > > > > why we have to bring things like httpcomponents, ant, etc from a
> > 3rd
> > > > party
> > > > > > repo-server. That's seems quite bad to me.
> > > > > >
> > > > > > Also, looking at it I see that the build creates things like
> > > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0
> > > > > >
> > > > > > which creates a bad impression that Apache Bigtop is providing a
> > > > commercial's
> > > > > > vendors binaries. Can anyone who has the knowledge about this
> > component
> > > > > > address these issues somehow?
> > > > > >
> > > > > > Thank you very much!
> > > > > >  Cos
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> >

Re: Issues with Kite build

Posted by RJ Nowling <rn...@gmail.com>.
I think that was the second part of my statement.  :)  I don't see that
happening since neither Hue nor Kite are Apache projects and have no
incentive to distance themselves from particular vendors.

On Thu, Sep 10, 2015 at 1:18 PM, Konstantin Boudnik <co...@apache.org> wrote:

> Or trying to convince the upstream projects to stop using their mirrors for
> what they call "open source" projects?
>
> Cos
>
> On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> > I don't know how we'd get around it without patching the upstreams'
> > dependencies or convincing the upstream projects to use the Apache repos
> >
> > On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik <co...@apache.org>
> wrote:
> >
> > > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> > >
> > > > even hue is downloading cloudera snapshots (sic!) from maven to
> compile
> > > > against. IIRC these binaries are not bundled with "our" packaging.
> > >
> > > Yeah, Hue is another of those. Are you saying these dependencies are
> > > 'compile-time' only?  If we aren't bundling anything of these 3rd party
> > > binaries to our convenience packages - I am ok. Still feels icky though
> > >
> > > Thanks.
> > >   Cos
> > >
> > > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik <cos@apache.org
> >:
> > > > >
> > > > > I have been running the full build to validate the DSL patch and
> have
> > > noticed
> > > > > that kite downloads an enormous amount of Apache stuff from
> Cloudera's
> > > repo.
> > > > > While is bad by itself, as we have no clue what's in there, I don't
> > > understand
> > > > > why we have to bring things like httpcomponents, ant, etc from a
> 3rd
> > > party
> > > > > repo-server. That's seems quite bad to me.
> > > > >
> > > > > Also, looking at it I see that the build creates things like
> > > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0
> > > > >
> > > > > which creates a bad impression that Apache Bigtop is providing a
> > > commercial's
> > > > > vendors binaries. Can anyone who has the knowledge about this
> component
> > > > > address these issues somehow?
> > > > >
> > > > > Thank you very much!
> > > > >  Cos
> > > > >
> > > >
> > >
> > >
> > >
>

Re: Issues with Kite build

Posted by Konstantin Boudnik <co...@apache.org>.
Or trying to convince the upstream projects to stop using their mirrors for
what they call "open source" projects?

Cos

On Thu, Sep 10, 2015 at 12:05PM, RJ Nowling wrote:
> I don't know how we'd get around it without patching the upstreams'
> dependencies or convincing the upstream projects to use the Apache repos
> 
> On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik <co...@apache.org> wrote:
> 
> > On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
> >
> > > even hue is downloading cloudera snapshots (sic!) from maven to compile
> > > against. IIRC these binaries are not bundled with "our" packaging.
> >
> > Yeah, Hue is another of those. Are you saying these dependencies are
> > 'compile-time' only?  If we aren't bundling anything of these 3rd party
> > binaries to our convenience packages - I am ok. Still feels icky though
> >
> > Thanks.
> >   Cos
> >
> > > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik <co...@apache.org>:
> > > >
> > > > I have been running the full build to validate the DSL patch and have
> > noticed
> > > > that kite downloads an enormous amount of Apache stuff from Cloudera's
> > repo.
> > > > While is bad by itself, as we have no clue what's in there, I don't
> > understand
> > > > why we have to bring things like httpcomponents, ant, etc from a 3rd
> > party
> > > > repo-server. That's seems quite bad to me.
> > > >
> > > > Also, looking at it I see that the build creates things like
> > > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0
> > > >
> > > > which creates a bad impression that Apache Bigtop is providing a
> > commercial's
> > > > vendors binaries. Can anyone who has the knowledge about this component
> > > > address these issues somehow?
> > > >
> > > > Thank you very much!
> > > >  Cos
> > > >
> > >
> >
> >
> >

Re: Issues with Kite build

Posted by RJ Nowling <rn...@gmail.com>.
I don't know how we'd get around it without patching the upstreams'
dependencies or convincing the upstream projects to use the Apache repos

On Thu, Sep 10, 2015 at 11:46 AM, Konstantin Boudnik <co...@apache.org> wrote:

> On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:
>
> > even hue is downloading cloudera snapshots (sic!) from maven to compile
> > against. IIRC these binaries are not bundled with "our" packaging.
>
> Yeah, Hue is another of those. Are you saying these dependencies are
> 'compile-time' only?  If we aren't bundling anything of these 3rd party
> binaries to our convenience packages - I am ok. Still feels icky though
>
> Thanks.
>   Cos
>
> > > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik <co...@apache.org>:
> > >
> > > I have been running the full build to validate the DSL patch and have
> noticed
> > > that kite downloads an enormous amount of Apache stuff from Cloudera's
> repo.
> > > While is bad by itself, as we have no clue what's in there, I don't
> understand
> > > why we have to bring things like httpcomponents, ant, etc from a 3rd
> party
> > > repo-server. That's seems quite bad to me.
> > >
> > > Also, looking at it I see that the build creates things like
> > >    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0
> > >
> > > which creates a bad impression that Apache Bigtop is providing a
> commercial's
> > > vendors binaries. Can anyone who has the knowledge about this component
> > > address these issues somehow?
> > >
> > > Thank you very much!
> > >  Cos
> > >
> >
>
>
>

Re: Issues with Kite build

Posted by Konstantin Boudnik <co...@apache.org>.
On Thu, Sep 10, 2015 at 11:06AM, Olaf Flebbe wrote:

> even hue is downloading cloudera snapshots (sic!) from maven to compile
> against. IIRC these binaries are not bundled with "our" packaging.

Yeah, Hue is another of those. Are you saying these dependencies are
'compile-time' only?  If we aren't bundling anything of these 3rd party
binaries to our convenience packages - I am ok. Still feels icky though

Thanks.
  Cos

> > Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik <co...@apache.org>:
> > 
> > I have been running the full build to validate the DSL patch and have noticed
> > that kite downloads an enormous amount of Apache stuff from Cloudera's repo.
> > While is bad by itself, as we have no clue what's in there, I don't understand
> > why we have to bring things like httpcomponents, ant, etc from a 3rd party
> > repo-server. That's seems quite bad to me.
> > 
> > Also, looking at it I see that the build creates things like
> >    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0
> > 
> > which creates a bad impression that Apache Bigtop is providing a commercial's
> > vendors binaries. Can anyone who has the knowledge about this component
> > address these issues somehow?
> > 
> > Thank you very much!
> >  Cos
> > 
> 



Re: Issues with Kite build

Posted by Olaf Flebbe <of...@oflebbe.de>.
Hi,

even hue is downloading cloudera snapshots (sic!) from maven to compile against. IIRC these binaries are not bundled with "our" packaging.

Olaf

> Am 10.09.2015 um 03:18 schrieb Konstantin Boudnik <co...@apache.org>:
> 
> I have been running the full build to validate the DSL patch and have noticed
> that kite downloads an enormous amount of Apache stuff from Cloudera's repo.
> While is bad by itself, as we have no clue what's in there, I don't understand
> why we have to bring things like httpcomponents, ant, etc from a 3rd party
> repo-server. That's seems quite bad to me.
> 
> Also, looking at it I see that the build creates things like
>    [INFO] Building Kite Hadoop CDH5 Dependencies Module 1.1.0
> 
> which creates a bad impression that Apache Bigtop is providing a commercial's
> vendors binaries. Can anyone who has the knowledge about this component
> address these issues somehow?
> 
> Thank you very much!
>  Cos
>