You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kudu.apache.org by Mike Percy <mp...@apache.org> on 2018/07/02 19:24:18 UTC

Re: Binaries for embedded testing

I explored the "download binaries from Maven" approach for a while on
Friday. Here is what I found:

1) There is a Maven plugin that should be able to help us find matching
system binaries @ https://github.com/trustin/os-maven-plugin

The protobuf-maven-plugin uses this approach to download and run the
appropriate protoc binary for your architecture according to
https://www.xolstice.org/protobuf-maven-plugin/examples/protoc-artifact.html

2) Stripped binaries from release builds look small enough to be viable to
download to run integration tests via Maven in precommit builds, at least
in non-bandwidth-constrained environments:

$ strip kudu-master
$ strip kudu-tserver
$ ls -alh
total 85M
drwxrwxr-x 2 mpercy mpercy  45 Jul  2 12:05 .
drwxrwxr-x 3 mpercy mpercy  98 Jun 29 14:56 ..
-rwxrwxr-x 1 mpercy mpercy 45M Jul  2 12:05 kudu-master
-rwxrwxr-x 1 mpercy mpercy 41M Jul  2 12:05 kudu-tserver

3) Kudu binaries contain many system dependencies related to security as
well as the c++ stdlib:

$ ldd kudu-tserver
        linux-vdso.so.1 =>  (0x00007ffe0c290000)
        libz.so.1 => /lib64/libz.so.1 (0x00007fde730d5000)
        libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fde72eab000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde72c8e000)
        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fde729a7000)
        libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fde725bd000)
        libssl.so.10 => /lib64/libssl.so.10 (0x00007fde7234e000)
        libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007fde72131000)
        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
(0x00007fde71ee3000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fde71cde000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fde71ad6000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fde717cd000)
        libm.so.6 => /lib64/libm.so.6 (0x00007fde714ca000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fde712b4000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fde70ef3000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fde732fe000)
        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fde70cc0000)
        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fde70abc000)
        libkrb5support.so.0 => /lib64/libkrb5support.so.0
(0x00007fde708ad000)
        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fde706a8000)
        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fde7048e000)
        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fde70257000)
        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fde7002f000)
        libfreebl3.so => /lib64/libfreebl3.so (0x00007fde6fe2c000)
        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fde6fbca000)

So it's not viable to simply have a linux-x86_64 binary and a darwin-x86_64
binary like protoc does, or even just ubuntu & redhat. We'll likely need a
separate binary for every major OS version, i.e. RHEL 6, RHEL 7, trusty,
xenial, bionic. I think people running non-LTS builds of Ubuntu, or SUSE or
something, would be out of luck.

One potential option would be to offer a completely static build that is
for testing only and with no intent to ever fix security vulnerabilities. I
would have two concerns about that, though: 1) someone could take those
binaries and run them for non-testing purposes, and 2) I'm not sure how
easy it would be to generate a fully static build, since I don't think the
distributions provide static libs for security components in order to
discourage people from doing this.

Mike


On Sat, Jun 30, 2018 at 4:31 AM Tim Robertson <ti...@gmail.com>
wrote:

> > What do you mean by that?
> Sorry, poor phrasing - currently the Beam project has the build path with
> unit tests (no Docker there) and the project IT environment which can use
> Docker.
> A binary only approach could potentially be managed without adding a
> dependency on Docker - but has other issues summarised below.
>
> > For Kudu-internal testing I think we could stick to running "kudu
> minicluster
> Yes.
>
> > ... external use cases, we could switch that to "docker run
> kudu:minicluster:1.7.0"
> I think this makes good sense.
>
>
> In summary:
>
> 1) Fake a Kudu master in Java - difficult unless simplified, not
> representative if simplified, code maintenance issue
> 2) Mocking the Kudu client - verbose unless only covering simple scenarios
> 3) Use mini cluster with binaries - portability challenge of binaries, need
> to script caching the binaries / use of some repository, unfamiliar build
> tasks with binary handling (unless built to work with something like
> maven), possible could see linking problems
> 4) Docker - predictable, adds a dependency, existing Kudu images not
> "managed" at the moment
>
> For Beam I think I will put most effort into IT which can use Docker or an
> existing cluster and then mock a Java KuduClient for some basic sanity
> tests for the build path.
>
> On Docker:
> - to get current versions [e.g. 1] working I found I had to edit
> /etc/hosts. I think the mini cluster version with the FakeDNS might avoid
> that?
> - Kudu docs currently encourage the Cloudera Quickstart VM over Docker
> [2,3]
>
> Do you think the Kudu project could provide an image allowing "docker run
> kudu:minicluster:1.x.x" as part of the release cycle?
>
> Thanks again,
> Tim
>
> [1] https://github.com/MartinWeindel/kudu-docker
> [2] https://kudu.apache.org/docs/quickstart.html#quickstart_vm
> [3] https://github.com/cloudera/kudu-examples/wiki/Docker-based-tutorial
>
> On Sat, Jun 30, 2018 at 2:22 AM, Todd Lipcon <to...@cloudera.com.invalid>
> wrote:
>
> > On Fri, Jun 29, 2018 at 1:23 PM, Tim Robertson <
> timrobertson100@gmail.com>
> > wrote:
> >
> > > Thanks Mike, Todd - I greatly appreciate the inputs.
> > >
> > > > How many platforms would need to be supported for it to be viable for
> > > Beam?
> > > The minimal for it to be considered would probably(!) be ubuntu,
> centos,
> > > osx. Incidentally it was actually the protobuf approach that make me
> > > consider this.
> > >
> > > > What about depending on a docker container than runs the kudu
> > > minicluster in
> > > "host" networking mode?
> > > I've also pondered this a little but like Attila raises it puts a lot
> of
> > > burden for other project developers. Mmmm...
> > >
> >
> > What do you mean by that? For Kudu-internal testing I think we could
> stick
> > to running "kudu minicluster" as is. For external use cases, we could
> > switch that to "docker run kudu:minicluster:1.7.0" or whatever, and it
> > would auto-download from dockerhub as necessary, right?
> >
> >
> > >
> > > Ismaël (Beam PMC) has suggested I stick to mocking given the complexity
> > of
> > > the things I'm exploring.
> > >
> > > As another idea:
> > > I briefly pondered writing a "FakeKudu Java server" - data held in
> > memory,
> > > no partitioning, protobuf messaging, handling table metadata, checking
> > > schemas on write, predicate and projected columns for scan, faking
> > kerberos
> > > (if possible). It didn't seem particularly difficult to do but I fear a
> > > maintenance burden for a small audience.
> > >
> > >
> > Yea, I think that would be quite a maintenance burden, especially as new
> > features are added over time. I suppose in many cases you could omit
> things
> > or stub things out, but then the behavior will begin to differ and it
> won't
> > really be that clear that your tests actually are representative.
> >
> >
> > > Could utilities in Kudu that help folk test Java clients be of interest
> > to
> > > others? - e.g. preconfigured mock objects for various scenarios. If so,
> > I'd
> > > be happy to discuss options and offer PRs in Kudu.
> > >
> > > Thanks,
> > > Tim
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Fri, Jun 29, 2018 at 9:34 PM, Todd Lipcon <todd@cloudera.com.invalid
> >
> > > wrote:
> > >
> > > > On Fri, Jun 29, 2018 at 12:31 PM, Mike Percy <mp...@apache.org>
> > wrote:
> > > >
> > > > > This is something I've been thinking about and toying with and I'd
> > like
> > > > to
> > > > > see if we can't get binaries available via Maven for at least one
> > > > platform
> > > > > (say, RHEL 7). Similar to how protobuf does it.
> > > > >
> > > >
> > > > What about depending on a docker container than runs the kudu
> > minicluster
> > > > in "host" networking mode? eg https://github.com/
> > > MartinWeindel/kudu-docker
> > > > is one possibility
> > > >
> > > >
> > > > > How many platforms would need to be supported for it to be viable
> for
> > > > Beam?
> > > > >
> > > > > Thanks,
> > > > > Mike
> > > > >
> > > > > On Fri, Jun 29, 2018 at 10:01 AM Tim <ti...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Thanks Attila
> > > > > >
> > > > > > That’s great feedback and helpful for me to reference as
> guidance.
> > > > > >
> > > > > > By “Kudu installation” I was referring to the possibility that an
> > > > install
> > > > > > might set config etc, beyond just having the binary. I got it
> > running
> > > > on
> > > > > > CentOS similar to how you outline now.
> > > > > >
> > > > > > I too believe mocking makes most sense, especially as we have the
> > IT
> > > > > > running as well, but was asked to explore this further. It’s
> useful
> > > to
> > > > > know
> > > > > > you’d agree.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > Tim
> > > > > >
> > > > > > > On 29 Jun 2018, at 17:33, Attila Bukor <ab...@cloudera.com>
> > > wrote:
> > > > > > >
> > > > > > > Hi Tim,
> > > > > > >
> > > > > > > I’m not sure what you mean by relying on actual installations.
> If
> > > you
> > > > > > have the kudu, kudu-master and kudu-tserver binaries at the same
> > > > location
> > > > > > and they can be executed, MiniKuduCluster can be used (“binDir”
> > > > property
> > > > > > should be set to the directory containing the Kudu binaries). You
> > > > should
> > > > > > also look into BaseKuduTest as that will set up the
> MiniKuduCluster
> > > for
> > > > > you
> > > > > > and you don’t have to do it manually.
> > > > > > >
> > > > > > > Extracting the Kudu binaries from an rpm should probably work,
> > but
> > > > that
> > > > > > binds you to CDH as currently Cloudera is the only one that ships
> > > Kudu
> > > > > > binaries and MacOS builds are not available anywhere afaik. Also,
> > > 1.4.0
> > > > > is
> > > > > > around a year old, you might want to use this repository instead
> > > (from
> > > > > CDH
> > > > > > 5.13 Kudu is part of the CDH):
> > > > > > http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5/
> > > > > RPMS/x86_64/kudu-1.7.0+cdh5.15.0+0-1.cdh5.15.0.p0.52.el7.x86_64.rpm
> > > > > > >
> > > > > > > As a general suggestion, I would recommend mocking Kudu for
> unit
> > > > tests
> > > > > > (that’s what a unit test is for after all) and create separate
> > > > > integration
> > > > > > tests that actually use Kudu that can be skipped where Kudu is
> not
> > > > > > available. Of course the CI should be set up to be able to
> provide
> > > all
> > > > > > necessary integrations for the tests, but a developer wouldn’t
> have
> > > to
> > > > > set
> > > > > > up Kudu, or use Docker to run the tests if their change doesn’t
> > > affect
> > > > > the
> > > > > > Kudu integration.
> > > > > > >
> > > > > > > Attila
> > > > > > >
> > > > > > >> On 2018. Jun 29., at 16:42, Tim Robertson <
> > > > timrobertson100@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Hi folks,
> > > > > > >>
> > > > > > >> I've written Java KuduIO for Apache Beam with integration
> tests
> > > > making
> > > > > > use
> > > > > > >> of Kudu in Docker.  It is yet to be committed on Apache Beam.
> > > > > > >>
> > > > > > >> Rather than mocking Kudu client for unit tests I'd like to
> > explore
> > > > use
> > > > > > of
> > > > > > >> the MiniKuduCluster which "Depends on precompiled kudu,
> > > kudu-master,
> > > > > and
> > > > > > >> kudu-tserver binaries".
> > > > > > >>
> > > > > > >> I'd need unit tests to run on the main linux distros and OS X.
> > > > > > >>
> > > > > > >> For the linux distros, would an approach where I extract the
> > > > binaries
> > > > > > from
> > > > > > >> the packages [1] work please? Or does the MiniKuduCluster rely
> > on
> > > > > actual
> > > > > > >> installations? I am pretty weak on C builds and linked
> libraries
> > > etc
> > > > > > (Java
> > > > > > >> guy, sorry).
> > > > > > >>
> > > > > > >> For CentOS I'm exploring this for example:
> > > > > > >>  rpm2cpio ./kudu-1.4.0+cdh5.12.2+0-1.
> > > cdh5.12.2.p0.8.el7.x86_64.rpm
> > > > |
> > > > > > cpio
> > > > > > >> -idmv
> > > > > > >>
> > > > > > >> I haven't explored OS X options yet.
> > > > > > >>
> > > > > > >> Any advice here would greatly be appreciated to save me going
> > > down a
> > > > > > dead
> > > > > > >> end.
> > > > > > >>
> > > > > > >> Many thanks,
> > > > > > >> Tim
> > > > > > >>
> > > > > > >>
> > > > > > >> [1] http://kudu.apache.org/docs/installation.html#install_
> > > packages
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>

Re: Binaries for embedded testing

Posted by Tim <ti...@gmail.com>.
Thank you very much Mike for taking the time to explore that.

Tim

> On 2 Jul 2018, at 21:24, Mike Percy <mp...@apache.org> wrote:
> 
> I explored the "download binaries from Maven" approach for a while on
> Friday. Here is what I found:
> 
> 1) There is a Maven plugin that should be able to help us find matching
> system binaries @ https://github.com/trustin/os-maven-plugin
> 
> The protobuf-maven-plugin uses this approach to download and run the
> appropriate protoc binary for your architecture according to
> https://www.xolstice.org/protobuf-maven-plugin/examples/protoc-artifact.html
> 
> 2) Stripped binaries from release builds look small enough to be viable to
> download to run integration tests via Maven in precommit builds, at least
> in non-bandwidth-constrained environments:
> 
> $ strip kudu-master
> $ strip kudu-tserver
> $ ls -alh
> total 85M
> drwxrwxr-x 2 mpercy mpercy  45 Jul  2 12:05 .
> drwxrwxr-x 3 mpercy mpercy  98 Jun 29 14:56 ..
> -rwxrwxr-x 1 mpercy mpercy 45M Jul  2 12:05 kudu-master
> -rwxrwxr-x 1 mpercy mpercy 41M Jul  2 12:05 kudu-tserver
> 
> 3) Kudu binaries contain many system dependencies related to security as
> well as the c++ stdlib:
> 
> $ ldd kudu-tserver
>        linux-vdso.so.1 =>  (0x00007ffe0c290000)
>        libz.so.1 => /lib64/libz.so.1 (0x00007fde730d5000)
>        libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007fde72eab000)
>        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fde72c8e000)
>        libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007fde729a7000)
>        libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fde725bd000)
>        libssl.so.10 => /lib64/libssl.so.10 (0x00007fde7234e000)
>        libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007fde72131000)
>        libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2
> (0x00007fde71ee3000)
>        libdl.so.2 => /lib64/libdl.so.2 (0x00007fde71cde000)
>        librt.so.1 => /lib64/librt.so.1 (0x00007fde71ad6000)
>        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fde717cd000)
>        libm.so.6 => /lib64/libm.so.6 (0x00007fde714ca000)
>        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fde712b4000)
>        libc.so.6 => /lib64/libc.so.6 (0x00007fde70ef3000)
>        /lib64/ld-linux-x86-64.so.2 (0x00007fde732fe000)
>        libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007fde70cc0000)
>        libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007fde70abc000)
>        libkrb5support.so.0 => /lib64/libkrb5support.so.0
> (0x00007fde708ad000)
>        libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007fde706a8000)
>        libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fde7048e000)
>        libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fde70257000)
>        libselinux.so.1 => /lib64/libselinux.so.1 (0x00007fde7002f000)
>        libfreebl3.so => /lib64/libfreebl3.so (0x00007fde6fe2c000)
>        libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fde6fbca000)
> 
> So it's not viable to simply have a linux-x86_64 binary and a darwin-x86_64
> binary like protoc does, or even just ubuntu & redhat. We'll likely need a
> separate binary for every major OS version, i.e. RHEL 6, RHEL 7, trusty,
> xenial, bionic. I think people running non-LTS builds of Ubuntu, or SUSE or
> something, would be out of luck.
> 
> One potential option would be to offer a completely static build that is
> for testing only and with no intent to ever fix security vulnerabilities. I
> would have two concerns about that, though: 1) someone could take those
> binaries and run them for non-testing purposes, and 2) I'm not sure how
> easy it would be to generate a fully static build, since I don't think the
> distributions provide static libs for security components in order to
> discourage people from doing this.
> 
> Mike
> 
> 
> On Sat, Jun 30, 2018 at 4:31 AM Tim Robertson <ti...@gmail.com>
> wrote:
> 
>>> What do you mean by that?
>> Sorry, poor phrasing - currently the Beam project has the build path with
>> unit tests (no Docker there) and the project IT environment which can use
>> Docker.
>> A binary only approach could potentially be managed without adding a
>> dependency on Docker - but has other issues summarised below.
>> 
>>> For Kudu-internal testing I think we could stick to running "kudu
>> minicluster
>> Yes.
>> 
>>> ... external use cases, we could switch that to "docker run
>> kudu:minicluster:1.7.0"
>> I think this makes good sense.
>> 
>> 
>> In summary:
>> 
>> 1) Fake a Kudu master in Java - difficult unless simplified, not
>> representative if simplified, code maintenance issue
>> 2) Mocking the Kudu client - verbose unless only covering simple scenarios
>> 3) Use mini cluster with binaries - portability challenge of binaries, need
>> to script caching the binaries / use of some repository, unfamiliar build
>> tasks with binary handling (unless built to work with something like
>> maven), possible could see linking problems
>> 4) Docker - predictable, adds a dependency, existing Kudu images not
>> "managed" at the moment
>> 
>> For Beam I think I will put most effort into IT which can use Docker or an
>> existing cluster and then mock a Java KuduClient for some basic sanity
>> tests for the build path.
>> 
>> On Docker:
>> - to get current versions [e.g. 1] working I found I had to edit
>> /etc/hosts. I think the mini cluster version with the FakeDNS might avoid
>> that?
>> - Kudu docs currently encourage the Cloudera Quickstart VM over Docker
>> [2,3]
>> 
>> Do you think the Kudu project could provide an image allowing "docker run
>> kudu:minicluster:1.x.x" as part of the release cycle?
>> 
>> Thanks again,
>> Tim
>> 
>> [1] https://github.com/MartinWeindel/kudu-docker
>> [2] https://kudu.apache.org/docs/quickstart.html#quickstart_vm
>> [3] https://github.com/cloudera/kudu-examples/wiki/Docker-based-tutorial
>> 
>> On Sat, Jun 30, 2018 at 2:22 AM, Todd Lipcon <to...@cloudera.com.invalid>
>> wrote:
>> 
>>> On Fri, Jun 29, 2018 at 1:23 PM, Tim Robertson <
>> timrobertson100@gmail.com>
>>> wrote:
>>> 
>>>> Thanks Mike, Todd - I greatly appreciate the inputs.
>>>> 
>>>>> How many platforms would need to be supported for it to be viable for
>>>> Beam?
>>>> The minimal for it to be considered would probably(!) be ubuntu,
>> centos,
>>>> osx. Incidentally it was actually the protobuf approach that make me
>>>> consider this.
>>>> 
>>>>> What about depending on a docker container than runs the kudu
>>>> minicluster in
>>>> "host" networking mode?
>>>> I've also pondered this a little but like Attila raises it puts a lot
>> of
>>>> burden for other project developers. Mmmm...
>>>> 
>>> 
>>> What do you mean by that? For Kudu-internal testing I think we could
>> stick
>>> to running "kudu minicluster" as is. For external use cases, we could
>>> switch that to "docker run kudu:minicluster:1.7.0" or whatever, and it
>>> would auto-download from dockerhub as necessary, right?
>>> 
>>> 
>>>> 
>>>> Ismaël (Beam PMC) has suggested I stick to mocking given the complexity
>>> of
>>>> the things I'm exploring.
>>>> 
>>>> As another idea:
>>>> I briefly pondered writing a "FakeKudu Java server" - data held in
>>> memory,
>>>> no partitioning, protobuf messaging, handling table metadata, checking
>>>> schemas on write, predicate and projected columns for scan, faking
>>> kerberos
>>>> (if possible). It didn't seem particularly difficult to do but I fear a
>>>> maintenance burden for a small audience.
>>>> 
>>>> 
>>> Yea, I think that would be quite a maintenance burden, especially as new
>>> features are added over time. I suppose in many cases you could omit
>> things
>>> or stub things out, but then the behavior will begin to differ and it
>> won't
>>> really be that clear that your tests actually are representative.
>>> 
>>> 
>>>> Could utilities in Kudu that help folk test Java clients be of interest
>>> to
>>>> others? - e.g. preconfigured mock objects for various scenarios. If so,
>>> I'd
>>>> be happy to discuss options and offer PRs in Kudu.
>>>> 
>>>> Thanks,
>>>> Tim
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jun 29, 2018 at 9:34 PM, Todd Lipcon <todd@cloudera.com.invalid
>>> 
>>>> wrote:
>>>> 
>>>>> On Fri, Jun 29, 2018 at 12:31 PM, Mike Percy <mp...@apache.org>
>>> wrote:
>>>>> 
>>>>>> This is something I've been thinking about and toying with and I'd
>>> like
>>>>> to
>>>>>> see if we can't get binaries available via Maven for at least one
>>>>> platform
>>>>>> (say, RHEL 7). Similar to how protobuf does it.
>>>>>> 
>>>>> 
>>>>> What about depending on a docker container than runs the kudu
>>> minicluster
>>>>> in "host" networking mode? eg https://github.com/
>>>> MartinWeindel/kudu-docker
>>>>> is one possibility
>>>>> 
>>>>> 
>>>>>> How many platforms would need to be supported for it to be viable
>> for
>>>>> Beam?
>>>>>> 
>>>>>> Thanks,
>>>>>> Mike
>>>>>> 
>>>>>> On Fri, Jun 29, 2018 at 10:01 AM Tim <ti...@gmail.com>
>>>> wrote:
>>>>>> 
>>>>>>> Thanks Attila
>>>>>>> 
>>>>>>> That’s great feedback and helpful for me to reference as
>> guidance.
>>>>>>> 
>>>>>>> By “Kudu installation” I was referring to the possibility that an
>>>>> install
>>>>>>> might set config etc, beyond just having the binary. I got it
>>> running
>>>>> on
>>>>>>> CentOS similar to how you outline now.
>>>>>>> 
>>>>>>> I too believe mocking makes most sense, especially as we have the
>>> IT
>>>>>>> running as well, but was asked to explore this further. It’s
>> useful
>>>> to
>>>>>> know
>>>>>>> you’d agree.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> 
>>>>>>> Tim
>>>>>>> 
>>>>>>>> On 29 Jun 2018, at 17:33, Attila Bukor <ab...@cloudera.com>
>>>> wrote:
>>>>>>>> 
>>>>>>>> Hi Tim,
>>>>>>>> 
>>>>>>>> I’m not sure what you mean by relying on actual installations.
>> If
>>>> you
>>>>>>> have the kudu, kudu-master and kudu-tserver binaries at the same
>>>>> location
>>>>>>> and they can be executed, MiniKuduCluster can be used (“binDir”
>>>>> property
>>>>>>> should be set to the directory containing the Kudu binaries). You
>>>>> should
>>>>>>> also look into BaseKuduTest as that will set up the
>> MiniKuduCluster
>>>> for
>>>>>> you
>>>>>>> and you don’t have to do it manually.
>>>>>>>> 
>>>>>>>> Extracting the Kudu binaries from an rpm should probably work,
>>> but
>>>>> that
>>>>>>> binds you to CDH as currently Cloudera is the only one that ships
>>>> Kudu
>>>>>>> binaries and MacOS builds are not available anywhere afaik. Also,
>>>> 1.4.0
>>>>>> is
>>>>>>> around a year old, you might want to use this repository instead
>>>> (from
>>>>>> CDH
>>>>>>> 5.13 Kudu is part of the CDH):
>>>>>>> http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5/
>>>>>> RPMS/x86_64/kudu-1.7.0+cdh5.15.0+0-1.cdh5.15.0.p0.52.el7.x86_64.rpm
>>>>>>>> 
>>>>>>>> As a general suggestion, I would recommend mocking Kudu for
>> unit
>>>>> tests
>>>>>>> (that’s what a unit test is for after all) and create separate
>>>>>> integration
>>>>>>> tests that actually use Kudu that can be skipped where Kudu is
>> not
>>>>>>> available. Of course the CI should be set up to be able to
>> provide
>>>> all
>>>>>>> necessary integrations for the tests, but a developer wouldn’t
>> have
>>>> to
>>>>>> set
>>>>>>> up Kudu, or use Docker to run the tests if their change doesn’t
>>>> affect
>>>>>> the
>>>>>>> Kudu integration.
>>>>>>>> 
>>>>>>>> Attila
>>>>>>>> 
>>>>>>>>> On 2018. Jun 29., at 16:42, Tim Robertson <
>>>>> timrobertson100@gmail.com>
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Hi folks,
>>>>>>>>> 
>>>>>>>>> I've written Java KuduIO for Apache Beam with integration
>> tests
>>>>> making
>>>>>>> use
>>>>>>>>> of Kudu in Docker.  It is yet to be committed on Apache Beam.
>>>>>>>>> 
>>>>>>>>> Rather than mocking Kudu client for unit tests I'd like to
>>> explore
>>>>> use
>>>>>>> of
>>>>>>>>> the MiniKuduCluster which "Depends on precompiled kudu,
>>>> kudu-master,
>>>>>> and
>>>>>>>>> kudu-tserver binaries".
>>>>>>>>> 
>>>>>>>>> I'd need unit tests to run on the main linux distros and OS X.
>>>>>>>>> 
>>>>>>>>> For the linux distros, would an approach where I extract the
>>>>> binaries
>>>>>>> from
>>>>>>>>> the packages [1] work please? Or does the MiniKuduCluster rely
>>> on
>>>>>> actual
>>>>>>>>> installations? I am pretty weak on C builds and linked
>> libraries
>>>> etc
>>>>>>> (Java
>>>>>>>>> guy, sorry).
>>>>>>>>> 
>>>>>>>>> For CentOS I'm exploring this for example:
>>>>>>>>> rpm2cpio ./kudu-1.4.0+cdh5.12.2+0-1.
>>>> cdh5.12.2.p0.8.el7.x86_64.rpm
>>>>> |
>>>>>>> cpio
>>>>>>>>> -idmv
>>>>>>>>> 
>>>>>>>>> I haven't explored OS X options yet.
>>>>>>>>> 
>>>>>>>>> Any advice here would greatly be appreciated to save me going
>>>> down a
>>>>>>> dead
>>>>>>>>> end.
>>>>>>>>> 
>>>>>>>>> Many thanks,
>>>>>>>>> Tim
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> [1] http://kudu.apache.org/docs/installation.html#install_
>>>> packages
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Todd Lipcon
>>>>> Software Engineer, Cloudera
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>> 
>> 

Re: Binaries for embedded testing

Posted by Mike Percy <mp...@apache.org>.
I wanted to post a follow-up here with an update from me on current
progress and propose some next steps for publishing / using relocatable
artifacts in the mini cluster so we can collaborate on completing this.

1. The patch to build the artifacts is in review and I need to finish
incorporating review feedback and publish a new revision at
https://gerrit.cloudera.org/c/11377/
2. I wrote a proof-of-concept to find something like the above artifact in
the Java classpath and unpack it to the current directory, which I posted
to GitHub at https://github.com/mpercy/resource-jar-tools

Some of the issues I noticed when implementing the POC in #2 are:

1. How to identify the specific artifact we are looking for without the
potential for some accidental conflict with another JAR
2. How to choose between them if we find multiple artifacts on the
classpath or we end up with multiple operating system artifacts / the wrong
os artifact on the classpath
3. How to make this extensible in case we want to be able to handle
multiple versions on the classpath in the future, i.e. for upgrade / compat
testing purposes

Based on the above experiment in #2 I think we should make a couple changes
to the format produced by #1 in order to make the interface between #2 and
#1 more extensible and explicit.

So I was thinking that the format of the archive can be very similar to a
typical binary release tarball layout plus a top-level META-INF directory
with a properties file in the META-INF directory with a specific file name.
The properties file will be what we look for when searching the classpath.

JAR file layout:

META-INF/apache-kudu-test-binary.properties
apache-kudu-test-binary-1.8.0/bin/kudu-tserver
apache-kudu-test-binary-1.8.0/lib/libkrpc.so
apache-kudu-test-binary-1.8.0/lib/libz.so.1
apache-kudu-test-binary-1.8.0/lib/libcrypto.so.1.1
...

The META-INF/apache-kudu-test-binary.properties file would contain
everything the unpacking code needs to know about where to find the files
and would be extensible in case we want to support fancier features in the
future.

Example properties file contents:

$ cat META-INF/apache-kudu-test-binary.properties
format.version=1
artifact.version=1.8.0
artifact.prefix=apache-kudu-test-binary-1.8.0
artifact.os=linux
artifact.arch=x86_64

So the semantics of the above would be:
 - format.version is the format of this artifact, in case we need to extend
it later and be able to tell the difference between them
 - artifact.version is the version of the release (or snapshot, i.e.
1.9.0-SNAPSHOT) in case we want to support multiple versions later or want
the client to be able to request a specific version
 - artifact.prefix is the directory name at the root of the JAR under which
the relevant files can be found (think ./configure --prefix equivalent when
building autoconf software) -- we unpack from here
 - artifact.os is the operating system and we will use the same convention
as https://github.com/trustin/os-maven-plugin for ${os.detected.name} which
practically speaking for Kudu, just means { linux | osx }.
 - artifact.arch is the target system architecture and again we will use
the same convention as the above os-maven-plugin for ${os.detected.arch}
which, practically speaking for Kudu, means this will always be set
to x86_64.

Regarding the software to find the JAR on the classpath and unpack it,
based on the above format we could use the prototype code at
https://github.com/mpercy/resource-jar-tools to unpack it by running
something like this:

$ java -cp
target/ResourceJarTools-1.0-SNAPSHOT.jar:$HOME/apache-kudu-test-binary-1.7.0.jar
org.apache.kudu.ListJarFromResource
META-INF/apache-kudu-test-binary.properties

$ java -cp
target/ResourceJarTools-1.0-SNAPSHOT.jar:$HOME/apache-kudu-test-binary-1.7.0.jar
org.apache.kudu.UnpackJarFromResource
META-INF/apache-kudu-test-binary.properties .

... assuming we were in the root dir of that project and had a test binary
jar in $HOME. But that code will simply unpack the whole jar into the
current directory and I think what we want is to unpack everything under
${artifact.prefix} to whatever temp dir we create for starting the
minicluster.

In terms of the API, Grant and I were musing on this in #community-discuss
a week or two ago and we sort of landed on the following simple API (I took
a little editorial license with the method names here):

/** Returns true if there is a test binary jar on the classpath */
public static bool isKuduTestBinaryJarOnClasspath();

/** Unpack the first located Kudu test binary JAR found on the classpath
into destDir. Throws if none found or unpacking fails. */
public static void unpackFirstTestKuduBinary(String destDir);

In the future, this could be extended to list available versions, unpack
specific versions, etc, but that seems unnecessary for an initial
implementation.

Finally, in terms of how this works at a high level, so far we thought that
this would unpack the files to a temporary directory and automatically
delete all of the the unpacked files and directories on JVM exit. We would
also have to put in a bit of plumbing to decide when to use this
classpath-searching approach (good for projects to integrate with) vs.
using a location determined in advance (like the current Kudu unit tests
do, and should continue to do).

So how does the test binary JAR artifact end up on the classpath in the
first place? Any project that needs that would just add it as a test
dependency to their build system and Maven / Gradle should automatically
take care of that part, so nothing special there aside from some additional
work at release time.

Please let me know if I left anything out. Let's discuss further here in
this thread or on Slack.

Mike

On Wed, Oct 24, 2018 at 2:15 PM Grant Henke <gh...@cloudera.com.invalid>
wrote:

> I wanted to provide a detailed update on the public testing utilities work
> that we have been doing since this email thread started to hopefully
> continue the discussion and motivate wrapping things up.
>
> Recently I restructured the binary location logic to make it a bit more
> friendly to external projects. We also adjusted our testing base class to
> be a rule so it could be composed into other tests. Then shortly after
> branching Kudu 1.8 we broke out the testing utilities into their own jar.
> The idea, is that Kudu 1.9 would contain some public testing artifacts that
> other projects could use.
>
> Some of those relevant commits are:
>
>    - [test] Clean up MiniKuduCluster and BaseKuduTest
>    <
> https://github.com/apache/kudu/commit/fd1ffd0fb65e138f1f015a55aa96ae870c1d51cd
> >
>    - [test] Adjust Kudu binary locator logic.
>    <
> https://github.com/apache/kudu/commit/34e88d3dafc421ccaabae5767aad2fd9fa015d39
> >
>    - [test] Move BaseKuduTest to a Junit Rule
>    <
> https://github.com/apache/kudu/commit/dc8ae79961f71b8bdc344781fc89d38d94152fc4
> >
>    - KUDU-2411: (Part 1) Break out existing test utilities into a seperate
>    module
>    <
> https://github.com/apache/kudu/commit/15f1416f67dcb714842d02647a1f2e06e675660d
> >
>    - [java] Allow command line override of kuduBinDir
>    <
> https://github.com/apache/kudu/commit/785490ce509e68029f8062882bb7021895e3b446
> >
>
> I tested Mike's patch (here <https://gerrit.cloudera.org/#/c/11377/>) and
> the generated artifact using -DkuduBinDir and things look like they will
> work. It seams like we need to find a way to package this stuff up, deploy
> it, and then write a utility class to locate it on the classpath and call
> it.
>
> Recently we have been looking at the way protoc is built and published
> <https://github.com/protocolbuffers/protobuf/tree/master/protoc-artifact>
> for inspiration and to see how they handle OSX. Tim's project
> <https://github.com/timrobertson100/kudu-test-server> is a good example of
> finding it on the classpath and executing it.
>
> Combining those 2 things we should have something useable.
>
> Any feedback, tips, suggestion or collaboration is more than welcome.
>
> Thanks,
> Grant
>
> On Sat, Aug 18, 2018 at 5:40 AM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > I've updated the GH to reflect the idea outlined above Mike
> >   https://github.com/timrobertson100/kudu-test-server
> >
> > - code is fairly hacky
> > - deletes extracted binaries (FYI: deleteOnExit() will not delete dirs
> > unless empty)
> > - uses maven classifier (only linux exists)
> >
> > Can you PTAL Mike when you get a chance? - no rush at all
> >
> >
> >
> > On Thu, Aug 9, 2018 at 6:07 PM Tim <ti...@gmail.com> wrote:
> >
> > > Thanks Mike - no apologies needed at all
> > >
> > > I’ll aim to reshape the GH repo I did to illustrate what we outlined,
> > with
> > > an example of use.
> > >
> > > Outstanding is some OS X binaries if anyone has some time?
> > >
> > > Have a good trip.
> > >
> > > Tim,
> > > Sent from my iPhone
> > >
> > > > On 9 Aug 2018, at 14:05, Mike Percy <mp...@apache.org> wrote:
> > > >
> > > > Hi Tim,
> > > > Sorry for the delay in responding, I've been trying to get back to
> > this.
> > > > You make some great points. If we can forego Maven/Gradle plugins, we
> > can
> > > > do this with a lot less work. Initially, I was concerned about
> > unpacking
> > > > the bits potentially more than strictly necessary, but it seems
> likely
> > > that
> > > > the CPU / IO required to do the unpacking would probably not be
> > > noticeable
> > > > in the grand scheme of running tests against a Kudu MiniCluster,
> > > especially
> > > > if it's only done once per test file (using @BeforeClass or similar).
> > > Also,
> > > > as long as there is some way to clean up the files that were unpacked
> > and
> > > > avoid potentially filling up a temp dir then nobody should have a
> > problem
> > > > with this approach... perhaps we can register a JVM shutdown hook and
> > > also
> > > > provide some kind of teardown() method so people can ensure the files
> > get
> > > > cleaned up as appropriate (I saw the TODO in your test code for this;
> > > > protoc uses File.deleteOnExit()).
> > > >
> > > > Regarding creating the binary artifacts, I don't think it's too big
> of
> > a
> > > > burden to ask the release manager to upload either a RHEL 6 or macOS
> > > binary
> > > > test artifact based on the output of a script, and ask the PMC for
> help
> > > to
> > > > get a binary for the other platform.
> > > >
> > > > I'm out of town for the next couple of weeks but once I get back I'll
> > see
> > > > if I can push this forward a bit more.
> > > >
> > > > Thanks!
> > > > Mike
> > > >
> > > >
> > > > On Thu, Aug 2, 2018 at 10:57 PM Tim Robertson <
> > timrobertson100@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Thanks to you for making this test possible Mike.
> > > >>
> > > >> I was approaching this emulating protoc where they put the binaries
> as
> > > >> artifacts for Maven [1].
> > > >> A slight difference is that protoc is a single file while your
> tarball
> > > has
> > > >> several. I notice that the protoc build plugin for maven also copies
> > out
> > > >> from the jar to a local filesystem [2] so I just copied that
> approach.
> > > >>
> > > >> What I could imagine is we have a jar with the binaries and a single
> > > class
> > > >> (say EmbeddedKudu) that copies the binaries into a temporary
> directory
> > > (as
> > > >> my hack in GH).
> > > >>
> > > >> A user would then do the following:
> > > >>
> > > >> <!-- finds the environment -->
> > > >> <build>
> > > >>    <extensions>
> > > >>        <extension>
> > > >>            <groupId>kr.motd.maven</groupId>
> > > >>            <artifactId>os-maven-plugin</artifactId>
> > > >>            <version>1.6.0</version>
> > > >>        </extension>
> > > >>    </extensions>
> > > >> </build>
> > > >> ....
> > > >> <!-- Adds mini cluster -->
> > > >> <dependency>
> > > >>    <groupId>org.apache.kudu</groupId>
> > > >>    <artifactId>kudu-client</artifactId>
> > > >>    <version>1.7.0</version>
> > > >>    <classifier>tests</classifier>
> > > >> </dependency>
> > > >> <!-- Adds an embedded Kudu -->
> > > >> <dependency>
> > > >>    <groupId>org.apache.kudu</groupId>
> > > >>    <artifactId>kudu-test-server</artifactId>
> > > >>    <version>1.7.0</version>
> > > >>    <classifier>${os.detected.classifier}</classifier>
> > > >> </dependency>
> > > >>
> > > >>
> > > >> The detect classifier stuff is copied from protoc, but perhaps we'd
> > just
> > > >> state that available options are (?) linux / OSX and ignore
> > > autodetection.
> > > >>
> > > >> And in code users would do something like:
> > > >>
> > > >> EmbeddedKudu.prepare(); // copies kudu from the jar and sets the
> > > >> system variable (as per my example)
> > > >> MiniKuduCluster miniCluster =
> > > >>    new
> > > >>
> > >
> >
> MiniKuduCluster.MiniKuduClusterBuilder().numMasters(1).numTservers(1).build();
> > > >>
> > > >>
> > > >> What this would mean:
> > > >>
> > > >> - no change to the existing Kudu code packkaging
> > > >> - leverage caching of the binaries as any Java artifact
> > > >> - download would be at build time not test runtime - only copying
> out
> > > from
> > > >> the jar each time
> > > >> - simple to include in most build environments (not tied to maven
> as a
> > > >> plugin)
> > > >>
> > > >> I can't comment if it makes sense to do this WRT the binaries though
> > > and it
> > > >> would mean building and releasing binaries into a jar on each Kudu
> > > release
> > > >> (as protoc does).
> > > >>
> > > >> WDYT?
> > > >>
> > > >> Thanks,
> > > >> Tim
> > > >>
> > > >> [1] http://repo1.maven.org/maven2/com/google/protobuf/protoc/3.6.1/
> > > >> [2]
> > > >>
> > > >>
> > >
> >
> https://github.com/os72/protoc-jar-maven-plugin/blob/master/src/main/java/com/github/os72/protocjar/maven/ProtocJarMojo.java#L664
> > > >>
> > > >>> On Thu, Aug 2, 2018 at 10:28 PM, Mike Percy <mp...@apache.org>
> > wrote:
> > > >>>
> > > >>> Ha, neat, thanks for posting this, Tim. It's a nice proof of
> concept.
> > > >>>
> > > >>> I was imagining that we would try to implement the downloading part
> > as
> > > a
> > > >>> Maven plugin, but maybe it could work to try to download the
> > artifacts
> > > at
> > > >>> runtime with a JUnit test. Do you think we could cache the
> artifacts
> > > >>> somewhere, maybe in the Maven repo somehow, so we don't have to
> > > download
> > > >>> the artifact every time we want to use it? I was hoping to simply
> be
> > > able
> > > >>> to ship a tarball or a jar of the binaries separately from the test
> > > >>> framework code.
> > > >>>
> > > >>> Mike
> > > >>>
> > > >>> On Thu, Aug 2, 2018 at 8:42 AM Tim Robertson <
> > > timrobertson100@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi folks
> > > >>>>
> > > >>>> I've not had too much time, but I threw this together using Mike's
> > > >>>> binaries:
> > > >>>>  https://github.com/timrobertson100/kudu-test-server
> > > >>>>
> > > >>>> I think this shows that running a mini cluster is possible using
> the
> > > >>>> binaries Mike prepared when they are included in a jar (on CentOS
> > 7.4
> > > >> at
> > > >>>> least).
> > > >>>>
> > > >>>> Please don't flame me for the code - it was a rush job - but
> perhaps
> > > >> you
> > > >>>> could verify it works for you Mike?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org>
> > > >> wrote:
> > > >>>>
> > > >>>>> On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <
> > > >>> timrobertson100@gmail.com
> > > >>>>>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>>> I'm pretty busy this week and would be happy to get some help
> on
> > > >>> this
> > > >>>>> if
> > > >>>>>> anybody has cycles to spare.
> > > >>>>>>
> > > >>>>>> Thanks Mike - I'll look into 3) and if we get it working I'm
> happy
> > > >> to
> > > >>>>> offer
> > > >>>>>> a PR for 4).
> > > >>>>>> This is an evening project for me so I might be a little slow
> too
> > -
> > > >>> if
> > > >>>>>> someone else is keen and has time please feel free to jump in.
> > > >>>>>>
> > > >>>>>
> > > >>>>> Sounds great! I think Grant started working on #4 but I don't
> know
> > > >> how
> > > >>>> far
> > > >>>>> he got.
> > > >>>>>
> > > >>>>> Mike
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
>
>
> --
> Grant Henke
> Software Engineer | Cloudera
> grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke
>

Re: Binaries for embedded testing

Posted by Grant Henke <gh...@cloudera.com.INVALID>.
I wanted to provide a detailed update on the public testing utilities work
that we have been doing since this email thread started to hopefully
continue the discussion and motivate wrapping things up.

Recently I restructured the binary location logic to make it a bit more
friendly to external projects. We also adjusted our testing base class to
be a rule so it could be composed into other tests. Then shortly after
branching Kudu 1.8 we broke out the testing utilities into their own jar.
The idea, is that Kudu 1.9 would contain some public testing artifacts that
other projects could use.

Some of those relevant commits are:

   - [test] Clean up MiniKuduCluster and BaseKuduTest
   <https://github.com/apache/kudu/commit/fd1ffd0fb65e138f1f015a55aa96ae870c1d51cd>
   - [test] Adjust Kudu binary locator logic.
   <https://github.com/apache/kudu/commit/34e88d3dafc421ccaabae5767aad2fd9fa015d39>
   - [test] Move BaseKuduTest to a Junit Rule
   <https://github.com/apache/kudu/commit/dc8ae79961f71b8bdc344781fc89d38d94152fc4>
   - KUDU-2411: (Part 1) Break out existing test utilities into a seperate
   module
   <https://github.com/apache/kudu/commit/15f1416f67dcb714842d02647a1f2e06e675660d>
   - [java] Allow command line override of kuduBinDir
   <https://github.com/apache/kudu/commit/785490ce509e68029f8062882bb7021895e3b446>

I tested Mike's patch (here <https://gerrit.cloudera.org/#/c/11377/>) and
the generated artifact using -DkuduBinDir and things look like they will
work. It seams like we need to find a way to package this stuff up, deploy
it, and then write a utility class to locate it on the classpath and call
it.

Recently we have been looking at the way protoc is built and published
<https://github.com/protocolbuffers/protobuf/tree/master/protoc-artifact>
for inspiration and to see how they handle OSX. Tim's project
<https://github.com/timrobertson100/kudu-test-server> is a good example of
finding it on the classpath and executing it.

Combining those 2 things we should have something useable.

Any feedback, tips, suggestion or collaboration is more than welcome.

Thanks,
Grant

On Sat, Aug 18, 2018 at 5:40 AM Tim Robertson <ti...@gmail.com>
wrote:

> I've updated the GH to reflect the idea outlined above Mike
>   https://github.com/timrobertson100/kudu-test-server
>
> - code is fairly hacky
> - deletes extracted binaries (FYI: deleteOnExit() will not delete dirs
> unless empty)
> - uses maven classifier (only linux exists)
>
> Can you PTAL Mike when you get a chance? - no rush at all
>
>
>
> On Thu, Aug 9, 2018 at 6:07 PM Tim <ti...@gmail.com> wrote:
>
> > Thanks Mike - no apologies needed at all
> >
> > I’ll aim to reshape the GH repo I did to illustrate what we outlined,
> with
> > an example of use.
> >
> > Outstanding is some OS X binaries if anyone has some time?
> >
> > Have a good trip.
> >
> > Tim,
> > Sent from my iPhone
> >
> > > On 9 Aug 2018, at 14:05, Mike Percy <mp...@apache.org> wrote:
> > >
> > > Hi Tim,
> > > Sorry for the delay in responding, I've been trying to get back to
> this.
> > > You make some great points. If we can forego Maven/Gradle plugins, we
> can
> > > do this with a lot less work. Initially, I was concerned about
> unpacking
> > > the bits potentially more than strictly necessary, but it seems likely
> > that
> > > the CPU / IO required to do the unpacking would probably not be
> > noticeable
> > > in the grand scheme of running tests against a Kudu MiniCluster,
> > especially
> > > if it's only done once per test file (using @BeforeClass or similar).
> > Also,
> > > as long as there is some way to clean up the files that were unpacked
> and
> > > avoid potentially filling up a temp dir then nobody should have a
> problem
> > > with this approach... perhaps we can register a JVM shutdown hook and
> > also
> > > provide some kind of teardown() method so people can ensure the files
> get
> > > cleaned up as appropriate (I saw the TODO in your test code for this;
> > > protoc uses File.deleteOnExit()).
> > >
> > > Regarding creating the binary artifacts, I don't think it's too big of
> a
> > > burden to ask the release manager to upload either a RHEL 6 or macOS
> > binary
> > > test artifact based on the output of a script, and ask the PMC for help
> > to
> > > get a binary for the other platform.
> > >
> > > I'm out of town for the next couple of weeks but once I get back I'll
> see
> > > if I can push this forward a bit more.
> > >
> > > Thanks!
> > > Mike
> > >
> > >
> > > On Thu, Aug 2, 2018 at 10:57 PM Tim Robertson <
> timrobertson100@gmail.com
> > >
> > > wrote:
> > >
> > >> Thanks to you for making this test possible Mike.
> > >>
> > >> I was approaching this emulating protoc where they put the binaries as
> > >> artifacts for Maven [1].
> > >> A slight difference is that protoc is a single file while your tarball
> > has
> > >> several. I notice that the protoc build plugin for maven also copies
> out
> > >> from the jar to a local filesystem [2] so I just copied that approach.
> > >>
> > >> What I could imagine is we have a jar with the binaries and a single
> > class
> > >> (say EmbeddedKudu) that copies the binaries into a temporary directory
> > (as
> > >> my hack in GH).
> > >>
> > >> A user would then do the following:
> > >>
> > >> <!-- finds the environment -->
> > >> <build>
> > >>    <extensions>
> > >>        <extension>
> > >>            <groupId>kr.motd.maven</groupId>
> > >>            <artifactId>os-maven-plugin</artifactId>
> > >>            <version>1.6.0</version>
> > >>        </extension>
> > >>    </extensions>
> > >> </build>
> > >> ....
> > >> <!-- Adds mini cluster -->
> > >> <dependency>
> > >>    <groupId>org.apache.kudu</groupId>
> > >>    <artifactId>kudu-client</artifactId>
> > >>    <version>1.7.0</version>
> > >>    <classifier>tests</classifier>
> > >> </dependency>
> > >> <!-- Adds an embedded Kudu -->
> > >> <dependency>
> > >>    <groupId>org.apache.kudu</groupId>
> > >>    <artifactId>kudu-test-server</artifactId>
> > >>    <version>1.7.0</version>
> > >>    <classifier>${os.detected.classifier}</classifier>
> > >> </dependency>
> > >>
> > >>
> > >> The detect classifier stuff is copied from protoc, but perhaps we'd
> just
> > >> state that available options are (?) linux / OSX and ignore
> > autodetection.
> > >>
> > >> And in code users would do something like:
> > >>
> > >> EmbeddedKudu.prepare(); // copies kudu from the jar and sets the
> > >> system variable (as per my example)
> > >> MiniKuduCluster miniCluster =
> > >>    new
> > >>
> >
> MiniKuduCluster.MiniKuduClusterBuilder().numMasters(1).numTservers(1).build();
> > >>
> > >>
> > >> What this would mean:
> > >>
> > >> - no change to the existing Kudu code packkaging
> > >> - leverage caching of the binaries as any Java artifact
> > >> - download would be at build time not test runtime - only copying out
> > from
> > >> the jar each time
> > >> - simple to include in most build environments (not tied to maven as a
> > >> plugin)
> > >>
> > >> I can't comment if it makes sense to do this WRT the binaries though
> > and it
> > >> would mean building and releasing binaries into a jar on each Kudu
> > release
> > >> (as protoc does).
> > >>
> > >> WDYT?
> > >>
> > >> Thanks,
> > >> Tim
> > >>
> > >> [1] http://repo1.maven.org/maven2/com/google/protobuf/protoc/3.6.1/
> > >> [2]
> > >>
> > >>
> >
> https://github.com/os72/protoc-jar-maven-plugin/blob/master/src/main/java/com/github/os72/protocjar/maven/ProtocJarMojo.java#L664
> > >>
> > >>> On Thu, Aug 2, 2018 at 10:28 PM, Mike Percy <mp...@apache.org>
> wrote:
> > >>>
> > >>> Ha, neat, thanks for posting this, Tim. It's a nice proof of concept.
> > >>>
> > >>> I was imagining that we would try to implement the downloading part
> as
> > a
> > >>> Maven plugin, but maybe it could work to try to download the
> artifacts
> > at
> > >>> runtime with a JUnit test. Do you think we could cache the artifacts
> > >>> somewhere, maybe in the Maven repo somehow, so we don't have to
> > download
> > >>> the artifact every time we want to use it? I was hoping to simply be
> > able
> > >>> to ship a tarball or a jar of the binaries separately from the test
> > >>> framework code.
> > >>>
> > >>> Mike
> > >>>
> > >>> On Thu, Aug 2, 2018 at 8:42 AM Tim Robertson <
> > timrobertson100@gmail.com>
> > >>> wrote:
> > >>>
> > >>>> Hi folks
> > >>>>
> > >>>> I've not had too much time, but I threw this together using Mike's
> > >>>> binaries:
> > >>>>  https://github.com/timrobertson100/kudu-test-server
> > >>>>
> > >>>> I think this shows that running a mini cluster is possible using the
> > >>>> binaries Mike prepared when they are included in a jar (on CentOS
> 7.4
> > >> at
> > >>>> least).
> > >>>>
> > >>>> Please don't flame me for the code - it was a rush job - but perhaps
> > >> you
> > >>>> could verify it works for you Mike?
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org>
> > >> wrote:
> > >>>>
> > >>>>> On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <
> > >>> timrobertson100@gmail.com
> > >>>>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>>> I'm pretty busy this week and would be happy to get some help on
> > >>> this
> > >>>>> if
> > >>>>>> anybody has cycles to spare.
> > >>>>>>
> > >>>>>> Thanks Mike - I'll look into 3) and if we get it working I'm happy
> > >> to
> > >>>>> offer
> > >>>>>> a PR for 4).
> > >>>>>> This is an evening project for me so I might be a little slow too
> -
> > >>> if
> > >>>>>> someone else is keen and has time please feel free to jump in.
> > >>>>>>
> > >>>>>
> > >>>>> Sounds great! I think Grant started working on #4 but I don't know
> > >> how
> > >>>> far
> > >>>>> he got.
> > >>>>>
> > >>>>> Mike
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
>


-- 
Grant Henke
Software Engineer | Cloudera
grant@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke

Re: Binaries for embedded testing

Posted by Tim Robertson <ti...@gmail.com>.
I've updated the GH to reflect the idea outlined above Mike
  https://github.com/timrobertson100/kudu-test-server

- code is fairly hacky
- deletes extracted binaries (FYI: deleteOnExit() will not delete dirs
unless empty)
- uses maven classifier (only linux exists)

Can you PTAL Mike when you get a chance? - no rush at all



On Thu, Aug 9, 2018 at 6:07 PM Tim <ti...@gmail.com> wrote:

> Thanks Mike - no apologies needed at all
>
> I’ll aim to reshape the GH repo I did to illustrate what we outlined, with
> an example of use.
>
> Outstanding is some OS X binaries if anyone has some time?
>
> Have a good trip.
>
> Tim,
> Sent from my iPhone
>
> > On 9 Aug 2018, at 14:05, Mike Percy <mp...@apache.org> wrote:
> >
> > Hi Tim,
> > Sorry for the delay in responding, I've been trying to get back to this.
> > You make some great points. If we can forego Maven/Gradle plugins, we can
> > do this with a lot less work. Initially, I was concerned about unpacking
> > the bits potentially more than strictly necessary, but it seems likely
> that
> > the CPU / IO required to do the unpacking would probably not be
> noticeable
> > in the grand scheme of running tests against a Kudu MiniCluster,
> especially
> > if it's only done once per test file (using @BeforeClass or similar).
> Also,
> > as long as there is some way to clean up the files that were unpacked and
> > avoid potentially filling up a temp dir then nobody should have a problem
> > with this approach... perhaps we can register a JVM shutdown hook and
> also
> > provide some kind of teardown() method so people can ensure the files get
> > cleaned up as appropriate (I saw the TODO in your test code for this;
> > protoc uses File.deleteOnExit()).
> >
> > Regarding creating the binary artifacts, I don't think it's too big of a
> > burden to ask the release manager to upload either a RHEL 6 or macOS
> binary
> > test artifact based on the output of a script, and ask the PMC for help
> to
> > get a binary for the other platform.
> >
> > I'm out of town for the next couple of weeks but once I get back I'll see
> > if I can push this forward a bit more.
> >
> > Thanks!
> > Mike
> >
> >
> > On Thu, Aug 2, 2018 at 10:57 PM Tim Robertson <timrobertson100@gmail.com
> >
> > wrote:
> >
> >> Thanks to you for making this test possible Mike.
> >>
> >> I was approaching this emulating protoc where they put the binaries as
> >> artifacts for Maven [1].
> >> A slight difference is that protoc is a single file while your tarball
> has
> >> several. I notice that the protoc build plugin for maven also copies out
> >> from the jar to a local filesystem [2] so I just copied that approach.
> >>
> >> What I could imagine is we have a jar with the binaries and a single
> class
> >> (say EmbeddedKudu) that copies the binaries into a temporary directory
> (as
> >> my hack in GH).
> >>
> >> A user would then do the following:
> >>
> >> <!-- finds the environment -->
> >> <build>
> >>    <extensions>
> >>        <extension>
> >>            <groupId>kr.motd.maven</groupId>
> >>            <artifactId>os-maven-plugin</artifactId>
> >>            <version>1.6.0</version>
> >>        </extension>
> >>    </extensions>
> >> </build>
> >> ....
> >> <!-- Adds mini cluster -->
> >> <dependency>
> >>    <groupId>org.apache.kudu</groupId>
> >>    <artifactId>kudu-client</artifactId>
> >>    <version>1.7.0</version>
> >>    <classifier>tests</classifier>
> >> </dependency>
> >> <!-- Adds an embedded Kudu -->
> >> <dependency>
> >>    <groupId>org.apache.kudu</groupId>
> >>    <artifactId>kudu-test-server</artifactId>
> >>    <version>1.7.0</version>
> >>    <classifier>${os.detected.classifier}</classifier>
> >> </dependency>
> >>
> >>
> >> The detect classifier stuff is copied from protoc, but perhaps we'd just
> >> state that available options are (?) linux / OSX and ignore
> autodetection.
> >>
> >> And in code users would do something like:
> >>
> >> EmbeddedKudu.prepare(); // copies kudu from the jar and sets the
> >> system variable (as per my example)
> >> MiniKuduCluster miniCluster =
> >>    new
> >>
> MiniKuduCluster.MiniKuduClusterBuilder().numMasters(1).numTservers(1).build();
> >>
> >>
> >> What this would mean:
> >>
> >> - no change to the existing Kudu code packkaging
> >> - leverage caching of the binaries as any Java artifact
> >> - download would be at build time not test runtime - only copying out
> from
> >> the jar each time
> >> - simple to include in most build environments (not tied to maven as a
> >> plugin)
> >>
> >> I can't comment if it makes sense to do this WRT the binaries though
> and it
> >> would mean building and releasing binaries into a jar on each Kudu
> release
> >> (as protoc does).
> >>
> >> WDYT?
> >>
> >> Thanks,
> >> Tim
> >>
> >> [1] http://repo1.maven.org/maven2/com/google/protobuf/protoc/3.6.1/
> >> [2]
> >>
> >>
> https://github.com/os72/protoc-jar-maven-plugin/blob/master/src/main/java/com/github/os72/protocjar/maven/ProtocJarMojo.java#L664
> >>
> >>> On Thu, Aug 2, 2018 at 10:28 PM, Mike Percy <mp...@apache.org> wrote:
> >>>
> >>> Ha, neat, thanks for posting this, Tim. It's a nice proof of concept.
> >>>
> >>> I was imagining that we would try to implement the downloading part as
> a
> >>> Maven plugin, but maybe it could work to try to download the artifacts
> at
> >>> runtime with a JUnit test. Do you think we could cache the artifacts
> >>> somewhere, maybe in the Maven repo somehow, so we don't have to
> download
> >>> the artifact every time we want to use it? I was hoping to simply be
> able
> >>> to ship a tarball or a jar of the binaries separately from the test
> >>> framework code.
> >>>
> >>> Mike
> >>>
> >>> On Thu, Aug 2, 2018 at 8:42 AM Tim Robertson <
> timrobertson100@gmail.com>
> >>> wrote:
> >>>
> >>>> Hi folks
> >>>>
> >>>> I've not had too much time, but I threw this together using Mike's
> >>>> binaries:
> >>>>  https://github.com/timrobertson100/kudu-test-server
> >>>>
> >>>> I think this shows that running a mini cluster is possible using the
> >>>> binaries Mike prepared when they are included in a jar (on CentOS 7.4
> >> at
> >>>> least).
> >>>>
> >>>> Please don't flame me for the code - it was a rush job - but perhaps
> >> you
> >>>> could verify it works for you Mike?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org>
> >> wrote:
> >>>>
> >>>>> On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <
> >>> timrobertson100@gmail.com
> >>>>>
> >>>>> wrote:
> >>>>>
> >>>>>>> I'm pretty busy this week and would be happy to get some help on
> >>> this
> >>>>> if
> >>>>>> anybody has cycles to spare.
> >>>>>>
> >>>>>> Thanks Mike - I'll look into 3) and if we get it working I'm happy
> >> to
> >>>>> offer
> >>>>>> a PR for 4).
> >>>>>> This is an evening project for me so I might be a little slow too -
> >>> if
> >>>>>> someone else is keen and has time please feel free to jump in.
> >>>>>>
> >>>>>
> >>>>> Sounds great! I think Grant started working on #4 but I don't know
> >> how
> >>>> far
> >>>>> he got.
> >>>>>
> >>>>> Mike
> >>>>>
> >>>>
> >>>
> >>
>

Re: Binaries for embedded testing

Posted by Tim <ti...@gmail.com>.
Thanks Mike - no apologies needed at all

I’ll aim to reshape the GH repo I did to illustrate what we outlined, with an example of use.

Outstanding is some OS X binaries if anyone has some time?

Have a good trip.

Tim,
Sent from my iPhone 

> On 9 Aug 2018, at 14:05, Mike Percy <mp...@apache.org> wrote:
> 
> Hi Tim,
> Sorry for the delay in responding, I've been trying to get back to this.
> You make some great points. If we can forego Maven/Gradle plugins, we can
> do this with a lot less work. Initially, I was concerned about unpacking
> the bits potentially more than strictly necessary, but it seems likely that
> the CPU / IO required to do the unpacking would probably not be noticeable
> in the grand scheme of running tests against a Kudu MiniCluster, especially
> if it's only done once per test file (using @BeforeClass or similar). Also,
> as long as there is some way to clean up the files that were unpacked and
> avoid potentially filling up a temp dir then nobody should have a problem
> with this approach... perhaps we can register a JVM shutdown hook and also
> provide some kind of teardown() method so people can ensure the files get
> cleaned up as appropriate (I saw the TODO in your test code for this;
> protoc uses File.deleteOnExit()).
> 
> Regarding creating the binary artifacts, I don't think it's too big of a
> burden to ask the release manager to upload either a RHEL 6 or macOS binary
> test artifact based on the output of a script, and ask the PMC for help to
> get a binary for the other platform.
> 
> I'm out of town for the next couple of weeks but once I get back I'll see
> if I can push this forward a bit more.
> 
> Thanks!
> Mike
> 
> 
> On Thu, Aug 2, 2018 at 10:57 PM Tim Robertson <ti...@gmail.com>
> wrote:
> 
>> Thanks to you for making this test possible Mike.
>> 
>> I was approaching this emulating protoc where they put the binaries as
>> artifacts for Maven [1].
>> A slight difference is that protoc is a single file while your tarball has
>> several. I notice that the protoc build plugin for maven also copies out
>> from the jar to a local filesystem [2] so I just copied that approach.
>> 
>> What I could imagine is we have a jar with the binaries and a single class
>> (say EmbeddedKudu) that copies the binaries into a temporary directory (as
>> my hack in GH).
>> 
>> A user would then do the following:
>> 
>> <!-- finds the environment -->
>> <build>
>>    <extensions>
>>        <extension>
>>            <groupId>kr.motd.maven</groupId>
>>            <artifactId>os-maven-plugin</artifactId>
>>            <version>1.6.0</version>
>>        </extension>
>>    </extensions>
>> </build>
>> ....
>> <!-- Adds mini cluster -->
>> <dependency>
>>    <groupId>org.apache.kudu</groupId>
>>    <artifactId>kudu-client</artifactId>
>>    <version>1.7.0</version>
>>    <classifier>tests</classifier>
>> </dependency>
>> <!-- Adds an embedded Kudu -->
>> <dependency>
>>    <groupId>org.apache.kudu</groupId>
>>    <artifactId>kudu-test-server</artifactId>
>>    <version>1.7.0</version>
>>    <classifier>${os.detected.classifier}</classifier>
>> </dependency>
>> 
>> 
>> The detect classifier stuff is copied from protoc, but perhaps we'd just
>> state that available options are (?) linux / OSX and ignore autodetection.
>> 
>> And in code users would do something like:
>> 
>> EmbeddedKudu.prepare(); // copies kudu from the jar and sets the
>> system variable (as per my example)
>> MiniKuduCluster miniCluster =
>>    new
>> MiniKuduCluster.MiniKuduClusterBuilder().numMasters(1).numTservers(1).build();
>> 
>> 
>> What this would mean:
>> 
>> - no change to the existing Kudu code packkaging
>> - leverage caching of the binaries as any Java artifact
>> - download would be at build time not test runtime - only copying out from
>> the jar each time
>> - simple to include in most build environments (not tied to maven as a
>> plugin)
>> 
>> I can't comment if it makes sense to do this WRT the binaries though and it
>> would mean building and releasing binaries into a jar on each Kudu release
>> (as protoc does).
>> 
>> WDYT?
>> 
>> Thanks,
>> Tim
>> 
>> [1] http://repo1.maven.org/maven2/com/google/protobuf/protoc/3.6.1/
>> [2]
>> 
>> https://github.com/os72/protoc-jar-maven-plugin/blob/master/src/main/java/com/github/os72/protocjar/maven/ProtocJarMojo.java#L664
>> 
>>> On Thu, Aug 2, 2018 at 10:28 PM, Mike Percy <mp...@apache.org> wrote:
>>> 
>>> Ha, neat, thanks for posting this, Tim. It's a nice proof of concept.
>>> 
>>> I was imagining that we would try to implement the downloading part as a
>>> Maven plugin, but maybe it could work to try to download the artifacts at
>>> runtime with a JUnit test. Do you think we could cache the artifacts
>>> somewhere, maybe in the Maven repo somehow, so we don't have to download
>>> the artifact every time we want to use it? I was hoping to simply be able
>>> to ship a tarball or a jar of the binaries separately from the test
>>> framework code.
>>> 
>>> Mike
>>> 
>>> On Thu, Aug 2, 2018 at 8:42 AM Tim Robertson <ti...@gmail.com>
>>> wrote:
>>> 
>>>> Hi folks
>>>> 
>>>> I've not had too much time, but I threw this together using Mike's
>>>> binaries:
>>>>  https://github.com/timrobertson100/kudu-test-server
>>>> 
>>>> I think this shows that running a mini cluster is possible using the
>>>> binaries Mike prepared when they are included in a jar (on CentOS 7.4
>> at
>>>> least).
>>>> 
>>>> Please don't flame me for the code - it was a rush job - but perhaps
>> you
>>>> could verify it works for you Mike?
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org>
>> wrote:
>>>> 
>>>>> On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <
>>> timrobertson100@gmail.com
>>>>> 
>>>>> wrote:
>>>>> 
>>>>>>> I'm pretty busy this week and would be happy to get some help on
>>> this
>>>>> if
>>>>>> anybody has cycles to spare.
>>>>>> 
>>>>>> Thanks Mike - I'll look into 3) and if we get it working I'm happy
>> to
>>>>> offer
>>>>>> a PR for 4).
>>>>>> This is an evening project for me so I might be a little slow too -
>>> if
>>>>>> someone else is keen and has time please feel free to jump in.
>>>>>> 
>>>>> 
>>>>> Sounds great! I think Grant started working on #4 but I don't know
>> how
>>>> far
>>>>> he got.
>>>>> 
>>>>> Mike
>>>>> 
>>>> 
>>> 
>> 

Re: Binaries for embedded testing

Posted by Mike Percy <mp...@apache.org>.
Hi Tim,
Sorry for the delay in responding, I've been trying to get back to this.
You make some great points. If we can forego Maven/Gradle plugins, we can
do this with a lot less work. Initially, I was concerned about unpacking
the bits potentially more than strictly necessary, but it seems likely that
the CPU / IO required to do the unpacking would probably not be noticeable
in the grand scheme of running tests against a Kudu MiniCluster, especially
if it's only done once per test file (using @BeforeClass or similar). Also,
as long as there is some way to clean up the files that were unpacked and
avoid potentially filling up a temp dir then nobody should have a problem
with this approach... perhaps we can register a JVM shutdown hook and also
provide some kind of teardown() method so people can ensure the files get
cleaned up as appropriate (I saw the TODO in your test code for this;
protoc uses File.deleteOnExit()).

Regarding creating the binary artifacts, I don't think it's too big of a
burden to ask the release manager to upload either a RHEL 6 or macOS binary
test artifact based on the output of a script, and ask the PMC for help to
get a binary for the other platform.

I'm out of town for the next couple of weeks but once I get back I'll see
if I can push this forward a bit more.

Thanks!
Mike


On Thu, Aug 2, 2018 at 10:57 PM Tim Robertson <ti...@gmail.com>
wrote:

> Thanks to you for making this test possible Mike.
>
> I was approaching this emulating protoc where they put the binaries as
> artifacts for Maven [1].
> A slight difference is that protoc is a single file while your tarball has
> several. I notice that the protoc build plugin for maven also copies out
> from the jar to a local filesystem [2] so I just copied that approach.
>
> What I could imagine is we have a jar with the binaries and a single class
> (say EmbeddedKudu) that copies the binaries into a temporary directory (as
> my hack in GH).
>
> A user would then do the following:
>
> <!-- finds the environment -->
> <build>
>     <extensions>
>         <extension>
>             <groupId>kr.motd.maven</groupId>
>             <artifactId>os-maven-plugin</artifactId>
>             <version>1.6.0</version>
>         </extension>
>     </extensions>
> </build>
> ....
> <!-- Adds mini cluster -->
> <dependency>
>     <groupId>org.apache.kudu</groupId>
>     <artifactId>kudu-client</artifactId>
>     <version>1.7.0</version>
>     <classifier>tests</classifier>
> </dependency>
> <!-- Adds an embedded Kudu -->
> <dependency>
>     <groupId>org.apache.kudu</groupId>
>     <artifactId>kudu-test-server</artifactId>
>     <version>1.7.0</version>
>     <classifier>${os.detected.classifier}</classifier>
> </dependency>
>
>
> The detect classifier stuff is copied from protoc, but perhaps we'd just
> state that available options are (?) linux / OSX and ignore autodetection.
>
> And in code users would do something like:
>
> EmbeddedKudu.prepare(); // copies kudu from the jar and sets the
> system variable (as per my example)
> MiniKuduCluster miniCluster =
>     new
> MiniKuduCluster.MiniKuduClusterBuilder().numMasters(1).numTservers(1).build();
>
>
> What this would mean:
>
> - no change to the existing Kudu code packkaging
> - leverage caching of the binaries as any Java artifact
> - download would be at build time not test runtime - only copying out from
> the jar each time
> - simple to include in most build environments (not tied to maven as a
> plugin)
>
> I can't comment if it makes sense to do this WRT the binaries though and it
> would mean building and releasing binaries into a jar on each Kudu release
> (as protoc does).
>
> WDYT?
>
> Thanks,
> Tim
>
> [1] http://repo1.maven.org/maven2/com/google/protobuf/protoc/3.6.1/
> [2]
>
> https://github.com/os72/protoc-jar-maven-plugin/blob/master/src/main/java/com/github/os72/protocjar/maven/ProtocJarMojo.java#L664
>
> On Thu, Aug 2, 2018 at 10:28 PM, Mike Percy <mp...@apache.org> wrote:
>
> > Ha, neat, thanks for posting this, Tim. It's a nice proof of concept.
> >
> > I was imagining that we would try to implement the downloading part as a
> > Maven plugin, but maybe it could work to try to download the artifacts at
> > runtime with a JUnit test. Do you think we could cache the artifacts
> > somewhere, maybe in the Maven repo somehow, so we don't have to download
> > the artifact every time we want to use it? I was hoping to simply be able
> > to ship a tarball or a jar of the binaries separately from the test
> > framework code.
> >
> > Mike
> >
> > On Thu, Aug 2, 2018 at 8:42 AM Tim Robertson <ti...@gmail.com>
> > wrote:
> >
> > > Hi folks
> > >
> > > I've not had too much time, but I threw this together using Mike's
> > > binaries:
> > >   https://github.com/timrobertson100/kudu-test-server
> > >
> > > I think this shows that running a mini cluster is possible using the
> > > binaries Mike prepared when they are included in a jar (on CentOS 7.4
> at
> > > least).
> > >
> > > Please don't flame me for the code - it was a rush job - but perhaps
> you
> > > could verify it works for you Mike?
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org>
> wrote:
> > >
> > > > On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <
> > timrobertson100@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > > I'm pretty busy this week and would be happy to get some help on
> > this
> > > > if
> > > > > anybody has cycles to spare.
> > > > >
> > > > > Thanks Mike - I'll look into 3) and if we get it working I'm happy
> to
> > > > offer
> > > > > a PR for 4).
> > > > > This is an evening project for me so I might be a little slow too -
> > if
> > > > > someone else is keen and has time please feel free to jump in.
> > > > >
> > > >
> > > > Sounds great! I think Grant started working on #4 but I don't know
> how
> > > far
> > > > he got.
> > > >
> > > > Mike
> > > >
> > >
> >
>

Re: Binaries for embedded testing

Posted by Tim Robertson <ti...@gmail.com>.
Thanks to you for making this test possible Mike.

I was approaching this emulating protoc where they put the binaries as
artifacts for Maven [1].
A slight difference is that protoc is a single file while your tarball has
several. I notice that the protoc build plugin for maven also copies out
from the jar to a local filesystem [2] so I just copied that approach.

What I could imagine is we have a jar with the binaries and a single class
(say EmbeddedKudu) that copies the binaries into a temporary directory (as
my hack in GH).

A user would then do the following:

<!-- finds the environment -->
<build>
    <extensions>
        <extension>
            <groupId>kr.motd.maven</groupId>
            <artifactId>os-maven-plugin</artifactId>
            <version>1.6.0</version>
        </extension>
    </extensions>
</build>
....
<!-- Adds mini cluster -->
<dependency>
    <groupId>org.apache.kudu</groupId>
    <artifactId>kudu-client</artifactId>
    <version>1.7.0</version>
    <classifier>tests</classifier>
</dependency>
<!-- Adds an embedded Kudu -->
<dependency>
    <groupId>org.apache.kudu</groupId>
    <artifactId>kudu-test-server</artifactId>
    <version>1.7.0</version>
    <classifier>${os.detected.classifier}</classifier>
</dependency>


The detect classifier stuff is copied from protoc, but perhaps we'd just
state that available options are (?) linux / OSX and ignore autodetection.

And in code users would do something like:

EmbeddedKudu.prepare(); // copies kudu from the jar and sets the
system variable (as per my example)
MiniKuduCluster miniCluster =
    new MiniKuduCluster.MiniKuduClusterBuilder().numMasters(1).numTservers(1).build();


What this would mean:

- no change to the existing Kudu code packkaging
- leverage caching of the binaries as any Java artifact
- download would be at build time not test runtime - only copying out from
the jar each time
- simple to include in most build environments (not tied to maven as a
plugin)

I can't comment if it makes sense to do this WRT the binaries though and it
would mean building and releasing binaries into a jar on each Kudu release
(as protoc does).

WDYT?

Thanks,
Tim

[1] http://repo1.maven.org/maven2/com/google/protobuf/protoc/3.6.1/
[2]
https://github.com/os72/protoc-jar-maven-plugin/blob/master/src/main/java/com/github/os72/protocjar/maven/ProtocJarMojo.java#L664

On Thu, Aug 2, 2018 at 10:28 PM, Mike Percy <mp...@apache.org> wrote:

> Ha, neat, thanks for posting this, Tim. It's a nice proof of concept.
>
> I was imagining that we would try to implement the downloading part as a
> Maven plugin, but maybe it could work to try to download the artifacts at
> runtime with a JUnit test. Do you think we could cache the artifacts
> somewhere, maybe in the Maven repo somehow, so we don't have to download
> the artifact every time we want to use it? I was hoping to simply be able
> to ship a tarball or a jar of the binaries separately from the test
> framework code.
>
> Mike
>
> On Thu, Aug 2, 2018 at 8:42 AM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > Hi folks
> >
> > I've not had too much time, but I threw this together using Mike's
> > binaries:
> >   https://github.com/timrobertson100/kudu-test-server
> >
> > I think this shows that running a mini cluster is possible using the
> > binaries Mike prepared when they are included in a jar (on CentOS 7.4 at
> > least).
> >
> > Please don't flame me for the code - it was a rush job - but perhaps you
> > could verify it works for you Mike?
> >
> >
> >
> >
> >
> >
> > On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org> wrote:
> >
> > > On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <
> timrobertson100@gmail.com
> > >
> > > wrote:
> > >
> > > > > I'm pretty busy this week and would be happy to get some help on
> this
> > > if
> > > > anybody has cycles to spare.
> > > >
> > > > Thanks Mike - I'll look into 3) and if we get it working I'm happy to
> > > offer
> > > > a PR for 4).
> > > > This is an evening project for me so I might be a little slow too -
> if
> > > > someone else is keen and has time please feel free to jump in.
> > > >
> > >
> > > Sounds great! I think Grant started working on #4 but I don't know how
> > far
> > > he got.
> > >
> > > Mike
> > >
> >
>

Re: Binaries for embedded testing

Posted by Mike Percy <mp...@apache.org>.
Ha, neat, thanks for posting this, Tim. It's a nice proof of concept.

I was imagining that we would try to implement the downloading part as a
Maven plugin, but maybe it could work to try to download the artifacts at
runtime with a JUnit test. Do you think we could cache the artifacts
somewhere, maybe in the Maven repo somehow, so we don't have to download
the artifact every time we want to use it? I was hoping to simply be able
to ship a tarball or a jar of the binaries separately from the test
framework code.

Mike

On Thu, Aug 2, 2018 at 8:42 AM Tim Robertson <ti...@gmail.com>
wrote:

> Hi folks
>
> I've not had too much time, but I threw this together using Mike's
> binaries:
>   https://github.com/timrobertson100/kudu-test-server
>
> I think this shows that running a mini cluster is possible using the
> binaries Mike prepared when they are included in a jar (on CentOS 7.4 at
> least).
>
> Please don't flame me for the code - it was a rush job - but perhaps you
> could verify it works for you Mike?
>
>
>
>
>
>
> On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org> wrote:
>
> > On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <timrobertson100@gmail.com
> >
> > wrote:
> >
> > > > I'm pretty busy this week and would be happy to get some help on this
> > if
> > > anybody has cycles to spare.
> > >
> > > Thanks Mike - I'll look into 3) and if we get it working I'm happy to
> > offer
> > > a PR for 4).
> > > This is an evening project for me so I might be a little slow too - if
> > > someone else is keen and has time please feel free to jump in.
> > >
> >
> > Sounds great! I think Grant started working on #4 but I don't know how
> far
> > he got.
> >
> > Mike
> >
>

Re: Binaries for embedded testing

Posted by Tim Robertson <ti...@gmail.com>.
Hi folks

I've not had too much time, but I threw this together using Mike's
binaries:
  https://github.com/timrobertson100/kudu-test-server

I think this shows that running a mini cluster is possible using the
binaries Mike prepared when they are included in a jar (on CentOS 7.4 at
least).

Please don't flame me for the code - it was a rush job - but perhaps you
could verify it works for you Mike?






On Thu, Jul 19, 2018 at 12:35 AM, Mike Percy <mp...@apache.org> wrote:

> On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > > I'm pretty busy this week and would be happy to get some help on this
> if
> > anybody has cycles to spare.
> >
> > Thanks Mike - I'll look into 3) and if we get it working I'm happy to
> offer
> > a PR for 4).
> > This is an evening project for me so I might be a little slow too - if
> > someone else is keen and has time please feel free to jump in.
> >
>
> Sounds great! I think Grant started working on #4 but I don't know how far
> he got.
>
> Mike
>

Re: Binaries for embedded testing

Posted by Mike Percy <mp...@apache.org>.
On Wed, Jul 18, 2018 at 1:24 PM Tim Robertson <ti...@gmail.com>
wrote:

> > I'm pretty busy this week and would be happy to get some help on this if
> anybody has cycles to spare.
>
> Thanks Mike - I'll look into 3) and if we get it working I'm happy to offer
> a PR for 4).
> This is an evening project for me so I might be a little slow too - if
> someone else is keen and has time please feel free to jump in.
>

Sounds great! I think Grant started working on #4 but I don't know how far
he got.

Mike

Re: Binaries for embedded testing

Posted by Tim Robertson <ti...@gmail.com>.
> I'm pretty busy this week and would be happy to get some help on this if
anybody has cycles to spare.

Thanks Mike - I'll look into 3) and if we get it working I'm happy to offer
a PR for 4).
This is an evening project for me so I might be a little slow too - if
someone else is keen and has time please feel free to jump in.


On Wed, Jul 18, 2018 at 10:16 PM, Mike Percy <mp...@apache.org> wrote:

> On Wed, Jul 18, 2018 at 12:42 PM Todd Lipcon <to...@cloudera.com.invalid>
> wrote:
>
> > On Wed, Jul 18, 2018 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:
> >
> > > So I was able to get a proof-of-concept working [1], although the
> script
> > is
> > > a bit hacky. The hacky part is that I arbitrarily chose a few system
> > > modules not to package until the thing was willing to run. I was able
> to
> > > build Kudu on EL6 and run it on Ubuntu 16.04. I have not make it work
> on
> > > macOS yet... the commands for the rpath modifications to make it
> > > relocatable would be different but the effect should be similar:
> library
> > > paths relative to the binary.
> > >
> > > To get this proof-of-concept to a usable state, we still would need the
> > > following pieces:
> > >
> > > 1) Make the above script also work on macOS using install_name_tool
> > > and @loader_path per [2]
> > >
> >
> > I wonder whether for OSX it would be easier to just assume Docker? We
> > already know we have various functional limitations on OSX, so maybe it
> > would simplify our release process if we didn't have to worry about
> > creating this special artifact on OSX? I would guess that most OSX
> > developers either already have or would be willing to install Docker.
> >
>
> If we require Docker on Mac to run, then it probably doesn't make sense to
> enable Kudu tests by default on Mac on downstream projects such as Beam,
> Flink, Flume, Spark, etc. I suspect the Kudu tests would usually be
> skipped.
>
> Mike
>

Re: Binaries for embedded testing

Posted by Mike Percy <mp...@apache.org>.
On Wed, Jul 18, 2018 at 12:42 PM Todd Lipcon <to...@cloudera.com.invalid>
wrote:

> On Wed, Jul 18, 2018 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:
>
> > So I was able to get a proof-of-concept working [1], although the script
> is
> > a bit hacky. The hacky part is that I arbitrarily chose a few system
> > modules not to package until the thing was willing to run. I was able to
> > build Kudu on EL6 and run it on Ubuntu 16.04. I have not make it work on
> > macOS yet... the commands for the rpath modifications to make it
> > relocatable would be different but the effect should be similar: library
> > paths relative to the binary.
> >
> > To get this proof-of-concept to a usable state, we still would need the
> > following pieces:
> >
> > 1) Make the above script also work on macOS using install_name_tool
> > and @loader_path per [2]
> >
>
> I wonder whether for OSX it would be easier to just assume Docker? We
> already know we have various functional limitations on OSX, so maybe it
> would simplify our release process if we didn't have to worry about
> creating this special artifact on OSX? I would guess that most OSX
> developers either already have or would be willing to install Docker.
>

If we require Docker on Mac to run, then it probably doesn't make sense to
enable Kudu tests by default on Mac on downstream projects such as Beam,
Flink, Flume, Spark, etc. I suspect the Kudu tests would usually be skipped.

Mike

Re: Binaries for embedded testing

Posted by Todd Lipcon <to...@cloudera.com.INVALID>.
On Wed, Jul 18, 2018 at 12:38 PM, Mike Percy <mp...@apache.org> wrote:

> So I was able to get a proof-of-concept working [1], although the script is
> a bit hacky. The hacky part is that I arbitrarily chose a few system
> modules not to package until the thing was willing to run. I was able to
> build Kudu on EL6 and run it on Ubuntu 16.04. I have not make it work on
> macOS yet... the commands for the rpath modifications to make it
> relocatable would be different but the effect should be similar: library
> paths relative to the binary.
>
> To get this proof-of-concept to a usable state, we still would need the
> following pieces:
>
> 1) Make the above script also work on macOS using install_name_tool
> and @loader_path per [2]
>

I wonder whether for OSX it would be easier to just assume Docker? We
already know we have various functional limitations on OSX, so maybe it
would simplify our release process if we didn't have to worry about
creating this special artifact on OSX? I would guess that most OSX
developers either already have or would be willing to install Docker.


> 2) Write a script to upload the resulting tarball to the right place on
> artifactory
> 3) Come up with a way to download the correct binary tarball and unpack it
> locally as part of the Maven test phase in dependent projects -- I think we
> can use ${os.detected.classifier} from
> https://github.com/trustin/os-maven-plugin for the system matching, I
> wonder if we need a custom Maven plugin to do the unpacking? Or maybe
> somehow we could use the ant-run plugin?
> 4) Move the KuduMiniCluster out of the kudu-client test dir and into its
> own Maven module so it's consumable by other projects
>
> I'm pretty busy this week and would be happy to get some help on this if
> anybody has cycles to spare.
>
> Mike
>
> [1] https://gist.github.com/mpercy/aee8d95c55f713615b90df5faad4bb99
> [2] https://blogs.oracle.com/dipol/dynamic-libraries,-rpath,-and-mac-os
>
>
> On Mon, Jul 9, 2018 at 11:36 AM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > Slightly related...
> >
> > Such a build could in theory be easily turned into a docker image too
> using
> > the relatively new Jib tool from Google:
> >
> >
> > https://cloudplatform.googleblog.com/2018/07/introducing-jib-build-java-
> docker-images-better.html
> >
> >
> >
> >
> > On Fri, Jul 6, 2018 at 12:57 AM, Mike Percy <mp...@apache.org> wrote:
> >
> > > On Thu, Jul 5, 2018 at 1:34 PM Todd Lipcon <to...@cloudera.com.invalid>
> > > wrote:
> > >
> > > > In many cases, builds done on earlier systems are runnable on newer
> > > > systems. For example, el6 builds tend to run fine on el7 and ubuntu
> 14
> > in
> > > > my experience. If you also bundle the libssl and libcrypto, it seems
> > that
> > > > an el6 build can also run on Ubuntu 16. So, assuming we could package
> > > more
> > > > than one binary in the artifact, packaging the shared libraries and
> > > setting
> > > > RPATH or LD_LIBRARY_PATH accordingly is probably a reasonable option
> to
> > > run
> > > > across most common Linux variants.
> > >
> > >
> > > Cool idea. We could do a fully dynamic build, ship all the dynamic deps
> > > with hacked rpaths, and likely save some space in the tarball along the
> > > way. It's worth trying out.
> > >
> > > Mike
> > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Binaries for embedded testing

Posted by Mike Percy <mp...@apache.org>.
So I was able to get a proof-of-concept working [1], although the script is
a bit hacky. The hacky part is that I arbitrarily chose a few system
modules not to package until the thing was willing to run. I was able to
build Kudu on EL6 and run it on Ubuntu 16.04. I have not make it work on
macOS yet... the commands for the rpath modifications to make it
relocatable would be different but the effect should be similar: library
paths relative to the binary.

To get this proof-of-concept to a usable state, we still would need the
following pieces:

1) Make the above script also work on macOS using install_name_tool
and @loader_path per [2]
2) Write a script to upload the resulting tarball to the right place on
artifactory
3) Come up with a way to download the correct binary tarball and unpack it
locally as part of the Maven test phase in dependent projects -- I think we
can use ${os.detected.classifier} from
https://github.com/trustin/os-maven-plugin for the system matching, I
wonder if we need a custom Maven plugin to do the unpacking? Or maybe
somehow we could use the ant-run plugin?
4) Move the KuduMiniCluster out of the kudu-client test dir and into its
own Maven module so it's consumable by other projects

I'm pretty busy this week and would be happy to get some help on this if
anybody has cycles to spare.

Mike

[1] https://gist.github.com/mpercy/aee8d95c55f713615b90df5faad4bb99
[2] https://blogs.oracle.com/dipol/dynamic-libraries,-rpath,-and-mac-os


On Mon, Jul 9, 2018 at 11:36 AM Tim Robertson <ti...@gmail.com>
wrote:

> Slightly related...
>
> Such a build could in theory be easily turned into a docker image too using
> the relatively new Jib tool from Google:
>
>
> https://cloudplatform.googleblog.com/2018/07/introducing-jib-build-java-docker-images-better.html
>
>
>
>
> On Fri, Jul 6, 2018 at 12:57 AM, Mike Percy <mp...@apache.org> wrote:
>
> > On Thu, Jul 5, 2018 at 1:34 PM Todd Lipcon <to...@cloudera.com.invalid>
> > wrote:
> >
> > > In many cases, builds done on earlier systems are runnable on newer
> > > systems. For example, el6 builds tend to run fine on el7 and ubuntu 14
> in
> > > my experience. If you also bundle the libssl and libcrypto, it seems
> that
> > > an el6 build can also run on Ubuntu 16. So, assuming we could package
> > more
> > > than one binary in the artifact, packaging the shared libraries and
> > setting
> > > RPATH or LD_LIBRARY_PATH accordingly is probably a reasonable option to
> > run
> > > across most common Linux variants.
> >
> >
> > Cool idea. We could do a fully dynamic build, ship all the dynamic deps
> > with hacked rpaths, and likely save some space in the tarball along the
> > way. It's worth trying out.
> >
> > Mike
> >
>

Re: Binaries for embedded testing

Posted by Tim Robertson <ti...@gmail.com>.
Slightly related...

Such a build could in theory be easily turned into a docker image too using
the relatively new Jib tool from Google:

https://cloudplatform.googleblog.com/2018/07/introducing-jib-build-java-docker-images-better.html




On Fri, Jul 6, 2018 at 12:57 AM, Mike Percy <mp...@apache.org> wrote:

> On Thu, Jul 5, 2018 at 1:34 PM Todd Lipcon <to...@cloudera.com.invalid>
> wrote:
>
> > In many cases, builds done on earlier systems are runnable on newer
> > systems. For example, el6 builds tend to run fine on el7 and ubuntu 14 in
> > my experience. If you also bundle the libssl and libcrypto, it seems that
> > an el6 build can also run on Ubuntu 16. So, assuming we could package
> more
> > than one binary in the artifact, packaging the shared libraries and
> setting
> > RPATH or LD_LIBRARY_PATH accordingly is probably a reasonable option to
> run
> > across most common Linux variants.
>
>
> Cool idea. We could do a fully dynamic build, ship all the dynamic deps
> with hacked rpaths, and likely save some space in the tarball along the
> way. It's worth trying out.
>
> Mike
>

Re: Binaries for embedded testing

Posted by Mike Percy <mp...@apache.org>.
On Thu, Jul 5, 2018 at 1:34 PM Todd Lipcon <to...@cloudera.com.invalid>
wrote:

> In many cases, builds done on earlier systems are runnable on newer
> systems. For example, el6 builds tend to run fine on el7 and ubuntu 14 in
> my experience. If you also bundle the libssl and libcrypto, it seems that
> an el6 build can also run on Ubuntu 16. So, assuming we could package more
> than one binary in the artifact, packaging the shared libraries and setting
> RPATH or LD_LIBRARY_PATH accordingly is probably a reasonable option to run
> across most common Linux variants.


Cool idea. We could do a fully dynamic build, ship all the dynamic deps
with hacked rpaths, and likely save some space in the tarball along the
way. It's worth trying out.

Mike

Re: Binaries for embedded testing

Posted by Todd Lipcon <to...@cloudera.com.INVALID>.
On Mon, Jul 2, 2018 at 12:24 PM, Mike Percy <mp...@apache.org> wrote:

>
> So it's not viable to simply have a linux-x86_64 binary and a darwin-x86_64
> binary like protoc does, or even just ubuntu & redhat. We'll likely need a
> separate binary for every major OS version, i.e. RHEL 6, RHEL 7, trusty,
> xenial, bionic. I think people running non-LTS builds of Ubuntu, or SUSE or
> something, would be out of luck.
>

In many cases, builds done on earlier systems are runnable on newer
systems. For example, el6 builds tend to run fine on el7 and ubuntu 14 in
my experience. If you also bundle the libssl and libcrypto, it seems that
an el6 build can also run on Ubuntu 16. So, assuming we could package more
than one binary in the artifact, packaging the shared libraries and setting
RPATH or LD_LIBRARY_PATH accordingly is probably a reasonable option to run
across most common Linux variants.

-Todd


>
> On Sat, Jun 30, 2018 at 4:31 AM Tim Robertson <ti...@gmail.com>
> wrote:
>
> > > What do you mean by that?
> > Sorry, poor phrasing - currently the Beam project has the build path with
> > unit tests (no Docker there) and the project IT environment which can use
> > Docker.
> > A binary only approach could potentially be managed without adding a
> > dependency on Docker - but has other issues summarised below.
> >
> > > For Kudu-internal testing I think we could stick to running "kudu
> > minicluster
> > Yes.
> >
> > > ... external use cases, we could switch that to "docker run
> > kudu:minicluster:1.7.0"
> > I think this makes good sense.
> >
> >
> > In summary:
> >
> > 1) Fake a Kudu master in Java - difficult unless simplified, not
> > representative if simplified, code maintenance issue
> > 2) Mocking the Kudu client - verbose unless only covering simple
> scenarios
> > 3) Use mini cluster with binaries - portability challenge of binaries,
> need
> > to script caching the binaries / use of some repository, unfamiliar build
> > tasks with binary handling (unless built to work with something like
> > maven), possible could see linking problems
> > 4) Docker - predictable, adds a dependency, existing Kudu images not
> > "managed" at the moment
> >
> > For Beam I think I will put most effort into IT which can use Docker or
> an
> > existing cluster and then mock a Java KuduClient for some basic sanity
> > tests for the build path.
> >
> > On Docker:
> > - to get current versions [e.g. 1] working I found I had to edit
> > /etc/hosts. I think the mini cluster version with the FakeDNS might avoid
> > that?
> > - Kudu docs currently encourage the Cloudera Quickstart VM over Docker
> > [2,3]
> >
> > Do you think the Kudu project could provide an image allowing "docker run
> > kudu:minicluster:1.x.x" as part of the release cycle?
> >
> > Thanks again,
> > Tim
> >
> > [1] https://github.com/MartinWeindel/kudu-docker
> > [2] https://kudu.apache.org/docs/quickstart.html#quickstart_vm
> > [3] https://github.com/cloudera/kudu-examples/wiki/Docker-based-tutorial
> >
> > On Sat, Jun 30, 2018 at 2:22 AM, Todd Lipcon <to...@cloudera.com.invalid>
> > wrote:
> >
> > > On Fri, Jun 29, 2018 at 1:23 PM, Tim Robertson <
> > timrobertson100@gmail.com>
> > > wrote:
> > >
> > > > Thanks Mike, Todd - I greatly appreciate the inputs.
> > > >
> > > > > How many platforms would need to be supported for it to be viable
> for
> > > > Beam?
> > > > The minimal for it to be considered would probably(!) be ubuntu,
> > centos,
> > > > osx. Incidentally it was actually the protobuf approach that make me
> > > > consider this.
> > > >
> > > > > What about depending on a docker container than runs the kudu
> > > > minicluster in
> > > > "host" networking mode?
> > > > I've also pondered this a little but like Attila raises it puts a lot
> > of
> > > > burden for other project developers. Mmmm...
> > > >
> > >
> > > What do you mean by that? For Kudu-internal testing I think we could
> > stick
> > > to running "kudu minicluster" as is. For external use cases, we could
> > > switch that to "docker run kudu:minicluster:1.7.0" or whatever, and it
> > > would auto-download from dockerhub as necessary, right?
> > >
> > >
> > > >
> > > > Ismaël (Beam PMC) has suggested I stick to mocking given the
> complexity
> > > of
> > > > the things I'm exploring.
> > > >
> > > > As another idea:
> > > > I briefly pondered writing a "FakeKudu Java server" - data held in
> > > memory,
> > > > no partitioning, protobuf messaging, handling table metadata,
> checking
> > > > schemas on write, predicate and projected columns for scan, faking
> > > kerberos
> > > > (if possible). It didn't seem particularly difficult to do but I
> fear a
> > > > maintenance burden for a small audience.
> > > >
> > > >
> > > Yea, I think that would be quite a maintenance burden, especially as
> new
> > > features are added over time. I suppose in many cases you could omit
> > things
> > > or stub things out, but then the behavior will begin to differ and it
> > won't
> > > really be that clear that your tests actually are representative.
> > >
> > >
> > > > Could utilities in Kudu that help folk test Java clients be of
> interest
> > > to
> > > > others? - e.g. preconfigured mock objects for various scenarios. If
> so,
> > > I'd
> > > > be happy to discuss options and offer PRs in Kudu.
> > > >
> > > > Thanks,
> > > > Tim
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Jun 29, 2018 at 9:34 PM, Todd Lipcon
> <todd@cloudera.com.invalid
> > >
> > > > wrote:
> > > >
> > > > > On Fri, Jun 29, 2018 at 12:31 PM, Mike Percy <mp...@apache.org>
> > > wrote:
> > > > >
> > > > > > This is something I've been thinking about and toying with and
> I'd
> > > like
> > > > > to
> > > > > > see if we can't get binaries available via Maven for at least one
> > > > > platform
> > > > > > (say, RHEL 7). Similar to how protobuf does it.
> > > > > >
> > > > >
> > > > > What about depending on a docker container than runs the kudu
> > > minicluster
> > > > > in "host" networking mode? eg https://github.com/
> > > > MartinWeindel/kudu-docker
> > > > > is one possibility
> > > > >
> > > > >
> > > > > > How many platforms would need to be supported for it to be viable
> > for
> > > > > Beam?
> > > > > >
> > > > > > Thanks,
> > > > > > Mike
> > > > > >
> > > > > > On Fri, Jun 29, 2018 at 10:01 AM Tim <ti...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Thanks Attila
> > > > > > >
> > > > > > > That’s great feedback and helpful for me to reference as
> > guidance.
> > > > > > >
> > > > > > > By “Kudu installation” I was referring to the possibility that
> an
> > > > > install
> > > > > > > might set config etc, beyond just having the binary. I got it
> > > running
> > > > > on
> > > > > > > CentOS similar to how you outline now.
> > > > > > >
> > > > > > > I too believe mocking makes most sense, especially as we have
> the
> > > IT
> > > > > > > running as well, but was asked to explore this further. It’s
> > useful
> > > > to
> > > > > > know
> > > > > > > you’d agree.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > Tim
> > > > > > >
> > > > > > > > On 29 Jun 2018, at 17:33, Attila Bukor <ab...@cloudera.com>
> > > > wrote:
> > > > > > > >
> > > > > > > > Hi Tim,
> > > > > > > >
> > > > > > > > I’m not sure what you mean by relying on actual
> installations.
> > If
> > > > you
> > > > > > > have the kudu, kudu-master and kudu-tserver binaries at the
> same
> > > > > location
> > > > > > > and they can be executed, MiniKuduCluster can be used (“binDir”
> > > > > property
> > > > > > > should be set to the directory containing the Kudu binaries).
> You
> > > > > should
> > > > > > > also look into BaseKuduTest as that will set up the
> > MiniKuduCluster
> > > > for
> > > > > > you
> > > > > > > and you don’t have to do it manually.
> > > > > > > >
> > > > > > > > Extracting the Kudu binaries from an rpm should probably
> work,
> > > but
> > > > > that
> > > > > > > binds you to CDH as currently Cloudera is the only one that
> ships
> > > > Kudu
> > > > > > > binaries and MacOS builds are not available anywhere afaik.
> Also,
> > > > 1.4.0
> > > > > > is
> > > > > > > around a year old, you might want to use this repository
> instead
> > > > (from
> > > > > > CDH
> > > > > > > 5.13 Kudu is part of the CDH):
> > > > > > > http://archive.cloudera.com/cdh5/redhat/7/x86_64/cdh/5/
> > > > > > RPMS/x86_64/kudu-1.7.0+cdh5.15.0+0-1.cdh5.15.0.p0.52.el7.
> x86_64.rpm
> > > > > > > >
> > > > > > > > As a general suggestion, I would recommend mocking Kudu for
> > unit
> > > > > tests
> > > > > > > (that’s what a unit test is for after all) and create separate
> > > > > > integration
> > > > > > > tests that actually use Kudu that can be skipped where Kudu is
> > not
> > > > > > > available. Of course the CI should be set up to be able to
> > provide
> > > > all
> > > > > > > necessary integrations for the tests, but a developer wouldn’t
> > have
> > > > to
> > > > > > set
> > > > > > > up Kudu, or use Docker to run the tests if their change doesn’t
> > > > affect
> > > > > > the
> > > > > > > Kudu integration.
> > > > > > > >
> > > > > > > > Attila
> > > > > > > >
> > > > > > > >> On 2018. Jun 29., at 16:42, Tim Robertson <
> > > > > timrobertson100@gmail.com>
> > > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Hi folks,
> > > > > > > >>
> > > > > > > >> I've written Java KuduIO for Apache Beam with integration
> > tests
> > > > > making
> > > > > > > use
> > > > > > > >> of Kudu in Docker.  It is yet to be committed on Apache
> Beam.
> > > > > > > >>
> > > > > > > >> Rather than mocking Kudu client for unit tests I'd like to
> > > explore
> > > > > use
> > > > > > > of
> > > > > > > >> the MiniKuduCluster which "Depends on precompiled kudu,
> > > > kudu-master,
> > > > > > and
> > > > > > > >> kudu-tserver binaries".
> > > > > > > >>
> > > > > > > >> I'd need unit tests to run on the main linux distros and OS
> X.
> > > > > > > >>
> > > > > > > >> For the linux distros, would an approach where I extract the
> > > > > binaries
> > > > > > > from
> > > > > > > >> the packages [1] work please? Or does the MiniKuduCluster
> rely
> > > on
> > > > > > actual
> > > > > > > >> installations? I am pretty weak on C builds and linked
> > libraries
> > > > etc
> > > > > > > (Java
> > > > > > > >> guy, sorry).
> > > > > > > >>
> > > > > > > >> For CentOS I'm exploring this for example:
> > > > > > > >>  rpm2cpio ./kudu-1.4.0+cdh5.12.2+0-1.
> > > > cdh5.12.2.p0.8.el7.x86_64.rpm
> > > > > |
> > > > > > > cpio
> > > > > > > >> -idmv
> > > > > > > >>
> > > > > > > >> I haven't explored OS X options yet.
> > > > > > > >>
> > > > > > > >> Any advice here would greatly be appreciated to save me
> going
> > > > down a
> > > > > > > dead
> > > > > > > >> end.
> > > > > > > >>
> > > > > > > >> Many thanks,
> > > > > > > >> Tim
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> [1] http://kudu.apache.org/docs/installation.html#install_
> > > > packages
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Todd Lipcon
> > > > > Software Engineer, Cloudera
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera