You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pirk.apache.org by Ellison Anne Williams <ea...@gmail.com> on 2016/08/18 16:12:14 UTC

Distributed Tests Before Accepting PR

Hi Guys,

As a friendly public service announcement - please make sure that you run
the distributed test suite before you accept a PR (or at least, before
accepting a PR that touches anything affecting the tests).

Thanks!

Ellison Anne

Re: Distributed Tests Before Accepting PR

Posted by Tim Ellison <t....@gmail.com>.
On 19/08/16 14:20, Ellison Anne Williams wrote:
> Also, AWS and GCP are options - they won't integrate into travis, but they
> are a (relatively) easy way to run through the distributed test suite.
> 
> I had thought about posting some instructions (at some point) for 'How to
> Run the Pirk Distributed Tests on AWS/GCP/etc' to help new folks get up to
> speed quickly. Of course, AWS and GCP both have detailed instructions, but
> they take time to wade through. Would that be helpful?

Yes please!  I'm currently wading.  While I have used plenty of Spark
clusters on-prem and in IBM Bluemix, I'm not familiar with driving AWS/GCP.

May I suggest you jot some notes into the wiki [1], and I'll follow
behind and fill in details etc as I go?

[1] https://cwiki.apache.org/confluence/display/PIRK/Pirk+Home

Thanks,
Tim

> On Fri, Aug 19, 2016 at 9:09 AM, Darin Johnson <db...@gmail.com>
> wrote:
> 
>> I've built full integration tests with hadoop-minicluster before.  They're
>> a pain to setup but aren't bad to maintain once done and could be
>> integrated into travis-ci.
>>
>> On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <t....@gmail.com>
>> wrote:
>>
>>> On 18/08/16 17:12, Ellison Anne Williams wrote:
>>>> As a friendly public service announcement - please make sure that you
>> run
>>>> the distributed test suite before you accept a PR (or at least, before
>>>> accepting a PR that touches anything affecting the tests).
>>>
>>> Mea culpa.
>>>
>>> My usual working practice is:
>>>  - hack, hack, hack
>>>  - run mvn clean test locally
>>>  - commit to new local branch
>>>  - push to my github fork
>>>  - wait until Travis declares it tested ok
>>>  - open the PR, expect the PR to pass the Travis checks
>>>
>>> Now I agree that I should also be doing the distributed tests; and even
>>> more so as I work my way up the Pirk stack into the distributed code.
>>>
>>> What I really want is the equivalent of a Travis check for the stuff I'm
>>> doing, and the PRs I'm reviewing.  Any thoughts about how we can achieve
>>> that as I try to figure out how I can run the distributed tests?
>>>
>>> Regards,
>>> Tim
>>>
>>>
>>>
>>
> 

Re: Distributed Tests Before Accepting PR

Posted by Suneel Marthi <su...@gmail.com>.
Other projects like Spark, flink, Oryx that are first class Hadoop citizens
are setup to run integration tests that involve kafka, spinning up a mini
spark cluster etc... and run the same on Travis.

We can target that for next release.

On Fri, Aug 19, 2016 at 9:20 AM, Ellison Anne Williams <
eawilliamspirk@gmail.com> wrote:

> Also, AWS and GCP are options - they won't integrate into travis, but they
> are a (relatively) easy way to run through the distributed test suite.
>
> I had thought about posting some instructions (at some point) for 'How to
> Run the Pirk Distributed Tests on AWS/GCP/etc' to help new folks get up to
> speed quickly. Of course, AWS and GCP both have detailed instructions, but
> they take time to wade through. Would that be helpful?
>
> On Fri, Aug 19, 2016 at 9:09 AM, Darin Johnson <db...@gmail.com>
> wrote:
>
> > I've built full integration tests with hadoop-minicluster before.
> They're
> > a pain to setup but aren't bad to maintain once done and could be
> > integrated into travis-ci.
> >
> > On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <t....@gmail.com>
> > wrote:
> >
> > > On 18/08/16 17:12, Ellison Anne Williams wrote:
> > > > As a friendly public service announcement - please make sure that you
> > run
> > > > the distributed test suite before you accept a PR (or at least,
> before
> > > > accepting a PR that touches anything affecting the tests).
> > >
> > > Mea culpa.
> > >
> > > My usual working practice is:
> > >  - hack, hack, hack
> > >  - run mvn clean test locally
> > >  - commit to new local branch
> > >  - push to my github fork
> > >  - wait until Travis declares it tested ok
> > >  - open the PR, expect the PR to pass the Travis checks
> > >
> > > Now I agree that I should also be doing the distributed tests; and even
> > > more so as I work my way up the Pirk stack into the distributed code.
> > >
> > > What I really want is the equivalent of a Travis check for the stuff
> I'm
> > > doing, and the PRs I'm reviewing.  Any thoughts about how we can
> achieve
> > > that as I try to figure out how I can run the distributed tests?
> > >
> > > Regards,
> > > Tim
> > >
> > >
> > >
> >
>

Re: Distributed Tests Before Accepting PR

Posted by Tim Ellison <t....@gmail.com>.
On 23/08/16 14:51, Ellison Anne Williams wrote:
> The distributed tests are self contained except for letting the application
> know (1) where to find Elasticsearch and (2) setting spark.home for the
> SparkLauncher. For MapReduce distributed testing that reads from hdfs, it
> should work out-of-the-box on a correctly configured Hadoop cluster running
> YARN.

Do you recommend using the AWS EMR clusters, or setting up Hadoop/YARN
myself on regular EC2 instances?

I created some EMR clusters, and after bringing them up/down a few times
learned something about the Amazon pricing model ;-)  I also have a
Bluemix Hadoop cluster.

It would seem that using EMR I'll need to upload the Pirk exe-jar to run
it as a stage.  What I was hoping to do was run Hadoop locally as the
master, and just use the cluster for the slave nodes - does that make sense?

> For Elasticseach, you need to set 'es.nodes' and 'es.port' in the
> pirk.properties file packaged in the jar. This is actually a bug in that it
> should be able to be set from a local properties file -- I have entered a
> JIRA and will fix it shortly.
> 
> For Spark, the the DistributedTestSuite using SparkLaucher which needs to
> know where to find the bin directory containing the spark-submit script.
> The 'spark.home' property in the pirk.properties file defines this
> location. Typically, it is '/usr', but (FYI) I have needed to put the full
> path to the 'original' install location in the Cloudera distribution that I
> have been using lately.

Again, this sounds like you are running the driver locally, and the
executors remotely, correct?

> I will update the testing webpage to reflect the properties that need to be
> set for the various distributed tests. Thanks for pointing this out!
> 
> As time allows, I can add primers for AWS and GCP (specifically).

Thanks. It is important that we do distributed testing, so any pointers
would be appreciated.

Regards,
Tim


> On Tue, Aug 23, 2016 at 9:19 AM, Tim Ellison <t....@gmail.com> wrote:
> 
>> On 19/08/16 14:20, Ellison Anne Williams wrote:
>>> Also, AWS and GCP are options - they won't integrate into travis, but
>> they
>>> are a (relatively) easy way to run through the distributed test suite.
>>>
>>> I had thought about posting some instructions (at some point) for 'How to
>>> Run the Pirk Distributed Tests on AWS/GCP/etc' to help new folks get up
>> to
>>> speed quickly. Of course, AWS and GCP both have detailed instructions,
>> but
>>> they take time to wade through. Would that be helpful?
>>
>> I've had a play with AWS and got a Spark cluster defined and started --
>> but our instructions for running Pirk distributed tests [1] don't really
>> give enough information on how to send it work to do.
>>
>> It'll take me a while to figure it out from the code, so if you can
>> share any properties etc that would be helpful.
>>
>> [1] http://pirk.incubator.apache.org/for_developers#testing
>>
>> Thanks,
>> Tim
>>
>>
>>> On Fri, Aug 19, 2016 at 9:09 AM, Darin Johnson <db...@gmail.com>
>>> wrote:
>>>
>>>> I've built full integration tests with hadoop-minicluster before.
>> They're
>>>> a pain to setup but aren't bad to maintain once done and could be
>>>> integrated into travis-ci.
>>>>
>>>> On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <t....@gmail.com>
>>>> wrote:
>>>>
>>>>> On 18/08/16 17:12, Ellison Anne Williams wrote:
>>>>>> As a friendly public service announcement - please make sure that you
>>>> run
>>>>>> the distributed test suite before you accept a PR (or at least, before
>>>>>> accepting a PR that touches anything affecting the tests).
>>>>>
>>>>> Mea culpa.
>>>>>
>>>>> My usual working practice is:
>>>>>  - hack, hack, hack
>>>>>  - run mvn clean test locally
>>>>>  - commit to new local branch
>>>>>  - push to my github fork
>>>>>  - wait until Travis declares it tested ok
>>>>>  - open the PR, expect the PR to pass the Travis checks
>>>>>
>>>>> Now I agree that I should also be doing the distributed tests; and even
>>>>> more so as I work my way up the Pirk stack into the distributed code.
>>>>>
>>>>> What I really want is the equivalent of a Travis check for the stuff
>> I'm
>>>>> doing, and the PRs I'm reviewing.  Any thoughts about how we can
>> achieve
>>>>> that as I try to figure out how I can run the distributed tests?
>>>>>
>>>>> Regards,
>>>>> Tim
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: Distributed Tests Before Accepting PR

Posted by Ellison Anne Williams <ea...@gmail.com>.
The distributed tests are self contained except for letting the application
know (1) where to find Elasticsearch and (2) setting spark.home for the
SparkLauncher. For MapReduce distributed testing that reads from hdfs, it
should work out-of-the-box on a correctly configured Hadoop cluster running
YARN.

For Elasticseach, you need to set 'es.nodes' and 'es.port' in the
pirk.properties file packaged in the jar. This is actually a bug in that it
should be able to be set from a local properties file -- I have entered a
JIRA and will fix it shortly.

For Spark, the the DistributedTestSuite using SparkLaucher which needs to
know where to find the bin directory containing the spark-submit script.
The 'spark.home' property in the pirk.properties file defines this
location. Typically, it is '/usr', but (FYI) I have needed to put the full
path to the 'original' install location in the Cloudera distribution that I
have been using lately.

I will update the testing webpage to reflect the properties that need to be
set for the various distributed tests. Thanks for pointing this out!

As time allows, I can add primers for AWS and GCP (specifically).



On Tue, Aug 23, 2016 at 9:19 AM, Tim Ellison <t....@gmail.com> wrote:

> On 19/08/16 14:20, Ellison Anne Williams wrote:
> > Also, AWS and GCP are options - they won't integrate into travis, but
> they
> > are a (relatively) easy way to run through the distributed test suite.
> >
> > I had thought about posting some instructions (at some point) for 'How to
> > Run the Pirk Distributed Tests on AWS/GCP/etc' to help new folks get up
> to
> > speed quickly. Of course, AWS and GCP both have detailed instructions,
> but
> > they take time to wade through. Would that be helpful?
>
> I've had a play with AWS and got a Spark cluster defined and started --
> but our instructions for running Pirk distributed tests [1] don't really
> give enough information on how to send it work to do.
>
> It'll take me a while to figure it out from the code, so if you can
> share any properties etc that would be helpful.
>
> [1] http://pirk.incubator.apache.org/for_developers#testing
>
> Thanks,
> Tim
>
>
> > On Fri, Aug 19, 2016 at 9:09 AM, Darin Johnson <db...@gmail.com>
> > wrote:
> >
> >> I've built full integration tests with hadoop-minicluster before.
> They're
> >> a pain to setup but aren't bad to maintain once done and could be
> >> integrated into travis-ci.
> >>
> >> On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <t....@gmail.com>
> >> wrote:
> >>
> >>> On 18/08/16 17:12, Ellison Anne Williams wrote:
> >>>> As a friendly public service announcement - please make sure that you
> >> run
> >>>> the distributed test suite before you accept a PR (or at least, before
> >>>> accepting a PR that touches anything affecting the tests).
> >>>
> >>> Mea culpa.
> >>>
> >>> My usual working practice is:
> >>>  - hack, hack, hack
> >>>  - run mvn clean test locally
> >>>  - commit to new local branch
> >>>  - push to my github fork
> >>>  - wait until Travis declares it tested ok
> >>>  - open the PR, expect the PR to pass the Travis checks
> >>>
> >>> Now I agree that I should also be doing the distributed tests; and even
> >>> more so as I work my way up the Pirk stack into the distributed code.
> >>>
> >>> What I really want is the equivalent of a Travis check for the stuff
> I'm
> >>> doing, and the PRs I'm reviewing.  Any thoughts about how we can
> achieve
> >>> that as I try to figure out how I can run the distributed tests?
> >>>
> >>> Regards,
> >>> Tim
> >>>
> >>>
> >>>
> >>
> >
>

Re: Distributed Tests Before Accepting PR

Posted by Tim Ellison <t....@gmail.com>.
On 19/08/16 14:20, Ellison Anne Williams wrote:
> Also, AWS and GCP are options - they won't integrate into travis, but they
> are a (relatively) easy way to run through the distributed test suite.
> 
> I had thought about posting some instructions (at some point) for 'How to
> Run the Pirk Distributed Tests on AWS/GCP/etc' to help new folks get up to
> speed quickly. Of course, AWS and GCP both have detailed instructions, but
> they take time to wade through. Would that be helpful?

I've had a play with AWS and got a Spark cluster defined and started --
but our instructions for running Pirk distributed tests [1] don't really
give enough information on how to send it work to do.

It'll take me a while to figure it out from the code, so if you can
share any properties etc that would be helpful.

[1] http://pirk.incubator.apache.org/for_developers#testing

Thanks,
Tim


> On Fri, Aug 19, 2016 at 9:09 AM, Darin Johnson <db...@gmail.com>
> wrote:
> 
>> I've built full integration tests with hadoop-minicluster before.  They're
>> a pain to setup but aren't bad to maintain once done and could be
>> integrated into travis-ci.
>>
>> On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <t....@gmail.com>
>> wrote:
>>
>>> On 18/08/16 17:12, Ellison Anne Williams wrote:
>>>> As a friendly public service announcement - please make sure that you
>> run
>>>> the distributed test suite before you accept a PR (or at least, before
>>>> accepting a PR that touches anything affecting the tests).
>>>
>>> Mea culpa.
>>>
>>> My usual working practice is:
>>>  - hack, hack, hack
>>>  - run mvn clean test locally
>>>  - commit to new local branch
>>>  - push to my github fork
>>>  - wait until Travis declares it tested ok
>>>  - open the PR, expect the PR to pass the Travis checks
>>>
>>> Now I agree that I should also be doing the distributed tests; and even
>>> more so as I work my way up the Pirk stack into the distributed code.
>>>
>>> What I really want is the equivalent of a Travis check for the stuff I'm
>>> doing, and the PRs I'm reviewing.  Any thoughts about how we can achieve
>>> that as I try to figure out how I can run the distributed tests?
>>>
>>> Regards,
>>> Tim
>>>
>>>
>>>
>>
> 

Re: Distributed Tests Before Accepting PR

Posted by Ellison Anne Williams <ea...@gmail.com>.
Also, AWS and GCP are options - they won't integrate into travis, but they
are a (relatively) easy way to run through the distributed test suite.

I had thought about posting some instructions (at some point) for 'How to
Run the Pirk Distributed Tests on AWS/GCP/etc' to help new folks get up to
speed quickly. Of course, AWS and GCP both have detailed instructions, but
they take time to wade through. Would that be helpful?

On Fri, Aug 19, 2016 at 9:09 AM, Darin Johnson <db...@gmail.com>
wrote:

> I've built full integration tests with hadoop-minicluster before.  They're
> a pain to setup but aren't bad to maintain once done and could be
> integrated into travis-ci.
>
> On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <t....@gmail.com>
> wrote:
>
> > On 18/08/16 17:12, Ellison Anne Williams wrote:
> > > As a friendly public service announcement - please make sure that you
> run
> > > the distributed test suite before you accept a PR (or at least, before
> > > accepting a PR that touches anything affecting the tests).
> >
> > Mea culpa.
> >
> > My usual working practice is:
> >  - hack, hack, hack
> >  - run mvn clean test locally
> >  - commit to new local branch
> >  - push to my github fork
> >  - wait until Travis declares it tested ok
> >  - open the PR, expect the PR to pass the Travis checks
> >
> > Now I agree that I should also be doing the distributed tests; and even
> > more so as I work my way up the Pirk stack into the distributed code.
> >
> > What I really want is the equivalent of a Travis check for the stuff I'm
> > doing, and the PRs I'm reviewing.  Any thoughts about how we can achieve
> > that as I try to figure out how I can run the distributed tests?
> >
> > Regards,
> > Tim
> >
> >
> >
>

Re: Distributed Tests Before Accepting PR

Posted by Darin Johnson <db...@gmail.com>.
I've built full integration tests with hadoop-minicluster before.  They're
a pain to setup but aren't bad to maintain once done and could be
integrated into travis-ci.

On Fri, Aug 19, 2016 at 9:02 AM, Tim Ellison <t....@gmail.com> wrote:

> On 18/08/16 17:12, Ellison Anne Williams wrote:
> > As a friendly public service announcement - please make sure that you run
> > the distributed test suite before you accept a PR (or at least, before
> > accepting a PR that touches anything affecting the tests).
>
> Mea culpa.
>
> My usual working practice is:
>  - hack, hack, hack
>  - run mvn clean test locally
>  - commit to new local branch
>  - push to my github fork
>  - wait until Travis declares it tested ok
>  - open the PR, expect the PR to pass the Travis checks
>
> Now I agree that I should also be doing the distributed tests; and even
> more so as I work my way up the Pirk stack into the distributed code.
>
> What I really want is the equivalent of a Travis check for the stuff I'm
> doing, and the PRs I'm reviewing.  Any thoughts about how we can achieve
> that as I try to figure out how I can run the distributed tests?
>
> Regards,
> Tim
>
>
>

Re: Distributed Tests Before Accepting PR

Posted by Tim Ellison <t....@gmail.com>.
On 18/08/16 17:12, Ellison Anne Williams wrote:
> As a friendly public service announcement - please make sure that you run
> the distributed test suite before you accept a PR (or at least, before
> accepting a PR that touches anything affecting the tests).

Mea culpa.

My usual working practice is:
 - hack, hack, hack
 - run mvn clean test locally
 - commit to new local branch
 - push to my github fork
 - wait until Travis declares it tested ok
 - open the PR, expect the PR to pass the Travis checks

Now I agree that I should also be doing the distributed tests; and even
more so as I work my way up the Pirk stack into the distributed code.

What I really want is the equivalent of a Travis check for the stuff I'm
doing, and the PRs I'm reviewing.  Any thoughts about how we can achieve
that as I try to figure out how I can run the distributed tests?

Regards,
Tim



Re: Distributed Tests Before Accepting PR

Posted by Ellison Anne Williams <ea...@gmail.com>.
No worries! :)

On Thu, Aug 18, 2016 at 12:16 PM, Suneel Marthi <su...@gmail.com>
wrote:

> Apologies for this.
> Reverting now.
>
> On Thu, Aug 18, 2016 at 12:12 PM, Ellison Anne Williams <
> eawilliamspirk@gmail.com> wrote:
>
> > Hi Guys,
> >
> > As a friendly public service announcement - please make sure that you run
> > the distributed test suite before you accept a PR (or at least, before
> > accepting a PR that touches anything affecting the tests).
> >
> > Thanks!
> >
> > Ellison Anne
> >
>

Re: Distributed Tests Before Accepting PR

Posted by Suneel Marthi <su...@gmail.com>.
Apologies for this.
Reverting now.

On Thu, Aug 18, 2016 at 12:12 PM, Ellison Anne Williams <
eawilliamspirk@gmail.com> wrote:

> Hi Guys,
>
> As a friendly public service announcement - please make sure that you run
> the distributed test suite before you accept a PR (or at least, before
> accepting a PR that touches anything affecting the tests).
>
> Thanks!
>
> Ellison Anne
>