You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pirk.apache.org by Tim Ellison <t....@gmail.com> on 2016/09/14 12:06:18 UTC

Next short term goal?

I'm reluctant to dive into more Pirk dev at the moment because

 (i) there was plenty of discussion around the release about
restructuring the code base into a number of modules.  I figure that
will be a disruptive change best undertaken with most folks "out of the
pool".

 (ii) I have failed to set up a reliable distributed testing system for
myself, and wholeheartedly agree that distrib testing is important for
all devs.  Hopefully I can fix that as improved instructions come along,
though perhaps we should be approaching infra to get a shared Pirk
cluster available for testing?

 (iii) other stuff gets in the way :-)

I realise that these are within my powers to address, and I'm still keen
to participate in moving Pirk forward; so what is the next hill to
conquer for the project?

Regards,
Tim

Re: Next short term goal?

Posted by Andy LoPresto <al...@apache.org>.
I would be very excited to participate (as an audience member) in Pirk 101.

Also, a lesson learned from NiFi — tagging Jira tickets with “beginner” (when appropriate — not “beginner if you have 8 years of FBP experience”) helps new community members find small, insulated pieces they can contribute to without causing any harm and gets them invested and feeling accomplished with a low barrier to entry.

Andy LoPresto
alopresto@apache.org
alopresto.apache@gmail.com
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Sep 15, 2016, at 1:49 PM, Ellison Anne Williams <ea...@apache.org> wrote:
> 
> Yes, I put in for ApacheCon - haven't heard anything back yet...
> 
> Will give some thought to doing a Pirk 101 video or something similar :)
> 
> On Thu, Sep 15, 2016 at 9:05 AM, Tim Ellison <t....@gmail.com> wrote:
> 
>> On 14/09/16 14:08, Darin Johnson wrote:
>>> +1 to pirk 101.
>> 
>> Ellison Anne, did you put in a conference proposal?  A good approach to
>> "pirk 101" would be recording an introductory presentation video and
>> linking to that from the project website.
>> 
>> I've read the code enough to see what Pirk is doing, but need some
>> additional theory background before I'd feel confident pitching it myself.
>> 
>> Regards,
>> Tim
>> 
>>> On Sep 14, 2016 8:59 AM, "Suneel Marthi" <sm...@apache.org> wrote:
>>> 
>>>> Once I am back home from Berlin next week, I was looking at working on
>>>> Responder-Flink, Flink's awesome Streaming capabilities (its not
>> mini-batch
>>>> like the other hyped ones) makes Flink the perfect distributed engine to
>>>> implement the Wideskies algorithm.
>>>> 
>>>> It would help for Pirk-noobs like me and others if there was a Pirk 101
>>>> (via Google hangout maybe).
>>>> 
>>>> On Wed, Sep 14, 2016 at 2:55 PM, Ellison Anne Williams <
>>>> eawilliams@apache.org> wrote:
>>>> 
>>>>> Darin - That would be great!
>>>>> 
>>>>> Sounds like it will take a bit of time to get it together.
>>>>> 
>>>>> In the meantime/very near term, we could provide a step-by-step
>>>>> AWS/GCP/Azure instructions for bringing up a small cluster, running the
>>>>> distributed tests, and debugging. Admittedly, most of this is handled
>> in
>>>>> the AWS/GCP/Azure documentation, but, in my experience, the
>> documentation
>>>>> is confusing and very time consuming to get through the first time.
>>>>> 
>>>>> The goal is for smart people who want to contribute to Pirk (who don't
>>>> work
>>>>> on these systems often) to be able to get up and running quickly to
>> test
>>>>> their improvements and additions to the codebase.
>>>>> 
>>>>> On Wed, Sep 14, 2016 at 8:46 AM, Darin Johnson <
>> dbjohnson1978@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> Regarding (ii) I think docker-compose might help with distributed
>>>> testing
>>>>>> for development and Terraform scripts for benchmark work in AWS/GCE.
>> I
>>>>>> might be doing this soon - if interested I can share both.
>>>>>> 
>>>>>> On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:
>>>>>> 
>>>>>>> I'm reluctant to dive into more Pirk dev at the moment because
>>>>>>> 
>>>>>>> (i) there was plenty of discussion around the release about
>>>>>>> restructuring the code base into a number of modules.  I figure that
>>>>>>> will be a disruptive change best undertaken with most folks "out of
>>>> the
>>>>>>> pool".
>>>>>>> 
>>>>>>> (ii) I have failed to set up a reliable distributed testing system
>>>> for
>>>>>>> myself, and wholeheartedly agree that distrib testing is important
>>>> for
>>>>>>> all devs.  Hopefully I can fix that as improved instructions come
>>>>> along,
>>>>>>> though perhaps we should be approaching infra to get a shared Pirk
>>>>>>> cluster available for testing?
>>>>>>> 
>>>>>>> (iii) other stuff gets in the way :-)
>>>>>>> 
>>>>>>> I realise that these are within my powers to address, and I'm still
>>>>> keen
>>>>>>> to participate in moving Pirk forward; so what is the next hill to
>>>>>>> conquer for the project?
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Tim
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: Next short term goal?

Posted by Ellison Anne Williams <ea...@apache.org>.
Yes, I put in for ApacheCon - haven't heard anything back yet...

Will give some thought to doing a Pirk 101 video or something similar :)

On Thu, Sep 15, 2016 at 9:05 AM, Tim Ellison <t....@gmail.com> wrote:

> On 14/09/16 14:08, Darin Johnson wrote:
> > +1 to pirk 101.
>
> Ellison Anne, did you put in a conference proposal?  A good approach to
> "pirk 101" would be recording an introductory presentation video and
> linking to that from the project website.
>
> I've read the code enough to see what Pirk is doing, but need some
> additional theory background before I'd feel confident pitching it myself.
>
> Regards,
> Tim
>
> > On Sep 14, 2016 8:59 AM, "Suneel Marthi" <sm...@apache.org> wrote:
> >
> >> Once I am back home from Berlin next week, I was looking at working on
> >> Responder-Flink, Flink's awesome Streaming capabilities (its not
> mini-batch
> >> like the other hyped ones) makes Flink the perfect distributed engine to
> >> implement the Wideskies algorithm.
> >>
> >> It would help for Pirk-noobs like me and others if there was a Pirk 101
> >> (via Google hangout maybe).
> >>
> >> On Wed, Sep 14, 2016 at 2:55 PM, Ellison Anne Williams <
> >> eawilliams@apache.org> wrote:
> >>
> >>> Darin - That would be great!
> >>>
> >>> Sounds like it will take a bit of time to get it together.
> >>>
> >>> In the meantime/very near term, we could provide a step-by-step
> >>> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> >>> distributed tests, and debugging. Admittedly, most of this is handled
> in
> >>> the AWS/GCP/Azure documentation, but, in my experience, the
> documentation
> >>> is confusing and very time consuming to get through the first time.
> >>>
> >>> The goal is for smart people who want to contribute to Pirk (who don't
> >> work
> >>> on these systems often) to be able to get up and running quickly to
> test
> >>> their improvements and additions to the codebase.
> >>>
> >>> On Wed, Sep 14, 2016 at 8:46 AM, Darin Johnson <
> dbjohnson1978@gmail.com>
> >>> wrote:
> >>>
> >>>> Regarding (ii) I think docker-compose might help with distributed
> >> testing
> >>>> for development and Terraform scripts for benchmark work in AWS/GCE.
> I
> >>>> might be doing this soon - if interested I can share both.
> >>>>
> >>>> On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:
> >>>>
> >>>>> I'm reluctant to dive into more Pirk dev at the moment because
> >>>>>
> >>>>>  (i) there was plenty of discussion around the release about
> >>>>> restructuring the code base into a number of modules.  I figure that
> >>>>> will be a disruptive change best undertaken with most folks "out of
> >> the
> >>>>> pool".
> >>>>>
> >>>>>  (ii) I have failed to set up a reliable distributed testing system
> >> for
> >>>>> myself, and wholeheartedly agree that distrib testing is important
> >> for
> >>>>> all devs.  Hopefully I can fix that as improved instructions come
> >>> along,
> >>>>> though perhaps we should be approaching infra to get a shared Pirk
> >>>>> cluster available for testing?
> >>>>>
> >>>>>  (iii) other stuff gets in the way :-)
> >>>>>
> >>>>> I realise that these are within my powers to address, and I'm still
> >>> keen
> >>>>> to participate in moving Pirk forward; so what is the next hill to
> >>>>> conquer for the project?
> >>>>>
> >>>>> Regards,
> >>>>> Tim
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: Next short term goal?

Posted by Tim Ellison <t....@gmail.com>.
On 14/09/16 14:08, Darin Johnson wrote:
> +1 to pirk 101.

Ellison Anne, did you put in a conference proposal?  A good approach to
"pirk 101" would be recording an introductory presentation video and
linking to that from the project website.

I've read the code enough to see what Pirk is doing, but need some
additional theory background before I'd feel confident pitching it myself.

Regards,
Tim

> On Sep 14, 2016 8:59 AM, "Suneel Marthi" <sm...@apache.org> wrote:
> 
>> Once I am back home from Berlin next week, I was looking at working on
>> Responder-Flink, Flink's awesome Streaming capabilities (its not mini-batch
>> like the other hyped ones) makes Flink the perfect distributed engine to
>> implement the Wideskies algorithm.
>>
>> It would help for Pirk-noobs like me and others if there was a Pirk 101
>> (via Google hangout maybe).
>>
>> On Wed, Sep 14, 2016 at 2:55 PM, Ellison Anne Williams <
>> eawilliams@apache.org> wrote:
>>
>>> Darin - That would be great!
>>>
>>> Sounds like it will take a bit of time to get it together.
>>>
>>> In the meantime/very near term, we could provide a step-by-step
>>> AWS/GCP/Azure instructions for bringing up a small cluster, running the
>>> distributed tests, and debugging. Admittedly, most of this is handled in
>>> the AWS/GCP/Azure documentation, but, in my experience, the documentation
>>> is confusing and very time consuming to get through the first time.
>>>
>>> The goal is for smart people who want to contribute to Pirk (who don't
>> work
>>> on these systems often) to be able to get up and running quickly to test
>>> their improvements and additions to the codebase.
>>>
>>> On Wed, Sep 14, 2016 at 8:46 AM, Darin Johnson <db...@gmail.com>
>>> wrote:
>>>
>>>> Regarding (ii) I think docker-compose might help with distributed
>> testing
>>>> for development and Terraform scripts for benchmark work in AWS/GCE.  I
>>>> might be doing this soon - if interested I can share both.
>>>>
>>>> On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:
>>>>
>>>>> I'm reluctant to dive into more Pirk dev at the moment because
>>>>>
>>>>>  (i) there was plenty of discussion around the release about
>>>>> restructuring the code base into a number of modules.  I figure that
>>>>> will be a disruptive change best undertaken with most folks "out of
>> the
>>>>> pool".
>>>>>
>>>>>  (ii) I have failed to set up a reliable distributed testing system
>> for
>>>>> myself, and wholeheartedly agree that distrib testing is important
>> for
>>>>> all devs.  Hopefully I can fix that as improved instructions come
>>> along,
>>>>> though perhaps we should be approaching infra to get a shared Pirk
>>>>> cluster available for testing?
>>>>>
>>>>>  (iii) other stuff gets in the way :-)
>>>>>
>>>>> I realise that these are within my powers to address, and I'm still
>>> keen
>>>>> to participate in moving Pirk forward; so what is the next hill to
>>>>> conquer for the project?
>>>>>
>>>>> Regards,
>>>>> Tim
>>>>>
>>>>
>>>
>>
> 

Re: Next short term goal?

Posted by Darin Johnson <db...@gmail.com>.
+1 to pirk 101.

On Sep 14, 2016 8:59 AM, "Suneel Marthi" <sm...@apache.org> wrote:

> Once I am back home from Berlin next week, I was looking at working on
> Responder-Flink, Flink's awesome Streaming capabilities (its not mini-batch
> like the other hyped ones) makes Flink the perfect distributed engine to
> implement the Wideskies algorithm.
>
> It would help for Pirk-noobs like me and others if there was a Pirk 101
> (via Google hangout maybe).
>
> On Wed, Sep 14, 2016 at 2:55 PM, Ellison Anne Williams <
> eawilliams@apache.org> wrote:
>
> > Darin - That would be great!
> >
> > Sounds like it will take a bit of time to get it together.
> >
> > In the meantime/very near term, we could provide a step-by-step
> > AWS/GCP/Azure instructions for bringing up a small cluster, running the
> > distributed tests, and debugging. Admittedly, most of this is handled in
> > the AWS/GCP/Azure documentation, but, in my experience, the documentation
> > is confusing and very time consuming to get through the first time.
> >
> > The goal is for smart people who want to contribute to Pirk (who don't
> work
> > on these systems often) to be able to get up and running quickly to test
> > their improvements and additions to the codebase.
> >
> > On Wed, Sep 14, 2016 at 8:46 AM, Darin Johnson <db...@gmail.com>
> > wrote:
> >
> > > Regarding (ii) I think docker-compose might help with distributed
> testing
> > > for development and Terraform scripts for benchmark work in AWS/GCE.  I
> > > might be doing this soon - if interested I can share both.
> > >
> > > On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:
> > >
> > > > I'm reluctant to dive into more Pirk dev at the moment because
> > > >
> > > >  (i) there was plenty of discussion around the release about
> > > > restructuring the code base into a number of modules.  I figure that
> > > > will be a disruptive change best undertaken with most folks "out of
> the
> > > > pool".
> > > >
> > > >  (ii) I have failed to set up a reliable distributed testing system
> for
> > > > myself, and wholeheartedly agree that distrib testing is important
> for
> > > > all devs.  Hopefully I can fix that as improved instructions come
> > along,
> > > > though perhaps we should be approaching infra to get a shared Pirk
> > > > cluster available for testing?
> > > >
> > > >  (iii) other stuff gets in the way :-)
> > > >
> > > > I realise that these are within my powers to address, and I'm still
> > keen
> > > > to participate in moving Pirk forward; so what is the next hill to
> > > > conquer for the project?
> > > >
> > > > Regards,
> > > > Tim
> > > >
> > >
> >
>

Re: Next short term goal?

Posted by Suneel Marthi <sm...@apache.org>.
Once I am back home from Berlin next week, I was looking at working on
Responder-Flink, Flink's awesome Streaming capabilities (its not mini-batch
like the other hyped ones) makes Flink the perfect distributed engine to
implement the Wideskies algorithm.

It would help for Pirk-noobs like me and others if there was a Pirk 101
(via Google hangout maybe).

On Wed, Sep 14, 2016 at 2:55 PM, Ellison Anne Williams <
eawilliams@apache.org> wrote:

> Darin - That would be great!
>
> Sounds like it will take a bit of time to get it together.
>
> In the meantime/very near term, we could provide a step-by-step
> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> distributed tests, and debugging. Admittedly, most of this is handled in
> the AWS/GCP/Azure documentation, but, in my experience, the documentation
> is confusing and very time consuming to get through the first time.
>
> The goal is for smart people who want to contribute to Pirk (who don't work
> on these systems often) to be able to get up and running quickly to test
> their improvements and additions to the codebase.
>
> On Wed, Sep 14, 2016 at 8:46 AM, Darin Johnson <db...@gmail.com>
> wrote:
>
> > Regarding (ii) I think docker-compose might help with distributed testing
> > for development and Terraform scripts for benchmark work in AWS/GCE.  I
> > might be doing this soon - if interested I can share both.
> >
> > On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:
> >
> > > I'm reluctant to dive into more Pirk dev at the moment because
> > >
> > >  (i) there was plenty of discussion around the release about
> > > restructuring the code base into a number of modules.  I figure that
> > > will be a disruptive change best undertaken with most folks "out of the
> > > pool".
> > >
> > >  (ii) I have failed to set up a reliable distributed testing system for
> > > myself, and wholeheartedly agree that distrib testing is important for
> > > all devs.  Hopefully I can fix that as improved instructions come
> along,
> > > though perhaps we should be approaching infra to get a shared Pirk
> > > cluster available for testing?
> > >
> > >  (iii) other stuff gets in the way :-)
> > >
> > > I realise that these are within my powers to address, and I'm still
> keen
> > > to participate in moving Pirk forward; so what is the next hill to
> > > conquer for the project?
> > >
> > > Regards,
> > > Tim
> > >
> >
>

Re: Distributed testing on AWS (was: Re: Next short term goal?)

Posted by Jacob Wilder <ja...@gmail.com>.
I put up my instructions for GCP and AWS on this page: https://pirk.incubator.apache.org/cloud_instructions
I also have prototype instructions for Azure but their HDInsight platform doesn’t yet support Java 8. 

Not everything works completely right but it is a start. 

On 9/15/16, 09:01, "Tim Ellison" <t....@gmail.com> wrote:

    On 14/09/16 13:55, Ellison Anne Williams wrote:
    > In the meantime/very near term, we could provide a step-by-step
    > AWS/GCP/Azure instructions for bringing up a small cluster, running the
    > distributed tests, and debugging. Admittedly, most of this is handled in
    > the AWS/GCP/Azure documentation, but, in my experience, the documentation
    > is confusing and very time consuming to get through the first time.
    
    So do you advise running bare VMs and installing Hadoop, or running the
    AWS Elastic Map Reduce service?
    
    Here's where I've been going so far, but don't want to start a wiki
    entry with instructions if this is the wrong approach altogether...
    
      - Sign-up for an AWS account.
     	https://aws.amazon.com
    
      - Obtain access keys
     	https://console.aws.amazon.com/iam
    
      - Install aws command-line tool
     	https://aws.amazon.com/cli
    
      - Configure aws tool
     Choose a default region in the EMR group
    http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region
    
     $ aws configure
     AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
     AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
     Default region name [None]: eu-east-1
     Default output format [None]: text
    
      - Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".
    
      - Create a Spark cluster
    
     $ aws emr create-cluster \
       --name "Spark Cluster" \
       --release-label emr-5.0.0 \
       --applications Name=Spark \
       --ec2-attributes KeyName=SparkClusterKeys \
       --instance-type m3.xlarge \
       --instance-count 3 \
       --use-default-roles
    
     answers a cluster ID, e.g. j-3KVTXXXXXX7UG
    
      - Upload a JAR file
    
     $ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
     $ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
    SparkClusterKeys.pem --command "hadoop jar <pirkJar>
    org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
    pirkJar>"
    
      - Terminate cluster
    
     $ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG
    
    
    Look at charges per hour and think, there may be a better way...
    
    Regards,
    Tim
    



Distributed testing on AWS (was: Re: Next short term goal?)

Posted by Tim Ellison <t....@gmail.com>.
On 14/09/16 13:55, Ellison Anne Williams wrote:
> In the meantime/very near term, we could provide a step-by-step
> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> distributed tests, and debugging. Admittedly, most of this is handled in
> the AWS/GCP/Azure documentation, but, in my experience, the documentation
> is confusing and very time consuming to get through the first time.

So do you advise running bare VMs and installing Hadoop, or running the
AWS Elastic Map Reduce service?

Here's where I've been going so far, but don't want to start a wiki
entry with instructions if this is the wrong approach altogether...

  - Sign-up for an AWS account.
 	https://aws.amazon.com

  - Obtain access keys
 	https://console.aws.amazon.com/iam

  - Install aws command-line tool
 	https://aws.amazon.com/cli

  - Configure aws tool
 Choose a default region in the EMR group
http://docs.aws.amazon.com/general/latest/gr/rande.html#emr_region

 $ aws configure
 AWS Access Key ID [None]: AKIAI44QH8DHBEXAMPLE
 AWS Secret Access Key [None]: je7MtGbClwBF/2Zp9Utk/h3yCo8nvbEXAMPLEKEY
 Default region name [None]: eu-east-1
 Default output format [None]: text

  - Create an EC2 key pair, and download e.g. "SparkClusterKeys.pem".

  - Create a Spark cluster

 $ aws emr create-cluster \
   --name "Spark Cluster" \
   --release-label emr-5.0.0 \
   --applications Name=Spark \
   --ec2-attributes KeyName=SparkClusterKeys \
   --instance-type m3.xlarge \
   --instance-count 3 \
   --use-default-roles

 answers a cluster ID, e.g. j-3KVTXXXXXX7UG

  - Upload a JAR file

 $ aws emr put --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --src apache-pirk-0.0.1-SNAPSHOT-exe.jar
 $ aws emr ssh --cluster-id j-3KVTXXXXXX7UG --key-pair-file
SparkClusterKeys.pem --command "hadoop jar <pirkJar>
org.apache.pirk.test.distributed.DistributedTestDriver -j <full path to
pirkJar>"

  - Terminate cluster

 $ aws emr terminate-clusters --cluster-ids j-3KVTXXXXXX7UG


Look at charges per hour and think, there may be a better way...

Regards,
Tim

Re: Next short term goal?

Posted by Darin Johnson <db...@gmail.com>.
That'd be great, it'll help me with both setups.

On Sep 14, 2016 8:55 AM, "Ellison Anne Williams" <ea...@apache.org>
wrote:

> Darin - That would be great!
>
> Sounds like it will take a bit of time to get it together.
>
> In the meantime/very near term, we could provide a step-by-step
> AWS/GCP/Azure instructions for bringing up a small cluster, running the
> distributed tests, and debugging. Admittedly, most of this is handled in
> the AWS/GCP/Azure documentation, but, in my experience, the documentation
> is confusing and very time consuming to get through the first time.
>
> The goal is for smart people who want to contribute to Pirk (who don't work
> on these systems often) to be able to get up and running quickly to test
> their improvements and additions to the codebase.
>
> On Wed, Sep 14, 2016 at 8:46 AM, Darin Johnson <db...@gmail.com>
> wrote:
>
> > Regarding (ii) I think docker-compose might help with distributed testing
> > for development and Terraform scripts for benchmark work in AWS/GCE.  I
> > might be doing this soon - if interested I can share both.
> >
> > On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:
> >
> > > I'm reluctant to dive into more Pirk dev at the moment because
> > >
> > >  (i) there was plenty of discussion around the release about
> > > restructuring the code base into a number of modules.  I figure that
> > > will be a disruptive change best undertaken with most folks "out of the
> > > pool".
> > >
> > >  (ii) I have failed to set up a reliable distributed testing system for
> > > myself, and wholeheartedly agree that distrib testing is important for
> > > all devs.  Hopefully I can fix that as improved instructions come
> along,
> > > though perhaps we should be approaching infra to get a shared Pirk
> > > cluster available for testing?
> > >
> > >  (iii) other stuff gets in the way :-)
> > >
> > > I realise that these are within my powers to address, and I'm still
> keen
> > > to participate in moving Pirk forward; so what is the next hill to
> > > conquer for the project?
> > >
> > > Regards,
> > > Tim
> > >
> >
>

Re: Next short term goal?

Posted by Ellison Anne Williams <ea...@apache.org>.
Darin - That would be great!

Sounds like it will take a bit of time to get it together.

In the meantime/very near term, we could provide a step-by-step
AWS/GCP/Azure instructions for bringing up a small cluster, running the
distributed tests, and debugging. Admittedly, most of this is handled in
the AWS/GCP/Azure documentation, but, in my experience, the documentation
is confusing and very time consuming to get through the first time.

The goal is for smart people who want to contribute to Pirk (who don't work
on these systems often) to be able to get up and running quickly to test
their improvements and additions to the codebase.

On Wed, Sep 14, 2016 at 8:46 AM, Darin Johnson <db...@gmail.com>
wrote:

> Regarding (ii) I think docker-compose might help with distributed testing
> for development and Terraform scripts for benchmark work in AWS/GCE.  I
> might be doing this soon - if interested I can share both.
>
> On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:
>
> > I'm reluctant to dive into more Pirk dev at the moment because
> >
> >  (i) there was plenty of discussion around the release about
> > restructuring the code base into a number of modules.  I figure that
> > will be a disruptive change best undertaken with most folks "out of the
> > pool".
> >
> >  (ii) I have failed to set up a reliable distributed testing system for
> > myself, and wholeheartedly agree that distrib testing is important for
> > all devs.  Hopefully I can fix that as improved instructions come along,
> > though perhaps we should be approaching infra to get a shared Pirk
> > cluster available for testing?
> >
> >  (iii) other stuff gets in the way :-)
> >
> > I realise that these are within my powers to address, and I'm still keen
> > to participate in moving Pirk forward; so what is the next hill to
> > conquer for the project?
> >
> > Regards,
> > Tim
> >
>

Re: Next short term goal?

Posted by Darin Johnson <db...@gmail.com>.
Regarding (ii) I think docker-compose might help with distributed testing
for development and Terraform scripts for benchmark work in AWS/GCE.  I
might be doing this soon - if interested I can share both.

On Sep 14, 2016 8:06 AM, "Tim Ellison" <t....@gmail.com> wrote:

> I'm reluctant to dive into more Pirk dev at the moment because
>
>  (i) there was plenty of discussion around the release about
> restructuring the code base into a number of modules.  I figure that
> will be a disruptive change best undertaken with most folks "out of the
> pool".
>
>  (ii) I have failed to set up a reliable distributed testing system for
> myself, and wholeheartedly agree that distrib testing is important for
> all devs.  Hopefully I can fix that as improved instructions come along,
> though perhaps we should be approaching infra to get a shared Pirk
> cluster available for testing?
>
>  (iii) other stuff gets in the way :-)
>
> I realise that these are within my powers to address, and I'm still keen
> to participate in moving Pirk forward; so what is the next hill to
> conquer for the project?
>
> Regards,
> Tim
>

Re: Next short term goal?

Posted by Ellison Anne Williams <ea...@apache.org>.
Hi Guys,

A few comments, presented topically:

Submodule Re-Factor:

My understanding is that Darin is working on the initial submodule
re-factor and design; as a first step, he proposed abstracting the
Responder to a ResponderLauncher interface (which I think makes lots of
sense - see the discussion thread over this past weekend.

*** Darin, can you describe the approach that have in mind for the
submodule refactor so that we can discuss?


Distributed Testing:

Tim -- You are such a valuable committer to the project, I certainly don't
want you to be deterred by a lack of immediate distributed testing know-how!

Let's prioritize getting the instructions together for AWS/GCP/Azure.
Perhaps we can shoot to have those completed and posted on the website by
the end of the weekend-- I just added a JIRA PIRK-64 to this effect. Would
that help?

Does AWS/GCP/Azure give any credits to Apache projects for development and
testing? Has anyone walked down this road before?

Releasing:

I would like for us to release once a month. Our last release was 8.29, so
let's shoot for another minor release at the end of September. Following
semantic versioning, would folks prefer that our September release be 0.1.1
or 0.2.0? If we can complete the submodule refactor in time, it should
definitely (IMO) be 0.2.0 - otherwise, I think that we could argue for
0.1.1. Thoughts?

Next Steps:

I am currently working on scale testing the Spark streaming and keep
running into (and fixing) auxiliary bugs along the way (which means that it
is taking longer than anticipated).

There are several outstanding JIRA issues for non-distributed improvements
to the codebase (i.e., PIRK-45, PIRK-14). I would propose that we need to
be more aggressive at documenting outstanding Pirk issues in JIRA so that
it is easy for folks to find and choose ways to contribute.


Thanks!

Ellison Anne









On Wed, Sep 14, 2016 at 8:16 AM, Suneel Marthi <sm...@apache.org> wrote:

> On Wed, Sep 14, 2016 at 2:06 PM, Tim Ellison <t....@gmail.com>
> wrote:
>
> > I'm reluctant to dive into more Pirk dev at the moment because
> >
> >  (i) there was plenty of discussion around the release about
> > restructuring the code base into a number of modules.  I figure that
> > will be a disruptive change best undertaken with most folks "out of the
> > pool".
> >
>
> This is a top priority IMO.  We can start the discussions around how do we
> wanna modularize Pirk.
>
> Would there be use cases wherein some application just needs the Responder
> only as a jar.
>
> OR
>
> Responder on Storm jar, Responder on Flink jar ....
>
> Same thing with Query.
>
> the rest of the classes could probably be part of a "pirk-core" module (and
> its own jar)
>
>
> >
> >  (ii) I have failed to set up a reliable distributed testing system for
> > myself, and wholeheartedly agree that distrib testing is important for
> > all devs.  Hopefully I can fix that as improved instructions come along,
> > though perhaps we should be approaching infra to get a shared Pirk
> > cluster available for testing?
> >
>
> Does Apache Infra provide a hadoop/spark cluster? I didn't think they did
> but I could be wrong.
>
>
> >
> >  (iii) other stuff gets in the way :-)
> >
> > I realise that these are within my powers to address, and I'm still keen
> > to participate in moving Pirk forward; so what is the next hill to
> > conquer for the project?
> >
> > Regards,
> > Tim
> >
>
> I am presently in Berlin attending #FlinkForward and will be back home next
> week. We can start discussions around the next steps and plan for the next
> release.
>

Re: Next short term goal?

Posted by Darin Johnson <db...@gmail.com>.
I'm actually looking at (I) now.  I'm starting small with pirk-63.  A WIP
PR should be in late tonight.  This will keep the driver from needing all
framework dependencies, but we'll need to discuss passing framework
specific options as well.

I could definitely see the use case for separate responder jars.
Especially, given that some potential users may need to incorporate custom
in-house (or commercial) frameworks that can't be part of pirk due to
license issues.

Is there an umbrella Jira for this?

On Sep 14, 2016 8:16 AM, "Suneel Marthi" <sm...@apache.org> wrote:

> On Wed, Sep 14, 2016 at 2:06 PM, Tim Ellison <t....@gmail.com>
> wrote:
>
> > I'm reluctant to dive into more Pirk dev at the moment because
> >
> >  (i) there was plenty of discussion around the release about
> > restructuring the code base into a number of modules.  I figure that
> > will be a disruptive change best undertaken with most folks "out of the
> > pool".
> >
>
> This is a top priority IMO.  We can start the discussions around how do we
> wanna modularize Pirk.
>
> Would there be use cases wherein some application just needs the Responder
> only as a jar.
>
> OR
>
> Responder on Storm jar, Responder on Flink jar ....
>
> Same thing with Query.
>
> the rest of the classes could probably be part of a "pirk-core" module (and
> its own jar)
>
>
> >
> >  (ii) I have failed to set up a reliable distributed testing system for
> > myself, and wholeheartedly agree that distrib testing is important for
> > all devs.  Hopefully I can fix that as improved instructions come along,
> > though perhaps we should be approaching infra to get a shared Pirk
> > cluster available for testing?
> >
>
> Does Apache Infra provide a hadoop/spark cluster? I didn't think they did
> but I could be wrong.
>
>
> >
> >  (iii) other stuff gets in the way :-)
> >
> > I realise that these are within my powers to address, and I'm still keen
> > to participate in moving Pirk forward; so what is the next hill to
> > conquer for the project?
> >
> > Regards,
> > Tim
> >
>
> I am presently in Berlin attending #FlinkForward and will be back home next
> week. We can start discussions around the next steps and plan for the next
> release.
>

Re: Next short term goal?

Posted by Suneel Marthi <sm...@apache.org>.
On Wed, Sep 14, 2016 at 2:06 PM, Tim Ellison <t....@gmail.com> wrote:

> I'm reluctant to dive into more Pirk dev at the moment because
>
>  (i) there was plenty of discussion around the release about
> restructuring the code base into a number of modules.  I figure that
> will be a disruptive change best undertaken with most folks "out of the
> pool".
>

This is a top priority IMO.  We can start the discussions around how do we
wanna modularize Pirk.

Would there be use cases wherein some application just needs the Responder
only as a jar.

OR

Responder on Storm jar, Responder on Flink jar ....

Same thing with Query.

the rest of the classes could probably be part of a "pirk-core" module (and
its own jar)


>
>  (ii) I have failed to set up a reliable distributed testing system for
> myself, and wholeheartedly agree that distrib testing is important for
> all devs.  Hopefully I can fix that as improved instructions come along,
> though perhaps we should be approaching infra to get a shared Pirk
> cluster available for testing?
>

Does Apache Infra provide a hadoop/spark cluster? I didn't think they did
but I could be wrong.


>
>  (iii) other stuff gets in the way :-)
>
> I realise that these are within my powers to address, and I'm still keen
> to participate in moving Pirk forward; so what is the next hill to
> conquer for the project?
>
> Regards,
> Tim
>

I am presently in Berlin attending #FlinkForward and will be back home next
week. We can start discussions around the next steps and plan for the next
release.