You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pirk.apache.org by Ellison Anne Williams <ea...@apache.org> on 2016/09/14 12:48:29 UTC

Pirk Submodule Refactor

Starting a new thread to discuss the Pirk submodule refactor (so that we
don't get too mixed up with the 'Next short term goal?' thread)...

Darin - Thanks for jumping in on the last email (I think that we hit send
at exactly the same time :)). Can you describe what you have in mind for
the submodule refactor so that we can discuss?

(No, there is not an umbrella JIRA for producing separate Responder jars -
please feel free to go ahead and add one)

Re: Pirk Submodule Refactor

Posted by Ellison Anne Williams <ea...@apache.org>.

Sounds great Darin.

On Fri, Sep 23, 2016 at 9:16 PM, Darin Johnson <db...@gmail.com>
wrote:

> Reposting the Google doc to this thread for cohesion.
> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> mMrRQyynQ-Q6MFbI/edit?usp=sharing
>
> If there's no issues I'd like to start this, since it involves a lot of
> file moves (which are a pain to revise) my plan is to break it into a few
> modules at a time.  That should make the reviews and testing easier as
> well.
>
> On Sep 17, 2016 8:54 AM, "Darin Johnson" <db...@gmail.com> wrote:
>
> > Great
> >
> > Will have pirk-63 sometime this weekend, which will help.  Then go ahead
> > with these suggestions as a base, I may come back with some thoughts
> about
> > the cli.  I'd like for new responders not to modify pirk-core.  There's a
> > few ways I've done this before, but need to decide which will be least
> > intrusive and easiest to maintain.
> >
> > Darin
> >
> > On Sep 15, 2016 6:17 PM, "Ellison Anne Williams" <ea...@apache.org>
> > wrote:
> >
> >> On Thu, Sep 15, 2016 at 9:25 AM, Tim Ellison <t....@gmail.com>
> >> wrote:
> >>
> >> > On 15/09/16 09:21, Darin Johnson wrote:
> >> > > So my goal for the submodule refactor is pretty straight forward, I
> >> > > basically want to separate the project into: pirk-core, pirk-hadoop,
> >> > > pirk-spark, and pirk-storm.  I think separating pirk-core and
> >> pirk-hadoop
> >> > > is very ambitious at this point as there's a lot of dependencies
> we'd
> >> > need
> >> > > to resolve.
> >> >
> >> > I think it is quite do-able, but agree that it is more work than the
> >> > others.
> >> >
> >> > > pirk-storm and pirk-spark would be much more reasonable
> >> > > starts.  I'd also recommend we do something about the elastic-search
> >> > > dependency, it seems more of an InputFormat option than part of
> >> > pirk-core.
> >> > >
> >> > > There's a few blockers to this:
> >> > >
> >> > > This first is PIRK-63, here the ResponderDriver was calling the
> >> Responder
> >> > > class of each specific framework.  That fix is straight-forward,
> pass
> >> the
> >> > > class as an argument I've started that here:
> >> > > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was
> >> expected
> >> > > earlier - but had a rebase issue - so didn't get around to
> completing
> >> a
> >> > few
> >> > > bits).  It also allows at least at the rudimentary level to add new
> >> > > responders by putting jars on the classpath vs recompiling pirk.
> I'm
> >> > open
> >> > > to suggestions here - I think it's very likely ResponderLauncher
> isn't
> >> > > needed and instead run could be a static member of another class,
> >> however
> >> > > based off what was in ResponderDriver this seems to be the approach
> >> with
> >> > > the fewest issues - especially storm.
> >> >
> >> > Give a shout when you want somebody to take a look.
> >> >
> >> > > Another is how we're passing the command line options in
> ResponderCLI,
> >> > here
> >> > > we're defining framework specific elements to the Driver which are
> >> then
> >> > > passed to the underlying framework Driver/Topology/ToolRunner.  This
> >> > > becomes more difficult to address cleanly so seems like a good place
> >> to
> >> > > start a discussion.  I think this mechanism should be addressed
> >> though as
> >> > > putting options for every framework/inputformat everyone could want
> in
> >> > > untenable.
> >> >
> >> > I guess one option is structure the monolithic CLI around plug-ins, so
> >> > rather than today's
> >> >   ResponderDriver <options for everything> ...
> >> >
> >> > it would become
> >> >   ResponderDriver --pir embedSelector=true --storm option=value ...
> >> >
> >> > and so on; or more likely
> >> >   ResponderDriver --pir optionsFile=pir.properties --storm
> >> > optionsFile=storm.properties ...
> >> >
> >> > and then the driver can delegate each command line option group to the
> >> > correct handler.
> >> >
> >>
> >> Agree with this approach - as the CLI already supports reading all of
> the
> >> properties from properties files (both local and in hdfs), it should be
> >> relatively straightforward to delegate the handling.
> >>
> >>
> >> >
> >> > > After addressing these two based off some experiments it looks like
> >> > > breaking out storm is pretty straight forward and spark should be
> >> about
> >> > the
> >> > > same.  I'm still looking at elastic search.  Hadoop would require
> more
> >> > and
> >> > > I think less important for now.
> >> >
> >> > Much of the Hadoop dependency I see is 'services' for storing and
> >> > retrieving, these could be abstracted out to a provider model.
> >> >
> >>
> >> Agreed.
> >>
> >>
> >> >
> >> > > I also realize there are other ways to break the modules apart and
> I'm
> >> > > mostly discussing modularizing the responder package, however that's
> >> were
> >> > > most of the dependencies lie so I think that's were we'll get the
> most
> >> > > impact.
> >> >
> >> > +1, the responder and CLI.
> >> >
> >> > Regards,
> >> > Tim
> >> >
> >> >
> >> > > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <
> >> suneel.marthi@gmail.com>
> >> > > wrote:
> >> > >
> >> > >> +1 to start a sub-thread. I would suggest to start a shared Google
> >> Doc
> >> > for
> >> > >> dumping ideas and evolving a structure.
> >> > >>
> >> > >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
> >> > >> eawilliams@apache.org> wrote:
> >> > >>
> >> > >>> Starting a new thread to discuss the Pirk submodule refactor (so
> >> that
> >> > we
> >> > >>> don't get too mixed up with the 'Next short term goal?' thread)...
> >> > >>>
> >> > >>> Darin - Thanks for jumping in on the last email (I think that we
> hit
> >> > send
> >> > >>> at exactly the same time :)). Can you describe what you have in
> mind
> >> > for
> >> > >>> the submodule refactor so that we can discuss?
> >> > >>>
> >> > >>> (No, there is not an umbrella JIRA for producing separate
> Responder
> >> > jars
> >> > >> -
> >> > >>> please feel free to go ahead and add one)
> >> > >>>
> >> > >>
> >> > >
> >> >
> >>
> >
>

Re: Pirk Submodule Refactor

Posted by Darin Johnson <db...@gmail.com>.

Reposting the Google doc to this thread for cohesion.
https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_mMrRQyynQ-Q6MFbI/edit?usp=sharing

If there's no issues I'd like to start this, since it involves a lot of
file moves (which are a pain to revise) my plan is to break it into a few
modules at a time.  That should make the reviews and testing easier as well.

On Sep 17, 2016 8:54 AM, "Darin Johnson" <db...@gmail.com> wrote:

> Great
>
> Will have pirk-63 sometime this weekend, which will help.  Then go ahead
> with these suggestions as a base, I may come back with some thoughts about
> the cli.  I'd like for new responders not to modify pirk-core.  There's a
> few ways I've done this before, but need to decide which will be least
> intrusive and easiest to maintain.
>
> Darin
>
> On Sep 15, 2016 6:17 PM, "Ellison Anne Williams" <ea...@apache.org>
> wrote:
>
>> On Thu, Sep 15, 2016 at 9:25 AM, Tim Ellison <t....@gmail.com>
>> wrote:
>>
>> > On 15/09/16 09:21, Darin Johnson wrote:
>> > > So my goal for the submodule refactor is pretty straight forward, I
>> > > basically want to separate the project into: pirk-core, pirk-hadoop,
>> > > pirk-spark, and pirk-storm.  I think separating pirk-core and
>> pirk-hadoop
>> > > is very ambitious at this point as there's a lot of dependencies we'd
>> > need
>> > > to resolve.
>> >
>> > I think it is quite do-able, but agree that it is more work than the
>> > others.
>> >
>> > > pirk-storm and pirk-spark would be much more reasonable
>> > > starts.  I'd also recommend we do something about the elastic-search
>> > > dependency, it seems more of an InputFormat option than part of
>> > pirk-core.
>> > >
>> > > There's a few blockers to this:
>> > >
>> > > This first is PIRK-63, here the ResponderDriver was calling the
>> Responder
>> > > class of each specific framework.  That fix is straight-forward, pass
>> the
>> > > class as an argument I've started that here:
>> > > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was
>> expected
>> > > earlier - but had a rebase issue - so didn't get around to completing
>> a
>> > few
>> > > bits).  It also allows at least at the rudimentary level to add new
>> > > responders by putting jars on the classpath vs recompiling pirk.  I'm
>> > open
>> > > to suggestions here - I think it's very likely ResponderLauncher isn't
>> > > needed and instead run could be a static member of another class,
>> however
>> > > based off what was in ResponderDriver this seems to be the approach
>> with
>> > > the fewest issues - especially storm.
>> >
>> > Give a shout when you want somebody to take a look.
>> >
>> > > Another is how we're passing the command line options in ResponderCLI,
>> > here
>> > > we're defining framework specific elements to the Driver which are
>> then
>> > > passed to the underlying framework Driver/Topology/ToolRunner.  This
>> > > becomes more difficult to address cleanly so seems like a good place
>> to
>> > > start a discussion.  I think this mechanism should be addressed
>> though as
>> > > putting options for every framework/inputformat everyone could want in
>> > > untenable.
>> >
>> > I guess one option is structure the monolithic CLI around plug-ins, so
>> > rather than today's
>> >   ResponderDriver <options for everything> ...
>> >
>> > it would become
>> >   ResponderDriver --pir embedSelector=true --storm option=value ...
>> >
>> > and so on; or more likely
>> >   ResponderDriver --pir optionsFile=pir.properties --storm
>> > optionsFile=storm.properties ...
>> >
>> > and then the driver can delegate each command line option group to the
>> > correct handler.
>> >
>>
>> Agree with this approach - as the CLI already supports reading all of the
>> properties from properties files (both local and in hdfs), it should be
>> relatively straightforward to delegate the handling.
>>
>>
>> >
>> > > After addressing these two based off some experiments it looks like
>> > > breaking out storm is pretty straight forward and spark should be
>> about
>> > the
>> > > same.  I'm still looking at elastic search.  Hadoop would require more
>> > and
>> > > I think less important for now.
>> >
>> > Much of the Hadoop dependency I see is 'services' for storing and
>> > retrieving, these could be abstracted out to a provider model.
>> >
>>
>> Agreed.
>>
>>
>> >
>> > > I also realize there are other ways to break the modules apart and I'm
>> > > mostly discussing modularizing the responder package, however that's
>> were
>> > > most of the dependencies lie so I think that's were we'll get the most
>> > > impact.
>> >
>> > +1, the responder and CLI.
>> >
>> > Regards,
>> > Tim
>> >
>> >
>> > > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <
>> suneel.marthi@gmail.com>
>> > > wrote:
>> > >
>> > >> +1 to start a sub-thread. I would suggest to start a shared Google
>> Doc
>> > for
>> > >> dumping ideas and evolving a structure.
>> > >>
>> > >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
>> > >> eawilliams@apache.org> wrote:
>> > >>
>> > >>> Starting a new thread to discuss the Pirk submodule refactor (so
>> that
>> > we
>> > >>> don't get too mixed up with the 'Next short term goal?' thread)...
>> > >>>
>> > >>> Darin - Thanks for jumping in on the last email (I think that we hit
>> > send
>> > >>> at exactly the same time :)). Can you describe what you have in mind
>> > for
>> > >>> the submodule refactor so that we can discuss?
>> > >>>
>> > >>> (No, there is not an umbrella JIRA for producing separate Responder
>> > jars
>> > >> -
>> > >>> please feel free to go ahead and add one)
>> > >>>
>> > >>
>> > >
>> >
>>
>

Re: Pirk Submodule Refactor

Posted by Darin Johnson <db...@gmail.com>.

Great

Will have pirk-63 sometime this weekend, which will help.  Then go ahead
with these suggestions as a base, I may come back with some thoughts about
the cli.  I'd like for new responders not to modify pirk-core.  There's a
few ways I've done this before, but need to decide which will be least
intrusive and easiest to maintain.

Darin

On Sep 15, 2016 6:17 PM, "Ellison Anne Williams" <ea...@apache.org>
wrote:

> On Thu, Sep 15, 2016 at 9:25 AM, Tim Ellison <t....@gmail.com>
> wrote:
>
> > On 15/09/16 09:21, Darin Johnson wrote:
> > > So my goal for the submodule refactor is pretty straight forward, I
> > > basically want to separate the project into: pirk-core, pirk-hadoop,
> > > pirk-spark, and pirk-storm.  I think separating pirk-core and
> pirk-hadoop
> > > is very ambitious at this point as there's a lot of dependencies we'd
> > need
> > > to resolve.
> >
> > I think it is quite do-able, but agree that it is more work than the
> > others.
> >
> > > pirk-storm and pirk-spark would be much more reasonable
> > > starts.  I'd also recommend we do something about the elastic-search
> > > dependency, it seems more of an InputFormat option than part of
> > pirk-core.
> > >
> > > There's a few blockers to this:
> > >
> > > This first is PIRK-63, here the ResponderDriver was calling the
> Responder
> > > class of each specific framework.  That fix is straight-forward, pass
> the
> > > class as an argument I've started that here:
> > > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected
> > > earlier - but had a rebase issue - so didn't get around to completing a
> > few
> > > bits).  It also allows at least at the rudimentary level to add new
> > > responders by putting jars on the classpath vs recompiling pirk.  I'm
> > open
> > > to suggestions here - I think it's very likely ResponderLauncher isn't
> > > needed and instead run could be a static member of another class,
> however
> > > based off what was in ResponderDriver this seems to be the approach
> with
> > > the fewest issues - especially storm.
> >
> > Give a shout when you want somebody to take a look.
> >
> > > Another is how we're passing the command line options in ResponderCLI,
> > here
> > > we're defining framework specific elements to the Driver which are then
> > > passed to the underlying framework Driver/Topology/ToolRunner.  This
> > > becomes more difficult to address cleanly so seems like a good place to
> > > start a discussion.  I think this mechanism should be addressed though
> as
> > > putting options for every framework/inputformat everyone could want in
> > > untenable.
> >
> > I guess one option is structure the monolithic CLI around plug-ins, so
> > rather than today's
> >   ResponderDriver <options for everything> ...
> >
> > it would become
> >   ResponderDriver --pir embedSelector=true --storm option=value ...
> >
> > and so on; or more likely
> >   ResponderDriver --pir optionsFile=pir.properties --storm
> > optionsFile=storm.properties ...
> >
> > and then the driver can delegate each command line option group to the
> > correct handler.
> >
>
> Agree with this approach - as the CLI already supports reading all of the
> properties from properties files (both local and in hdfs), it should be
> relatively straightforward to delegate the handling.
>
>
> >
> > > After addressing these two based off some experiments it looks like
> > > breaking out storm is pretty straight forward and spark should be about
> > the
> > > same.  I'm still looking at elastic search.  Hadoop would require more
> > and
> > > I think less important for now.
> >
> > Much of the Hadoop dependency I see is 'services' for storing and
> > retrieving, these could be abstracted out to a provider model.
> >
>
> Agreed.
>
>
> >
> > > I also realize there are other ways to break the modules apart and I'm
> > > mostly discussing modularizing the responder package, however that's
> were
> > > most of the dependencies lie so I think that's were we'll get the most
> > > impact.
> >
> > +1, the responder and CLI.
> >
> > Regards,
> > Tim
> >
> >
> > > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <
> suneel.marthi@gmail.com>
> > > wrote:
> > >
> > >> +1 to start a sub-thread. I would suggest to start a shared Google Doc
> > for
> > >> dumping ideas and evolving a structure.
> > >>
> > >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
> > >> eawilliams@apache.org> wrote:
> > >>
> > >>> Starting a new thread to discuss the Pirk submodule refactor (so that
> > we
> > >>> don't get too mixed up with the 'Next short term goal?' thread)...
> > >>>
> > >>> Darin - Thanks for jumping in on the last email (I think that we hit
> > send
> > >>> at exactly the same time :)). Can you describe what you have in mind
> > for
> > >>> the submodule refactor so that we can discuss?
> > >>>
> > >>> (No, there is not an umbrella JIRA for producing separate Responder
> > jars
> > >> -
> > >>> please feel free to go ahead and add one)
> > >>>
> > >>
> > >
> >
>

Re: Pirk Submodule Refactor

Posted by Ellison Anne Williams <ea...@apache.org>.

On Thu, Sep 15, 2016 at 9:25 AM, Tim Ellison <t....@gmail.com> wrote:

> On 15/09/16 09:21, Darin Johnson wrote:
> > So my goal for the submodule refactor is pretty straight forward, I
> > basically want to separate the project into: pirk-core, pirk-hadoop,
> > pirk-spark, and pirk-storm.  I think separating pirk-core and pirk-hadoop
> > is very ambitious at this point as there's a lot of dependencies we'd
> need
> > to resolve.
>
> I think it is quite do-able, but agree that it is more work than the
> others.
>
> > pirk-storm and pirk-spark would be much more reasonable
> > starts.  I'd also recommend we do something about the elastic-search
> > dependency, it seems more of an InputFormat option than part of
> pirk-core.
> >
> > There's a few blockers to this:
> >
> > This first is PIRK-63, here the ResponderDriver was calling the Responder
> > class of each specific framework.  That fix is straight-forward, pass the
> > class as an argument I've started that here:
> > https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected
> > earlier - but had a rebase issue - so didn't get around to completing a
> few
> > bits).  It also allows at least at the rudimentary level to add new
> > responders by putting jars on the classpath vs recompiling pirk.  I'm
> open
> > to suggestions here - I think it's very likely ResponderLauncher isn't
> > needed and instead run could be a static member of another class, however
> > based off what was in ResponderDriver this seems to be the approach with
> > the fewest issues - especially storm.
>
> Give a shout when you want somebody to take a look.
>
> > Another is how we're passing the command line options in ResponderCLI,
> here
> > we're defining framework specific elements to the Driver which are then
> > passed to the underlying framework Driver/Topology/ToolRunner.  This
> > becomes more difficult to address cleanly so seems like a good place to
> > start a discussion.  I think this mechanism should be addressed though as
> > putting options for every framework/inputformat everyone could want in
> > untenable.
>
> I guess one option is structure the monolithic CLI around plug-ins, so
> rather than today's
>   ResponderDriver <options for everything> ...
>
> it would become
>   ResponderDriver --pir embedSelector=true --storm option=value ...
>
> and so on; or more likely
>   ResponderDriver --pir optionsFile=pir.properties --storm
> optionsFile=storm.properties ...
>
> and then the driver can delegate each command line option group to the
> correct handler.
>

Agree with this approach - as the CLI already supports reading all of the
properties from properties files (both local and in hdfs), it should be
relatively straightforward to delegate the handling.


>
> > After addressing these two based off some experiments it looks like
> > breaking out storm is pretty straight forward and spark should be about
> the
> > same.  I'm still looking at elastic search.  Hadoop would require more
> and
> > I think less important for now.
>
> Much of the Hadoop dependency I see is 'services' for storing and
> retrieving, these could be abstracted out to a provider model.
>

Agreed.


>
> > I also realize there are other ways to break the modules apart and I'm
> > mostly discussing modularizing the responder package, however that's were
> > most of the dependencies lie so I think that's were we'll get the most
> > impact.
>
> +1, the responder and CLI.
>
> Regards,
> Tim
>
>
> > On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <su...@gmail.com>
> > wrote:
> >
> >> +1 to start a sub-thread. I would suggest to start a shared Google Doc
> for
> >> dumping ideas and evolving a structure.
> >>
> >> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
> >> eawilliams@apache.org> wrote:
> >>
> >>> Starting a new thread to discuss the Pirk submodule refactor (so that
> we
> >>> don't get too mixed up with the 'Next short term goal?' thread)...
> >>>
> >>> Darin - Thanks for jumping in on the last email (I think that we hit
> send
> >>> at exactly the same time :)). Can you describe what you have in mind
> for
> >>> the submodule refactor so that we can discuss?
> >>>
> >>> (No, there is not an umbrella JIRA for producing separate Responder
> jars
> >> -
> >>> please feel free to go ahead and add one)
> >>>
> >>
> >
>

Re: Pirk Submodule Refactor

Posted by Tim Ellison <t....@gmail.com>.

On 15/09/16 09:21, Darin Johnson wrote:
> So my goal for the submodule refactor is pretty straight forward, I
> basically want to separate the project into: pirk-core, pirk-hadoop,
> pirk-spark, and pirk-storm.  I think separating pirk-core and pirk-hadoop
> is very ambitious at this point as there's a lot of dependencies we'd need
> to resolve.

I think it is quite do-able, but agree that it is more work than the others.

> pirk-storm and pirk-spark would be much more reasonable
> starts.  I'd also recommend we do something about the elastic-search
> dependency, it seems more of an InputFormat option than part of pirk-core.
> 
> There's a few blockers to this:
> 
> This first is PIRK-63, here the ResponderDriver was calling the Responder
> class of each specific framework.  That fix is straight-forward, pass the
> class as an argument I've started that here:
> https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected
> earlier - but had a rebase issue - so didn't get around to completing a few
> bits).  It also allows at least at the rudimentary level to add new
> responders by putting jars on the classpath vs recompiling pirk.  I'm open
> to suggestions here - I think it's very likely ResponderLauncher isn't
> needed and instead run could be a static member of another class, however
> based off what was in ResponderDriver this seems to be the approach with
> the fewest issues - especially storm.

Give a shout when you want somebody to take a look.

> Another is how we're passing the command line options in ResponderCLI, here
> we're defining framework specific elements to the Driver which are then
> passed to the underlying framework Driver/Topology/ToolRunner.  This
> becomes more difficult to address cleanly so seems like a good place to
> start a discussion.  I think this mechanism should be addressed though as
> putting options for every framework/inputformat everyone could want in
> untenable.

I guess one option is structure the monolithic CLI around plug-ins, so
rather than today's
  ResponderDriver <options for everything> ...

it would become
  ResponderDriver --pir embedSelector=true --storm option=value ...

and so on; or more likely
  ResponderDriver --pir optionsFile=pir.properties --storm
optionsFile=storm.properties ...

and then the driver can delegate each command line option group to the
correct handler.

> After addressing these two based off some experiments it looks like
> breaking out storm is pretty straight forward and spark should be about the
> same.  I'm still looking at elastic search.  Hadoop would require more and
> I think less important for now.

Much of the Hadoop dependency I see is 'services' for storing and
retrieving, these could be abstracted out to a provider model.

> I also realize there are other ways to break the modules apart and I'm
> mostly discussing modularizing the responder package, however that's were
> most of the dependencies lie so I think that's were we'll get the most
> impact.

+1, the responder and CLI.

Regards,
Tim


> On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <su...@gmail.com>
> wrote:
> 
>> +1 to start a sub-thread. I would suggest to start a shared Google Doc for
>> dumping ideas and evolving a structure.
>>
>> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
>> eawilliams@apache.org> wrote:
>>
>>> Starting a new thread to discuss the Pirk submodule refactor (so that we
>>> don't get too mixed up with the 'Next short term goal?' thread)...
>>>
>>> Darin - Thanks for jumping in on the last email (I think that we hit send
>>> at exactly the same time :)). Can you describe what you have in mind for
>>> the submodule refactor so that we can discuss?
>>>
>>> (No, there is not an umbrella JIRA for producing separate Responder jars
>> -
>>> please feel free to go ahead and add one)
>>>
>>
>

Re: Pirk Submodule Refactor

Posted by Darin Johnson <db...@gmail.com>.

So my goal for the submodule refactor is pretty straight forward, I
basically want to separate the project into: pirk-core, pirk-hadoop,
pirk-spark, and pirk-storm.  I think separating pirk-core and pirk-hadoop
is very ambitious at this point as there's a lot of dependencies we'd need
to resolve.  pirk-storm and pirk-spark would be much more reasonable
starts.  I'd also recommend we do something about the elastic-search
dependency, it seems more of an InputFormat option than part of pirk-core.

There's a few blockers to this:

This first is PIRK-63, here the ResponderDriver was calling the Responder
class of each specific framework.  That fix is straight-forward, pass the
class as an argument I've started that here:
https://github.com/DarinJ/incubator-pirk/tree/Pirk-63 (PR was expected
earlier - but had a rebase issue - so didn't get around to completing a few
bits).  It also allows at least at the rudimentary level to add new
responders by putting jars on the classpath vs recompiling pirk.  I'm open
to suggestions here - I think it's very likely ResponderLauncher isn't
needed and instead run could be a static member of another class, however
based off what was in ResponderDriver this seems to be the approach with
the fewest issues - especially storm.

Another is how we're passing the command line options in ResponderCLI, here
we're defining framework specific elements to the Driver which are then
passed to the underlying framework Driver/Topology/ToolRunner.  This
becomes more difficult to address cleanly so seems like a good place to
start a discussion.  I think this mechanism should be addressed though as
putting options for every framework/inputformat everyone could want in
untenable.

After addressing these two based off some experiments it looks like
breaking out storm is pretty straight forward and spark should be about the
same.  I'm still looking at elastic search.  Hadoop would require more and
I think less important for now.

I also realize there are other ways to break the modules apart and I'm
mostly discussing modularizing the responder package, however that's were
most of the dependencies lie so I think that's were we'll get the most
impact.

Darin

On Wed, Sep 14, 2016 at 8:52 AM, Suneel Marthi <su...@gmail.com>
wrote:

> +1 to start a sub-thread. I would suggest to start a shared Google Doc for
> dumping ideas and evolving a structure.
>
> On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
> eawilliams@apache.org> wrote:
>
> > Starting a new thread to discuss the Pirk submodule refactor (so that we
> > don't get too mixed up with the 'Next short term goal?' thread)...
> >
> > Darin - Thanks for jumping in on the last email (I think that we hit send
> > at exactly the same time :)). Can you describe what you have in mind for
> > the submodule refactor so that we can discuss?
> >
> > (No, there is not an umbrella JIRA for producing separate Responder jars
> -
> > please feel free to go ahead and add one)
> >
>

Re: Pirk Submodule Refactor

Posted by Suneel Marthi <su...@gmail.com>.

+1 to start a sub-thread. I would suggest to start a shared Google Doc for
dumping ideas and evolving a structure.

On Wed, Sep 14, 2016 at 2:48 PM, Ellison Anne Williams <
eawilliams@apache.org> wrote:

> Starting a new thread to discuss the Pirk submodule refactor (so that we
> don't get too mixed up with the 'Next short term goal?' thread)...
>
> Darin - Thanks for jumping in on the last email (I think that we hit send
> at exactly the same time :)). Can you describe what you have in mind for
> the submodule refactor so that we can discuss?
>
> (No, there is not an umbrella JIRA for producing separate Responder jars -
> please feel free to go ahead and add one)
>