You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pirk.apache.org by Darin Johnson <db...@gmail.com> on 2016/09/11 01:34:01 UTC

Making ResponderDriver more general

Hey guys,

I was looking into creating my own responder as a general exercise but as
the jar was getting pretty big I thought it might be useful to first create
a modular build as someone using hadoop not would want to push around storm
dependencies and vice versa.  As I was scoping this I noticed in
ResponderDriver there is the following block:

switch (platform)

    {

      case MAPREDUCE: ...

      case SPARK: ...

      case SPARKSTREAMING: ...

      case STORM: ...

      case STANDALONE:...

    }

This essentially means that pirk must know about all platforms in order to
run.  I think a better approach might be to create an interface
"ResponderLauncher" which the developer of a platform would overload, and
pass the overloaded classname on the command line or via configuration and
loaded at runtime via reflection (this is how hadoop allows different
schedulers).

The would allow better extensibility to other platforms, especially for
users using proprietary or non-apache license compatible tools, along with
starting the process of a multi-module build.  Then one could just put
additional jars in the classpath and run vs modify the pirk code to get
their platform included.

I believe something like:

public interface ResponderLauncher {

public void run(ConfigOpts opts);

public void run();

}

would likely do. Here ConfigOpts is fictional and doesn't appear necessary,
but I thought I should offer some possibility for passing some command line
or other options - suggestions welcome.

I think I could get this done and the [Hadoop,Spark,
SparkStreaming,Storm]ResponderLauncher classes rather quickly but as this
would be my first work on this project I thought it'd be good to solicit
opinions first. Especially as its API breaking and you look to be
attempting semantic versioning.

If this works for everyone, I'd be willing to submit this as a PR and a
second modularizing the build (which will likely be preceded by another
email discussion). Though, I envision it creating pirk-core, pirk-hadoop,
pirk-spark, and pirk-storm artifacts which could then be deployed to
central.

Cheers,
Darin

Re: Making ResponderDriver more general

Posted by Darin Johnson <db...@gmail.com>.

Ellison Anne,

Good to here you're in favor.

Yes the name of the class would be parsed from the command line or taken
from the properties file.  I plan to use the same mechanism as currently
used to get the platform variable.

I'm pro using spark-submit in general moving to hadoop jar removes spark
standalone/mesos users.

I'll try to knock out the properties in the next day or so.

Cheers,
Darin

On Sep 11, 2016 11:50 AM, "Ellison Anne Williams" <ea...@apache.org>
wrote:

Hi Darin,

I think that generalizing the Responder launching in the ResponderDriver
(and elsewhere) with a ResponderLauncher interface makes a lot of sense and
is 'in the spirit' of some of the other generalities within the codebase
(in the schemas, partitioners, base input format, etc).

I am assuming that name of the specific ResponderLauncher implementation
class would be passed as an argument (or parsed from the properties file)
to the ResponderDriver via the same ResponderDriverCLI mechanisms. The
ResponderDriver would then instantiate that class, launching the desired
Responder. Is this what you had in mind?

If so, the only decision point for us is whether or not Spark-based
Responders should be run with spark-submit (i.e., calling the
ResponderDriver with spark-submit - the way it's currently done) or if the
implementations of ResponderLauncher should in turn call SparkLauncher
(meaning that the ResponderDriver could be called with hadoop jar). The
only considerations in forcing Spark-based Responders to use the
SparkLauncher are (1) that it becomes a bit more tricky to launch with
SparkLaucher as the 'spark-home' (the dir containing the spark-submit
script) can be difficult to pick up correctly within some systems (we've
specifically had trouble with AWS and GCP) and (2) all Spark related
configs must be passed as args to the SparkLauncher.

I'm not concerned about altering the API at this point as we are only on
release 0.1.0 -- we need to stabilize the API before a 1.0.0 release, but
we can change it in ways that make sense now to move closer to a stable API.

I am in agreement to proceed with the PR.

Thoughts?

Thanks!

Ellison Anne

On Sat, Sep 10, 2016 at 9:34 PM, Darin Johnson <db...@gmail.com>
wrote:

> Hey guys,
>
> I was looking into creating my own responder as a general exercise but as
> the jar was getting pretty big I thought it might be useful to first
create
> a modular build as someone using hadoop not would want to push around
storm
> dependencies and vice versa.  As I was scoping this I noticed in
> ResponderDriver there is the following block:
>
> switch (platform)
>
>     {
>
>       case MAPREDUCE: ...
>
>       case SPARK: ...
>
>       case SPARKSTREAMING: ...
>
>       case STORM: ...
>
>       case STANDALONE:...
>
>     }
>
> This essentially means that pirk must know about all platforms in order to
> run.  I think a better approach might be to create an interface
> "ResponderLauncher" which the developer of a platform would overload, and
> pass the overloaded classname on the command line or via configuration and
> loaded at runtime via reflection (this is how hadoop allows different
> schedulers).
>
> The would allow better extensibility to other platforms, especially for
> users using proprietary or non-apache license compatible tools, along with
> starting the process of a multi-module build.  Then one could just put
> additional jars in the classpath and run vs modify the pirk code to get
> their platform included.
>
> I believe something like:
>
> public interface ResponderLauncher {
>
> public void run(ConfigOpts opts);
>
> public void run();
>
> }
>
> would likely do. Here ConfigOpts is fictional and doesn't appear
necessary,
> but I thought I should offer some possibility for passing some command
line
> or other options - suggestions welcome.
>
> I think I could get this done and the [Hadoop,Spark,
> SparkStreaming,Storm]ResponderLauncher classes rather quickly but as this
> would be my first work on this project I thought it'd be good to solicit
> opinions first. Especially as its API breaking and you look to be
> attempting semantic versioning.
>
> If this works for everyone, I'd be willing to submit this as a PR and a
> second modularizing the build (which will likely be preceded by another
> email discussion). Though, I envision it creating pirk-core, pirk-hadoop,
> pirk-spark, and pirk-storm artifacts which could then be deployed to
> central.
>
> Cheers,
> Darin
>

Re: Making ResponderDriver more general

Posted by Ellison Anne Williams <ea...@apache.org>.

Hi Darin,

I think that generalizing the Responder launching in the ResponderDriver
(and elsewhere) with a ResponderLauncher interface makes a lot of sense and
is 'in the spirit' of some of the other generalities within the codebase
(in the schemas, partitioners, base input format, etc).

I am assuming that name of the specific ResponderLauncher implementation
class would be passed as an argument (or parsed from the properties file)
to the ResponderDriver via the same ResponderDriverCLI mechanisms. The
ResponderDriver would then instantiate that class, launching the desired
Responder. Is this what you had in mind?

If so, the only decision point for us is whether or not Spark-based
Responders should be run with spark-submit (i.e., calling the
ResponderDriver with spark-submit - the way it's currently done) or if the
implementations of ResponderLauncher should in turn call SparkLauncher
(meaning that the ResponderDriver could be called with hadoop jar). The
only considerations in forcing Spark-based Responders to use the
SparkLauncher are (1) that it becomes a bit more tricky to launch with
SparkLaucher as the 'spark-home' (the dir containing the spark-submit
script) can be difficult to pick up correctly within some systems (we've
specifically had trouble with AWS and GCP) and (2) all Spark related
configs must be passed as args to the SparkLauncher.

I'm not concerned about altering the API at this point as we are only on
release 0.1.0 -- we need to stabilize the API before a 1.0.0 release, but
we can change it in ways that make sense now to move closer to a stable API.

I am in agreement to proceed with the PR.

Thoughts?

Thanks!

Ellison Anne

On Sat, Sep 10, 2016 at 9:34 PM, Darin Johnson <db...@gmail.com>
wrote:

> Hey guys,
>
> I was looking into creating my own responder as a general exercise but as
> the jar was getting pretty big I thought it might be useful to first create
> a modular build as someone using hadoop not would want to push around storm
> dependencies and vice versa.  As I was scoping this I noticed in
> ResponderDriver there is the following block:
>
> switch (platform)
>
>     {
>
>       case MAPREDUCE: ...
>
>       case SPARK: ...
>
>       case SPARKSTREAMING: ...
>
>       case STORM: ...
>
>       case STANDALONE:...
>
>     }
>
> This essentially means that pirk must know about all platforms in order to
> run.  I think a better approach might be to create an interface
> "ResponderLauncher" which the developer of a platform would overload, and
> pass the overloaded classname on the command line or via configuration and
> loaded at runtime via reflection (this is how hadoop allows different
> schedulers).
>
> The would allow better extensibility to other platforms, especially for
> users using proprietary or non-apache license compatible tools, along with
> starting the process of a multi-module build.  Then one could just put
> additional jars in the classpath and run vs modify the pirk code to get
> their platform included.
>
> I believe something like:
>
> public interface ResponderLauncher {
>
> public void run(ConfigOpts opts);
>
> public void run();
>
> }
>
> would likely do. Here ConfigOpts is fictional and doesn't appear necessary,
> but I thought I should offer some possibility for passing some command line
> or other options - suggestions welcome.
>
> I think I could get this done and the [Hadoop,Spark,
> SparkStreaming,Storm]ResponderLauncher classes rather quickly but as this
> would be my first work on this project I thought it'd be good to solicit
> opinions first. Especially as its API breaking and you look to be
> attempting semantic versioning.
>
> If this works for everyone, I'd be willing to submit this as a PR and a
> second modularizing the build (which will likely be preceded by another
> email discussion). Though, I envision it creating pirk-core, pirk-hadoop,
> pirk-spark, and pirk-storm artifacts which could then be deployed to
> central.
>
> Cheers,
> Darin
>