You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pirk.apache.org by DarinJ <gi...@git.apache.org> on 2016/09/19 06:05:26 UTC

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

GitHub user DarinJ opened a pull request:

    https://github.com/apache/incubator-pirk/pull/93

    WIP-Pirk 63-DO NOT MERGE

    This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to other responders without having to modify the actual code of Pirk.  It's submitted for feedback only, please DO NOT MERGE.  I've only tested standalone mode.
        
    It deprecates the "platform" CLI option in favor of the "launcher" option which is the name of a class implementing the `ResponderLauncher` interface which will invoke the run method via reflection.  This allows a developer of a different responder to merely place a jar on the classpath and specify the appropriate `ResponderLauncher` on the classpath.
        
    The "platform" CLI option is still made available.  However, I removed the explicit dependencies in favor of using reflection.  This was done in anticipation other refactoring the build into submodules, though this does admittedly make the code more fragile.
    
    ResponderDriver had no unit tests, and unfortunately I saw no good way to create good ones for this particular change, especially as it required multiple frameworks to run.
    
    I should say that another possible route here is to have each framework responder implement their own ResponderDriver.  We could provide some utilities to check the minimum Pirk required options are set, but leave the rest to the implementation of the responder.  It would clean up the ResponderCLI and ResponderProps which are rather bloated and might continue to grow if left unchecked.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DarinJ/incubator-pirk Pirk-63

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-pirk/pull/93.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #93
    
----
commit dda458bb2ae77fd9e3dc686d17dd8b49095b3395
Author: Darin Johnson <da...@apache.org>
Date:   2016-09-13T03:19:12Z

    This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to other responders without having to modify the actual code of Pirk.  It's submitted for feedback only, please DO NOT MERGE.
    
    It deprecates the "platform" CLI option in favor of the "launcher" option which is the name of a class implementing the `ResponderLauncher` interface which will invoke the run method via reflection.  This allows a developer of a different responder to merely place a jar on the classpath and specify the appropriate `ResponderLauncher` on the classpath.
    
    The "platform" CLI option is still made available.  However, I removed the explicit dependencies in favor of using reflection.  This was done in anticipation other refactoring the build into submodules, though this does admittedly make the code more fragile.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79987737
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderService.java ---
    @@ -0,0 +1,73 @@
    +package org.apache.pirk.responder.wideskies;
    --- End diff --
    
    Just move the license comment to the top of the file, to fit with the style adopted throughout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80012564
  
    --- Diff: src/main/resources/META-INF/services/org.apache.pirk.responder.wideskies.spi.ResponderPlugin ---
    @@ -0,0 +1,5 @@
    +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
    --- End diff --
    
    If not plug-ins then can you explain the way you see it working?
    
    I had a guess on [this thread](https://lists.apache.org/thread.html/2f2c72c264a43b404416c45ad978aba554e7569a4787b62f45c2802d@%3Cdev.pirk.apache.org%3E), but didn't see a reply.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

On 22/09/16 10:40, Ellison Anne Williams wrote:
> Why are we using the service-plugin model when we are dropping the platform
> designations?

I'm not sure what "dropping the platform designations" means.  Clearly
we need different responders for different backends, and I'm guessing we
need to tell the responder which backend we want to use.

> Thought that we were going to drop all central platform
> awareness from Pirk in favor of the ResponderLauncher. The service-plugin
> model retains it in a roundabout way as we have to statically maintain the
> list of known responders in
> org.apache.pirk.responder.wideskies.spi.ResponderPlugin.

No, the ResponderPlugin is an interface that identifies what backend it
supports (getPlatformName) and then it's invocation (run).  There is no
list of known responders.

If I choose to implement a new backend responder then I implement this
interface, create a JAR with a META-INF/services showing I provide it,
and add it to the classpath.  There is no need to inform the core, it
will find the service using the ResponderService.

In this proposed PR, the full list is is our single JAR, but of course
as we move to submodules they will migrate into the appropriate
submodule JAR.

Regards,
Tim

> On Thu, Sep 22, 2016 at 5:36 AM, ellisonanne <gi...@git.apache.org> wrote:
> 
>> Github user ellisonanne commented on a diff in the pull request:
>>
>>     https://github.com/apache/incubator-pirk/pull/93#discussion_r80005357
>>
>>     --- Diff: src/main/resources/META-INF/services/org.apache.pirk.
>> responder.wideskies.spi.ResponderPlugin ---
>>     @@ -0,0 +1,5 @@
>>     +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
>>     --- End diff --
>>
>>     Why are we using the service-plugin model when we are dropping the
>> platform designations?
>>
>>
>> ---
>> If your project is set up for it, you can reply to this email and have your
>> reply appear on GitHub as well. If your project does not have this feature
>> enabled and wishes so, or if the feature is enabled but not working, please
>> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
>> with INFRA.
>> ---
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Ellison Anne Williams <ea...@apache.org>.

Why are we using the service-plugin model when we are dropping the platform
designations? Thought that we were going to drop all central platform
awareness from Pirk in favor of the ResponderLauncher. The service-plugin
model retains it in a roundabout way as we have to statically maintain the
list of known responders in
org.apache.pirk.responder.wideskies.spi.ResponderPlugin.

On Thu, Sep 22, 2016 at 5:36 AM, ellisonanne <gi...@git.apache.org> wrote:

> Github user ellisonanne commented on a diff in the pull request:
>
>     https://github.com/apache/incubator-pirk/pull/93#discussion_r80005357
>
>     --- Diff: src/main/resources/META-INF/services/org.apache.pirk.
> responder.wideskies.spi.ResponderPlugin ---
>     @@ -0,0 +1,5 @@
>     +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
>     --- End diff --
>
>     Why are we using the service-plugin model when we are dropping the
> platform designations?
>
>
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by ellisonanne <gi...@git.apache.org>.

Github user ellisonanne commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80005357
  
    --- Diff: src/main/resources/META-INF/services/org.apache.pirk.responder.wideskies.spi.ResponderPlugin ---
    @@ -0,0 +1,5 @@
    +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
    --- End diff --
    
    Why are we using the service-plugin model when we are dropping the platform designations?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79988975
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/spi/ResponderPlugin.java ---
    @@ -0,0 +1,40 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +package org.apache.pirk.responder.wideskies.spi;
    +
    +/**
    + * Interface which launches a responder
    + * <p>
    + * Implement this interface to start the execution of a framework responder, the run method will be called via reflection by the ResponderDriver.
    + * </p>
    + */
    +public interface ResponderPlugin
    +{
    +  /**
    +   * Returns the plugin name for your framework
    +   * This will be the platform argument
    +   * @return
    +   */
    +  public String getPlatformName();
    +  /**
    +   * This method launches your framework responder.
    +   */
    +  public void run() throws Exception;
    --- End diff --
    
    I'm going to sneak in after this commit and change this to ```throws PIRException``` ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79377024
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
    @@ -49,83 +41,111 @@
     public class ResponderDriver
     {
       private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class);
    +  // ClassNames to instantiate Platforms using the platform CLI
    +  private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
    +  private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
    +  private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
    +  private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
    +  private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
     
    --- End diff --
    
    Yes, I added this for backwards compatibility. Maybe overkill this early in the game, but didn't want to break anyone's scrips/bash history to quickly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80026380
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
    @@ -50,103 +40,31 @@
     {
       private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class);
     
    -  private enum Platform
    -  {
    -    MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE
    -  }
    -
    -  public static void main(String[] args) throws Exception
    +  public static void main(String[] args)
       {
         ResponderCLI responderCLI = new ResponderCLI(args);
     
    -    // For handling System.exit calls from Spark Streaming
    -    System.setSecurityManager(new SystemExitManager());
    -
    -    Platform platform = Platform.NONE;
    -    String platformString = SystemConfiguration.getProperty(ResponderProps.PLATFORM);
    +    String platformName = SystemConfiguration.getProperty(ResponderProps.PLATFORM, "None");
    +    logger.info("Attempting to use platform {} ...", platformName);
         try
         {
    -      platform = Platform.valueOf(platformString.toUpperCase());
    -    } catch (IllegalArgumentException e)
    -    {
    -      logger.error("platform " + platformString + " not found.");
    -    }
    -
    -    logger.info("platform = " + platform);
    -    switch (platform)
    -    {
    -      case MAPREDUCE:
    -        logger.info("Launching MapReduce ResponderTool:");
    -
    -        ComputeResponseTool pirWLTool = new ComputeResponseTool();
    -        ToolRunner.run(pirWLTool, new String[] {});
    -        break;
    -
    -      case SPARK:
    -        logger.info("Launching Spark ComputeResponse:");
    -
    -        ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration()));
    -        computeResponse.performQuery();
    -        break;
    -
    -      case SPARKSTREAMING:
    -        logger.info("Launching Spark ComputeStreamingResponse:");
    -
    -        ComputeStreamingResponse computeSR = new ComputeStreamingResponse(FileSystem.get(new Configuration()));
    -        try
    -        {
    -          computeSR.performQuery();
    -        } catch (SystemExitException e)
    -        {
    -          // If System.exit(0) is not caught from Spark Streaming,
    -          // the application will complete with a 'failed' status
    -          logger.info("Exited with System.exit(0) from Spark Streaming");
    -        }
    -
    -        // Teardown the context
    -        computeSR.teardown();
    -        break;
    -
    -      case STORM:
    -        logger.info("Launching Storm PirkTopology:");
    -        PirkTopology.runPirkTopology();
    -        break;
    -
    -      case STANDALONE:
    -        logger.info("Launching Standalone Responder:");
    -
    -        String queryInput = SystemConfiguration.getProperty("pir.queryInput");
    -        Query query = new LocalFileSystemStore().recall(queryInput, Query.class);
    -
    -        Responder pirResponder = new Responder(query);
    -        pirResponder.computeStandaloneResponse();
    -        break;
    -    }
    -  }
    -
    -  // Exception and Security Manager classes used to catch System.exit from Spark Streaming
    -  private static class SystemExitException extends SecurityException
    -  {}
    -
    -  private static class SystemExitManager extends SecurityManager
    -  {
    -    @Override
    -    public void checkPermission(Permission perm)
    -    {}
    -
    -    @Override
    -    public void checkExit(int status)
    -    {
    -      super.checkExit(status);
    -      if (status == 0) // If we exited cleanly, throw SystemExitException
    +      ResponderPlugin responder = ResponderService.getInstance().getResponder(platformName);
    +      if (responder == null)
           {
    -        throw new SystemExitException();
    +        logger.error("No such platform plugin found: {}!", platformName);
           }
           else
           {
    -        throw new SecurityException();
    +        responder.run();
           }
    -
    +    }
    +    catch (PIRException pirEx)
    --- End diff --
    
    Hmm, yeah I didn't like that Exception was being caught and turned to PirException and I lost the stacktrace.  But yeah it's really unnecessary.
    
    I'm frustrated with the Exception issue as well and It seems like there's a lot of discussion about exceptions right now which I was trying to code around.  I don't want to force opinions on this in a modest PR, if you've got a plan can you point me to it and I'll follow that pattern and trust it'll get cleaned up. 
    
    I think in the current state PIRException by itself can hide a lot of exceptions if we just catch Exception to throw PirException.  I've looked over a lot of the exceptions we're currently throwing, many are just illegal argument or illegal state, a lot could be done with Guava's Preconditions or a class similar to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80026462
  
    --- Diff: src/main/resources/responder.properties ---
    @@ -27,9 +27,15 @@ pir.dataInputFormat=
     #outputFile -- required -- Fully qualified name of output file in hdfs
     pir.outputFile=
     
    -#platform -- required -- 'mapreduce', 'spark', 'sparkstreaming', or 'standalone'
    +#One of the following two options is required - launcher prefered
    +
    +#launcher -- required -- full class name of a class implementing ResponderPlugin
    +#ie. org.apache.pirk.responder.wideskies.standalone.StandaloneResponderPluginProcessing platform technology for the responder
    +#launcher=
    +
    +#platform -- required -- 'mapreduce', 'spark', 'sparkstreaming', 'standalone', or 'storm'
    --- End diff --
    
    yep


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80030927
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
    @@ -50,103 +40,31 @@
     {
       private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class);
     
    -  private enum Platform
    -  {
    -    MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE
    -  }
    -
    -  public static void main(String[] args) throws Exception
    +  public static void main(String[] args)
       {
         ResponderCLI responderCLI = new ResponderCLI(args);
     
    -    // For handling System.exit calls from Spark Streaming
    -    System.setSecurityManager(new SystemExitManager());
    -
    -    Platform platform = Platform.NONE;
    -    String platformString = SystemConfiguration.getProperty(ResponderProps.PLATFORM);
    +    String platformName = SystemConfiguration.getProperty(ResponderProps.PLATFORM, "None");
    +    logger.info("Attempting to use platform {} ...", platformName);
         try
         {
    -      platform = Platform.valueOf(platformString.toUpperCase());
    -    } catch (IllegalArgumentException e)
    -    {
    -      logger.error("platform " + platformString + " not found.");
    -    }
    -
    -    logger.info("platform = " + platform);
    -    switch (platform)
    -    {
    -      case MAPREDUCE:
    -        logger.info("Launching MapReduce ResponderTool:");
    -
    -        ComputeResponseTool pirWLTool = new ComputeResponseTool();
    -        ToolRunner.run(pirWLTool, new String[] {});
    -        break;
    -
    -      case SPARK:
    -        logger.info("Launching Spark ComputeResponse:");
    -
    -        ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration()));
    -        computeResponse.performQuery();
    -        break;
    -
    -      case SPARKSTREAMING:
    -        logger.info("Launching Spark ComputeStreamingResponse:");
    -
    -        ComputeStreamingResponse computeSR = new ComputeStreamingResponse(FileSystem.get(new Configuration()));
    -        try
    -        {
    -          computeSR.performQuery();
    -        } catch (SystemExitException e)
    -        {
    -          // If System.exit(0) is not caught from Spark Streaming,
    -          // the application will complete with a 'failed' status
    -          logger.info("Exited with System.exit(0) from Spark Streaming");
    -        }
    -
    -        // Teardown the context
    -        computeSR.teardown();
    -        break;
    -
    -      case STORM:
    -        logger.info("Launching Storm PirkTopology:");
    -        PirkTopology.runPirkTopology();
    -        break;
    -
    -      case STANDALONE:
    -        logger.info("Launching Standalone Responder:");
    -
    -        String queryInput = SystemConfiguration.getProperty("pir.queryInput");
    -        Query query = new LocalFileSystemStore().recall(queryInput, Query.class);
    -
    -        Responder pirResponder = new Responder(query);
    -        pirResponder.computeStandaloneResponse();
    -        break;
    -    }
    -  }
    -
    -  // Exception and Security Manager classes used to catch System.exit from Spark Streaming
    -  private static class SystemExitException extends SecurityException
    -  {}
    -
    -  private static class SystemExitManager extends SecurityManager
    -  {
    -    @Override
    -    public void checkPermission(Permission perm)
    -    {}
    -
    -    @Override
    -    public void checkExit(int status)
    -    {
    -      super.checkExit(status);
    -      if (status == 0) // If we exited cleanly, throw SystemExitException
    +      ResponderPlugin responder = ResponderService.getInstance().getResponder(platformName);
    +      if (responder == null)
           {
    -        throw new SystemExitException();
    +        logger.error("No such platform plugin found: {}!", platformName);
           }
           else
           {
    -        throw new SecurityException();
    +        responder.run();
           }
    -
    +    }
    +    catch (PIRException pirEx)
    --- End diff --
    
    IIRC the PIRException was being thrown with the original as a cause, so you won't loose the stacktrace.  I agree that tidying up exceptions is beyond this PR, and will require some 'waves' of wrapping Exception until they are banished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80029052
  
    --- Diff: src/main/resources/META-INF/services/org.apache.pirk.responder.wideskies.spi.ResponderPlugin ---
    @@ -0,0 +1,5 @@
    +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
    --- End diff --
    
    @ellisonanne the service-plugin model allows better extensibility than pure reflection and is a well known pattern so should be easier for others to develop new responders. 
    
    This means that we keep platform designators and a developer for a new responder adds his/her own designator via `getPlatformName` which will be available when the jar is on the class path.  (I could actually put those into the command line but I won't as the config is in flux - todo I guess). 
    
    It also allows a shorter command line 
    ```
    -p spark
    ```
    vs 
    ```
    -l org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher
    ```
    with the same benefits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by ellisonanne <gi...@git.apache.org>.

Github user ellisonanne commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80058520
  
    --- Diff: src/main/resources/META-INF/services/org.apache.pirk.responder.wideskies.spi.ResponderPlugin ---
    @@ -0,0 +1,5 @@
    +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
    --- End diff --
    
    No - every option requires user specification. I had originally thought that we would have the user specify the interface implementation (via class name) via the command line/prop file and then instantiate through reflection. 
    
    I like the service-plugin model and am in favor of proceeding with it. It just wasn't what I thought we were doing when we started this PR (recall that we started with a ResponderLauncher interface + user specification of implementation) and wanted to make sure that we were clear on the tradespace.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79988418
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
    @@ -50,103 +40,31 @@
     {
       private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class);
     
    -  private enum Platform
    -  {
    -    MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE
    -  }
    -
    -  public static void main(String[] args) throws Exception
    +  public static void main(String[] args)
       {
         ResponderCLI responderCLI = new ResponderCLI(args);
     
    -    // For handling System.exit calls from Spark Streaming
    -    System.setSecurityManager(new SystemExitManager());
    -
    -    Platform platform = Platform.NONE;
    -    String platformString = SystemConfiguration.getProperty(ResponderProps.PLATFORM);
    +    String platformName = SystemConfiguration.getProperty(ResponderProps.PLATFORM, "None");
    +    logger.info("Attempting to use platform {} ...", platformName);
         try
         {
    -      platform = Platform.valueOf(platformString.toUpperCase());
    -    } catch (IllegalArgumentException e)
    -    {
    -      logger.error("platform " + platformString + " not found.");
    -    }
    -
    -    logger.info("platform = " + platform);
    -    switch (platform)
    -    {
    -      case MAPREDUCE:
    -        logger.info("Launching MapReduce ResponderTool:");
    -
    -        ComputeResponseTool pirWLTool = new ComputeResponseTool();
    -        ToolRunner.run(pirWLTool, new String[] {});
    -        break;
    -
    -      case SPARK:
    -        logger.info("Launching Spark ComputeResponse:");
    -
    -        ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration()));
    -        computeResponse.performQuery();
    -        break;
    -
    -      case SPARKSTREAMING:
    -        logger.info("Launching Spark ComputeStreamingResponse:");
    -
    -        ComputeStreamingResponse computeSR = new ComputeStreamingResponse(FileSystem.get(new Configuration()));
    -        try
    -        {
    -          computeSR.performQuery();
    -        } catch (SystemExitException e)
    -        {
    -          // If System.exit(0) is not caught from Spark Streaming,
    -          // the application will complete with a 'failed' status
    -          logger.info("Exited with System.exit(0) from Spark Streaming");
    -        }
    -
    -        // Teardown the context
    -        computeSR.teardown();
    -        break;
    -
    -      case STORM:
    -        logger.info("Launching Storm PirkTopology:");
    -        PirkTopology.runPirkTopology();
    -        break;
    -
    -      case STANDALONE:
    -        logger.info("Launching Standalone Responder:");
    -
    -        String queryInput = SystemConfiguration.getProperty("pir.queryInput");
    -        Query query = new LocalFileSystemStore().recall(queryInput, Query.class);
    -
    -        Responder pirResponder = new Responder(query);
    -        pirResponder.computeStandaloneResponse();
    -        break;
    -    }
    -  }
    -
    -  // Exception and Security Manager classes used to catch System.exit from Spark Streaming
    -  private static class SystemExitException extends SecurityException
    -  {}
    -
    -  private static class SystemExitManager extends SecurityManager
    -  {
    -    @Override
    -    public void checkPermission(Permission perm)
    -    {}
    -
    -    @Override
    -    public void checkExit(int status)
    -    {
    -      super.checkExit(status);
    -      if (status == 0) // If we exited cleanly, throw SystemExitException
    +      ResponderPlugin responder = ResponderService.getInstance().getResponder(platformName);
    +      if (responder == null)
           {
    -        throw new SystemExitException();
    +        logger.error("No such platform plugin found: {}!", platformName);
           }
           else
           {
    -        throw new SecurityException();
    +        responder.run();
           }
    -
    +    }
    +    catch (PIRException pirEx)
    --- End diff --
    
    Why specialize the ```PIRException``` from ```Exception```, then do essentially the same thing?
    
    I'm on a mission to tidy-up the project code to throw ```PIRException```s and deal with/translate any system generated exceptions, so my preference is to keep it as ```PIRException``` only.  Allowing ```Exceptions``` will hide problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Ellison Anne Williams <ea...@apache.org>.

Ok - so, I'm officially shamed ;)

I'm not a big fan of java CLI either as it tends to be heavyweight and
inflexible - it was just super fast to put together as a first pass. Happy
to consider other more flexible options... :)

On Mon, Sep 19, 2016 at 8:40 AM, Ellison Anne Williams <
eawilliams@apache.org> wrote:

> It seems that it's the same idea as the ResponderLauncher with the service
> component added to maintain something akin to the 'platform'. I would
> prefer that we just did away with the platform notion altogether and make
> the ResponderDriver 'dumb'. We get around needing a platform-aware service
> by requiring the ResponderLauncher implementation to be passed as a CLI to
> the ResponderDriver.
>
> Am I missing something? Is there a good reason to provide a service by
> which platforms are registered? I'm open...
>
> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
> wrote:
>
>> How about an approach like this?
>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>
>> The "on-ramp" is the driver [1], which calls upon the service to find a
>> plug-in [2] that claims to implement the required platform responder,
>> e.g. [3].
>>
>> The list of plug-ins is given in the provider's JAR file, so the ones we
>> provide in Pirk are listed together [4], but if you split these into
>> modules, or somebody brings their own JAR alongside, these would be
>> listed in each JAR's services/ directory.
>>
>> [1]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/src/
>> main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
>> [2]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/src/
>> main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
>> [3]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/src/
>> main/java/org/apache/pirk/responder/wideskies/storm/StormResponder.java
>> [4]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/src/
>> main/services/org.apache.responder.spi.Responder
>>
>> I'm not even going to dignify this with a WIP PR, it is far from ready,
>> so proceed with caution.  There is hopefully enough there to show the
>> approach, and if it is worth continuing I'm happy to do so.
>>
>> Regards,
>> Tim
>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Ellison Anne Williams <ea...@apache.org>.

Sounds good.

On Mon, Sep 19, 2016 at 4:22 PM, Darin Johnson <db...@gmail.com>
wrote:

> Alright, that was in the spirit of what I was thinking when I did this.
>
> Why don't we take Tim's suggested improvements to my PR (I'll do the
> necessary cleanup) and at the same time just remove the platform argument
> altogether since backwards compatibility isn't upsetting anyone.
>
> We'll still need a command line option for the launcher for now as we don't
> have submodules we can decide between the two choices after we break out
> submodules and improve the config.
>
>
> On Sep 19, 2016 12:19 PM, "Tim Ellison" <t....@gmail.com> wrote:
>
> > On 19/09/16 15:46, Darin Johnson wrote:
> > > Hey guys,
> > >
> > > Thanks for looking at the PR, I apologize if it offended anyone's
> eyes:).
> > >
> > > I'm glad it generated some discussion about the configuration.  I
> didn't
> > > really like where things were heading with the config.  However, didn't
> > > want to create to much scope creep.
> > >
> > > I think any hierarchical config (TypeSafe or yaml) would make things
> much
> > > more maintainable, the plugin could simply grab the appropriate part of
> > the
> > > config and handle accordingly.  I'd also cut down the number of command
> > > line options to only those that change between runs often (like
> > > input/output)
> > >
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >
> > > An argument for this approach is for lambda architecture approaches
> (say
> > > spark/spark-streaming) were the contents of the jars would be so
> similar
> > it
> > > seems like to much trouble to create separate jars.
> > >
> > > Happy to continue working on this given some direction on where you'd
> > like
> > > it to go.  Also, it's a bit of a blocker to refactoring the build into
> > > submodules.
> >
> > FWIW my 2c is to not try and fix all the problems in one go, and rather
> > take a compromise on the configurations while you tease apart the
> > submodules in to separate source code trees, poms, etc; then come back
> > and fix the runtime configs.
> >
> > Once the submodules are in place it will open up more work for release
> > engineering and tinkering that can be done in parallel with the config
> > polishing.
> >
> > Just a thought.
> > Tim
> >
> >
> > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com>
> > wrote:
> > >
> > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > >>> It seems that it's the same idea as the ResponderLauncher with the
> > >> service
> > >>> component added to maintain something akin to the 'platform'. I would
> > >>> prefer that we just did away with the platform notion altogether and
> > make
> > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> > >> service
> > >>> by requiring the ResponderLauncher implementation to be passed as a
> CLI
> > >> to
> > >>> the ResponderDriver.
> > >>
> > >> Let me check I understand what you are saying here.
> > >>
> > >> At the moment, there is a monolithic Pirk that hard codes how to
> respond
> > >> using lots of different backends (mapreduce, spark, sparkstreaming,
> > >> storm , standalone), and that is selected by command-line argument.
> > >>
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >>
> > >> A second option is to simply pass in the required backend JAR to
> select
> > >> the particular implementation you choose, as a specific Pirk
> > >> installation doesn't need to use multiple backends simultaneously.
> > >>
> > >> ...and you are leaning towards the second option.  Do I have that
> > correct?
> > >>
> > >> Regards,
> > >> Tim
> > >>
> > >>> Am I missing something? Is there a good reason to provide a service
> by
> > >>> which platforms are registered? I'm open...
> > >>>
> > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
> > >> wrote:
> > >>>
> > >>>> How about an approach like this?
> > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > >>>>
> > >>>> The "on-ramp" is the driver [1], which calls upon the service to
> find
> > a
> > >>>> plug-in [2] that claims to implement the required platform
> responder,
> > >>>> e.g. [3].
> > >>>>
> > >>>> The list of plug-ins is given in the provider's JAR file, so the
> ones
> > we
> > >>>> provide in Pirk are listed together [4], but if you split these into
> > >>>> modules, or somebody brings their own JAR alongside, these would be
> > >>>> listed in each JAR's services/ directory.
> > >>>>
> > >>>> [1]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > ResponderDriver.java
> > >>>> [2]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> > >>>> [3]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > >>>> StormResponder.java
> > >>>> [4]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/services/org.apache.responder.spi.Responder
> > >>>>
> > >>>> I'm not even going to dignify this with a WIP PR, it is far from
> > ready,
> > >>>> so proceed with caution.  There is hopefully enough there to show
> the
> > >>>> approach, and if it is worth continuing I'm happy to do so.
> > >>>>
> > >>>> Regards,
> > >>>> Tim
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

On 19/09/16 21:22, Darin Johnson wrote:
> Alright, that was in the spirit of what I was thinking when I did this.
> 
> Why don't we take Tim's suggested improvements to my PR (I'll do the
> necessary cleanup) and at the same time just remove the platform argument
> altogether since backwards compatibility isn't upsetting anyone.
> 
> We'll still need a command line option for the launcher for now as we don't
> have submodules we can decide between the two choices after we break out
> submodules and improve the config.

Sounds great - let me know how I can help.

Regards,
Tim


> On Sep 19, 2016 12:19 PM, "Tim Ellison" <t....@gmail.com> wrote:
> 
>> On 19/09/16 15:46, Darin Johnson wrote:
>>> Hey guys,
>>>
>>> Thanks for looking at the PR, I apologize if it offended anyone's eyes:).
>>>
>>> I'm glad it generated some discussion about the configuration.  I didn't
>>> really like where things were heading with the config.  However, didn't
>>> want to create to much scope creep.
>>>
>>> I think any hierarchical config (TypeSafe or yaml) would make things much
>>> more maintainable, the plugin could simply grab the appropriate part of
>> the
>>> config and handle accordingly.  I'd also cut down the number of command
>>> line options to only those that change between runs often (like
>>> input/output)
>>>
>>>> One option is to make Pirk pluggable, so that a Pirk installation could
>>>> use one or more of these in an extensible fashion by adding JAR files.
>>>> That would still require selecting one by command-line argument.
>>>
>>> An argument for this approach is for lambda architecture approaches (say
>>> spark/spark-streaming) were the contents of the jars would be so similar
>> it
>>> seems like to much trouble to create separate jars.
>>>
>>> Happy to continue working on this given some direction on where you'd
>> like
>>> it to go.  Also, it's a bit of a blocker to refactoring the build into
>>> submodules.
>>
>> FWIW my 2c is to not try and fix all the problems in one go, and rather
>> take a compromise on the configurations while you tease apart the
>> submodules in to separate source code trees, poms, etc; then come back
>> and fix the runtime configs.
>>
>> Once the submodules are in place it will open up more work for release
>> engineering and tinkering that can be done in parallel with the config
>> polishing.
>>
>> Just a thought.
>> Tim
>>
>>
>>> On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com>
>> wrote:
>>>
>>>> On 19/09/16 13:40, Ellison Anne Williams wrote:
>>>>> It seems that it's the same idea as the ResponderLauncher with the
>>>> service
>>>>> component added to maintain something akin to the 'platform'. I would
>>>>> prefer that we just did away with the platform notion altogether and
>> make
>>>>> the ResponderDriver 'dumb'. We get around needing a platform-aware
>>>> service
>>>>> by requiring the ResponderLauncher implementation to be passed as a CLI
>>>> to
>>>>> the ResponderDriver.
>>>>
>>>> Let me check I understand what you are saying here.
>>>>
>>>> At the moment, there is a monolithic Pirk that hard codes how to respond
>>>> using lots of different backends (mapreduce, spark, sparkstreaming,
>>>> storm , standalone), and that is selected by command-line argument.
>>>>
>>>> One option is to make Pirk pluggable, so that a Pirk installation could
>>>> use one or more of these in an extensible fashion by adding JAR files.
>>>> That would still require selecting one by command-line argument.
>>>>
>>>> A second option is to simply pass in the required backend JAR to select
>>>> the particular implementation you choose, as a specific Pirk
>>>> installation doesn't need to use multiple backends simultaneously.
>>>>
>>>> ...and you are leaning towards the second option.  Do I have that
>> correct?
>>>>
>>>> Regards,
>>>> Tim
>>>>
>>>>> Am I missing something? Is there a good reason to provide a service by
>>>>> which platforms are registered? I'm open...
>>>>>
>>>>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
>>>> wrote:
>>>>>
>>>>>> How about an approach like this?
>>>>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>>>>>
>>>>>> The "on-ramp" is the driver [1], which calls upon the service to find
>> a
>>>>>> plug-in [2] that claims to implement the required platform responder,
>>>>>> e.g. [3].
>>>>>>
>>>>>> The list of plug-ins is given in the provider's JAR file, so the ones
>> we
>>>>>> provide in Pirk are listed together [4], but if you split these into
>>>>>> modules, or somebody brings their own JAR alongside, these would be
>>>>>> listed in each JAR's services/ directory.
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>> src/main/java/org/apache/pirk/responder/wideskies/
>> ResponderDriver.java
>>>>>> [2]
>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
>>>>>> [3]
>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>>>>>> StormResponder.java
>>>>>> [4]
>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>> src/main/services/org.apache.responder.spi.Responder
>>>>>>
>>>>>> I'm not even going to dignify this with a WIP PR, it is far from
>> ready,
>>>>>> so proceed with caution.  There is hopefully enough there to show the
>>>>>> approach, and if it is worth continuing I'm happy to do so.
>>>>>>
>>>>>> Regards,
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

Alright, that was in the spirit of what I was thinking when I did this.

Why don't we take Tim's suggested improvements to my PR (I'll do the
necessary cleanup) and at the same time just remove the platform argument
altogether since backwards compatibility isn't upsetting anyone.

We'll still need a command line option for the launcher for now as we don't
have submodules we can decide between the two choices after we break out
submodules and improve the config.


On Sep 19, 2016 12:19 PM, "Tim Ellison" <t....@gmail.com> wrote:

> On 19/09/16 15:46, Darin Johnson wrote:
> > Hey guys,
> >
> > Thanks for looking at the PR, I apologize if it offended anyone's eyes:).
> >
> > I'm glad it generated some discussion about the configuration.  I didn't
> > really like where things were heading with the config.  However, didn't
> > want to create to much scope creep.
> >
> > I think any hierarchical config (TypeSafe or yaml) would make things much
> > more maintainable, the plugin could simply grab the appropriate part of
> the
> > config and handle accordingly.  I'd also cut down the number of command
> > line options to only those that change between runs often (like
> > input/output)
> >
> >> One option is to make Pirk pluggable, so that a Pirk installation could
> >> use one or more of these in an extensible fashion by adding JAR files.
> >> That would still require selecting one by command-line argument.
> >
> > An argument for this approach is for lambda architecture approaches (say
> > spark/spark-streaming) were the contents of the jars would be so similar
> it
> > seems like to much trouble to create separate jars.
> >
> > Happy to continue working on this given some direction on where you'd
> like
> > it to go.  Also, it's a bit of a blocker to refactoring the build into
> > submodules.
>
> FWIW my 2c is to not try and fix all the problems in one go, and rather
> take a compromise on the configurations while you tease apart the
> submodules in to separate source code trees, poms, etc; then come back
> and fix the runtime configs.
>
> Once the submodules are in place it will open up more work for release
> engineering and tinkering that can be done in parallel with the config
> polishing.
>
> Just a thought.
> Tim
>
>
> > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com>
> wrote:
> >
> >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> >>> It seems that it's the same idea as the ResponderLauncher with the
> >> service
> >>> component added to maintain something akin to the 'platform'. I would
> >>> prefer that we just did away with the platform notion altogether and
> make
> >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> >> service
> >>> by requiring the ResponderLauncher implementation to be passed as a CLI
> >> to
> >>> the ResponderDriver.
> >>
> >> Let me check I understand what you are saying here.
> >>
> >> At the moment, there is a monolithic Pirk that hard codes how to respond
> >> using lots of different backends (mapreduce, spark, sparkstreaming,
> >> storm , standalone), and that is selected by command-line argument.
> >>
> >> One option is to make Pirk pluggable, so that a Pirk installation could
> >> use one or more of these in an extensible fashion by adding JAR files.
> >> That would still require selecting one by command-line argument.
> >>
> >> A second option is to simply pass in the required backend JAR to select
> >> the particular implementation you choose, as a specific Pirk
> >> installation doesn't need to use multiple backends simultaneously.
> >>
> >> ...and you are leaning towards the second option.  Do I have that
> correct?
> >>
> >> Regards,
> >> Tim
> >>
> >>> Am I missing something? Is there a good reason to provide a service by
> >>> which platforms are registered? I'm open...
> >>>
> >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
> >> wrote:
> >>>
> >>>> How about an approach like this?
> >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> >>>>
> >>>> The "on-ramp" is the driver [1], which calls upon the service to find
> a
> >>>> plug-in [2] that claims to implement the required platform responder,
> >>>> e.g. [3].
> >>>>
> >>>> The list of plug-ins is given in the provider's JAR file, so the ones
> we
> >>>> provide in Pirk are listed together [4], but if you split these into
> >>>> modules, or somebody brings their own JAR alongside, these would be
> >>>> listed in each JAR's services/ directory.
> >>>>
> >>>> [1]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/java/org/apache/pirk/responder/wideskies/
> ResponderDriver.java
> >>>> [2]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> >>>> [3]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> >>>> StormResponder.java
> >>>> [4]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/services/org.apache.responder.spi.Responder
> >>>>
> >>>> I'm not even going to dignify this with a WIP PR, it is far from
> ready,
> >>>> so proceed with caution.  There is hopefully enough there to show the
> >>>> approach, and if it is worth continuing I'm happy to do so.
> >>>>
> >>>> Regards,
> >>>> Tim
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Suneel Marthi <su...@gmail.com>.

Pirk-ES is more of a Source input (to be more technically precise) so it
doesn't warrant being a spearate module of its own.

But I am not sure about pirk-hadoop and pirk-storm aren't they both
different responder impls (??)

ElasticSearch should just be an input source like HDFS and doesn't need to
be a separate module. Flink has a plugin for Elasticsearch Sink, similarly
Spark has one too for Elasticsearch source and sink.

*Question:* Is it ever possible now or in the future that Pirk could be
used for its Query-only, Responder-only or in applications that may need
both.

If so, then we should be looking at generating separate artifacts for
pirk-query, pirk-spark-responder, pirk-storm-responder etc...

It may help to start a shared Google doc and gather all comments, its so
hard to keep track of all this in emails.

Most other projects like Kafka, Flink do it that way - if its ok with all
we should start doing the same on Pirk.




On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <db...@gmail.com>
wrote:

> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> Based off my pirk-63 I was able to pull spark and storm out with no
> issues.  I was planning to pull them out, then tackling elastic search,
> then hadoop as it's a little entrenched.  This should keep most PRs to
> manageable chunks. I think once that's done addressing the configs will
> make more sense.
>
> I'm open to suggestions. But the hope would be:
> Pirk-parent
> Pirk-core
> Pirk-hadoop
> Pirk-storm
> Pirk-parent
>
> Pirk-es is a little weird as it's really just an inputformat, seems like
> there's a more general solution here than creating submodules for every
> inputformat.
>
> Darin
>
> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
>
> >
>
> > Refactor is definitely a first priority.  Is there a design/proposal
> draft
> > that we could comment on about how to go about refactoring the code.  I
> > have been trying to keep up with the emails but definitely would have
> > missed some.
> >
> >
> >
> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > eawilliams@apache.org <ea...@apache.org>> wrote:
> >
> > > Agree - let's leave the config/CLI the way it is for now and tackle
> that as
> > > a subsequent design discussion and PR.
> > >
> > > Also, I think that we should leave the ResponderDriver and the
> > > ResponderProps alone for this PR and push to a subsequent PR (once we
> > > decide if and how we would like to delegate each).
> > >
> > > I vote to remove the 'platform' option and the backwards compatibility
> in
> > > this PR and proceed with having a ResponderLauncher interface and
> forcing
> > > its implementation by the ResponderDriver.
> > >
> > > And, I am not so concerned with having one fat jar vs. multiple jars
> right
> > > now - to me, at this point, it's a 'nice to have' and not a 'must have'
> for
> > > Pirk functionality. We do need to break out Pirk into more clearly
> defined
> > > submodules (which is in progress) - via this re-factor, I think that we
> > > will gain some ability to generate multiple jars which is nice.
> > >
> > >
> > >
> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t....@gmail.com>
> > > wrote:
> > >
> > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > Hey guys,
> > > > >
> > > > > Thanks for looking at the PR, I apologize if it offended anyone's
> > > eyes:).
> > > > >
> > > > > I'm glad it generated some discussion about the configuration.  I
> > > didn't
> > > > > really like where things were heading with the config.  However,
> didn't
> > > > > want to create to much scope creep.
> > > > >
> > > > > I think any hierarchical config (TypeSafe or yaml) would make
> things
> > > much
> > > > > more maintainable, the plugin could simply grab the appropriate
> part of
> > > > the
> > > > > config and handle accordingly.  I'd also cut down the number of
> command
> > > > > line options to only those that change between runs often (like
> > > > > input/output)
> > > > >
> > > > >> One option is to make Pirk pluggable, so that a Pirk installation
> > > could
> > > > >> use one or more of these in an extensible fashion by adding JAR
> files.
> > > > >> That would still require selecting one by command-line argument.
> > > > >
> > > > > An argument for this approach is for lambda architecture approaches
> > > (say
> > > > > spark/spark-streaming) were the contents of the jars would be so
> > > similar
> > > > it
> > > > > seems like to much trouble to create separate jars.
> > > > >
> > > > > Happy to continue working on this given some direction on where
> you'd
> > > > like
> > > > > it to go.  Also, it's a bit of a blocker to refactoring the build
> into
> > > > > submodules.
> > > >
> > > > FWIW my 2c is to not try and fix all the problems in one go, and
> rather
> > > > take a compromise on the configurations while you tease apart the
> > > > submodules in to separate source code trees, poms, etc; then come
> back
> > > > and fix the runtime configs.
> > > >
> > > > Once the submodules are in place it will open up more work for
> release
> > > > engineering and tinkering that can be done in parallel with the
> config
> > > > polishing.
> > > >
> > > > Just a thought.
> > > > Tim
> > > >
> > > >
> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> t.p.ellison@gmail.com>
> > > > wrote:
> > > > >
> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > > >>> It seems that it's the same idea as the ResponderLauncher with
> the
> > > > >> service
> > > > >>> component added to maintain something akin to the 'platform'. I
> would
> > > > >>> prefer that we just did away with the platform notion altogether
> and
> > > > make
> > > > >>> the ResponderDriver 'dumb'. We get around needing a
> platform-aware
> > > > >> service
> > > > >>> by requiring the ResponderLauncher implementation to be passed as
> a
> > > CLI
> > > > >> to
> > > > >>> the ResponderDriver.
> > > > >>
> > > > >> Let me check I understand what you are saying here.
> > > > >>
> > > > >> At the moment, there is a monolithic Pirk that hard codes how to
> > > respond
> > > > >> using lots of different backends (mapreduce, spark,
> sparkstreaming,
> > > > >> storm , standalone), and that is selected by command-line
> argument.
> > > > >>
> > > > >> One option is to make Pirk pluggable, so that a Pirk installation
> > > could
> > > > >> use one or more of these in an extensible fashion by adding JAR
> files.
> > > > >> That would still require selecting one by command-line argument.
> > > > >>
> > > > >> A second option is to simply pass in the required backend JAR to
> > > select
> > > > >> the particular implementation you choose, as a specific Pirk
> > > > >> installation doesn't need to use multiple backends simultaneously.
> > > > >>
> > > > >> ...and you are leaning towards the second option.  Do I have that
> > > > correct?
> > > > >>
> > > > >> Regards,
> > > > >> Tim
> > > > >>
> > > > >>> Am I missing something? Is there a good reason to provide a
> service
> > > by
> > > > >>> which platforms are registered? I'm open...
> > > > >>>
> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> t.p.ellison@gmail.com>
> > > > >> wrote:
> > > > >>>
> > > > >>>> How about an approach like this?
> > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > > > >>>>
> > > > >>>> The "on-ramp" is the driver [1], which calls upon the service to
> > > find
> > > > a
> > > > >>>> plug-in [2] that claims to implement the required platform
> > > responder,
> > > > >>>> e.g. [3].
> > > > >>>>
> > > > >>>> The list of plug-ins is given in the provider's JAR file, so the
> > > ones
> > > > we
> > > > >>>> provide in Pirk are listed together [4], but if you split these
> into
> > > > >>>> modules, or somebody brings their own JAR alongside, these would
> be
> > > > >>>> listed in each JAR's services/ directory.
> > > > >>>>
> > > > >>>> [1]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > > ResponderDriver.java
> > > > >>>> [2]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.
> java
> > > > >>>> [3]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > > >>>> StormResponder.java
> > > > >>>> [4]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > > >>>>
> > > > >>>> I'm not even going to dignify this with a WIP PR, it is far from
> > > > ready,
> > > > >>>> so proceed with caution.  There is hopefully enough there to
> show
> > > the
> > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Tim
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

Sure will do tonight.

On Sep 19, 2016 5:07 PM, "Suneel Marthi" <su...@gmail.com> wrote:

> A shared Google doc would be more convenient than a bunch of Jiras. Its
> easier to comment and add notes that way.
>
>
> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <db...@gmail.com>
> wrote:
>
> > Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> > Based off my pirk-63 I was able to pull spark and storm out with no
> > issues.  I was planning to pull them out, then tackling elastic search,
> > then hadoop as it's a little entrenched.  This should keep most PRs to
> > manageable chunks. I think once that's done addressing the configs will
> > make more sense.
> >
> > I'm open to suggestions. But the hope would be:
> > Pirk-parent
> > Pirk-core
> > Pirk-hadoop
> > Pirk-storm
> > Pirk-parent
> >
> > Pirk-es is a little weird as it's really just an inputformat, seems like
> > there's a more general solution here than creating submodules for every
> > inputformat.
> >
> > Darin
> >
> > On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
> >
> > >
> >
> > > Refactor is definitely a first priority.  Is there a design/proposal
> > draft
> > > that we could comment on about how to go about refactoring the code.  I
> > > have been trying to keep up with the emails but definitely would have
> > > missed some.
> > >
> > >
> > >
> > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > eawilliams@apache.org <ea...@apache.org>> wrote:
> > >
> > > > Agree - let's leave the config/CLI the way it is for now and tackle
> > that as
> > > > a subsequent design discussion and PR.
> > > >
> > > > Also, I think that we should leave the ResponderDriver and the
> > > > ResponderProps alone for this PR and push to a subsequent PR (once we
> > > > decide if and how we would like to delegate each).
> > > >
> > > > I vote to remove the 'platform' option and the backwards
> compatibility
> > in
> > > > this PR and proceed with having a ResponderLauncher interface and
> > forcing
> > > > its implementation by the ResponderDriver.
> > > >
> > > > And, I am not so concerned with having one fat jar vs. multiple jars
> > right
> > > > now - to me, at this point, it's a 'nice to have' and not a 'must
> have'
> > for
> > > > Pirk functionality. We do need to break out Pirk into more clearly
> > defined
> > > > submodules (which is in progress) - via this re-factor, I think that
> we
> > > > will gain some ability to generate multiple jars which is nice.
> > > >
> > > >
> > > >
> > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t.p.ellison@gmail.com
> >
> > > > wrote:
> > > >
> > > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > > Hey guys,
> > > > > >
> > > > > > Thanks for looking at the PR, I apologize if it offended anyone's
> > > > eyes:).
> > > > > >
> > > > > > I'm glad it generated some discussion about the configuration.  I
> > > > didn't
> > > > > > really like where things were heading with the config.  However,
> > didn't
> > > > > > want to create to much scope creep.
> > > > > >
> > > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > things
> > > > much
> > > > > > more maintainable, the plugin could simply grab the appropriate
> > part of
> > > > > the
> > > > > > config and handle accordingly.  I'd also cut down the number of
> > command
> > > > > > line options to only those that change between runs often (like
> > > > > > input/output)
> > > > > >
> > > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> > > > could
> > > > > >> use one or more of these in an extensible fashion by adding JAR
> > files.
> > > > > >> That would still require selecting one by command-line argument.
> > > > > >
> > > > > > An argument for this approach is for lambda architecture
> approaches
> > > > (say
> > > > > > spark/spark-streaming) were the contents of the jars would be so
> > > > similar
> > > > > it
> > > > > > seems like to much trouble to create separate jars.
> > > > > >
> > > > > > Happy to continue working on this given some direction on where
> > you'd
> > > > > like
> > > > > > it to go.  Also, it's a bit of a blocker to refactoring the build
> > into
> > > > > > submodules.
> > > > >
> > > > > FWIW my 2c is to not try and fix all the problems in one go, and
> > rather
> > > > > take a compromise on the configurations while you tease apart the
> > > > > submodules in to separate source code trees, poms, etc; then come
> > back
> > > > > and fix the runtime configs.
> > > > >
> > > > > Once the submodules are in place it will open up more work for
> > release
> > > > > engineering and tinkering that can be done in parallel with the
> > config
> > > > > polishing.
> > > > >
> > > > > Just a thought.
> > > > > Tim
> > > > >
> > > > >
> > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > t.p.ellison@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > > > >>> It seems that it's the same idea as the ResponderLauncher with
> > the
> > > > > >> service
> > > > > >>> component added to maintain something akin to the 'platform'. I
> > would
> > > > > >>> prefer that we just did away with the platform notion
> altogether
> > and
> > > > > make
> > > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > platform-aware
> > > > > >> service
> > > > > >>> by requiring the ResponderLauncher implementation to be passed
> as
> > a
> > > > CLI
> > > > > >> to
> > > > > >>> the ResponderDriver.
> > > > > >>
> > > > > >> Let me check I understand what you are saying here.
> > > > > >>
> > > > > >> At the moment, there is a monolithic Pirk that hard codes how to
> > > > respond
> > > > > >> using lots of different backends (mapreduce, spark,
> > sparkstreaming,
> > > > > >> storm , standalone), and that is selected by command-line
> > argument.
> > > > > >>
> > > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> > > > could
> > > > > >> use one or more of these in an extensible fashion by adding JAR
> > files.
> > > > > >> That would still require selecting one by command-line argument.
> > > > > >>
> > > > > >> A second option is to simply pass in the required backend JAR to
> > > > select
> > > > > >> the particular implementation you choose, as a specific Pirk
> > > > > >> installation doesn't need to use multiple backends
> simultaneously.
> > > > > >>
> > > > > >> ...and you are leaning towards the second option.  Do I have
> that
> > > > > correct?
> > > > > >>
> > > > > >> Regards,
> > > > > >> Tim
> > > > > >>
> > > > > >>> Am I missing something? Is there a good reason to provide a
> > service
> > > > by
> > > > > >>> which platforms are registered? I'm open...
> > > > > >>>
> > > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > t.p.ellison@gmail.com>
> > > > > >> wrote:
> > > > > >>>
> > > > > >>>> How about an approach like this?
> > > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > > > > >>>>
> > > > > >>>> The "on-ramp" is the driver [1], which calls upon the service
> to
> > > > find
> > > > > a
> > > > > >>>> plug-in [2] that claims to implement the required platform
> > > > responder,
> > > > > >>>> e.g. [3].
> > > > > >>>>
> > > > > >>>> The list of plug-ins is given in the provider's JAR file, so
> the
> > > > ones
> > > > > we
> > > > > >>>> provide in Pirk are listed together [4], but if you split
> these
> > into
> > > > > >>>> modules, or somebody brings their own JAR alongside, these
> would
> > be
> > > > > >>>> listed in each JAR's services/ directory.
> > > > > >>>>
> > > > > >>>> [1]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > > > ResponderDriver.java
> > > > > >>>> [2]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.
> > java
> > > > > >>>> [3]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > > > >>>> StormResponder.java
> > > > > >>>> [4]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > > > >>>>
> > > > > >>>> I'm not even going to dignify this with a WIP PR, it is far
> from
> > > > > ready,
> > > > > >>>> so proceed with caution.  There is hopefully enough there to
> > show
> > > > the
> > > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > > > > >>>>
> > > > > >>>> Regards,
> > > > > >>>> Tim
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > >
> >
>

Re: Google docs

Posted by Darin Johnson <db...@gmail.com>.

Updated per Tim's comments in the document plus the Apache footer. I didn't
see anything in legal for these types of documents so I just used the
Apache Copyright footer, if that is incorrect let me know.

On Wed, Sep 21, 2016 at 8:23 AM, Tim Ellison <t....@gmail.com> wrote:

> On 21/09/16 13:13, Suneel Marthi wrote:
> > On Wed, Sep 21, 2016 at 1:10 PM, Tim Ellison <t....@gmail.com>
> wrote:
> >
> >> On 21/09/16 11:36, Suneel Marthi wrote:
> >>> On Wed, Sep 21, 2016 at 10:43 AM, Tim Ellison <t....@gmail.com>
> >> wrote:
> >>>
> >>>> On 21/09/16 02:40, Darin Johnson wrote:
> >>>>> Suneel, a google doc as promised, only a day late (sorry - sick kid).
> >>>>>
> >>>>> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> >>>> mMrRQyynQ-Q6MFbI/edit?usp=sharing
> >>>>>
> >>>>> I was planning on working on this, but I'm going to take a day or two
> >> to
> >>>>> let others comment.
> >>>>
> >>>> Just swapping to my mentoring hat for a moment...
> >>>>
> >>>> If you are going to use non-ASF resources for project materials, you
> >>>> should make it clear under what terms those materials are offered.
> >>>>
> >>>> When we all discuss code and ideas on the ASF's mailing lists, JIRAs,
> >>>> wiki pages, etc, it is understood that these are contributions under
> the
> >>>> ALv2 for inclusion in the project; but if I point you to Pirk related
> >>>> material on my Google account, personal blog, or corporate website
> then
> >>>> it can become ambiguous as to how that material may be used.
> >>>>
> >>>> That's why I'm a proponent for keeping things on ASF infrastructure
> >>>> unless there is a big win for doing it elsewhere that justifies making
> >>>> the terms of that new location explicit.
> >>>>
> >>>> In this particular case, if you need the ability to simultaneously
> edit
> >>>> the document (or some other Google docs feature) then I'd advise
> adding
> >>>> an ALv2 footer [1]; otherwise why not just use our wiki? [2]
> >>>>
> >>>> [1] e.g. Licensed under the Apache License, Version 2.0.
> >>>> [2] https://cwiki.apache.org/confluence/display/PIRK
> >>>>
> >>>> Regards,
> >>>> Tim
> >>>>
> >>>
> >>> Point noted. Between creating a Wiki page and a Google Doc, to me
> Google
> >>> doc takes a lot less time and its easier for others to comment and
> >> modify.
> >>> The Wiki has problems of its own, know all too well from having to
> >> approve
> >>> the monthly incubator reports and see how much time u spend waiting for
> >> the
> >>> login screen to show up and before u r granted access :-)
> >>>
> >>> The actual discussions are still happening on mailing lists which point
> >> to
> >>> the Google docs, so what's the issue here?  If adding an ALv2 footer
> >> makes
> >>> a google doc ASF legitimate - fine.
> >>>
> >>> I pointed to an example of how Flink is doing process and feature
> >>> improvements which is using - project Wiki, Google Docs and all
> >> discussions
> >>> happening on mailing lists -
> >>> https://cwiki.apache.org/confluence/pages/viewpage.
> >> action?pageId=65870673
> >>>
> >>> There are other projects doing things similarly - projects like Kudu,
> and
> >>> others even have public slack channels. We can go and start picking on
> >>> every one of those and start crying foul.  We can still use the old -
> >>> fashioned IRC channels (or its ultra modern avatars like Slack which I
> >>> would very much prefer).
> >>
> >> I'm no apologist for ASF infrastructure, which tends to be a lowest
> >> common denominator.  There is always lots of discussion on the infra
> >> lists about moving things on.
> >>
> >> My point is that putting a link on this list to material hosted
> >> elsewhere does not make it a contribution.  I am simply doing my mentor
> >> duty to ensure people are aware of that and (a) mark things as
> >> contributions where they are so, and (b) don't take things from
> >> elsewhere without checking the terms first.
> >>
> >> Of course, I'm not suggesting anything bad has occurred here.
> >>
> >> If other PMCs have less focus on this, that's for them to consider, but
> >> I have direct personal experience of an Apache project being part of a
> >> very expensive court case over its contributions so I am hyper-sensitive
> >> to it.
> >>
> >> Regards,
> >> Tim
> >>
> >
> > Apologies for my response. I see your point of view and concur. Would it
> > suffice to add an ASF legal footer to the Google doc? The Google doc
> lives
> > as long as we are hashing out the design, once implemented the Wiki would
> > be the permanent home and the Google doc ceases to exist.
>
> Yes to all.  I'll let Darin add it as he wrote the doc.
>
> I didn't mean this to be a big deal :-( and as this is all material that
> is on the list anyway it's a poor example.
>
> Regards,
> Tim
>
>

Re: Google docs

Posted by Tim Ellison <t....@gmail.com>.

On 21/09/16 13:13, Suneel Marthi wrote:
> On Wed, Sep 21, 2016 at 1:10 PM, Tim Ellison <t....@gmail.com> wrote:
> 
>> On 21/09/16 11:36, Suneel Marthi wrote:
>>> On Wed, Sep 21, 2016 at 10:43 AM, Tim Ellison <t....@gmail.com>
>> wrote:
>>>
>>>> On 21/09/16 02:40, Darin Johnson wrote:
>>>>> Suneel, a google doc as promised, only a day late (sorry - sick kid).
>>>>>
>>>>> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
>>>> mMrRQyynQ-Q6MFbI/edit?usp=sharing
>>>>>
>>>>> I was planning on working on this, but I'm going to take a day or two
>> to
>>>>> let others comment.
>>>>
>>>> Just swapping to my mentoring hat for a moment...
>>>>
>>>> If you are going to use non-ASF resources for project materials, you
>>>> should make it clear under what terms those materials are offered.
>>>>
>>>> When we all discuss code and ideas on the ASF's mailing lists, JIRAs,
>>>> wiki pages, etc, it is understood that these are contributions under the
>>>> ALv2 for inclusion in the project; but if I point you to Pirk related
>>>> material on my Google account, personal blog, or corporate website then
>>>> it can become ambiguous as to how that material may be used.
>>>>
>>>> That's why I'm a proponent for keeping things on ASF infrastructure
>>>> unless there is a big win for doing it elsewhere that justifies making
>>>> the terms of that new location explicit.
>>>>
>>>> In this particular case, if you need the ability to simultaneously edit
>>>> the document (or some other Google docs feature) then I'd advise adding
>>>> an ALv2 footer [1]; otherwise why not just use our wiki? [2]
>>>>
>>>> [1] e.g. Licensed under the Apache License, Version 2.0.
>>>> [2] https://cwiki.apache.org/confluence/display/PIRK
>>>>
>>>> Regards,
>>>> Tim
>>>>
>>>
>>> Point noted. Between creating a Wiki page and a Google Doc, to me Google
>>> doc takes a lot less time and its easier for others to comment and
>> modify.
>>> The Wiki has problems of its own, know all too well from having to
>> approve
>>> the monthly incubator reports and see how much time u spend waiting for
>> the
>>> login screen to show up and before u r granted access :-)
>>>
>>> The actual discussions are still happening on mailing lists which point
>> to
>>> the Google docs, so what's the issue here?  If adding an ALv2 footer
>> makes
>>> a google doc ASF legitimate - fine.
>>>
>>> I pointed to an example of how Flink is doing process and feature
>>> improvements which is using - project Wiki, Google Docs and all
>> discussions
>>> happening on mailing lists -
>>> https://cwiki.apache.org/confluence/pages/viewpage.
>> action?pageId=65870673
>>>
>>> There are other projects doing things similarly - projects like Kudu, and
>>> others even have public slack channels. We can go and start picking on
>>> every one of those and start crying foul.  We can still use the old -
>>> fashioned IRC channels (or its ultra modern avatars like Slack which I
>>> would very much prefer).
>>
>> I'm no apologist for ASF infrastructure, which tends to be a lowest
>> common denominator.  There is always lots of discussion on the infra
>> lists about moving things on.
>>
>> My point is that putting a link on this list to material hosted
>> elsewhere does not make it a contribution.  I am simply doing my mentor
>> duty to ensure people are aware of that and (a) mark things as
>> contributions where they are so, and (b) don't take things from
>> elsewhere without checking the terms first.
>>
>> Of course, I'm not suggesting anything bad has occurred here.
>>
>> If other PMCs have less focus on this, that's for them to consider, but
>> I have direct personal experience of an Apache project being part of a
>> very expensive court case over its contributions so I am hyper-sensitive
>> to it.
>>
>> Regards,
>> Tim
>>
> 
> Apologies for my response. I see your point of view and concur. Would it
> suffice to add an ASF legal footer to the Google doc? The Google doc lives
> as long as we are hashing out the design, once implemented the Wiki would
> be the permanent home and the Google doc ceases to exist.

Yes to all.  I'll let Darin add it as he wrote the doc.

I didn't mean this to be a big deal :-( and as this is all material that
is on the list anyway it's a poor example.

Regards,
Tim

Re: Google docs

Posted by Ellison Anne Williams <ea...@apache.org>.

Clearly, I will defer to the mentors here - but, to weigh in -

It seems that as long as the Google doc is made publicly available via a
link on the mailing list, has the appropriate legal markings, is moved to
the wiki upon completion, and all discussion still occurs on the mailing
list, we can use the technology as an interim medium. Honestly, with all of
those caveats, it may be easier just to use the wiki - other than not
having to wait a few seconds for updates, I'm not sure what GoogleDocs
really buys us here.

As a side note, perhaps it's worth conversation at the ASF level as to how
ASF can incorporate 'new' technologies such as Slack, GoogleDocs, etc.
Maybe these are already occurring and I am unaware...

On Wed, Sep 21, 2016 at 8:13 AM, Suneel Marthi <su...@gmail.com>
wrote:

> On Wed, Sep 21, 2016 at 1:10 PM, Tim Ellison <t....@gmail.com>
> wrote:
>
> > On 21/09/16 11:36, Suneel Marthi wrote:
> > > On Wed, Sep 21, 2016 at 10:43 AM, Tim Ellison <t....@gmail.com>
> > wrote:
> > >
> > >> On 21/09/16 02:40, Darin Johnson wrote:
> > >>> Suneel, a google doc as promised, only a day late (sorry - sick kid).
> > >>>
> > >>> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> > >> mMrRQyynQ-Q6MFbI/edit?usp=sharing
> > >>>
> > >>> I was planning on working on this, but I'm going to take a day or two
> > to
> > >>> let others comment.
> > >>
> > >> Just swapping to my mentoring hat for a moment...
> > >>
> > >> If you are going to use non-ASF resources for project materials, you
> > >> should make it clear under what terms those materials are offered.
> > >>
> > >> When we all discuss code and ideas on the ASF's mailing lists, JIRAs,
> > >> wiki pages, etc, it is understood that these are contributions under
> the
> > >> ALv2 for inclusion in the project; but if I point you to Pirk related
> > >> material on my Google account, personal blog, or corporate website
> then
> > >> it can become ambiguous as to how that material may be used.
> > >>
> > >> That's why I'm a proponent for keeping things on ASF infrastructure
> > >> unless there is a big win for doing it elsewhere that justifies making
> > >> the terms of that new location explicit.
> > >>
> > >> In this particular case, if you need the ability to simultaneously
> edit
> > >> the document (or some other Google docs feature) then I'd advise
> adding
> > >> an ALv2 footer [1]; otherwise why not just use our wiki? [2]
> > >>
> > >> [1] e.g. Licensed under the Apache License, Version 2.0.
> > >> [2] https://cwiki.apache.org/confluence/display/PIRK
> > >>
> > >> Regards,
> > >> Tim
> > >>
> > >
> > > Point noted. Between creating a Wiki page and a Google Doc, to me
> Google
> > > doc takes a lot less time and its easier for others to comment and
> > modify.
> > > The Wiki has problems of its own, know all too well from having to
> > approve
> > > the monthly incubator reports and see how much time u spend waiting for
> > the
> > > login screen to show up and before u r granted access :-)
> > >
> > > The actual discussions are still happening on mailing lists which point
> > to
> > > the Google docs, so what's the issue here?  If adding an ALv2 footer
> > makes
> > > a google doc ASF legitimate - fine.
> > >
> > > I pointed to an example of how Flink is doing process and feature
> > > improvements which is using - project Wiki, Google Docs and all
> > discussions
> > > happening on mailing lists -
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=65870673
> > >
> > > There are other projects doing things similarly - projects like Kudu,
> and
> > > others even have public slack channels. We can go and start picking on
> > > every one of those and start crying foul.  We can still use the old -
> > > fashioned IRC channels (or its ultra modern avatars like Slack which I
> > > would very much prefer).
> >
> > I'm no apologist for ASF infrastructure, which tends to be a lowest
> > common denominator.  There is always lots of discussion on the infra
> > lists about moving things on.
> >
> > My point is that putting a link on this list to material hosted
> > elsewhere does not make it a contribution.  I am simply doing my mentor
> > duty to ensure people are aware of that and (a) mark things as
> > contributions where they are so, and (b) don't take things from
> > elsewhere without checking the terms first.
> >
> > Of course, I'm not suggesting anything bad has occurred here.
> >
> > If other PMCs have less focus on this, that's for them to consider, but
> > I have direct personal experience of an Apache project being part of a
> > very expensive court case over its contributions so I am hyper-sensitive
> > to it.
> >
> > Regards,
> > Tim
> >
>
> Apologies for my response. I see your point of view and concur. Would it
> suffice to add an ASF legal footer to the Google doc? The Google doc lives
> as long as we are hashing out the design, once implemented the Wiki would
> be the permanent home and the Google doc ceases to exist.
>

Re: Google docs

Posted by Suneel Marthi <su...@gmail.com>.

On Wed, Sep 21, 2016 at 1:10 PM, Tim Ellison <t....@gmail.com> wrote:

> On 21/09/16 11:36, Suneel Marthi wrote:
> > On Wed, Sep 21, 2016 at 10:43 AM, Tim Ellison <t....@gmail.com>
> wrote:
> >
> >> On 21/09/16 02:40, Darin Johnson wrote:
> >>> Suneel, a google doc as promised, only a day late (sorry - sick kid).
> >>>
> >>> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> >> mMrRQyynQ-Q6MFbI/edit?usp=sharing
> >>>
> >>> I was planning on working on this, but I'm going to take a day or two
> to
> >>> let others comment.
> >>
> >> Just swapping to my mentoring hat for a moment...
> >>
> >> If you are going to use non-ASF resources for project materials, you
> >> should make it clear under what terms those materials are offered.
> >>
> >> When we all discuss code and ideas on the ASF's mailing lists, JIRAs,
> >> wiki pages, etc, it is understood that these are contributions under the
> >> ALv2 for inclusion in the project; but if I point you to Pirk related
> >> material on my Google account, personal blog, or corporate website then
> >> it can become ambiguous as to how that material may be used.
> >>
> >> That's why I'm a proponent for keeping things on ASF infrastructure
> >> unless there is a big win for doing it elsewhere that justifies making
> >> the terms of that new location explicit.
> >>
> >> In this particular case, if you need the ability to simultaneously edit
> >> the document (or some other Google docs feature) then I'd advise adding
> >> an ALv2 footer [1]; otherwise why not just use our wiki? [2]
> >>
> >> [1] e.g. Licensed under the Apache License, Version 2.0.
> >> [2] https://cwiki.apache.org/confluence/display/PIRK
> >>
> >> Regards,
> >> Tim
> >>
> >
> > Point noted. Between creating a Wiki page and a Google Doc, to me Google
> > doc takes a lot less time and its easier for others to comment and
> modify.
> > The Wiki has problems of its own, know all too well from having to
> approve
> > the monthly incubator reports and see how much time u spend waiting for
> the
> > login screen to show up and before u r granted access :-)
> >
> > The actual discussions are still happening on mailing lists which point
> to
> > the Google docs, so what's the issue here?  If adding an ALv2 footer
> makes
> > a google doc ASF legitimate - fine.
> >
> > I pointed to an example of how Flink is doing process and feature
> > improvements which is using - project Wiki, Google Docs and all
> discussions
> > happening on mailing lists -
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65870673
> >
> > There are other projects doing things similarly - projects like Kudu, and
> > others even have public slack channels. We can go and start picking on
> > every one of those and start crying foul.  We can still use the old -
> > fashioned IRC channels (or its ultra modern avatars like Slack which I
> > would very much prefer).
>
> I'm no apologist for ASF infrastructure, which tends to be a lowest
> common denominator.  There is always lots of discussion on the infra
> lists about moving things on.
>
> My point is that putting a link on this list to material hosted
> elsewhere does not make it a contribution.  I am simply doing my mentor
> duty to ensure people are aware of that and (a) mark things as
> contributions where they are so, and (b) don't take things from
> elsewhere without checking the terms first.
>
> Of course, I'm not suggesting anything bad has occurred here.
>
> If other PMCs have less focus on this, that's for them to consider, but
> I have direct personal experience of an Apache project being part of a
> very expensive court case over its contributions so I am hyper-sensitive
> to it.
>
> Regards,
> Tim
>

Apologies for my response. I see your point of view and concur. Would it
suffice to add an ASF legal footer to the Google doc? The Google doc lives
as long as we are hashing out the design, once implemented the Wiki would
be the permanent home and the Google doc ceases to exist.

Re: Google docs

Posted by Tim Ellison <t....@gmail.com>.

On 21/09/16 11:36, Suneel Marthi wrote:
> On Wed, Sep 21, 2016 at 10:43 AM, Tim Ellison <t....@gmail.com> wrote:
> 
>> On 21/09/16 02:40, Darin Johnson wrote:
>>> Suneel, a google doc as promised, only a day late (sorry - sick kid).
>>>
>>> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
>> mMrRQyynQ-Q6MFbI/edit?usp=sharing
>>>
>>> I was planning on working on this, but I'm going to take a day or two to
>>> let others comment.
>>
>> Just swapping to my mentoring hat for a moment...
>>
>> If you are going to use non-ASF resources for project materials, you
>> should make it clear under what terms those materials are offered.
>>
>> When we all discuss code and ideas on the ASF's mailing lists, JIRAs,
>> wiki pages, etc, it is understood that these are contributions under the
>> ALv2 for inclusion in the project; but if I point you to Pirk related
>> material on my Google account, personal blog, or corporate website then
>> it can become ambiguous as to how that material may be used.
>>
>> That's why I'm a proponent for keeping things on ASF infrastructure
>> unless there is a big win for doing it elsewhere that justifies making
>> the terms of that new location explicit.
>>
>> In this particular case, if you need the ability to simultaneously edit
>> the document (or some other Google docs feature) then I'd advise adding
>> an ALv2 footer [1]; otherwise why not just use our wiki? [2]
>>
>> [1] e.g. Licensed under the Apache License, Version 2.0.
>> [2] https://cwiki.apache.org/confluence/display/PIRK
>>
>> Regards,
>> Tim
>>
> 
> Point noted. Between creating a Wiki page and a Google Doc, to me Google
> doc takes a lot less time and its easier for others to comment and modify.
> The Wiki has problems of its own, know all too well from having to approve
> the monthly incubator reports and see how much time u spend waiting for the
> login screen to show up and before u r granted access :-)
> 
> The actual discussions are still happening on mailing lists which point to
> the Google docs, so what's the issue here?  If adding an ALv2 footer makes
> a google doc ASF legitimate - fine.
> 
> I pointed to an example of how Flink is doing process and feature
> improvements which is using - project Wiki, Google Docs and all discussions
> happening on mailing lists -
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673
> 
> There are other projects doing things similarly - projects like Kudu, and
> others even have public slack channels. We can go and start picking on
> every one of those and start crying foul.  We can still use the old -
> fashioned IRC channels (or its ultra modern avatars like Slack which I
> would very much prefer).

I'm no apologist for ASF infrastructure, which tends to be a lowest
common denominator.  There is always lots of discussion on the infra
lists about moving things on.

My point is that putting a link on this list to material hosted
elsewhere does not make it a contribution.  I am simply doing my mentor
duty to ensure people are aware of that and (a) mark things as
contributions where they are so, and (b) don't take things from
elsewhere without checking the terms first.

Of course, I'm not suggesting anything bad has occurred here.

If other PMCs have less focus on this, that's for them to consider, but
I have direct personal experience of an Apache project being part of a
very expensive court case over its contributions so I am hyper-sensitive
to it.

Regards,
Tim

Re: Google docs (was: Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE)

Posted by Suneel Marthi <sm...@apache.org>.

On Wed, Sep 21, 2016 at 10:43 AM, Tim Ellison <t....@gmail.com> wrote:

> On 21/09/16 02:40, Darin Johnson wrote:
> > Suneel, a google doc as promised, only a day late (sorry - sick kid).
> >
> > https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> mMrRQyynQ-Q6MFbI/edit?usp=sharing
> >
> > I was planning on working on this, but I'm going to take a day or two to
> > let others comment.
>
> Just swapping to my mentoring hat for a moment...
>
> If you are going to use non-ASF resources for project materials, you
> should make it clear under what terms those materials are offered.
>
> When we all discuss code and ideas on the ASF's mailing lists, JIRAs,
> wiki pages, etc, it is understood that these are contributions under the
> ALv2 for inclusion in the project; but if I point you to Pirk related
> material on my Google account, personal blog, or corporate website then
> it can become ambiguous as to how that material may be used.
>
> That's why I'm a proponent for keeping things on ASF infrastructure
> unless there is a big win for doing it elsewhere that justifies making
> the terms of that new location explicit.
>
> In this particular case, if you need the ability to simultaneously edit
> the document (or some other Google docs feature) then I'd advise adding
> an ALv2 footer [1]; otherwise why not just use our wiki? [2]
>
> [1] e.g. Licensed under the Apache License, Version 2.0.
> [2] https://cwiki.apache.org/confluence/display/PIRK
>
> Regards,
> Tim
>

Point noted. Between creating a Wiki page and a Google Doc, to me Google
doc takes a lot less time and its easier for others to comment and modify.
The Wiki has problems of its own, know all too well from having to approve
the monthly incubator reports and see how much time u spend waiting for the
login screen to show up and before u r granted access :-)

The actual discussions are still happening on mailing lists which point to
the Google docs, so what's the issue here?  If adding an ALv2 footer makes
a google doc ASF legitimate - fine.

I pointed to an example of how Flink is doing process and feature
improvements which is using - project Wiki, Google Docs and all discussions
happening on mailing lists -
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673

There are other projects doing things similarly - projects like Kudu, and
others even have public slack channels. We can go and start picking on
every one of those and start crying foul.  We can still use the old -
fashioned IRC channels (or its ultra modern avatars like Slack which I
would very much prefer).

Google docs (was: Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE)

Posted by Tim Ellison <t....@gmail.com>.

On 21/09/16 02:40, Darin Johnson wrote:
> Suneel, a google doc as promised, only a day late (sorry - sick kid).
> 
> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_mMrRQyynQ-Q6MFbI/edit?usp=sharing
> 
> I was planning on working on this, but I'm going to take a day or two to
> let others comment.

Just swapping to my mentoring hat for a moment...

If you are going to use non-ASF resources for project materials, you
should make it clear under what terms those materials are offered.

When we all discuss code and ideas on the ASF's mailing lists, JIRAs,
wiki pages, etc, it is understood that these are contributions under the
ALv2 for inclusion in the project; but if I point you to Pirk related
material on my Google account, personal blog, or corporate website then
it can become ambiguous as to how that material may be used.

That's why I'm a proponent for keeping things on ASF infrastructure
unless there is a big win for doing it elsewhere that justifies making
the terms of that new location explicit.

In this particular case, if you need the ability to simultaneously edit
the document (or some other Google docs feature) then I'd advise adding
an ALv2 footer [1]; otherwise why not just use our wiki? [2]

[1] e.g. Licensed under the Apache License, Version 2.0.
[2] https://cwiki.apache.org/confluence/display/PIRK

Regards,
Tim

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

On 21/09/16 03:22, Ellison Anne Williams wrote:
> I am in favor of breaking out pirk-core as specified so that our initial
> submodule structure would be as follows:
> 
> pirk-core (encryption,query, inputformat, serialization, utils)
> 
> pirk-responder (core responder incl. standalone)
> 
> pirk-querier
> 
> pirk-storm
> 
> pirk-mapreduce
> 
> pirk-spark
> 
> pirk-benchmark
> 
> pirk-distributed-test

Yes, I certainly wouldn't split it up any more than this yet.

> One thing to note is that under this breakdown, pirk-core would not include
> the Elasticsearch dependency (es-hadoop). The only submodules that would
> have the es-hadoop dependency (those which need it) currently are
> pirk-mapreduce, pirk-spark, and pirk-distributed-test.
> 
> 
> I believe that we agreed (somewhere :)) in this thread to go ahead and
> remove the platform 'backwards compatibility' for PIRK-63. Please holler if
> this is not correct.

I agree.  While it is trivial to maintain that compatibility, it feels
like we still are in an era where we should use the freedom to drop it.

Regards,
Tim

> On Tue, Sep 20, 2016 at 9:40 PM, Darin Johnson <db...@gmail.com>
> wrote:
> 
>> Suneel, a google doc as promised, only a day late (sorry - sick kid).
>>
>> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
>> mMrRQyynQ-Q6MFbI/edit?usp=sharing
>>
>> I was planning on working on this, but I'm going to take a day or two to
>> let others comment.
>>
>> Darin
>>
>> On Mon, Sep 19, 2016 at 5:07 PM, Suneel Marthi <su...@gmail.com>
>> wrote:
>>
>>> A shared Google doc would be more convenient than a bunch of Jiras. Its
>>> easier to comment and add notes that way.
>>>
>>>
>>> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <dbjohnson1978@gmail.com
>>>
>>> wrote:
>>>
>>>> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
>>>> Based off my pirk-63 I was able to pull spark and storm out with no
>>>> issues.  I was planning to pull them out, then tackling elastic search,
>>>> then hadoop as it's a little entrenched.  This should keep most PRs to
>>>> manageable chunks. I think once that's done addressing the configs will
>>>> make more sense.
>>>>
>>>> I'm open to suggestions. But the hope would be:
>>>> Pirk-parent
>>>> Pirk-core
>>>> Pirk-hadoop
>>>> Pirk-storm
>>>> Pirk-parent
>>>>
>>>> Pirk-es is a little weird as it's really just an inputformat, seems
>> like
>>>> there's a more general solution here than creating submodules for every
>>>> inputformat.
>>>>
>>>> Darin
>>>>
>>>> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
>>>>
>>>>>
>>>>
>>>>> Refactor is definitely a first priority.  Is there a design/proposal
>>>> draft
>>>>> that we could comment on about how to go about refactoring the
>> code.  I
>>>>> have been trying to keep up with the emails but definitely would have
>>>>> missed some.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
>>>>> eawilliams@apache.org <ea...@apache.org>> wrote:
>>>>>
>>>>>> Agree - let's leave the config/CLI the way it is for now and tackle
>>>> that as
>>>>>> a subsequent design discussion and PR.
>>>>>>
>>>>>> Also, I think that we should leave the ResponderDriver and the
>>>>>> ResponderProps alone for this PR and push to a subsequent PR (once
>> we
>>>>>> decide if and how we would like to delegate each).
>>>>>>
>>>>>> I vote to remove the 'platform' option and the backwards
>>> compatibility
>>>> in
>>>>>> this PR and proceed with having a ResponderLauncher interface and
>>>> forcing
>>>>>> its implementation by the ResponderDriver.
>>>>>>
>>>>>> And, I am not so concerned with having one fat jar vs. multiple
>> jars
>>>> right
>>>>>> now - to me, at this point, it's a 'nice to have' and not a 'must
>>> have'
>>>> for
>>>>>> Pirk functionality. We do need to break out Pirk into more clearly
>>>> defined
>>>>>> submodules (which is in progress) - via this re-factor, I think
>> that
>>> we
>>>>>> will gain some ability to generate multiple jars which is nice.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
>> t.p.ellison@gmail.com
>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> On 19/09/16 15:46, Darin Johnson wrote:
>>>>>>>> Hey guys,
>>>>>>>>
>>>>>>>> Thanks for looking at the PR, I apologize if it offended
>> anyone's
>>>>>> eyes:).
>>>>>>>>
>>>>>>>> I'm glad it generated some discussion about the
>> configuration.  I
>>>>>> didn't
>>>>>>>> really like where things were heading with the config.
>> However,
>>>> didn't
>>>>>>>> want to create to much scope creep.
>>>>>>>>
>>>>>>>> I think any hierarchical config (TypeSafe or yaml) would make
>>>> things
>>>>>> much
>>>>>>>> more maintainable, the plugin could simply grab the appropriate
>>>> part of
>>>>>>> the
>>>>>>>> config and handle accordingly.  I'd also cut down the number of
>>>> command
>>>>>>>> line options to only those that change between runs often (like
>>>>>>>> input/output)
>>>>>>>>
>>>>>>>>> One option is to make Pirk pluggable, so that a Pirk
>>> installation
>>>>>> could
>>>>>>>>> use one or more of these in an extensible fashion by adding
>> JAR
>>>> files.
>>>>>>>>> That would still require selecting one by command-line
>> argument.
>>>>>>>>
>>>>>>>> An argument for this approach is for lambda architecture
>>> approaches
>>>>>> (say
>>>>>>>> spark/spark-streaming) were the contents of the jars would be
>> so
>>>>>> similar
>>>>>>> it
>>>>>>>> seems like to much trouble to create separate jars.
>>>>>>>>
>>>>>>>> Happy to continue working on this given some direction on where
>>>> you'd
>>>>>>> like
>>>>>>>> it to go.  Also, it's a bit of a blocker to refactoring the
>> build
>>>> into
>>>>>>>> submodules.
>>>>>>>
>>>>>>> FWIW my 2c is to not try and fix all the problems in one go, and
>>>> rather
>>>>>>> take a compromise on the configurations while you tease apart the
>>>>>>> submodules in to separate source code trees, poms, etc; then come
>>>> back
>>>>>>> and fix the runtime configs.
>>>>>>>
>>>>>>> Once the submodules are in place it will open up more work for
>>>> release
>>>>>>> engineering and tinkering that can be done in parallel with the
>>>> config
>>>>>>> polishing.
>>>>>>>
>>>>>>> Just a thought.
>>>>>>> Tim
>>>>>>>
>>>>>>>
>>>>>>>> On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
>>>> t.p.ellison@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> On 19/09/16 13:40, Ellison Anne Williams wrote:
>>>>>>>>>> It seems that it's the same idea as the ResponderLauncher
>> with
>>>> the
>>>>>>>>> service
>>>>>>>>>> component added to maintain something akin to the
>> 'platform'. I
>>>> would
>>>>>>>>>> prefer that we just did away with the platform notion
>>> altogether
>>>> and
>>>>>>> make
>>>>>>>>>> the ResponderDriver 'dumb'. We get around needing a
>>>> platform-aware
>>>>>>>>> service
>>>>>>>>>> by requiring the ResponderLauncher implementation to be
>> passed
>>> as
>>>> a
>>>>>> CLI
>>>>>>>>> to
>>>>>>>>>> the ResponderDriver.
>>>>>>>>>
>>>>>>>>> Let me check I understand what you are saying here.
>>>>>>>>>
>>>>>>>>> At the moment, there is a monolithic Pirk that hard codes how
>> to
>>>>>> respond
>>>>>>>>> using lots of different backends (mapreduce, spark,
>>>> sparkstreaming,
>>>>>>>>> storm , standalone), and that is selected by command-line
>>>> argument.
>>>>>>>>>
>>>>>>>>> One option is to make Pirk pluggable, so that a Pirk
>>> installation
>>>>>> could
>>>>>>>>> use one or more of these in an extensible fashion by adding
>> JAR
>>>> files.
>>>>>>>>> That would still require selecting one by command-line
>> argument.
>>>>>>>>>
>>>>>>>>> A second option is to simply pass in the required backend JAR
>> to
>>>>>> select
>>>>>>>>> the particular implementation you choose, as a specific Pirk
>>>>>>>>> installation doesn't need to use multiple backends
>>> simultaneously.
>>>>>>>>>
>>>>>>>>> ...and you are leaning towards the second option.  Do I have
>>> that
>>>>>>> correct?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Tim
>>>>>>>>>
>>>>>>>>>> Am I missing something? Is there a good reason to provide a
>>>> service
>>>>>> by
>>>>>>>>>> which platforms are registered? I'm open...
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
>>>> t.p.ellison@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> How about an approach like this?
>>>>>>>>>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>>>>>>>>>>
>>>>>>>>>>> The "on-ramp" is the driver [1], which calls upon the
>> service
>>> to
>>>>>> find
>>>>>>> a
>>>>>>>>>>> plug-in [2] that claims to implement the required platform
>>>>>> responder,
>>>>>>>>>>> e.g. [3].
>>>>>>>>>>>
>>>>>>>>>>> The list of plug-ins is given in the provider's JAR file, so
>>> the
>>>>>> ones
>>>>>>> we
>>>>>>>>>>> provide in Pirk are listed together [4], but if you split
>>> these
>>>> into
>>>>>>>>>>> modules, or somebody brings their own JAR alongside, these
>>> would
>>>> be
>>>>>>>>>>> listed in each JAR's services/ directory.
>>>>>>>>>>>
>>>>>>>>>>> [1]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/
>>>>>>> ResponderDriver.java
>>>>>>>>>>> [2]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/java/org/apache/pirk/
>> responder/spi/ResponderPlugin.
>>>> java
>>>>>>>>>>> [3]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>>>>>>>>>>> StormResponder.java
>>>>>>>>>>> [4]
>>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>>> src/main/services/org.apache.responder.spi.Responder
>>>>>>>>>>>
>>>>>>>>>>> I'm not even going to dignify this with a WIP PR, it is far
>>> from
>>>>>>> ready,
>>>>>>>>>>> so proceed with caution.  There is hopefully enough there to
>>>> show
>>>>>> the
>>>>>>>>>>> approach, and if it is worth continuing I'm happy to do so.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Tim
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Suneel Marthi <su...@gmail.com>.

A potential use case (please correct me here) could be that someone needs a
Storm impl of Responder which means we need to generate an artifact like -
'pirk-responder-storm.jar'



On Wed, Sep 21, 2016 at 5:41 AM, Darin Johnson <db...@gmail.com>
wrote:

> So along these lines I want to ask one more set of questions.
>
> Since storm/spark/hadoop are just responders do we want to put them as
> modules below responder? I'm not in favor of this but thought I'd ask.
>
> Does it make sense to put some responders in a contrib section?  Storm does
> this for a lot of spouts and things.  I think it will make sense
> eventually.
>
>
>
> On Tue, Sep 20, 2016 at 10:22 PM, Ellison Anne Williams <
> eawilliams@apache.org> wrote:
>
> > I am in favor of breaking out pirk-core as specified so that our initial
> > submodule structure would be as follows:
> >
> > pirk-core (encryption,query, inputformat, serialization, utils)
> >
> > pirk-responder (core responder incl. standalone)
> >
> > pirk-querier
> >
> > pirk-storm
> >
> > pirk-mapreduce
> >
> > pirk-spark
> >
> > pirk-benchmark
> >
> > pirk-distributed-test
> >
> >
> > One thing to note is that under this breakdown, pirk-core would not
> include
> > the Elasticsearch dependency (es-hadoop). The only submodules that would
> > have the es-hadoop dependency (those which need it) currently are
> > pirk-mapreduce, pirk-spark, and pirk-distributed-test.
> >
> >
> > I believe that we agreed (somewhere :)) in this thread to go ahead and
> > remove the platform 'backwards compatibility' for PIRK-63. Please holler
> if
> > this is not correct.
> >
> >
> >
> >
> > On Tue, Sep 20, 2016 at 9:40 PM, Darin Johnson <db...@gmail.com>
> > wrote:
> >
> > > Suneel, a google doc as promised, only a day late (sorry - sick kid).
> > >
> > > https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> > > mMrRQyynQ-Q6MFbI/edit?usp=sharing
> > >
> > > I was planning on working on this, but I'm going to take a day or two
> to
> > > let others comment.
> > >
> > > Darin
> > >
> > > On Mon, Sep 19, 2016 at 5:07 PM, Suneel Marthi <
> suneel.marthi@gmail.com>
> > > wrote:
> > >
> > > > A shared Google doc would be more convenient than a bunch of Jiras.
> Its
> > > > easier to comment and add notes that way.
> > > >
> > > >
> > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <
> > dbjohnson1978@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Suneel, I'll try to put a couple jiras on it tonight with my
> > thoughts.
> > > > > Based off my pirk-63 I was able to pull spark and storm out with no
> > > > > issues.  I was planning to pull them out, then tackling elastic
> > search,
> > > > > then hadoop as it's a little entrenched.  This should keep most PRs
> > to
> > > > > manageable chunks. I think once that's done addressing the configs
> > will
> > > > > make more sense.
> > > > >
> > > > > I'm open to suggestions. But the hope would be:
> > > > > Pirk-parent
> > > > > Pirk-core
> > > > > Pirk-hadoop
> > > > > Pirk-storm
> > > > > Pirk-parent
> > > > >
> > > > > Pirk-es is a little weird as it's really just an inputformat, seems
> > > like
> > > > > there's a more general solution here than creating submodules for
> > every
> > > > > inputformat.
> > > > >
> > > > > Darin
> > > > >
> > > > > On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org>
> wrote:
> > > > >
> > > > > >
> > > > >
> > > > > > Refactor is definitely a first priority.  Is there a
> > design/proposal
> > > > > draft
> > > > > > that we could comment on about how to go about refactoring the
> > > code.  I
> > > > > > have been trying to keep up with the emails but definitely would
> > have
> > > > > > missed some.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > > > > eawilliams@apache.org <ea...@apache.org>> wrote:
> > > > > >
> > > > > > > Agree - let's leave the config/CLI the way it is for now and
> > tackle
> > > > > that as
> > > > > > > a subsequent design discussion and PR.
> > > > > > >
> > > > > > > Also, I think that we should leave the ResponderDriver and the
> > > > > > > ResponderProps alone for this PR and push to a subsequent PR
> > (once
> > > we
> > > > > > > decide if and how we would like to delegate each).
> > > > > > >
> > > > > > > I vote to remove the 'platform' option and the backwards
> > > > compatibility
> > > > > in
> > > > > > > this PR and proceed with having a ResponderLauncher interface
> and
> > > > > forcing
> > > > > > > its implementation by the ResponderDriver.
> > > > > > >
> > > > > > > And, I am not so concerned with having one fat jar vs. multiple
> > > jars
> > > > > right
> > > > > > > now - to me, at this point, it's a 'nice to have' and not a
> 'must
> > > > have'
> > > > > for
> > > > > > > Pirk functionality. We do need to break out Pirk into more
> > clearly
> > > > > defined
> > > > > > > submodules (which is in progress) - via this re-factor, I think
> > > that
> > > > we
> > > > > > > will gain some ability to generate multiple jars which is nice.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> > > t.p.ellison@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > > > > > Hey guys,
> > > > > > > > >
> > > > > > > > > Thanks for looking at the PR, I apologize if it offended
> > > anyone's
> > > > > > > eyes:).
> > > > > > > > >
> > > > > > > > > I'm glad it generated some discussion about the
> > > configuration.  I
> > > > > > > didn't
> > > > > > > > > really like where things were heading with the config.
> > > However,
> > > > > didn't
> > > > > > > > > want to create to much scope creep.
> > > > > > > > >
> > > > > > > > > I think any hierarchical config (TypeSafe or yaml) would
> make
> > > > > things
> > > > > > > much
> > > > > > > > > more maintainable, the plugin could simply grab the
> > appropriate
> > > > > part of
> > > > > > > > the
> > > > > > > > > config and handle accordingly.  I'd also cut down the
> number
> > of
> > > > > command
> > > > > > > > > line options to only those that change between runs often
> > (like
> > > > > > > > > input/output)
> > > > > > > > >
> > > > > > > > >> One option is to make Pirk pluggable, so that a Pirk
> > > > installation
> > > > > > > could
> > > > > > > > >> use one or more of these in an extensible fashion by
> adding
> > > JAR
> > > > > files.
> > > > > > > > >> That would still require selecting one by command-line
> > > argument.
> > > > > > > > >
> > > > > > > > > An argument for this approach is for lambda architecture
> > > > approaches
> > > > > > > (say
> > > > > > > > > spark/spark-streaming) were the contents of the jars would
> be
> > > so
> > > > > > > similar
> > > > > > > > it
> > > > > > > > > seems like to much trouble to create separate jars.
> > > > > > > > >
> > > > > > > > > Happy to continue working on this given some direction on
> > where
> > > > > you'd
> > > > > > > > like
> > > > > > > > > it to go.  Also, it's a bit of a blocker to refactoring the
> > > build
> > > > > into
> > > > > > > > > submodules.
> > > > > > > >
> > > > > > > > FWIW my 2c is to not try and fix all the problems in one go,
> > and
> > > > > rather
> > > > > > > > take a compromise on the configurations while you tease apart
> > the
> > > > > > > > submodules in to separate source code trees, poms, etc; then
> > come
> > > > > back
> > > > > > > > and fix the runtime configs.
> > > > > > > >
> > > > > > > > Once the submodules are in place it will open up more work
> for
> > > > > release
> > > > > > > > engineering and tinkering that can be done in parallel with
> the
> > > > > config
> > > > > > > > polishing.
> > > > > > > >
> > > > > > > > Just a thought.
> > > > > > > > Tim
> > > > > > > >
> > > > > > > >
> > > > > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > > > > t.p.ellison@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > > > > > > >>> It seems that it's the same idea as the ResponderLauncher
> > > with
> > > > > the
> > > > > > > > >> service
> > > > > > > > >>> component added to maintain something akin to the
> > > 'platform'. I
> > > > > would
> > > > > > > > >>> prefer that we just did away with the platform notion
> > > > altogether
> > > > > and
> > > > > > > > make
> > > > > > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > > > > platform-aware
> > > > > > > > >> service
> > > > > > > > >>> by requiring the ResponderLauncher implementation to be
> > > passed
> > > > as
> > > > > a
> > > > > > > CLI
> > > > > > > > >> to
> > > > > > > > >>> the ResponderDriver.
> > > > > > > > >>
> > > > > > > > >> Let me check I understand what you are saying here.
> > > > > > > > >>
> > > > > > > > >> At the moment, there is a monolithic Pirk that hard codes
> > how
> > > to
> > > > > > > respond
> > > > > > > > >> using lots of different backends (mapreduce, spark,
> > > > > sparkstreaming,
> > > > > > > > >> storm , standalone), and that is selected by command-line
> > > > > argument.
> > > > > > > > >>
> > > > > > > > >> One option is to make Pirk pluggable, so that a Pirk
> > > > installation
> > > > > > > could
> > > > > > > > >> use one or more of these in an extensible fashion by
> adding
> > > JAR
> > > > > files.
> > > > > > > > >> That would still require selecting one by command-line
> > > argument.
> > > > > > > > >>
> > > > > > > > >> A second option is to simply pass in the required backend
> > JAR
> > > to
> > > > > > > select
> > > > > > > > >> the particular implementation you choose, as a specific
> Pirk
> > > > > > > > >> installation doesn't need to use multiple backends
> > > > simultaneously.
> > > > > > > > >>
> > > > > > > > >> ...and you are leaning towards the second option.  Do I
> have
> > > > that
> > > > > > > > correct?
> > > > > > > > >>
> > > > > > > > >> Regards,
> > > > > > > > >> Tim
> > > > > > > > >>
> > > > > > > > >>> Am I missing something? Is there a good reason to
> provide a
> > > > > service
> > > > > > > by
> > > > > > > > >>> which platforms are registered? I'm open...
> > > > > > > > >>>
> > > > > > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > > > > t.p.ellison@gmail.com>
> > > > > > > > >> wrote:
> > > > > > > > >>>
> > > > > > > > >>>> How about an approach like this?
> > > > > > > > >>>>    https://github.com/tellison/
> > incubator-pirk/tree/pirk-63
> > > > > > > > >>>>
> > > > > > > > >>>> The "on-ramp" is the driver [1], which calls upon the
> > > service
> > > > to
> > > > > > > find
> > > > > > > > a
> > > > > > > > >>>> plug-in [2] that claims to implement the required
> platform
> > > > > > > responder,
> > > > > > > > >>>> e.g. [3].
> > > > > > > > >>>>
> > > > > > > > >>>> The list of plug-ins is given in the provider's JAR
> file,
> > so
> > > > the
> > > > > > > ones
> > > > > > > > we
> > > > > > > > >>>> provide in Pirk are listed together [4], but if you
> split
> > > > these
> > > > > into
> > > > > > > > >>>> modules, or somebody brings their own JAR alongside,
> these
> > > > would
> > > > > be
> > > > > > > > >>>> listed in each JAR's services/ directory.
> > > > > > > > >>>>
> > > > > > > > >>>> [1]
> > > > > > > > >>>> https://github.com/tellison/
> incubator-pirk/blob/pirk-63/
> > > > > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > > > > > > ResponderDriver.java
> > > > > > > > >>>> [2]
> > > > > > > > >>>> https://github.com/tellison/
> incubator-pirk/blob/pirk-63/
> > > > > > > > >>>> src/main/java/org/apache/pirk/
> > > responder/spi/ResponderPlugin.
> > > > > java
> > > > > > > > >>>> [3]
> > > > > > > > >>>> https://github.com/tellison/
> incubator-pirk/blob/pirk-63/
> > > > > > > > >>>> src/main/java/org/apache/pirk/
> responder/wideskies/storm/
> > > > > > > > >>>> StormResponder.java
> > > > > > > > >>>> [4]
> > > > > > > > >>>> https://github.com/tellison/
> incubator-pirk/blob/pirk-63/
> > > > > > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > > > > > > >>>>
> > > > > > > > >>>> I'm not even going to dignify this with a WIP PR, it is
> > far
> > > > from
> > > > > > > > ready,
> > > > > > > > >>>> so proceed with caution.  There is hopefully enough
> there
> > to
> > > > > show
> > > > > > > the
> > > > > > > > >>>> approach, and if it is worth continuing I'm happy to do
> > so.
> > > > > > > > >>>>
> > > > > > > > >>>> Regards,
> > > > > > > > >>>> Tim
> > > > > > > > >>>>
> > > > > > > > >>>>
> > > > > > > > >>>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

So along these lines I want to ask one more set of questions.

Since storm/spark/hadoop are just responders do we want to put them as
modules below responder? I'm not in favor of this but thought I'd ask.

Does it make sense to put some responders in a contrib section?  Storm does
this for a lot of spouts and things.  I think it will make sense eventually.



On Tue, Sep 20, 2016 at 10:22 PM, Ellison Anne Williams <
eawilliams@apache.org> wrote:

> I am in favor of breaking out pirk-core as specified so that our initial
> submodule structure would be as follows:
>
> pirk-core (encryption,query, inputformat, serialization, utils)
>
> pirk-responder (core responder incl. standalone)
>
> pirk-querier
>
> pirk-storm
>
> pirk-mapreduce
>
> pirk-spark
>
> pirk-benchmark
>
> pirk-distributed-test
>
>
> One thing to note is that under this breakdown, pirk-core would not include
> the Elasticsearch dependency (es-hadoop). The only submodules that would
> have the es-hadoop dependency (those which need it) currently are
> pirk-mapreduce, pirk-spark, and pirk-distributed-test.
>
>
> I believe that we agreed (somewhere :)) in this thread to go ahead and
> remove the platform 'backwards compatibility' for PIRK-63. Please holler if
> this is not correct.
>
>
>
>
> On Tue, Sep 20, 2016 at 9:40 PM, Darin Johnson <db...@gmail.com>
> wrote:
>
> > Suneel, a google doc as promised, only a day late (sorry - sick kid).
> >
> > https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> > mMrRQyynQ-Q6MFbI/edit?usp=sharing
> >
> > I was planning on working on this, but I'm going to take a day or two to
> > let others comment.
> >
> > Darin
> >
> > On Mon, Sep 19, 2016 at 5:07 PM, Suneel Marthi <su...@gmail.com>
> > wrote:
> >
> > > A shared Google doc would be more convenient than a bunch of Jiras. Its
> > > easier to comment and add notes that way.
> > >
> > >
> > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <
> dbjohnson1978@gmail.com
> > >
> > > wrote:
> > >
> > > > Suneel, I'll try to put a couple jiras on it tonight with my
> thoughts.
> > > > Based off my pirk-63 I was able to pull spark and storm out with no
> > > > issues.  I was planning to pull them out, then tackling elastic
> search,
> > > > then hadoop as it's a little entrenched.  This should keep most PRs
> to
> > > > manageable chunks. I think once that's done addressing the configs
> will
> > > > make more sense.
> > > >
> > > > I'm open to suggestions. But the hope would be:
> > > > Pirk-parent
> > > > Pirk-core
> > > > Pirk-hadoop
> > > > Pirk-storm
> > > > Pirk-parent
> > > >
> > > > Pirk-es is a little weird as it's really just an inputformat, seems
> > like
> > > > there's a more general solution here than creating submodules for
> every
> > > > inputformat.
> > > >
> > > > Darin
> > > >
> > > > On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
> > > >
> > > > >
> > > >
> > > > > Refactor is definitely a first priority.  Is there a
> design/proposal
> > > > draft
> > > > > that we could comment on about how to go about refactoring the
> > code.  I
> > > > > have been trying to keep up with the emails but definitely would
> have
> > > > > missed some.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > > > eawilliams@apache.org <ea...@apache.org>> wrote:
> > > > >
> > > > > > Agree - let's leave the config/CLI the way it is for now and
> tackle
> > > > that as
> > > > > > a subsequent design discussion and PR.
> > > > > >
> > > > > > Also, I think that we should leave the ResponderDriver and the
> > > > > > ResponderProps alone for this PR and push to a subsequent PR
> (once
> > we
> > > > > > decide if and how we would like to delegate each).
> > > > > >
> > > > > > I vote to remove the 'platform' option and the backwards
> > > compatibility
> > > > in
> > > > > > this PR and proceed with having a ResponderLauncher interface and
> > > > forcing
> > > > > > its implementation by the ResponderDriver.
> > > > > >
> > > > > > And, I am not so concerned with having one fat jar vs. multiple
> > jars
> > > > right
> > > > > > now - to me, at this point, it's a 'nice to have' and not a 'must
> > > have'
> > > > for
> > > > > > Pirk functionality. We do need to break out Pirk into more
> clearly
> > > > defined
> > > > > > submodules (which is in progress) - via this re-factor, I think
> > that
> > > we
> > > > > > will gain some ability to generate multiple jars which is nice.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> > t.p.ellison@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > > > > Hey guys,
> > > > > > > >
> > > > > > > > Thanks for looking at the PR, I apologize if it offended
> > anyone's
> > > > > > eyes:).
> > > > > > > >
> > > > > > > > I'm glad it generated some discussion about the
> > configuration.  I
> > > > > > didn't
> > > > > > > > really like where things were heading with the config.
> > However,
> > > > didn't
> > > > > > > > want to create to much scope creep.
> > > > > > > >
> > > > > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > > > things
> > > > > > much
> > > > > > > > more maintainable, the plugin could simply grab the
> appropriate
> > > > part of
> > > > > > > the
> > > > > > > > config and handle accordingly.  I'd also cut down the number
> of
> > > > command
> > > > > > > > line options to only those that change between runs often
> (like
> > > > > > > > input/output)
> > > > > > > >
> > > > > > > >> One option is to make Pirk pluggable, so that a Pirk
> > > installation
> > > > > > could
> > > > > > > >> use one or more of these in an extensible fashion by adding
> > JAR
> > > > files.
> > > > > > > >> That would still require selecting one by command-line
> > argument.
> > > > > > > >
> > > > > > > > An argument for this approach is for lambda architecture
> > > approaches
> > > > > > (say
> > > > > > > > spark/spark-streaming) were the contents of the jars would be
> > so
> > > > > > similar
> > > > > > > it
> > > > > > > > seems like to much trouble to create separate jars.
> > > > > > > >
> > > > > > > > Happy to continue working on this given some direction on
> where
> > > > you'd
> > > > > > > like
> > > > > > > > it to go.  Also, it's a bit of a blocker to refactoring the
> > build
> > > > into
> > > > > > > > submodules.
> > > > > > >
> > > > > > > FWIW my 2c is to not try and fix all the problems in one go,
> and
> > > > rather
> > > > > > > take a compromise on the configurations while you tease apart
> the
> > > > > > > submodules in to separate source code trees, poms, etc; then
> come
> > > > back
> > > > > > > and fix the runtime configs.
> > > > > > >
> > > > > > > Once the submodules are in place it will open up more work for
> > > > release
> > > > > > > engineering and tinkering that can be done in parallel with the
> > > > config
> > > > > > > polishing.
> > > > > > >
> > > > > > > Just a thought.
> > > > > > > Tim
> > > > > > >
> > > > > > >
> > > > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > > > t.p.ellison@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > > > > > >>> It seems that it's the same idea as the ResponderLauncher
> > with
> > > > the
> > > > > > > >> service
> > > > > > > >>> component added to maintain something akin to the
> > 'platform'. I
> > > > would
> > > > > > > >>> prefer that we just did away with the platform notion
> > > altogether
> > > > and
> > > > > > > make
> > > > > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > > > platform-aware
> > > > > > > >> service
> > > > > > > >>> by requiring the ResponderLauncher implementation to be
> > passed
> > > as
> > > > a
> > > > > > CLI
> > > > > > > >> to
> > > > > > > >>> the ResponderDriver.
> > > > > > > >>
> > > > > > > >> Let me check I understand what you are saying here.
> > > > > > > >>
> > > > > > > >> At the moment, there is a monolithic Pirk that hard codes
> how
> > to
> > > > > > respond
> > > > > > > >> using lots of different backends (mapreduce, spark,
> > > > sparkstreaming,
> > > > > > > >> storm , standalone), and that is selected by command-line
> > > > argument.
> > > > > > > >>
> > > > > > > >> One option is to make Pirk pluggable, so that a Pirk
> > > installation
> > > > > > could
> > > > > > > >> use one or more of these in an extensible fashion by adding
> > JAR
> > > > files.
> > > > > > > >> That would still require selecting one by command-line
> > argument.
> > > > > > > >>
> > > > > > > >> A second option is to simply pass in the required backend
> JAR
> > to
> > > > > > select
> > > > > > > >> the particular implementation you choose, as a specific Pirk
> > > > > > > >> installation doesn't need to use multiple backends
> > > simultaneously.
> > > > > > > >>
> > > > > > > >> ...and you are leaning towards the second option.  Do I have
> > > that
> > > > > > > correct?
> > > > > > > >>
> > > > > > > >> Regards,
> > > > > > > >> Tim
> > > > > > > >>
> > > > > > > >>> Am I missing something? Is there a good reason to provide a
> > > > service
> > > > > > by
> > > > > > > >>> which platforms are registered? I'm open...
> > > > > > > >>>
> > > > > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > > > t.p.ellison@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>>
> > > > > > > >>>> How about an approach like this?
> > > > > > > >>>>    https://github.com/tellison/
> incubator-pirk/tree/pirk-63
> > > > > > > >>>>
> > > > > > > >>>> The "on-ramp" is the driver [1], which calls upon the
> > service
> > > to
> > > > > > find
> > > > > > > a
> > > > > > > >>>> plug-in [2] that claims to implement the required platform
> > > > > > responder,
> > > > > > > >>>> e.g. [3].
> > > > > > > >>>>
> > > > > > > >>>> The list of plug-ins is given in the provider's JAR file,
> so
> > > the
> > > > > > ones
> > > > > > > we
> > > > > > > >>>> provide in Pirk are listed together [4], but if you split
> > > these
> > > > into
> > > > > > > >>>> modules, or somebody brings their own JAR alongside, these
> > > would
> > > > be
> > > > > > > >>>> listed in each JAR's services/ directory.
> > > > > > > >>>>
> > > > > > > >>>> [1]
> > > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > > > > > ResponderDriver.java
> > > > > > > >>>> [2]
> > > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > > >>>> src/main/java/org/apache/pirk/
> > responder/spi/ResponderPlugin.
> > > > java
> > > > > > > >>>> [3]
> > > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > > > > > >>>> StormResponder.java
> > > > > > > >>>> [4]
> > > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > > > > > >>>>
> > > > > > > >>>> I'm not even going to dignify this with a WIP PR, it is
> far
> > > from
> > > > > > > ready,
> > > > > > > >>>> so proceed with caution.  There is hopefully enough there
> to
> > > > show
> > > > > > the
> > > > > > > >>>> approach, and if it is worth continuing I'm happy to do
> so.
> > > > > > > >>>>
> > > > > > > >>>> Regards,
> > > > > > > >>>> Tim
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>
> > > > > > > >
> > > > > > >
> > > > > >
> > > >
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Ellison Anne Williams <ea...@apache.org>.

I am in favor of breaking out pirk-core as specified so that our initial
submodule structure would be as follows:

pirk-core (encryption,query, inputformat, serialization, utils)

pirk-responder (core responder incl. standalone)

pirk-querier

pirk-storm

pirk-mapreduce

pirk-spark

pirk-benchmark

pirk-distributed-test


One thing to note is that under this breakdown, pirk-core would not include
the Elasticsearch dependency (es-hadoop). The only submodules that would
have the es-hadoop dependency (those which need it) currently are
pirk-mapreduce, pirk-spark, and pirk-distributed-test.


I believe that we agreed (somewhere :)) in this thread to go ahead and
remove the platform 'backwards compatibility' for PIRK-63. Please holler if
this is not correct.




On Tue, Sep 20, 2016 at 9:40 PM, Darin Johnson <db...@gmail.com>
wrote:

> Suneel, a google doc as promised, only a day late (sorry - sick kid).
>
> https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_
> mMrRQyynQ-Q6MFbI/edit?usp=sharing
>
> I was planning on working on this, but I'm going to take a day or two to
> let others comment.
>
> Darin
>
> On Mon, Sep 19, 2016 at 5:07 PM, Suneel Marthi <su...@gmail.com>
> wrote:
>
> > A shared Google doc would be more convenient than a bunch of Jiras. Its
> > easier to comment and add notes that way.
> >
> >
> > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <dbjohnson1978@gmail.com
> >
> > wrote:
> >
> > > Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> > > Based off my pirk-63 I was able to pull spark and storm out with no
> > > issues.  I was planning to pull them out, then tackling elastic search,
> > > then hadoop as it's a little entrenched.  This should keep most PRs to
> > > manageable chunks. I think once that's done addressing the configs will
> > > make more sense.
> > >
> > > I'm open to suggestions. But the hope would be:
> > > Pirk-parent
> > > Pirk-core
> > > Pirk-hadoop
> > > Pirk-storm
> > > Pirk-parent
> > >
> > > Pirk-es is a little weird as it's really just an inputformat, seems
> like
> > > there's a more general solution here than creating submodules for every
> > > inputformat.
> > >
> > > Darin
> > >
> > > On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
> > >
> > > >
> > >
> > > > Refactor is definitely a first priority.  Is there a design/proposal
> > > draft
> > > > that we could comment on about how to go about refactoring the
> code.  I
> > > > have been trying to keep up with the emails but definitely would have
> > > > missed some.
> > > >
> > > >
> > > >
> > > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > > eawilliams@apache.org <ea...@apache.org>> wrote:
> > > >
> > > > > Agree - let's leave the config/CLI the way it is for now and tackle
> > > that as
> > > > > a subsequent design discussion and PR.
> > > > >
> > > > > Also, I think that we should leave the ResponderDriver and the
> > > > > ResponderProps alone for this PR and push to a subsequent PR (once
> we
> > > > > decide if and how we would like to delegate each).
> > > > >
> > > > > I vote to remove the 'platform' option and the backwards
> > compatibility
> > > in
> > > > > this PR and proceed with having a ResponderLauncher interface and
> > > forcing
> > > > > its implementation by the ResponderDriver.
> > > > >
> > > > > And, I am not so concerned with having one fat jar vs. multiple
> jars
> > > right
> > > > > now - to me, at this point, it's a 'nice to have' and not a 'must
> > have'
> > > for
> > > > > Pirk functionality. We do need to break out Pirk into more clearly
> > > defined
> > > > > submodules (which is in progress) - via this re-factor, I think
> that
> > we
> > > > > will gain some ability to generate multiple jars which is nice.
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> t.p.ellison@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > > > Hey guys,
> > > > > > >
> > > > > > > Thanks for looking at the PR, I apologize if it offended
> anyone's
> > > > > eyes:).
> > > > > > >
> > > > > > > I'm glad it generated some discussion about the
> configuration.  I
> > > > > didn't
> > > > > > > really like where things were heading with the config.
> However,
> > > didn't
> > > > > > > want to create to much scope creep.
> > > > > > >
> > > > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > > things
> > > > > much
> > > > > > > more maintainable, the plugin could simply grab the appropriate
> > > part of
> > > > > > the
> > > > > > > config and handle accordingly.  I'd also cut down the number of
> > > command
> > > > > > > line options to only those that change between runs often (like
> > > > > > > input/output)
> > > > > > >
> > > > > > >> One option is to make Pirk pluggable, so that a Pirk
> > installation
> > > > > could
> > > > > > >> use one or more of these in an extensible fashion by adding
> JAR
> > > files.
> > > > > > >> That would still require selecting one by command-line
> argument.
> > > > > > >
> > > > > > > An argument for this approach is for lambda architecture
> > approaches
> > > > > (say
> > > > > > > spark/spark-streaming) were the contents of the jars would be
> so
> > > > > similar
> > > > > > it
> > > > > > > seems like to much trouble to create separate jars.
> > > > > > >
> > > > > > > Happy to continue working on this given some direction on where
> > > you'd
> > > > > > like
> > > > > > > it to go.  Also, it's a bit of a blocker to refactoring the
> build
> > > into
> > > > > > > submodules.
> > > > > >
> > > > > > FWIW my 2c is to not try and fix all the problems in one go, and
> > > rather
> > > > > > take a compromise on the configurations while you tease apart the
> > > > > > submodules in to separate source code trees, poms, etc; then come
> > > back
> > > > > > and fix the runtime configs.
> > > > > >
> > > > > > Once the submodules are in place it will open up more work for
> > > release
> > > > > > engineering and tinkering that can be done in parallel with the
> > > config
> > > > > > polishing.
> > > > > >
> > > > > > Just a thought.
> > > > > > Tim
> > > > > >
> > > > > >
> > > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > > t.p.ellison@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > > > > >>> It seems that it's the same idea as the ResponderLauncher
> with
> > > the
> > > > > > >> service
> > > > > > >>> component added to maintain something akin to the
> 'platform'. I
> > > would
> > > > > > >>> prefer that we just did away with the platform notion
> > altogether
> > > and
> > > > > > make
> > > > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > > platform-aware
> > > > > > >> service
> > > > > > >>> by requiring the ResponderLauncher implementation to be
> passed
> > as
> > > a
> > > > > CLI
> > > > > > >> to
> > > > > > >>> the ResponderDriver.
> > > > > > >>
> > > > > > >> Let me check I understand what you are saying here.
> > > > > > >>
> > > > > > >> At the moment, there is a monolithic Pirk that hard codes how
> to
> > > > > respond
> > > > > > >> using lots of different backends (mapreduce, spark,
> > > sparkstreaming,
> > > > > > >> storm , standalone), and that is selected by command-line
> > > argument.
> > > > > > >>
> > > > > > >> One option is to make Pirk pluggable, so that a Pirk
> > installation
> > > > > could
> > > > > > >> use one or more of these in an extensible fashion by adding
> JAR
> > > files.
> > > > > > >> That would still require selecting one by command-line
> argument.
> > > > > > >>
> > > > > > >> A second option is to simply pass in the required backend JAR
> to
> > > > > select
> > > > > > >> the particular implementation you choose, as a specific Pirk
> > > > > > >> installation doesn't need to use multiple backends
> > simultaneously.
> > > > > > >>
> > > > > > >> ...and you are leaning towards the second option.  Do I have
> > that
> > > > > > correct?
> > > > > > >>
> > > > > > >> Regards,
> > > > > > >> Tim
> > > > > > >>
> > > > > > >>> Am I missing something? Is there a good reason to provide a
> > > service
> > > > > by
> > > > > > >>> which platforms are registered? I'm open...
> > > > > > >>>
> > > > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > > t.p.ellison@gmail.com>
> > > > > > >> wrote:
> > > > > > >>>
> > > > > > >>>> How about an approach like this?
> > > > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > > > > > >>>>
> > > > > > >>>> The "on-ramp" is the driver [1], which calls upon the
> service
> > to
> > > > > find
> > > > > > a
> > > > > > >>>> plug-in [2] that claims to implement the required platform
> > > > > responder,
> > > > > > >>>> e.g. [3].
> > > > > > >>>>
> > > > > > >>>> The list of plug-ins is given in the provider's JAR file, so
> > the
> > > > > ones
> > > > > > we
> > > > > > >>>> provide in Pirk are listed together [4], but if you split
> > these
> > > into
> > > > > > >>>> modules, or somebody brings their own JAR alongside, these
> > would
> > > be
> > > > > > >>>> listed in each JAR's services/ directory.
> > > > > > >>>>
> > > > > > >>>> [1]
> > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > > > > ResponderDriver.java
> > > > > > >>>> [2]
> > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > >>>> src/main/java/org/apache/pirk/
> responder/spi/ResponderPlugin.
> > > java
> > > > > > >>>> [3]
> > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > > > > >>>> StormResponder.java
> > > > > > >>>> [4]
> > > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > > > > >>>>
> > > > > > >>>> I'm not even going to dignify this with a WIP PR, it is far
> > from
> > > > > > ready,
> > > > > > >>>> so proceed with caution.  There is hopefully enough there to
> > > show
> > > > > the
> > > > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > > > > > >>>>
> > > > > > >>>> Regards,
> > > > > > >>>> Tim
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

Suneel, a google doc as promised, only a day late (sorry - sick kid).

https://docs.google.com/document/d/1K8E0TridC1hBfqDwWCqdZ8Dj_5_mMrRQyynQ-Q6MFbI/edit?usp=sharing

I was planning on working on this, but I'm going to take a day or two to
let others comment.

Darin

On Mon, Sep 19, 2016 at 5:07 PM, Suneel Marthi <su...@gmail.com>
wrote:

> A shared Google doc would be more convenient than a bunch of Jiras. Its
> easier to comment and add notes that way.
>
>
> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <db...@gmail.com>
> wrote:
>
> > Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> > Based off my pirk-63 I was able to pull spark and storm out with no
> > issues.  I was planning to pull them out, then tackling elastic search,
> > then hadoop as it's a little entrenched.  This should keep most PRs to
> > manageable chunks. I think once that's done addressing the configs will
> > make more sense.
> >
> > I'm open to suggestions. But the hope would be:
> > Pirk-parent
> > Pirk-core
> > Pirk-hadoop
> > Pirk-storm
> > Pirk-parent
> >
> > Pirk-es is a little weird as it's really just an inputformat, seems like
> > there's a more general solution here than creating submodules for every
> > inputformat.
> >
> > Darin
> >
> > On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
> >
> > >
> >
> > > Refactor is definitely a first priority.  Is there a design/proposal
> > draft
> > > that we could comment on about how to go about refactoring the code.  I
> > > have been trying to keep up with the emails but definitely would have
> > > missed some.
> > >
> > >
> > >
> > > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > eawilliams@apache.org <ea...@apache.org>> wrote:
> > >
> > > > Agree - let's leave the config/CLI the way it is for now and tackle
> > that as
> > > > a subsequent design discussion and PR.
> > > >
> > > > Also, I think that we should leave the ResponderDriver and the
> > > > ResponderProps alone for this PR and push to a subsequent PR (once we
> > > > decide if and how we would like to delegate each).
> > > >
> > > > I vote to remove the 'platform' option and the backwards
> compatibility
> > in
> > > > this PR and proceed with having a ResponderLauncher interface and
> > forcing
> > > > its implementation by the ResponderDriver.
> > > >
> > > > And, I am not so concerned with having one fat jar vs. multiple jars
> > right
> > > > now - to me, at this point, it's a 'nice to have' and not a 'must
> have'
> > for
> > > > Pirk functionality. We do need to break out Pirk into more clearly
> > defined
> > > > submodules (which is in progress) - via this re-factor, I think that
> we
> > > > will gain some ability to generate multiple jars which is nice.
> > > >
> > > >
> > > >
> > > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t.p.ellison@gmail.com
> >
> > > > wrote:
> > > >
> > > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > > Hey guys,
> > > > > >
> > > > > > Thanks for looking at the PR, I apologize if it offended anyone's
> > > > eyes:).
> > > > > >
> > > > > > I'm glad it generated some discussion about the configuration.  I
> > > > didn't
> > > > > > really like where things were heading with the config.  However,
> > didn't
> > > > > > want to create to much scope creep.
> > > > > >
> > > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > things
> > > > much
> > > > > > more maintainable, the plugin could simply grab the appropriate
> > part of
> > > > > the
> > > > > > config and handle accordingly.  I'd also cut down the number of
> > command
> > > > > > line options to only those that change between runs often (like
> > > > > > input/output)
> > > > > >
> > > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> > > > could
> > > > > >> use one or more of these in an extensible fashion by adding JAR
> > files.
> > > > > >> That would still require selecting one by command-line argument.
> > > > > >
> > > > > > An argument for this approach is for lambda architecture
> approaches
> > > > (say
> > > > > > spark/spark-streaming) were the contents of the jars would be so
> > > > similar
> > > > > it
> > > > > > seems like to much trouble to create separate jars.
> > > > > >
> > > > > > Happy to continue working on this given some direction on where
> > you'd
> > > > > like
> > > > > > it to go.  Also, it's a bit of a blocker to refactoring the build
> > into
> > > > > > submodules.
> > > > >
> > > > > FWIW my 2c is to not try and fix all the problems in one go, and
> > rather
> > > > > take a compromise on the configurations while you tease apart the
> > > > > submodules in to separate source code trees, poms, etc; then come
> > back
> > > > > and fix the runtime configs.
> > > > >
> > > > > Once the submodules are in place it will open up more work for
> > release
> > > > > engineering and tinkering that can be done in parallel with the
> > config
> > > > > polishing.
> > > > >
> > > > > Just a thought.
> > > > > Tim
> > > > >
> > > > >
> > > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > t.p.ellison@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > > > >>> It seems that it's the same idea as the ResponderLauncher with
> > the
> > > > > >> service
> > > > > >>> component added to maintain something akin to the 'platform'. I
> > would
> > > > > >>> prefer that we just did away with the platform notion
> altogether
> > and
> > > > > make
> > > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > platform-aware
> > > > > >> service
> > > > > >>> by requiring the ResponderLauncher implementation to be passed
> as
> > a
> > > > CLI
> > > > > >> to
> > > > > >>> the ResponderDriver.
> > > > > >>
> > > > > >> Let me check I understand what you are saying here.
> > > > > >>
> > > > > >> At the moment, there is a monolithic Pirk that hard codes how to
> > > > respond
> > > > > >> using lots of different backends (mapreduce, spark,
> > sparkstreaming,
> > > > > >> storm , standalone), and that is selected by command-line
> > argument.
> > > > > >>
> > > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> > > > could
> > > > > >> use one or more of these in an extensible fashion by adding JAR
> > files.
> > > > > >> That would still require selecting one by command-line argument.
> > > > > >>
> > > > > >> A second option is to simply pass in the required backend JAR to
> > > > select
> > > > > >> the particular implementation you choose, as a specific Pirk
> > > > > >> installation doesn't need to use multiple backends
> simultaneously.
> > > > > >>
> > > > > >> ...and you are leaning towards the second option.  Do I have
> that
> > > > > correct?
> > > > > >>
> > > > > >> Regards,
> > > > > >> Tim
> > > > > >>
> > > > > >>> Am I missing something? Is there a good reason to provide a
> > service
> > > > by
> > > > > >>> which platforms are registered? I'm open...
> > > > > >>>
> > > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > t.p.ellison@gmail.com>
> > > > > >> wrote:
> > > > > >>>
> > > > > >>>> How about an approach like this?
> > > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > > > > >>>>
> > > > > >>>> The "on-ramp" is the driver [1], which calls upon the service
> to
> > > > find
> > > > > a
> > > > > >>>> plug-in [2] that claims to implement the required platform
> > > > responder,
> > > > > >>>> e.g. [3].
> > > > > >>>>
> > > > > >>>> The list of plug-ins is given in the provider's JAR file, so
> the
> > > > ones
> > > > > we
> > > > > >>>> provide in Pirk are listed together [4], but if you split
> these
> > into
> > > > > >>>> modules, or somebody brings their own JAR alongside, these
> would
> > be
> > > > > >>>> listed in each JAR's services/ directory.
> > > > > >>>>
> > > > > >>>> [1]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > > > ResponderDriver.java
> > > > > >>>> [2]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.
> > java
> > > > > >>>> [3]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > > > >>>> StormResponder.java
> > > > > >>>> [4]
> > > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > > > >>>>
> > > > > >>>> I'm not even going to dignify this with a WIP PR, it is far
> from
> > > > > ready,
> > > > > >>>> so proceed with caution.  There is hopefully enough there to
> > show
> > > > the
> > > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > > > > >>>>
> > > > > >>>> Regards,
> > > > > >>>> Tim
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > > >
> > > > >
> > > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

On 19/09/16 22:10, Suneel Marthi wrote:
> Here's an example from the Flink project for how they go about new features
> or system breaking API changes, we could start a similar process. The Flink
> guys call these FLIP (Flink Improvement Proposal) and Kafka community
> similarly has something called KLIP.
> 
> We could start a PLIP (??? :-) )

I expect an change 'process' will evolve as required, but +1 for
starting to document some of the architecture on the Pirk wiki.

My notebook has a few scrappy line drawings at the moment.

Regards,
Tim


> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673
> 
> 
> On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi <su...@gmail.com>
> wrote:
> 
>> A shared Google doc would be more convenient than a bunch of Jiras. Its
>> easier to comment and add notes that way.
>>
>>
>> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <db...@gmail.com>
>> wrote:
>>
>>> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
>>> Based off my pirk-63 I was able to pull spark and storm out with no
>>> issues.  I was planning to pull them out, then tackling elastic search,
>>> then hadoop as it's a little entrenched.  This should keep most PRs to
>>> manageable chunks. I think once that's done addressing the configs will
>>> make more sense.
>>>
>>> I'm open to suggestions. But the hope would be:
>>> Pirk-parent
>>> Pirk-core
>>> Pirk-hadoop
>>> Pirk-storm
>>> Pirk-parent
>>>
>>> Pirk-es is a little weird as it's really just an inputformat, seems like
>>> there's a more general solution here than creating submodules for every
>>> inputformat.
>>>
>>> Darin
>>>
>>> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
>>>
>>>>
>>>
>>>> Refactor is definitely a first priority.  Is there a design/proposal
>>> draft
>>>> that we could comment on about how to go about refactoring the code.  I
>>>> have been trying to keep up with the emails but definitely would have
>>>> missed some.
>>>>
>>>>
>>>>
>>>> On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
>>>> eawilliams@apache.org <ea...@apache.org>> wrote:
>>>>
>>>>> Agree - let's leave the config/CLI the way it is for now and tackle
>>> that as
>>>>> a subsequent design discussion and PR.
>>>>>
>>>>> Also, I think that we should leave the ResponderDriver and the
>>>>> ResponderProps alone for this PR and push to a subsequent PR (once we
>>>>> decide if and how we would like to delegate each).
>>>>>
>>>>> I vote to remove the 'platform' option and the backwards compatibility
>>> in
>>>>> this PR and proceed with having a ResponderLauncher interface and
>>> forcing
>>>>> its implementation by the ResponderDriver.
>>>>>
>>>>> And, I am not so concerned with having one fat jar vs. multiple jars
>>> right
>>>>> now - to me, at this point, it's a 'nice to have' and not a 'must
>>> have'
>>> for
>>>>> Pirk functionality. We do need to break out Pirk into more clearly
>>> defined
>>>>> submodules (which is in progress) - via this re-factor, I think that
>>> we
>>>>> will gain some ability to generate multiple jars which is nice.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> On 19/09/16 15:46, Darin Johnson wrote:
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> Thanks for looking at the PR, I apologize if it offended anyone's
>>>>> eyes:).
>>>>>>>
>>>>>>> I'm glad it generated some discussion about the configuration.  I
>>>>> didn't
>>>>>>> really like where things were heading with the config.  However,
>>> didn't
>>>>>>> want to create to much scope creep.
>>>>>>>
>>>>>>> I think any hierarchical config (TypeSafe or yaml) would make
>>> things
>>>>> much
>>>>>>> more maintainable, the plugin could simply grab the appropriate
>>> part of
>>>>>> the
>>>>>>> config and handle accordingly.  I'd also cut down the number of
>>> command
>>>>>>> line options to only those that change between runs often (like
>>>>>>> input/output)
>>>>>>>
>>>>>>>> One option is to make Pirk pluggable, so that a Pirk installation
>>>>> could
>>>>>>>> use one or more of these in an extensible fashion by adding JAR
>>> files.
>>>>>>>> That would still require selecting one by command-line argument.
>>>>>>>
>>>>>>> An argument for this approach is for lambda architecture
>>> approaches
>>>>> (say
>>>>>>> spark/spark-streaming) were the contents of the jars would be so
>>>>> similar
>>>>>> it
>>>>>>> seems like to much trouble to create separate jars.
>>>>>>>
>>>>>>> Happy to continue working on this given some direction on where
>>> you'd
>>>>>> like
>>>>>>> it to go.  Also, it's a bit of a blocker to refactoring the build
>>> into
>>>>>>> submodules.
>>>>>>
>>>>>> FWIW my 2c is to not try and fix all the problems in one go, and
>>> rather
>>>>>> take a compromise on the configurations while you tease apart the
>>>>>> submodules in to separate source code trees, poms, etc; then come
>>> back
>>>>>> and fix the runtime configs.
>>>>>>
>>>>>> Once the submodules are in place it will open up more work for
>>> release
>>>>>> engineering and tinkering that can be done in parallel with the
>>> config
>>>>>> polishing.
>>>>>>
>>>>>> Just a thought.
>>>>>> Tim
>>>>>>
>>>>>>
>>>>>>> On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
>>> t.p.ellison@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> On 19/09/16 13:40, Ellison Anne Williams wrote:
>>>>>>>>> It seems that it's the same idea as the ResponderLauncher with
>>> the
>>>>>>>> service
>>>>>>>>> component added to maintain something akin to the 'platform'. I
>>> would
>>>>>>>>> prefer that we just did away with the platform notion altogether
>>> and
>>>>>> make
>>>>>>>>> the ResponderDriver 'dumb'. We get around needing a
>>> platform-aware
>>>>>>>> service
>>>>>>>>> by requiring the ResponderLauncher implementation to be passed
>>> as
>>> a
>>>>> CLI
>>>>>>>> to
>>>>>>>>> the ResponderDriver.
>>>>>>>>
>>>>>>>> Let me check I understand what you are saying here.
>>>>>>>>
>>>>>>>> At the moment, there is a monolithic Pirk that hard codes how to
>>>>> respond
>>>>>>>> using lots of different backends (mapreduce, spark,
>>> sparkstreaming,
>>>>>>>> storm , standalone), and that is selected by command-line
>>> argument.
>>>>>>>>
>>>>>>>> One option is to make Pirk pluggable, so that a Pirk installation
>>>>> could
>>>>>>>> use one or more of these in an extensible fashion by adding JAR
>>> files.
>>>>>>>> That would still require selecting one by command-line argument.
>>>>>>>>
>>>>>>>> A second option is to simply pass in the required backend JAR to
>>>>> select
>>>>>>>> the particular implementation you choose, as a specific Pirk
>>>>>>>> installation doesn't need to use multiple backends
>>> simultaneously.
>>>>>>>>
>>>>>>>> ...and you are leaning towards the second option.  Do I have that
>>>>>> correct?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Tim
>>>>>>>>
>>>>>>>>> Am I missing something? Is there a good reason to provide a
>>> service
>>>>> by
>>>>>>>>> which platforms are registered? I'm open...
>>>>>>>>>
>>>>>>>>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
>>> t.p.ellison@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> How about an approach like this?
>>>>>>>>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>>>>>>>>>
>>>>>>>>>> The "on-ramp" is the driver [1], which calls upon the service
>>> to
>>>>> find
>>>>>> a
>>>>>>>>>> plug-in [2] that claims to implement the required platform
>>>>> responder,
>>>>>>>>>> e.g. [3].
>>>>>>>>>>
>>>>>>>>>> The list of plug-ins is given in the provider's JAR file, so
>>> the
>>>>> ones
>>>>>> we
>>>>>>>>>> provide in Pirk are listed together [4], but if you split these
>>> into
>>>>>>>>>> modules, or somebody brings their own JAR alongside, these
>>> would
>>> be
>>>>>>>>>> listed in each JAR's services/ directory.
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/
>>>>>> ResponderDriver.java
>>>>>>>>>> [2]
>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.
>>> java
>>>>>>>>>> [3]
>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>>>>>>>>>> StormResponder.java
>>>>>>>>>> [4]
>>>>>>>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>>>>>>>> src/main/services/org.apache.responder.spi.Responder
>>>>>>>>>>
>>>>>>>>>> I'm not even going to dignify this with a WIP PR, it is far
>>> from
>>>>>> ready,
>>>>>>>>>> so proceed with caution.  There is hopefully enough there to
>>> show
>>>>> the
>>>>>>>>>> approach, and if it is worth continuing I'm happy to do so.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Tim
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

Great will write up the doc link here, finish pirk63 then start this.

On Sep 19, 2016 5:34 PM, "Suneel Marthi" <su...@gmail.com> wrote:

> +100
>
> On Mon, Sep 19, 2016 at 11:24 PM, Ellison Anne Williams <
> eawilliams@apache.org> wrote:
>
> > Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a
> > separate submodule.
> >
> > Aside from pirk-core, it seems that we would want to break the responder
> > implementations out into submodules. This would leave us with something
> > along the lines of the following (at this point):
> >
> > pirk-core (encryption, core responder incl. standalone, core querier,
> > query, inputformat, serialization, utils)
> > pirk-storm
> > pirk-mapreduce
> > pirk-spark
> > pirk-benchmark
> > pirk-distributed-test
> >
> > Once we add other responder implementations, we can add them as
> submodules
> > - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc.
> >
> > We could break 'pirk-core' down further...
> >
> > On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi <su...@gmail.com>
> > wrote:
> >
> > > Here's an example from the Flink project for how they go about new
> > features
> > > or system breaking API changes, we could start a similar process. The
> > Flink
> > > guys call these FLIP (Flink Improvement Proposal) and Kafka community
> > > similarly has something called KLIP.
> > >
> > > We could start a PLIP (??? :-) )
> > >
> > > https://cwiki.apache.org/confluence/pages/viewpage.
> > action?pageId=65870673
> > >
> > >
> > > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi <
> suneel.marthi@gmail.com
> > >
> > > wrote:
> > >
> > > > A shared Google doc would be more convenient than a bunch of Jiras.
> Its
> > > > easier to comment and add notes that way.
> > > >
> > > >
> > > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <
> > dbjohnson1978@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Suneel, I'll try to put a couple jiras on it tonight with my
> thoughts.
> > > >> Based off my pirk-63 I was able to pull spark and storm out with no
> > > >> issues.  I was planning to pull them out, then tackling elastic
> > search,
> > > >> then hadoop as it's a little entrenched.  This should keep most PRs
> to
> > > >> manageable chunks. I think once that's done addressing the configs
> > will
> > > >> make more sense.
> > > >>
> > > >> I'm open to suggestions. But the hope would be:
> > > >> Pirk-parent
> > > >> Pirk-core
> > > >> Pirk-hadoop
> > > >> Pirk-storm
> > > >> Pirk-parent
> > > >>
> > > >> Pirk-es is a little weird as it's really just an inputformat, seems
> > like
> > > >> there's a more general solution here than creating submodules for
> > every
> > > >> inputformat.
> > > >>
> > > >> Darin
> > > >>
> > > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org>
> wrote:
> > > >>
> > > >> >
> > > >>
> > > >> > Refactor is definitely a first priority.  Is there a
> design/proposal
> > > >> draft
> > > >> > that we could comment on about how to go about refactoring the
> code.
> > > I
> > > >> > have been trying to keep up with the emails but definitely would
> > have
> > > >> > missed some.
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > > >> > eawilliams@apache.org <ea...@apache.org>> wrote:
> > > >> >
> > > >> > > Agree - let's leave the config/CLI the way it is for now and
> > tackle
> > > >> that as
> > > >> > > a subsequent design discussion and PR.
> > > >> > >
> > > >> > > Also, I think that we should leave the ResponderDriver and the
> > > >> > > ResponderProps alone for this PR and push to a subsequent PR
> (once
> > > we
> > > >> > > decide if and how we would like to delegate each).
> > > >> > >
> > > >> > > I vote to remove the 'platform' option and the backwards
> > > compatibility
> > > >> in
> > > >> > > this PR and proceed with having a ResponderLauncher interface
> and
> > > >> forcing
> > > >> > > its implementation by the ResponderDriver.
> > > >> > >
> > > >> > > And, I am not so concerned with having one fat jar vs. multiple
> > jars
> > > >> right
> > > >> > > now - to me, at this point, it's a 'nice to have' and not a
> 'must
> > > >> have'
> > > >> for
> > > >> > > Pirk functionality. We do need to break out Pirk into more
> clearly
> > > >> defined
> > > >> > > submodules (which is in progress) - via this re-factor, I think
> > that
> > > >> we
> > > >> > > will gain some ability to generate multiple jars which is nice.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> > > t.p.ellison@gmail.com>
> > > >> > > wrote:
> > > >> > >
> > > >> > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > >> > > > > Hey guys,
> > > >> > > > >
> > > >> > > > > Thanks for looking at the PR, I apologize if it offended
> > > anyone's
> > > >> > > eyes:).
> > > >> > > > >
> > > >> > > > > I'm glad it generated some discussion about the
> configuration.
> > > I
> > > >> > > didn't
> > > >> > > > > really like where things were heading with the config.
> > However,
> > > >> didn't
> > > >> > > > > want to create to much scope creep.
> > > >> > > > >
> > > >> > > > > I think any hierarchical config (TypeSafe or yaml) would
> make
> > > >> things
> > > >> > > much
> > > >> > > > > more maintainable, the plugin could simply grab the
> > appropriate
> > > >> part of
> > > >> > > > the
> > > >> > > > > config and handle accordingly.  I'd also cut down the number
> > of
> > > >> command
> > > >> > > > > line options to only those that change between runs often
> > (like
> > > >> > > > > input/output)
> > > >> > > > >
> > > >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> > > installation
> > > >> > > could
> > > >> > > > >> use one or more of these in an extensible fashion by adding
> > JAR
> > > >> files.
> > > >> > > > >> That would still require selecting one by command-line
> > > argument.
> > > >> > > > >
> > > >> > > > > An argument for this approach is for lambda architecture
> > > >> approaches
> > > >> > > (say
> > > >> > > > > spark/spark-streaming) were the contents of the jars would
> be
> > so
> > > >> > > similar
> > > >> > > > it
> > > >> > > > > seems like to much trouble to create separate jars.
> > > >> > > > >
> > > >> > > > > Happy to continue working on this given some direction on
> > where
> > > >> you'd
> > > >> > > > like
> > > >> > > > > it to go.  Also, it's a bit of a blocker to refactoring the
> > > build
> > > >> into
> > > >> > > > > submodules.
> > > >> > > >
> > > >> > > > FWIW my 2c is to not try and fix all the problems in one go,
> and
> > > >> rather
> > > >> > > > take a compromise on the configurations while you tease apart
> > the
> > > >> > > > submodules in to separate source code trees, poms, etc; then
> > come
> > > >> back
> > > >> > > > and fix the runtime configs.
> > > >> > > >
> > > >> > > > Once the submodules are in place it will open up more work for
> > > >> release
> > > >> > > > engineering and tinkering that can be done in parallel with
> the
> > > >> config
> > > >> > > > polishing.
> > > >> > > >
> > > >> > > > Just a thought.
> > > >> > > > Tim
> > > >> > > >
> > > >> > > >
> > > >> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > > >> t.p.ellison@gmail.com>
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > >> > > > >>> It seems that it's the same idea as the ResponderLauncher
> > with
> > > >> the
> > > >> > > > >> service
> > > >> > > > >>> component added to maintain something akin to the
> > 'platform'.
> > > I
> > > >> would
> > > >> > > > >>> prefer that we just did away with the platform notion
> > > altogether
> > > >> and
> > > >> > > > make
> > > >> > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > > >> platform-aware
> > > >> > > > >> service
> > > >> > > > >>> by requiring the ResponderLauncher implementation to be
> > passed
> > > >> as
> > > >> a
> > > >> > > CLI
> > > >> > > > >> to
> > > >> > > > >>> the ResponderDriver.
> > > >> > > > >>
> > > >> > > > >> Let me check I understand what you are saying here.
> > > >> > > > >>
> > > >> > > > >> At the moment, there is a monolithic Pirk that hard codes
> how
> > > to
> > > >> > > respond
> > > >> > > > >> using lots of different backends (mapreduce, spark,
> > > >> sparkstreaming,
> > > >> > > > >> storm , standalone), and that is selected by command-line
> > > >> argument.
> > > >> > > > >>
> > > >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> > > installation
> > > >> > > could
> > > >> > > > >> use one or more of these in an extensible fashion by adding
> > JAR
> > > >> files.
> > > >> > > > >> That would still require selecting one by command-line
> > > argument.
> > > >> > > > >>
> > > >> > > > >> A second option is to simply pass in the required backend
> JAR
> > > to
> > > >> > > select
> > > >> > > > >> the particular implementation you choose, as a specific
> Pirk
> > > >> > > > >> installation doesn't need to use multiple backends
> > > >> simultaneously.
> > > >> > > > >>
> > > >> > > > >> ...and you are leaning towards the second option.  Do I
> have
> > > that
> > > >> > > > correct?
> > > >> > > > >>
> > > >> > > > >> Regards,
> > > >> > > > >> Tim
> > > >> > > > >>
> > > >> > > > >>> Am I missing something? Is there a good reason to provide
> a
> > > >> service
> > > >> > > by
> > > >> > > > >>> which platforms are registered? I'm open...
> > > >> > > > >>>
> > > >> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > > >> t.p.ellison@gmail.com>
> > > >> > > > >> wrote:
> > > >> > > > >>>
> > > >> > > > >>>> How about an approach like this?
> > > >> > > > >>>>    https://github.com/tellison/
> incubator-pirk/tree/pirk-63
> > > >> > > > >>>>
> > > >> > > > >>>> The "on-ramp" is the driver [1], which calls upon the
> > service
> > > >> to
> > > >> > > find
> > > >> > > > a
> > > >> > > > >>>> plug-in [2] that claims to implement the required
> platform
> > > >> > > responder,
> > > >> > > > >>>> e.g. [3].
> > > >> > > > >>>>
> > > >> > > > >>>> The list of plug-ins is given in the provider's JAR file,
> > so
> > > >> the
> > > >> > > ones
> > > >> > > > we
> > > >> > > > >>>> provide in Pirk are listed together [4], but if you split
> > > these
> > > >> into
> > > >> > > > >>>> modules, or somebody brings their own JAR alongside,
> these
> > > >> would
> > > >> be
> > > >> > > > >>>> listed in each JAR's services/ directory.
> > > >> > > > >>>>
> > > >> > > > >>>> [1]
> > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > >> > > > ResponderDriver.java
> > > >> > > > >>>> [2]
> > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >> > > > >>>> src/main/java/org/apache/pirk/
> > responder/spi/ResponderPlugin.
> > > >> java
> > > >> > > > >>>> [3]
> > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > >> > > > >>>> StormResponder.java
> > > >> > > > >>>> [4]
> > > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >> > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > >> > > > >>>>
> > > >> > > > >>>> I'm not even going to dignify this with a WIP PR, it is
> far
> > > >> from
> > > >> > > > ready,
> > > >> > > > >>>> so proceed with caution.  There is hopefully enough there
> > to
> > > >> show
> > > >> > > the
> > > >> > > > >>>> approach, and if it is worth continuing I'm happy to do
> so.
> > > >> > > > >>>>
> > > >> > > > >>>> Regards,
> > > >> > > > >>>> Tim
> > > >> > > > >>>>
> > > >> > > > >>>>
> > > >> > > > >>>
> > > >> > > > >>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Suneel Marthi <su...@gmail.com>.

+100

On Mon, Sep 19, 2016 at 11:24 PM, Ellison Anne Williams <
eawilliams@apache.org> wrote:

> Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a
> separate submodule.
>
> Aside from pirk-core, it seems that we would want to break the responder
> implementations out into submodules. This would leave us with something
> along the lines of the following (at this point):
>
> pirk-core (encryption, core responder incl. standalone, core querier,
> query, inputformat, serialization, utils)
> pirk-storm
> pirk-mapreduce
> pirk-spark
> pirk-benchmark
> pirk-distributed-test
>
> Once we add other responder implementations, we can add them as submodules
> - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc.
>
> We could break 'pirk-core' down further...
>
> On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi <su...@gmail.com>
> wrote:
>
> > Here's an example from the Flink project for how they go about new
> features
> > or system breaking API changes, we could start a similar process. The
> Flink
> > guys call these FLIP (Flink Improvement Proposal) and Kafka community
> > similarly has something called KLIP.
> >
> > We could start a PLIP (??? :-) )
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65870673
> >
> >
> > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi <suneel.marthi@gmail.com
> >
> > wrote:
> >
> > > A shared Google doc would be more convenient than a bunch of Jiras. Its
> > > easier to comment and add notes that way.
> > >
> > >
> > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <
> dbjohnson1978@gmail.com
> > >
> > > wrote:
> > >
> > >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> > >> Based off my pirk-63 I was able to pull spark and storm out with no
> > >> issues.  I was planning to pull them out, then tackling elastic
> search,
> > >> then hadoop as it's a little entrenched.  This should keep most PRs to
> > >> manageable chunks. I think once that's done addressing the configs
> will
> > >> make more sense.
> > >>
> > >> I'm open to suggestions. But the hope would be:
> > >> Pirk-parent
> > >> Pirk-core
> > >> Pirk-hadoop
> > >> Pirk-storm
> > >> Pirk-parent
> > >>
> > >> Pirk-es is a little weird as it's really just an inputformat, seems
> like
> > >> there's a more general solution here than creating submodules for
> every
> > >> inputformat.
> > >>
> > >> Darin
> > >>
> > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
> > >>
> > >> >
> > >>
> > >> > Refactor is definitely a first priority.  Is there a design/proposal
> > >> draft
> > >> > that we could comment on about how to go about refactoring the code.
> > I
> > >> > have been trying to keep up with the emails but definitely would
> have
> > >> > missed some.
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > >> > eawilliams@apache.org <ea...@apache.org>> wrote:
> > >> >
> > >> > > Agree - let's leave the config/CLI the way it is for now and
> tackle
> > >> that as
> > >> > > a subsequent design discussion and PR.
> > >> > >
> > >> > > Also, I think that we should leave the ResponderDriver and the
> > >> > > ResponderProps alone for this PR and push to a subsequent PR (once
> > we
> > >> > > decide if and how we would like to delegate each).
> > >> > >
> > >> > > I vote to remove the 'platform' option and the backwards
> > compatibility
> > >> in
> > >> > > this PR and proceed with having a ResponderLauncher interface and
> > >> forcing
> > >> > > its implementation by the ResponderDriver.
> > >> > >
> > >> > > And, I am not so concerned with having one fat jar vs. multiple
> jars
> > >> right
> > >> > > now - to me, at this point, it's a 'nice to have' and not a 'must
> > >> have'
> > >> for
> > >> > > Pirk functionality. We do need to break out Pirk into more clearly
> > >> defined
> > >> > > submodules (which is in progress) - via this re-factor, I think
> that
> > >> we
> > >> > > will gain some ability to generate multiple jars which is nice.
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> > t.p.ellison@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > On 19/09/16 15:46, Darin Johnson wrote:
> > >> > > > > Hey guys,
> > >> > > > >
> > >> > > > > Thanks for looking at the PR, I apologize if it offended
> > anyone's
> > >> > > eyes:).
> > >> > > > >
> > >> > > > > I'm glad it generated some discussion about the configuration.
> > I
> > >> > > didn't
> > >> > > > > really like where things were heading with the config.
> However,
> > >> didn't
> > >> > > > > want to create to much scope creep.
> > >> > > > >
> > >> > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > >> things
> > >> > > much
> > >> > > > > more maintainable, the plugin could simply grab the
> appropriate
> > >> part of
> > >> > > > the
> > >> > > > > config and handle accordingly.  I'd also cut down the number
> of
> > >> command
> > >> > > > > line options to only those that change between runs often
> (like
> > >> > > > > input/output)
> > >> > > > >
> > >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> > installation
> > >> > > could
> > >> > > > >> use one or more of these in an extensible fashion by adding
> JAR
> > >> files.
> > >> > > > >> That would still require selecting one by command-line
> > argument.
> > >> > > > >
> > >> > > > > An argument for this approach is for lambda architecture
> > >> approaches
> > >> > > (say
> > >> > > > > spark/spark-streaming) were the contents of the jars would be
> so
> > >> > > similar
> > >> > > > it
> > >> > > > > seems like to much trouble to create separate jars.
> > >> > > > >
> > >> > > > > Happy to continue working on this given some direction on
> where
> > >> you'd
> > >> > > > like
> > >> > > > > it to go.  Also, it's a bit of a blocker to refactoring the
> > build
> > >> into
> > >> > > > > submodules.
> > >> > > >
> > >> > > > FWIW my 2c is to not try and fix all the problems in one go, and
> > >> rather
> > >> > > > take a compromise on the configurations while you tease apart
> the
> > >> > > > submodules in to separate source code trees, poms, etc; then
> come
> > >> back
> > >> > > > and fix the runtime configs.
> > >> > > >
> > >> > > > Once the submodules are in place it will open up more work for
> > >> release
> > >> > > > engineering and tinkering that can be done in parallel with the
> > >> config
> > >> > > > polishing.
> > >> > > >
> > >> > > > Just a thought.
> > >> > > > Tim
> > >> > > >
> > >> > > >
> > >> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > >> t.p.ellison@gmail.com>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > >> > > > >>> It seems that it's the same idea as the ResponderLauncher
> with
> > >> the
> > >> > > > >> service
> > >> > > > >>> component added to maintain something akin to the
> 'platform'.
> > I
> > >> would
> > >> > > > >>> prefer that we just did away with the platform notion
> > altogether
> > >> and
> > >> > > > make
> > >> > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > >> platform-aware
> > >> > > > >> service
> > >> > > > >>> by requiring the ResponderLauncher implementation to be
> passed
> > >> as
> > >> a
> > >> > > CLI
> > >> > > > >> to
> > >> > > > >>> the ResponderDriver.
> > >> > > > >>
> > >> > > > >> Let me check I understand what you are saying here.
> > >> > > > >>
> > >> > > > >> At the moment, there is a monolithic Pirk that hard codes how
> > to
> > >> > > respond
> > >> > > > >> using lots of different backends (mapreduce, spark,
> > >> sparkstreaming,
> > >> > > > >> storm , standalone), and that is selected by command-line
> > >> argument.
> > >> > > > >>
> > >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> > installation
> > >> > > could
> > >> > > > >> use one or more of these in an extensible fashion by adding
> JAR
> > >> files.
> > >> > > > >> That would still require selecting one by command-line
> > argument.
> > >> > > > >>
> > >> > > > >> A second option is to simply pass in the required backend JAR
> > to
> > >> > > select
> > >> > > > >> the particular implementation you choose, as a specific Pirk
> > >> > > > >> installation doesn't need to use multiple backends
> > >> simultaneously.
> > >> > > > >>
> > >> > > > >> ...and you are leaning towards the second option.  Do I have
> > that
> > >> > > > correct?
> > >> > > > >>
> > >> > > > >> Regards,
> > >> > > > >> Tim
> > >> > > > >>
> > >> > > > >>> Am I missing something? Is there a good reason to provide a
> > >> service
> > >> > > by
> > >> > > > >>> which platforms are registered? I'm open...
> > >> > > > >>>
> > >> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > >> t.p.ellison@gmail.com>
> > >> > > > >> wrote:
> > >> > > > >>>
> > >> > > > >>>> How about an approach like this?
> > >> > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > >> > > > >>>>
> > >> > > > >>>> The "on-ramp" is the driver [1], which calls upon the
> service
> > >> to
> > >> > > find
> > >> > > > a
> > >> > > > >>>> plug-in [2] that claims to implement the required platform
> > >> > > responder,
> > >> > > > >>>> e.g. [3].
> > >> > > > >>>>
> > >> > > > >>>> The list of plug-ins is given in the provider's JAR file,
> so
> > >> the
> > >> > > ones
> > >> > > > we
> > >> > > > >>>> provide in Pirk are listed together [4], but if you split
> > these
> > >> into
> > >> > > > >>>> modules, or somebody brings their own JAR alongside, these
> > >> would
> > >> be
> > >> > > > >>>> listed in each JAR's services/ directory.
> > >> > > > >>>>
> > >> > > > >>>> [1]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > >> > > > ResponderDriver.java
> > >> > > > >>>> [2]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/java/org/apache/pirk/
> responder/spi/ResponderPlugin.
> > >> java
> > >> > > > >>>> [3]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > >> > > > >>>> StormResponder.java
> > >> > > > >>>> [4]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > >> > > > >>>>
> > >> > > > >>>> I'm not even going to dignify this with a WIP PR, it is far
> > >> from
> > >> > > > ready,
> > >> > > > >>>> so proceed with caution.  There is hopefully enough there
> to
> > >> show
> > >> > > the
> > >> > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > >> > > > >>>>
> > >> > > > >>>> Regards,
> > >> > > > >>>> Tim
> > >> > > > >>>>
> > >> > > > >>>>
> > >> > > > >>>
> > >> > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >>
> > >
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

So my concern with elastic search isn't really that I want it as a
submodule it's more that I don't think it belongs in any of the jars.
Otherwise one can argue every inputformat belongs.  It seems more prudent
to have those included via some abstraction devs can add on thier own
(cascading's tap comes to mind).

On Sep 19, 2016 5:24 PM, "Ellison Anne Williams" <ea...@apache.org>
wrote:

> Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a
> separate submodule.
>
> Aside from pirk-core, it seems that we would want to break the responder
> implementations out into submodules. This would leave us with something
> along the lines of the following (at this point):
>
> pirk-core (encryption, core responder incl. standalone, core querier,
> query, inputformat, serialization, utils)
> pirk-storm
> pirk-mapreduce
> pirk-spark
> pirk-benchmark
> pirk-distributed-test
>
> Once we add other responder implementations, we can add them as submodules
> - i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc.
>
> We could break 'pirk-core' down further...
>
> On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi <su...@gmail.com>
> wrote:
>
> > Here's an example from the Flink project for how they go about new
> features
> > or system breaking API changes, we could start a similar process. The
> Flink
> > guys call these FLIP (Flink Improvement Proposal) and Kafka community
> > similarly has something called KLIP.
> >
> > We could start a PLIP (??? :-) )
> >
> > https://cwiki.apache.org/confluence/pages/viewpage.
> action?pageId=65870673
> >
> >
> > On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi <suneel.marthi@gmail.com
> >
> > wrote:
> >
> > > A shared Google doc would be more convenient than a bunch of Jiras. Its
> > > easier to comment and add notes that way.
> > >
> > >
> > > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <
> dbjohnson1978@gmail.com
> > >
> > > wrote:
> > >
> > >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> > >> Based off my pirk-63 I was able to pull spark and storm out with no
> > >> issues.  I was planning to pull them out, then tackling elastic
> search,
> > >> then hadoop as it's a little entrenched.  This should keep most PRs to
> > >> manageable chunks. I think once that's done addressing the configs
> will
> > >> make more sense.
> > >>
> > >> I'm open to suggestions. But the hope would be:
> > >> Pirk-parent
> > >> Pirk-core
> > >> Pirk-hadoop
> > >> Pirk-storm
> > >> Pirk-parent
> > >>
> > >> Pirk-es is a little weird as it's really just an inputformat, seems
> like
> > >> there's a more general solution here than creating submodules for
> every
> > >> inputformat.
> > >>
> > >> Darin
> > >>
> > >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
> > >>
> > >> >
> > >>
> > >> > Refactor is definitely a first priority.  Is there a design/proposal
> > >> draft
> > >> > that we could comment on about how to go about refactoring the code.
> > I
> > >> > have been trying to keep up with the emails but definitely would
> have
> > >> > missed some.
> > >> >
> > >> >
> > >> >
> > >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > >> > eawilliams@apache.org <ea...@apache.org>> wrote:
> > >> >
> > >> > > Agree - let's leave the config/CLI the way it is for now and
> tackle
> > >> that as
> > >> > > a subsequent design discussion and PR.
> > >> > >
> > >> > > Also, I think that we should leave the ResponderDriver and the
> > >> > > ResponderProps alone for this PR and push to a subsequent PR (once
> > we
> > >> > > decide if and how we would like to delegate each).
> > >> > >
> > >> > > I vote to remove the 'platform' option and the backwards
> > compatibility
> > >> in
> > >> > > this PR and proceed with having a ResponderLauncher interface and
> > >> forcing
> > >> > > its implementation by the ResponderDriver.
> > >> > >
> > >> > > And, I am not so concerned with having one fat jar vs. multiple
> jars
> > >> right
> > >> > > now - to me, at this point, it's a 'nice to have' and not a 'must
> > >> have'
> > >> for
> > >> > > Pirk functionality. We do need to break out Pirk into more clearly
> > >> defined
> > >> > > submodules (which is in progress) - via this re-factor, I think
> that
> > >> we
> > >> > > will gain some ability to generate multiple jars which is nice.
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> > t.p.ellison@gmail.com>
> > >> > > wrote:
> > >> > >
> > >> > > > On 19/09/16 15:46, Darin Johnson wrote:
> > >> > > > > Hey guys,
> > >> > > > >
> > >> > > > > Thanks for looking at the PR, I apologize if it offended
> > anyone's
> > >> > > eyes:).
> > >> > > > >
> > >> > > > > I'm glad it generated some discussion about the configuration.
> > I
> > >> > > didn't
> > >> > > > > really like where things were heading with the config.
> However,
> > >> didn't
> > >> > > > > want to create to much scope creep.
> > >> > > > >
> > >> > > > > I think any hierarchical config (TypeSafe or yaml) would make
> > >> things
> > >> > > much
> > >> > > > > more maintainable, the plugin could simply grab the
> appropriate
> > >> part of
> > >> > > > the
> > >> > > > > config and handle accordingly.  I'd also cut down the number
> of
> > >> command
> > >> > > > > line options to only those that change between runs often
> (like
> > >> > > > > input/output)
> > >> > > > >
> > >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> > installation
> > >> > > could
> > >> > > > >> use one or more of these in an extensible fashion by adding
> JAR
> > >> files.
> > >> > > > >> That would still require selecting one by command-line
> > argument.
> > >> > > > >
> > >> > > > > An argument for this approach is for lambda architecture
> > >> approaches
> > >> > > (say
> > >> > > > > spark/spark-streaming) were the contents of the jars would be
> so
> > >> > > similar
> > >> > > > it
> > >> > > > > seems like to much trouble to create separate jars.
> > >> > > > >
> > >> > > > > Happy to continue working on this given some direction on
> where
> > >> you'd
> > >> > > > like
> > >> > > > > it to go.  Also, it's a bit of a blocker to refactoring the
> > build
> > >> into
> > >> > > > > submodules.
> > >> > > >
> > >> > > > FWIW my 2c is to not try and fix all the problems in one go, and
> > >> rather
> > >> > > > take a compromise on the configurations while you tease apart
> the
> > >> > > > submodules in to separate source code trees, poms, etc; then
> come
> > >> back
> > >> > > > and fix the runtime configs.
> > >> > > >
> > >> > > > Once the submodules are in place it will open up more work for
> > >> release
> > >> > > > engineering and tinkering that can be done in parallel with the
> > >> config
> > >> > > > polishing.
> > >> > > >
> > >> > > > Just a thought.
> > >> > > > Tim
> > >> > > >
> > >> > > >
> > >> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> > >> t.p.ellison@gmail.com>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > >> > > > >>> It seems that it's the same idea as the ResponderLauncher
> with
> > >> the
> > >> > > > >> service
> > >> > > > >>> component added to maintain something akin to the
> 'platform'.
> > I
> > >> would
> > >> > > > >>> prefer that we just did away with the platform notion
> > altogether
> > >> and
> > >> > > > make
> > >> > > > >>> the ResponderDriver 'dumb'. We get around needing a
> > >> platform-aware
> > >> > > > >> service
> > >> > > > >>> by requiring the ResponderLauncher implementation to be
> passed
> > >> as
> > >> a
> > >> > > CLI
> > >> > > > >> to
> > >> > > > >>> the ResponderDriver.
> > >> > > > >>
> > >> > > > >> Let me check I understand what you are saying here.
> > >> > > > >>
> > >> > > > >> At the moment, there is a monolithic Pirk that hard codes how
> > to
> > >> > > respond
> > >> > > > >> using lots of different backends (mapreduce, spark,
> > >> sparkstreaming,
> > >> > > > >> storm , standalone), and that is selected by command-line
> > >> argument.
> > >> > > > >>
> > >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> > installation
> > >> > > could
> > >> > > > >> use one or more of these in an extensible fashion by adding
> JAR
> > >> files.
> > >> > > > >> That would still require selecting one by command-line
> > argument.
> > >> > > > >>
> > >> > > > >> A second option is to simply pass in the required backend JAR
> > to
> > >> > > select
> > >> > > > >> the particular implementation you choose, as a specific Pirk
> > >> > > > >> installation doesn't need to use multiple backends
> > >> simultaneously.
> > >> > > > >>
> > >> > > > >> ...and you are leaning towards the second option.  Do I have
> > that
> > >> > > > correct?
> > >> > > > >>
> > >> > > > >> Regards,
> > >> > > > >> Tim
> > >> > > > >>
> > >> > > > >>> Am I missing something? Is there a good reason to provide a
> > >> service
> > >> > > by
> > >> > > > >>> which platforms are registered? I'm open...
> > >> > > > >>>
> > >> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> > >> t.p.ellison@gmail.com>
> > >> > > > >> wrote:
> > >> > > > >>>
> > >> > > > >>>> How about an approach like this?
> > >> > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > >> > > > >>>>
> > >> > > > >>>> The "on-ramp" is the driver [1], which calls upon the
> service
> > >> to
> > >> > > find
> > >> > > > a
> > >> > > > >>>> plug-in [2] that claims to implement the required platform
> > >> > > responder,
> > >> > > > >>>> e.g. [3].
> > >> > > > >>>>
> > >> > > > >>>> The list of plug-ins is given in the provider's JAR file,
> so
> > >> the
> > >> > > ones
> > >> > > > we
> > >> > > > >>>> provide in Pirk are listed together [4], but if you split
> > these
> > >> into
> > >> > > > >>>> modules, or somebody brings their own JAR alongside, these
> > >> would
> > >> be
> > >> > > > >>>> listed in each JAR's services/ directory.
> > >> > > > >>>>
> > >> > > > >>>> [1]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > >> > > > ResponderDriver.java
> > >> > > > >>>> [2]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/java/org/apache/pirk/
> responder/spi/ResponderPlugin.
> > >> java
> > >> > > > >>>> [3]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > >> > > > >>>> StormResponder.java
> > >> > > > >>>> [4]
> > >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >> > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > >> > > > >>>>
> > >> > > > >>>> I'm not even going to dignify this with a WIP PR, it is far
> > >> from
> > >> > > > ready,
> > >> > > > >>>> so proceed with caution.  There is hopefully enough there
> to
> > >> show
> > >> > > the
> > >> > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > >> > > > >>>>
> > >> > > > >>>> Regards,
> > >> > > > >>>> Tim
> > >> > > > >>>>
> > >> > > > >>>>
> > >> > > > >>>
> > >> > > > >>
> > >> > > > >
> > >> > > >
> > >> > >
> > >>
> > >
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Ellison Anne Williams <ea...@apache.org>.

Yes, ES is just an inputformat (like HDFS, Kafka, etc) - we don't need a
separate submodule.

Aside from pirk-core, it seems that we would want to break the responder
implementations out into submodules. This would leave us with something
along the lines of the following (at this point):

pirk-core (encryption, core responder incl. standalone, core querier,
query, inputformat, serialization, utils)
pirk-storm
pirk-mapreduce
pirk-spark
pirk-benchmark
pirk-distributed-test

Once we add other responder implementations, we can add them as submodules
- i.e. for Flink, we would have pirk-flink; for Beam, pirk-beam, etc.

We could break 'pirk-core' down further...

On Mon, Sep 19, 2016 at 5:10 PM, Suneel Marthi <su...@gmail.com>
wrote:

> Here's an example from the Flink project for how they go about new features
> or system breaking API changes, we could start a similar process. The Flink
> guys call these FLIP (Flink Improvement Proposal) and Kafka community
> similarly has something called KLIP.
>
> We could start a PLIP (??? :-) )
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673
>
>
> On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi <su...@gmail.com>
> wrote:
>
> > A shared Google doc would be more convenient than a bunch of Jiras. Its
> > easier to comment and add notes that way.
> >
> >
> > On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <dbjohnson1978@gmail.com
> >
> > wrote:
> >
> >> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> >> Based off my pirk-63 I was able to pull spark and storm out with no
> >> issues.  I was planning to pull them out, then tackling elastic search,
> >> then hadoop as it's a little entrenched.  This should keep most PRs to
> >> manageable chunks. I think once that's done addressing the configs will
> >> make more sense.
> >>
> >> I'm open to suggestions. But the hope would be:
> >> Pirk-parent
> >> Pirk-core
> >> Pirk-hadoop
> >> Pirk-storm
> >> Pirk-parent
> >>
> >> Pirk-es is a little weird as it's really just an inputformat, seems like
> >> there's a more general solution here than creating submodules for every
> >> inputformat.
> >>
> >> Darin
> >>
> >> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
> >>
> >> >
> >>
> >> > Refactor is definitely a first priority.  Is there a design/proposal
> >> draft
> >> > that we could comment on about how to go about refactoring the code.
> I
> >> > have been trying to keep up with the emails but definitely would have
> >> > missed some.
> >> >
> >> >
> >> >
> >> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> >> > eawilliams@apache.org <ea...@apache.org>> wrote:
> >> >
> >> > > Agree - let's leave the config/CLI the way it is for now and tackle
> >> that as
> >> > > a subsequent design discussion and PR.
> >> > >
> >> > > Also, I think that we should leave the ResponderDriver and the
> >> > > ResponderProps alone for this PR and push to a subsequent PR (once
> we
> >> > > decide if and how we would like to delegate each).
> >> > >
> >> > > I vote to remove the 'platform' option and the backwards
> compatibility
> >> in
> >> > > this PR and proceed with having a ResponderLauncher interface and
> >> forcing
> >> > > its implementation by the ResponderDriver.
> >> > >
> >> > > And, I am not so concerned with having one fat jar vs. multiple jars
> >> right
> >> > > now - to me, at this point, it's a 'nice to have' and not a 'must
> >> have'
> >> for
> >> > > Pirk functionality. We do need to break out Pirk into more clearly
> >> defined
> >> > > submodules (which is in progress) - via this re-factor, I think that
> >> we
> >> > > will gain some ability to generate multiple jars which is nice.
> >> > >
> >> > >
> >> > >
> >> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <
> t.p.ellison@gmail.com>
> >> > > wrote:
> >> > >
> >> > > > On 19/09/16 15:46, Darin Johnson wrote:
> >> > > > > Hey guys,
> >> > > > >
> >> > > > > Thanks for looking at the PR, I apologize if it offended
> anyone's
> >> > > eyes:).
> >> > > > >
> >> > > > > I'm glad it generated some discussion about the configuration.
> I
> >> > > didn't
> >> > > > > really like where things were heading with the config.  However,
> >> didn't
> >> > > > > want to create to much scope creep.
> >> > > > >
> >> > > > > I think any hierarchical config (TypeSafe or yaml) would make
> >> things
> >> > > much
> >> > > > > more maintainable, the plugin could simply grab the appropriate
> >> part of
> >> > > > the
> >> > > > > config and handle accordingly.  I'd also cut down the number of
> >> command
> >> > > > > line options to only those that change between runs often (like
> >> > > > > input/output)
> >> > > > >
> >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> >> > > could
> >> > > > >> use one or more of these in an extensible fashion by adding JAR
> >> files.
> >> > > > >> That would still require selecting one by command-line
> argument.
> >> > > > >
> >> > > > > An argument for this approach is for lambda architecture
> >> approaches
> >> > > (say
> >> > > > > spark/spark-streaming) were the contents of the jars would be so
> >> > > similar
> >> > > > it
> >> > > > > seems like to much trouble to create separate jars.
> >> > > > >
> >> > > > > Happy to continue working on this given some direction on where
> >> you'd
> >> > > > like
> >> > > > > it to go.  Also, it's a bit of a blocker to refactoring the
> build
> >> into
> >> > > > > submodules.
> >> > > >
> >> > > > FWIW my 2c is to not try and fix all the problems in one go, and
> >> rather
> >> > > > take a compromise on the configurations while you tease apart the
> >> > > > submodules in to separate source code trees, poms, etc; then come
> >> back
> >> > > > and fix the runtime configs.
> >> > > >
> >> > > > Once the submodules are in place it will open up more work for
> >> release
> >> > > > engineering and tinkering that can be done in parallel with the
> >> config
> >> > > > polishing.
> >> > > >
> >> > > > Just a thought.
> >> > > > Tim
> >> > > >
> >> > > >
> >> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> >> t.p.ellison@gmail.com>
> >> > > > wrote:
> >> > > > >
> >> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> >> > > > >>> It seems that it's the same idea as the ResponderLauncher with
> >> the
> >> > > > >> service
> >> > > > >>> component added to maintain something akin to the 'platform'.
> I
> >> would
> >> > > > >>> prefer that we just did away with the platform notion
> altogether
> >> and
> >> > > > make
> >> > > > >>> the ResponderDriver 'dumb'. We get around needing a
> >> platform-aware
> >> > > > >> service
> >> > > > >>> by requiring the ResponderLauncher implementation to be passed
> >> as
> >> a
> >> > > CLI
> >> > > > >> to
> >> > > > >>> the ResponderDriver.
> >> > > > >>
> >> > > > >> Let me check I understand what you are saying here.
> >> > > > >>
> >> > > > >> At the moment, there is a monolithic Pirk that hard codes how
> to
> >> > > respond
> >> > > > >> using lots of different backends (mapreduce, spark,
> >> sparkstreaming,
> >> > > > >> storm , standalone), and that is selected by command-line
> >> argument.
> >> > > > >>
> >> > > > >> One option is to make Pirk pluggable, so that a Pirk
> installation
> >> > > could
> >> > > > >> use one or more of these in an extensible fashion by adding JAR
> >> files.
> >> > > > >> That would still require selecting one by command-line
> argument.
> >> > > > >>
> >> > > > >> A second option is to simply pass in the required backend JAR
> to
> >> > > select
> >> > > > >> the particular implementation you choose, as a specific Pirk
> >> > > > >> installation doesn't need to use multiple backends
> >> simultaneously.
> >> > > > >>
> >> > > > >> ...and you are leaning towards the second option.  Do I have
> that
> >> > > > correct?
> >> > > > >>
> >> > > > >> Regards,
> >> > > > >> Tim
> >> > > > >>
> >> > > > >>> Am I missing something? Is there a good reason to provide a
> >> service
> >> > > by
> >> > > > >>> which platforms are registered? I'm open...
> >> > > > >>>
> >> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> >> t.p.ellison@gmail.com>
> >> > > > >> wrote:
> >> > > > >>>
> >> > > > >>>> How about an approach like this?
> >> > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> >> > > > >>>>
> >> > > > >>>> The "on-ramp" is the driver [1], which calls upon the service
> >> to
> >> > > find
> >> > > > a
> >> > > > >>>> plug-in [2] that claims to implement the required platform
> >> > > responder,
> >> > > > >>>> e.g. [3].
> >> > > > >>>>
> >> > > > >>>> The list of plug-ins is given in the provider's JAR file, so
> >> the
> >> > > ones
> >> > > > we
> >> > > > >>>> provide in Pirk are listed together [4], but if you split
> these
> >> into
> >> > > > >>>> modules, or somebody brings their own JAR alongside, these
> >> would
> >> be
> >> > > > >>>> listed in each JAR's services/ directory.
> >> > > > >>>>
> >> > > > >>>> [1]
> >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> >> > > > ResponderDriver.java
> >> > > > >>>> [2]
> >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> > > > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.
> >> java
> >> > > > >>>> [3]
> >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> >> > > > >>>> StormResponder.java
> >> > > > >>>> [4]
> >> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> > > > >>>> src/main/services/org.apache.responder.spi.Responder
> >> > > > >>>>
> >> > > > >>>> I'm not even going to dignify this with a WIP PR, it is far
> >> from
> >> > > > ready,
> >> > > > >>>> so proceed with caution.  There is hopefully enough there to
> >> show
> >> > > the
> >> > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> >> > > > >>>>
> >> > > > >>>> Regards,
> >> > > > >>>> Tim
> >> > > > >>>>
> >> > > > >>>>
> >> > > > >>>
> >> > > > >>
> >> > > > >
> >> > > >
> >> > >
> >>
> >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Suneel Marthi <su...@gmail.com>.

Here's an example from the Flink project for how they go about new features
or system breaking API changes, we could start a similar process. The Flink
guys call these FLIP (Flink Improvement Proposal) and Kafka community
similarly has something called KLIP.

We could start a PLIP (??? :-) )

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65870673


On Mon, Sep 19, 2016 at 11:07 PM, Suneel Marthi <su...@gmail.com>
wrote:

> A shared Google doc would be more convenient than a bunch of Jiras. Its
> easier to comment and add notes that way.
>
>
> On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <db...@gmail.com>
> wrote:
>
>> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
>> Based off my pirk-63 I was able to pull spark and storm out with no
>> issues.  I was planning to pull them out, then tackling elastic search,
>> then hadoop as it's a little entrenched.  This should keep most PRs to
>> manageable chunks. I think once that's done addressing the configs will
>> make more sense.
>>
>> I'm open to suggestions. But the hope would be:
>> Pirk-parent
>> Pirk-core
>> Pirk-hadoop
>> Pirk-storm
>> Pirk-parent
>>
>> Pirk-es is a little weird as it's really just an inputformat, seems like
>> there's a more general solution here than creating submodules for every
>> inputformat.
>>
>> Darin
>>
>> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
>>
>> >
>>
>> > Refactor is definitely a first priority.  Is there a design/proposal
>> draft
>> > that we could comment on about how to go about refactoring the code.  I
>> > have been trying to keep up with the emails but definitely would have
>> > missed some.
>> >
>> >
>> >
>> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
>> > eawilliams@apache.org <ea...@apache.org>> wrote:
>> >
>> > > Agree - let's leave the config/CLI the way it is for now and tackle
>> that as
>> > > a subsequent design discussion and PR.
>> > >
>> > > Also, I think that we should leave the ResponderDriver and the
>> > > ResponderProps alone for this PR and push to a subsequent PR (once we
>> > > decide if and how we would like to delegate each).
>> > >
>> > > I vote to remove the 'platform' option and the backwards compatibility
>> in
>> > > this PR and proceed with having a ResponderLauncher interface and
>> forcing
>> > > its implementation by the ResponderDriver.
>> > >
>> > > And, I am not so concerned with having one fat jar vs. multiple jars
>> right
>> > > now - to me, at this point, it's a 'nice to have' and not a 'must
>> have'
>> for
>> > > Pirk functionality. We do need to break out Pirk into more clearly
>> defined
>> > > submodules (which is in progress) - via this re-factor, I think that
>> we
>> > > will gain some ability to generate multiple jars which is nice.
>> > >
>> > >
>> > >
>> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t....@gmail.com>
>> > > wrote:
>> > >
>> > > > On 19/09/16 15:46, Darin Johnson wrote:
>> > > > > Hey guys,
>> > > > >
>> > > > > Thanks for looking at the PR, I apologize if it offended anyone's
>> > > eyes:).
>> > > > >
>> > > > > I'm glad it generated some discussion about the configuration.  I
>> > > didn't
>> > > > > really like where things were heading with the config.  However,
>> didn't
>> > > > > want to create to much scope creep.
>> > > > >
>> > > > > I think any hierarchical config (TypeSafe or yaml) would make
>> things
>> > > much
>> > > > > more maintainable, the plugin could simply grab the appropriate
>> part of
>> > > > the
>> > > > > config and handle accordingly.  I'd also cut down the number of
>> command
>> > > > > line options to only those that change between runs often (like
>> > > > > input/output)
>> > > > >
>> > > > >> One option is to make Pirk pluggable, so that a Pirk installation
>> > > could
>> > > > >> use one or more of these in an extensible fashion by adding JAR
>> files.
>> > > > >> That would still require selecting one by command-line argument.
>> > > > >
>> > > > > An argument for this approach is for lambda architecture
>> approaches
>> > > (say
>> > > > > spark/spark-streaming) were the contents of the jars would be so
>> > > similar
>> > > > it
>> > > > > seems like to much trouble to create separate jars.
>> > > > >
>> > > > > Happy to continue working on this given some direction on where
>> you'd
>> > > > like
>> > > > > it to go.  Also, it's a bit of a blocker to refactoring the build
>> into
>> > > > > submodules.
>> > > >
>> > > > FWIW my 2c is to not try and fix all the problems in one go, and
>> rather
>> > > > take a compromise on the configurations while you tease apart the
>> > > > submodules in to separate source code trees, poms, etc; then come
>> back
>> > > > and fix the runtime configs.
>> > > >
>> > > > Once the submodules are in place it will open up more work for
>> release
>> > > > engineering and tinkering that can be done in parallel with the
>> config
>> > > > polishing.
>> > > >
>> > > > Just a thought.
>> > > > Tim
>> > > >
>> > > >
>> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
>> t.p.ellison@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
>> > > > >>> It seems that it's the same idea as the ResponderLauncher with
>> the
>> > > > >> service
>> > > > >>> component added to maintain something akin to the 'platform'. I
>> would
>> > > > >>> prefer that we just did away with the platform notion altogether
>> and
>> > > > make
>> > > > >>> the ResponderDriver 'dumb'. We get around needing a
>> platform-aware
>> > > > >> service
>> > > > >>> by requiring the ResponderLauncher implementation to be passed
>> as
>> a
>> > > CLI
>> > > > >> to
>> > > > >>> the ResponderDriver.
>> > > > >>
>> > > > >> Let me check I understand what you are saying here.
>> > > > >>
>> > > > >> At the moment, there is a monolithic Pirk that hard codes how to
>> > > respond
>> > > > >> using lots of different backends (mapreduce, spark,
>> sparkstreaming,
>> > > > >> storm , standalone), and that is selected by command-line
>> argument.
>> > > > >>
>> > > > >> One option is to make Pirk pluggable, so that a Pirk installation
>> > > could
>> > > > >> use one or more of these in an extensible fashion by adding JAR
>> files.
>> > > > >> That would still require selecting one by command-line argument.
>> > > > >>
>> > > > >> A second option is to simply pass in the required backend JAR to
>> > > select
>> > > > >> the particular implementation you choose, as a specific Pirk
>> > > > >> installation doesn't need to use multiple backends
>> simultaneously.
>> > > > >>
>> > > > >> ...and you are leaning towards the second option.  Do I have that
>> > > > correct?
>> > > > >>
>> > > > >> Regards,
>> > > > >> Tim
>> > > > >>
>> > > > >>> Am I missing something? Is there a good reason to provide a
>> service
>> > > by
>> > > > >>> which platforms are registered? I'm open...
>> > > > >>>
>> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
>> t.p.ellison@gmail.com>
>> > > > >> wrote:
>> > > > >>>
>> > > > >>>> How about an approach like this?
>> > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>> > > > >>>>
>> > > > >>>> The "on-ramp" is the driver [1], which calls upon the service
>> to
>> > > find
>> > > > a
>> > > > >>>> plug-in [2] that claims to implement the required platform
>> > > responder,
>> > > > >>>> e.g. [3].
>> > > > >>>>
>> > > > >>>> The list of plug-ins is given in the provider's JAR file, so
>> the
>> > > ones
>> > > > we
>> > > > >>>> provide in Pirk are listed together [4], but if you split these
>> into
>> > > > >>>> modules, or somebody brings their own JAR alongside, these
>> would
>> be
>> > > > >>>> listed in each JAR's services/ directory.
>> > > > >>>>
>> > > > >>>> [1]
>> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
>> > > > ResponderDriver.java
>> > > > >>>> [2]
>> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> > > > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.
>> java
>> > > > >>>> [3]
>> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>> > > > >>>> StormResponder.java
>> > > > >>>> [4]
>> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> > > > >>>> src/main/services/org.apache.responder.spi.Responder
>> > > > >>>>
>> > > > >>>> I'm not even going to dignify this with a WIP PR, it is far
>> from
>> > > > ready,
>> > > > >>>> so proceed with caution.  There is hopefully enough there to
>> show
>> > > the
>> > > > >>>> approach, and if it is worth continuing I'm happy to do so.
>> > > > >>>>
>> > > > >>>> Regards,
>> > > > >>>> Tim
>> > > > >>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > > >
>> > > >
>> > >
>>
>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Suneel Marthi <su...@gmail.com>.

A shared Google doc would be more convenient than a bunch of Jiras. Its
easier to comment and add notes that way.


On Mon, Sep 19, 2016 at 10:38 PM, Darin Johnson <db...@gmail.com>
wrote:

> Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
> Based off my pirk-63 I was able to pull spark and storm out with no
> issues.  I was planning to pull them out, then tackling elastic search,
> then hadoop as it's a little entrenched.  This should keep most PRs to
> manageable chunks. I think once that's done addressing the configs will
> make more sense.
>
> I'm open to suggestions. But the hope would be:
> Pirk-parent
> Pirk-core
> Pirk-hadoop
> Pirk-storm
> Pirk-parent
>
> Pirk-es is a little weird as it's really just an inputformat, seems like
> there's a more general solution here than creating submodules for every
> inputformat.
>
> Darin
>
> On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:
>
> >
>
> > Refactor is definitely a first priority.  Is there a design/proposal
> draft
> > that we could comment on about how to go about refactoring the code.  I
> > have been trying to keep up with the emails but definitely would have
> > missed some.
> >
> >
> >
> > On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> > eawilliams@apache.org <ea...@apache.org>> wrote:
> >
> > > Agree - let's leave the config/CLI the way it is for now and tackle
> that as
> > > a subsequent design discussion and PR.
> > >
> > > Also, I think that we should leave the ResponderDriver and the
> > > ResponderProps alone for this PR and push to a subsequent PR (once we
> > > decide if and how we would like to delegate each).
> > >
> > > I vote to remove the 'platform' option and the backwards compatibility
> in
> > > this PR and proceed with having a ResponderLauncher interface and
> forcing
> > > its implementation by the ResponderDriver.
> > >
> > > And, I am not so concerned with having one fat jar vs. multiple jars
> right
> > > now - to me, at this point, it's a 'nice to have' and not a 'must have'
> for
> > > Pirk functionality. We do need to break out Pirk into more clearly
> defined
> > > submodules (which is in progress) - via this re-factor, I think that we
> > > will gain some ability to generate multiple jars which is nice.
> > >
> > >
> > >
> > > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t....@gmail.com>
> > > wrote:
> > >
> > > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > > Hey guys,
> > > > >
> > > > > Thanks for looking at the PR, I apologize if it offended anyone's
> > > eyes:).
> > > > >
> > > > > I'm glad it generated some discussion about the configuration.  I
> > > didn't
> > > > > really like where things were heading with the config.  However,
> didn't
> > > > > want to create to much scope creep.
> > > > >
> > > > > I think any hierarchical config (TypeSafe or yaml) would make
> things
> > > much
> > > > > more maintainable, the plugin could simply grab the appropriate
> part of
> > > > the
> > > > > config and handle accordingly.  I'd also cut down the number of
> command
> > > > > line options to only those that change between runs often (like
> > > > > input/output)
> > > > >
> > > > >> One option is to make Pirk pluggable, so that a Pirk installation
> > > could
> > > > >> use one or more of these in an extensible fashion by adding JAR
> files.
> > > > >> That would still require selecting one by command-line argument.
> > > > >
> > > > > An argument for this approach is for lambda architecture approaches
> > > (say
> > > > > spark/spark-streaming) were the contents of the jars would be so
> > > similar
> > > > it
> > > > > seems like to much trouble to create separate jars.
> > > > >
> > > > > Happy to continue working on this given some direction on where
> you'd
> > > > like
> > > > > it to go.  Also, it's a bit of a blocker to refactoring the build
> into
> > > > > submodules.
> > > >
> > > > FWIW my 2c is to not try and fix all the problems in one go, and
> rather
> > > > take a compromise on the configurations while you tease apart the
> > > > submodules in to separate source code trees, poms, etc; then come
> back
> > > > and fix the runtime configs.
> > > >
> > > > Once the submodules are in place it will open up more work for
> release
> > > > engineering and tinkering that can be done in parallel with the
> config
> > > > polishing.
> > > >
> > > > Just a thought.
> > > > Tim
> > > >
> > > >
> > > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <
> t.p.ellison@gmail.com>
> > > > wrote:
> > > > >
> > > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > > >>> It seems that it's the same idea as the ResponderLauncher with
> the
> > > > >> service
> > > > >>> component added to maintain something akin to the 'platform'. I
> would
> > > > >>> prefer that we just did away with the platform notion altogether
> and
> > > > make
> > > > >>> the ResponderDriver 'dumb'. We get around needing a
> platform-aware
> > > > >> service
> > > > >>> by requiring the ResponderLauncher implementation to be passed as
> a
> > > CLI
> > > > >> to
> > > > >>> the ResponderDriver.
> > > > >>
> > > > >> Let me check I understand what you are saying here.
> > > > >>
> > > > >> At the moment, there is a monolithic Pirk that hard codes how to
> > > respond
> > > > >> using lots of different backends (mapreduce, spark,
> sparkstreaming,
> > > > >> storm , standalone), and that is selected by command-line
> argument.
> > > > >>
> > > > >> One option is to make Pirk pluggable, so that a Pirk installation
> > > could
> > > > >> use one or more of these in an extensible fashion by adding JAR
> files.
> > > > >> That would still require selecting one by command-line argument.
> > > > >>
> > > > >> A second option is to simply pass in the required backend JAR to
> > > select
> > > > >> the particular implementation you choose, as a specific Pirk
> > > > >> installation doesn't need to use multiple backends simultaneously.
> > > > >>
> > > > >> ...and you are leaning towards the second option.  Do I have that
> > > > correct?
> > > > >>
> > > > >> Regards,
> > > > >> Tim
> > > > >>
> > > > >>> Am I missing something? Is there a good reason to provide a
> service
> > > by
> > > > >>> which platforms are registered? I'm open...
> > > > >>>
> > > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
> t.p.ellison@gmail.com>
> > > > >> wrote:
> > > > >>>
> > > > >>>> How about an approach like this?
> > > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > > > >>>>
> > > > >>>> The "on-ramp" is the driver [1], which calls upon the service to
> > > find
> > > > a
> > > > >>>> plug-in [2] that claims to implement the required platform
> > > responder,
> > > > >>>> e.g. [3].
> > > > >>>>
> > > > >>>> The list of plug-ins is given in the provider's JAR file, so the
> > > ones
> > > > we
> > > > >>>> provide in Pirk are listed together [4], but if you split these
> into
> > > > >>>> modules, or somebody brings their own JAR alongside, these would
> be
> > > > >>>> listed in each JAR's services/ directory.
> > > > >>>>
> > > > >>>> [1]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > > ResponderDriver.java
> > > > >>>> [2]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.
> java
> > > > >>>> [3]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > > >>>> StormResponder.java
> > > > >>>> [4]
> > > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > > >>>>
> > > > >>>> I'm not even going to dignify this with a WIP PR, it is far from
> > > > ready,
> > > > >>>> so proceed with caution.  There is hopefully enough there to
> show
> > > the
> > > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > > > >>>>
> > > > >>>> Regards,
> > > > >>>> Tim
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > > >
> > > >
> > >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

Suneel, I'll try to put a couple jiras on it tonight with my thoughts.
Based off my pirk-63 I was able to pull spark and storm out with no
issues.  I was planning to pull them out, then tackling elastic search,
then hadoop as it's a little entrenched.  This should keep most PRs to
manageable chunks. I think once that's done addressing the configs will
make more sense.

I'm open to suggestions. But the hope would be:
Pirk-parent
Pirk-core
Pirk-hadoop
Pirk-storm
Pirk-parent

Pirk-es is a little weird as it's really just an inputformat, seems like
there's a more general solution here than creating submodules for every
inputformat.

Darin

On Sep 19, 2016 1:00 PM, "Suneel Marthi" <sm...@apache.org> wrote:

>

> Refactor is definitely a first priority.  Is there a design/proposal draft
> that we could comment on about how to go about refactoring the code.  I
> have been trying to keep up with the emails but definitely would have
> missed some.
>
>
>
> On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
> eawilliams@apache.org <ea...@apache.org>> wrote:
>
> > Agree - let's leave the config/CLI the way it is for now and tackle
that as
> > a subsequent design discussion and PR.
> >
> > Also, I think that we should leave the ResponderDriver and the
> > ResponderProps alone for this PR and push to a subsequent PR (once we
> > decide if and how we would like to delegate each).
> >
> > I vote to remove the 'platform' option and the backwards compatibility
in
> > this PR and proceed with having a ResponderLauncher interface and
forcing
> > its implementation by the ResponderDriver.
> >
> > And, I am not so concerned with having one fat jar vs. multiple jars
right
> > now - to me, at this point, it's a 'nice to have' and not a 'must have'
for
> > Pirk functionality. We do need to break out Pirk into more clearly
defined
> > submodules (which is in progress) - via this re-factor, I think that we
> > will gain some ability to generate multiple jars which is nice.
> >
> >
> >
> > On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t....@gmail.com>
> > wrote:
> >
> > > On 19/09/16 15:46, Darin Johnson wrote:
> > > > Hey guys,
> > > >
> > > > Thanks for looking at the PR, I apologize if it offended anyone's
> > eyes:).
> > > >
> > > > I'm glad it generated some discussion about the configuration.  I
> > didn't
> > > > really like where things were heading with the config.  However,
didn't
> > > > want to create to much scope creep.
> > > >
> > > > I think any hierarchical config (TypeSafe or yaml) would make things
> > much
> > > > more maintainable, the plugin could simply grab the appropriate
part of
> > > the
> > > > config and handle accordingly.  I'd also cut down the number of
command
> > > > line options to only those that change between runs often (like
> > > > input/output)
> > > >
> > > >> One option is to make Pirk pluggable, so that a Pirk installation
> > could
> > > >> use one or more of these in an extensible fashion by adding JAR
files.
> > > >> That would still require selecting one by command-line argument.
> > > >
> > > > An argument for this approach is for lambda architecture approaches
> > (say
> > > > spark/spark-streaming) were the contents of the jars would be so
> > similar
> > > it
> > > > seems like to much trouble to create separate jars.
> > > >
> > > > Happy to continue working on this given some direction on where
you'd
> > > like
> > > > it to go.  Also, it's a bit of a blocker to refactoring the build
into
> > > > submodules.
> > >
> > > FWIW my 2c is to not try and fix all the problems in one go, and
rather
> > > take a compromise on the configurations while you tease apart the
> > > submodules in to separate source code trees, poms, etc; then come back
> > > and fix the runtime configs.
> > >
> > > Once the submodules are in place it will open up more work for release
> > > engineering and tinkering that can be done in parallel with the config
> > > polishing.
> > >
> > > Just a thought.
> > > Tim
> > >
> > >
> > > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com>
> > > wrote:
> > > >
> > > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > > >>> It seems that it's the same idea as the ResponderLauncher with the
> > > >> service
> > > >>> component added to maintain something akin to the 'platform'. I
would
> > > >>> prefer that we just did away with the platform notion altogether
and
> > > make
> > > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> > > >> service
> > > >>> by requiring the ResponderLauncher implementation to be passed as
a
> > CLI
> > > >> to
> > > >>> the ResponderDriver.
> > > >>
> > > >> Let me check I understand what you are saying here.
> > > >>
> > > >> At the moment, there is a monolithic Pirk that hard codes how to
> > respond
> > > >> using lots of different backends (mapreduce, spark, sparkstreaming,
> > > >> storm , standalone), and that is selected by command-line argument.
> > > >>
> > > >> One option is to make Pirk pluggable, so that a Pirk installation
> > could
> > > >> use one or more of these in an extensible fashion by adding JAR
files.
> > > >> That would still require selecting one by command-line argument.
> > > >>
> > > >> A second option is to simply pass in the required backend JAR to
> > select
> > > >> the particular implementation you choose, as a specific Pirk
> > > >> installation doesn't need to use multiple backends simultaneously.
> > > >>
> > > >> ...and you are leaning towards the second option.  Do I have that
> > > correct?
> > > >>
> > > >> Regards,
> > > >> Tim
> > > >>
> > > >>> Am I missing something? Is there a good reason to provide a
service
> > by
> > > >>> which platforms are registered? I'm open...
> > > >>>
> > > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <
t.p.ellison@gmail.com>
> > > >> wrote:
> > > >>>
> > > >>>> How about an approach like this?
> > > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > > >>>>
> > > >>>> The "on-ramp" is the driver [1], which calls upon the service to
> > find
> > > a
> > > >>>> plug-in [2] that claims to implement the required platform
> > responder,
> > > >>>> e.g. [3].
> > > >>>>
> > > >>>> The list of plug-ins is given in the provider's JAR file, so the
> > ones
> > > we
> > > >>>> provide in Pirk are listed together [4], but if you split these
into
> > > >>>> modules, or somebody brings their own JAR alongside, these would
be
> > > >>>> listed in each JAR's services/ directory.
> > > >>>>
> > > >>>> [1]
> > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > > ResponderDriver.java
> > > >>>> [2]
> > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> > > >>>> [3]
> > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > > >>>> StormResponder.java
> > > >>>> [4]
> > > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > > >>>> src/main/services/org.apache.responder.spi.Responder
> > > >>>>
> > > >>>> I'm not even going to dignify this with a WIP PR, it is far from
> > > ready,
> > > >>>> so proceed with caution.  There is hopefully enough there to show
> > the
> > > >>>> approach, and if it is worth continuing I'm happy to do so.
> > > >>>>
> > > >>>> Regards,
> > > >>>> Tim
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > >
> > >
> >

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Suneel Marthi <sm...@apache.org>.

Refactor is definitely a first priority.  Is there a design/proposal draft
that we could comment on about how to go about refactoring the code.  I
have been trying to keep up with the emails but definitely would have
missed some.



On Mon, Sep 19, 2016 at 6:57 PM, Ellison Anne Williams <
eawilliams@apache.org> wrote:

> Agree - let's leave the config/CLI the way it is for now and tackle that as
> a subsequent design discussion and PR.
>
> Also, I think that we should leave the ResponderDriver and the
> ResponderProps alone for this PR and push to a subsequent PR (once we
> decide if and how we would like to delegate each).
>
> I vote to remove the 'platform' option and the backwards compatibility in
> this PR and proceed with having a ResponderLauncher interface and forcing
> its implementation by the ResponderDriver.
>
> And, I am not so concerned with having one fat jar vs. multiple jars right
> now - to me, at this point, it's a 'nice to have' and not a 'must have' for
> Pirk functionality. We do need to break out Pirk into more clearly defined
> submodules (which is in progress) - via this re-factor, I think that we
> will gain some ability to generate multiple jars which is nice.
>
>
>
> On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t....@gmail.com>
> wrote:
>
> > On 19/09/16 15:46, Darin Johnson wrote:
> > > Hey guys,
> > >
> > > Thanks for looking at the PR, I apologize if it offended anyone's
> eyes:).
> > >
> > > I'm glad it generated some discussion about the configuration.  I
> didn't
> > > really like where things were heading with the config.  However, didn't
> > > want to create to much scope creep.
> > >
> > > I think any hierarchical config (TypeSafe or yaml) would make things
> much
> > > more maintainable, the plugin could simply grab the appropriate part of
> > the
> > > config and handle accordingly.  I'd also cut down the number of command
> > > line options to only those that change between runs often (like
> > > input/output)
> > >
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >
> > > An argument for this approach is for lambda architecture approaches
> (say
> > > spark/spark-streaming) were the contents of the jars would be so
> similar
> > it
> > > seems like to much trouble to create separate jars.
> > >
> > > Happy to continue working on this given some direction on where you'd
> > like
> > > it to go.  Also, it's a bit of a blocker to refactoring the build into
> > > submodules.
> >
> > FWIW my 2c is to not try and fix all the problems in one go, and rather
> > take a compromise on the configurations while you tease apart the
> > submodules in to separate source code trees, poms, etc; then come back
> > and fix the runtime configs.
> >
> > Once the submodules are in place it will open up more work for release
> > engineering and tinkering that can be done in parallel with the config
> > polishing.
> >
> > Just a thought.
> > Tim
> >
> >
> > > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com>
> > wrote:
> > >
> > >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > >>> It seems that it's the same idea as the ResponderLauncher with the
> > >> service
> > >>> component added to maintain something akin to the 'platform'. I would
> > >>> prefer that we just did away with the platform notion altogether and
> > make
> > >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> > >> service
> > >>> by requiring the ResponderLauncher implementation to be passed as a
> CLI
> > >> to
> > >>> the ResponderDriver.
> > >>
> > >> Let me check I understand what you are saying here.
> > >>
> > >> At the moment, there is a monolithic Pirk that hard codes how to
> respond
> > >> using lots of different backends (mapreduce, spark, sparkstreaming,
> > >> storm , standalone), and that is selected by command-line argument.
> > >>
> > >> One option is to make Pirk pluggable, so that a Pirk installation
> could
> > >> use one or more of these in an extensible fashion by adding JAR files.
> > >> That would still require selecting one by command-line argument.
> > >>
> > >> A second option is to simply pass in the required backend JAR to
> select
> > >> the particular implementation you choose, as a specific Pirk
> > >> installation doesn't need to use multiple backends simultaneously.
> > >>
> > >> ...and you are leaning towards the second option.  Do I have that
> > correct?
> > >>
> > >> Regards,
> > >> Tim
> > >>
> > >>> Am I missing something? Is there a good reason to provide a service
> by
> > >>> which platforms are registered? I'm open...
> > >>>
> > >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
> > >> wrote:
> > >>>
> > >>>> How about an approach like this?
> > >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> > >>>>
> > >>>> The "on-ramp" is the driver [1], which calls upon the service to
> find
> > a
> > >>>> plug-in [2] that claims to implement the required platform
> responder,
> > >>>> e.g. [3].
> > >>>>
> > >>>> The list of plug-ins is given in the provider's JAR file, so the
> ones
> > we
> > >>>> provide in Pirk are listed together [4], but if you split these into
> > >>>> modules, or somebody brings their own JAR alongside, these would be
> > >>>> listed in each JAR's services/ directory.
> > >>>>
> > >>>> [1]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/java/org/apache/pirk/responder/wideskies/
> > ResponderDriver.java
> > >>>> [2]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> > >>>> [3]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> > >>>> StormResponder.java
> > >>>> [4]
> > >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> > >>>> src/main/services/org.apache.responder.spi.Responder
> > >>>>
> > >>>> I'm not even going to dignify this with a WIP PR, it is far from
> > ready,
> > >>>> so proceed with caution.  There is hopefully enough there to show
> the
> > >>>> approach, and if it is worth continuing I'm happy to do so.
> > >>>>
> > >>>> Regards,
> > >>>> Tim
> > >>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Ellison Anne Williams <ea...@apache.org>.

Agree - let's leave the config/CLI the way it is for now and tackle that as
a subsequent design discussion and PR.

Also, I think that we should leave the ResponderDriver and the
ResponderProps alone for this PR and push to a subsequent PR (once we
decide if and how we would like to delegate each).

I vote to remove the 'platform' option and the backwards compatibility in
this PR and proceed with having a ResponderLauncher interface and forcing
its implementation by the ResponderDriver.

And, I am not so concerned with having one fat jar vs. multiple jars right
now - to me, at this point, it's a 'nice to have' and not a 'must have' for
Pirk functionality. We do need to break out Pirk into more clearly defined
submodules (which is in progress) - via this re-factor, I think that we
will gain some ability to generate multiple jars which is nice.



On Mon, Sep 19, 2016 at 12:19 PM, Tim Ellison <t....@gmail.com> wrote:

> On 19/09/16 15:46, Darin Johnson wrote:
> > Hey guys,
> >
> > Thanks for looking at the PR, I apologize if it offended anyone's eyes:).
> >
> > I'm glad it generated some discussion about the configuration.  I didn't
> > really like where things were heading with the config.  However, didn't
> > want to create to much scope creep.
> >
> > I think any hierarchical config (TypeSafe or yaml) would make things much
> > more maintainable, the plugin could simply grab the appropriate part of
> the
> > config and handle accordingly.  I'd also cut down the number of command
> > line options to only those that change between runs often (like
> > input/output)
> >
> >> One option is to make Pirk pluggable, so that a Pirk installation could
> >> use one or more of these in an extensible fashion by adding JAR files.
> >> That would still require selecting one by command-line argument.
> >
> > An argument for this approach is for lambda architecture approaches (say
> > spark/spark-streaming) were the contents of the jars would be so similar
> it
> > seems like to much trouble to create separate jars.
> >
> > Happy to continue working on this given some direction on where you'd
> like
> > it to go.  Also, it's a bit of a blocker to refactoring the build into
> > submodules.
>
> FWIW my 2c is to not try and fix all the problems in one go, and rather
> take a compromise on the configurations while you tease apart the
> submodules in to separate source code trees, poms, etc; then come back
> and fix the runtime configs.
>
> Once the submodules are in place it will open up more work for release
> engineering and tinkering that can be done in parallel with the config
> polishing.
>
> Just a thought.
> Tim
>
>
> > On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com>
> wrote:
> >
> >> On 19/09/16 13:40, Ellison Anne Williams wrote:
> >>> It seems that it's the same idea as the ResponderLauncher with the
> >> service
> >>> component added to maintain something akin to the 'platform'. I would
> >>> prefer that we just did away with the platform notion altogether and
> make
> >>> the ResponderDriver 'dumb'. We get around needing a platform-aware
> >> service
> >>> by requiring the ResponderLauncher implementation to be passed as a CLI
> >> to
> >>> the ResponderDriver.
> >>
> >> Let me check I understand what you are saying here.
> >>
> >> At the moment, there is a monolithic Pirk that hard codes how to respond
> >> using lots of different backends (mapreduce, spark, sparkstreaming,
> >> storm , standalone), and that is selected by command-line argument.
> >>
> >> One option is to make Pirk pluggable, so that a Pirk installation could
> >> use one or more of these in an extensible fashion by adding JAR files.
> >> That would still require selecting one by command-line argument.
> >>
> >> A second option is to simply pass in the required backend JAR to select
> >> the particular implementation you choose, as a specific Pirk
> >> installation doesn't need to use multiple backends simultaneously.
> >>
> >> ...and you are leaning towards the second option.  Do I have that
> correct?
> >>
> >> Regards,
> >> Tim
> >>
> >>> Am I missing something? Is there a good reason to provide a service by
> >>> which platforms are registered? I'm open...
> >>>
> >>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
> >> wrote:
> >>>
> >>>> How about an approach like this?
> >>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> >>>>
> >>>> The "on-ramp" is the driver [1], which calls upon the service to find
> a
> >>>> plug-in [2] that claims to implement the required platform responder,
> >>>> e.g. [3].
> >>>>
> >>>> The list of plug-ins is given in the provider's JAR file, so the ones
> we
> >>>> provide in Pirk are listed together [4], but if you split these into
> >>>> modules, or somebody brings their own JAR alongside, these would be
> >>>> listed in each JAR's services/ directory.
> >>>>
> >>>> [1]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/java/org/apache/pirk/responder/wideskies/
> ResponderDriver.java
> >>>> [2]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> >>>> [3]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
> >>>> StormResponder.java
> >>>> [4]
> >>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >>>> src/main/services/org.apache.responder.spi.Responder
> >>>>
> >>>> I'm not even going to dignify this with a WIP PR, it is far from
> ready,
> >>>> so proceed with caution.  There is hopefully enough there to show the
> >>>> approach, and if it is worth continuing I'm happy to do so.
> >>>>
> >>>> Regards,
> >>>> Tim
> >>>>
> >>>>
> >>>
> >>
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

On 19/09/16 15:46, Darin Johnson wrote:
> Hey guys,
> 
> Thanks for looking at the PR, I apologize if it offended anyone's eyes:).
> 
> I'm glad it generated some discussion about the configuration.  I didn't
> really like where things were heading with the config.  However, didn't
> want to create to much scope creep.
> 
> I think any hierarchical config (TypeSafe or yaml) would make things much
> more maintainable, the plugin could simply grab the appropriate part of the
> config and handle accordingly.  I'd also cut down the number of command
> line options to only those that change between runs often (like
> input/output)
> 
>> One option is to make Pirk pluggable, so that a Pirk installation could
>> use one or more of these in an extensible fashion by adding JAR files.
>> That would still require selecting one by command-line argument.
> 
> An argument for this approach is for lambda architecture approaches (say
> spark/spark-streaming) were the contents of the jars would be so similar it
> seems like to much trouble to create separate jars.
> 
> Happy to continue working on this given some direction on where you'd like
> it to go.  Also, it's a bit of a blocker to refactoring the build into
> submodules.

FWIW my 2c is to not try and fix all the problems in one go, and rather
take a compromise on the configurations while you tease apart the
submodules in to separate source code trees, poms, etc; then come back
and fix the runtime configs.

Once the submodules are in place it will open up more work for release
engineering and tinkering that can be done in parallel with the config
polishing.

Just a thought.
Tim


> On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com> wrote:
> 
>> On 19/09/16 13:40, Ellison Anne Williams wrote:
>>> It seems that it's the same idea as the ResponderLauncher with the
>> service
>>> component added to maintain something akin to the 'platform'. I would
>>> prefer that we just did away with the platform notion altogether and make
>>> the ResponderDriver 'dumb'. We get around needing a platform-aware
>> service
>>> by requiring the ResponderLauncher implementation to be passed as a CLI
>> to
>>> the ResponderDriver.
>>
>> Let me check I understand what you are saying here.
>>
>> At the moment, there is a monolithic Pirk that hard codes how to respond
>> using lots of different backends (mapreduce, spark, sparkstreaming,
>> storm , standalone), and that is selected by command-line argument.
>>
>> One option is to make Pirk pluggable, so that a Pirk installation could
>> use one or more of these in an extensible fashion by adding JAR files.
>> That would still require selecting one by command-line argument.
>>
>> A second option is to simply pass in the required backend JAR to select
>> the particular implementation you choose, as a specific Pirk
>> installation doesn't need to use multiple backends simultaneously.
>>
>> ...and you are leaning towards the second option.  Do I have that correct?
>>
>> Regards,
>> Tim
>>
>>> Am I missing something? Is there a good reason to provide a service by
>>> which platforms are registered? I'm open...
>>>
>>> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
>> wrote:
>>>
>>>> How about an approach like this?
>>>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>>>
>>>> The "on-ramp" is the driver [1], which calls upon the service to find a
>>>> plug-in [2] that claims to implement the required platform responder,
>>>> e.g. [3].
>>>>
>>>> The list of plug-ins is given in the provider's JAR file, so the ones we
>>>> provide in Pirk are listed together [4], but if you split these into
>>>> modules, or somebody brings their own JAR alongside, these would be
>>>> listed in each JAR's services/ directory.
>>>>
>>>> [1]
>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
>>>> [2]
>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
>>>> [3]
>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>>>> StormResponder.java
>>>> [4]
>>>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>>>> src/main/services/org.apache.responder.spi.Responder
>>>>
>>>> I'm not even going to dignify this with a WIP PR, it is far from ready,
>>>> so proceed with caution.  There is hopefully enough there to show the
>>>> approach, and if it is worth continuing I'm happy to do so.
>>>>
>>>> Regards,
>>>> Tim
>>>>
>>>>
>>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Darin Johnson <db...@gmail.com>.

Hey guys,

Thanks for looking at the PR, I apologize if it offended anyone's eyes:).

I'm glad it generated some discussion about the configuration.  I didn't
really like where things were heading with the config.  However, didn't
want to create to much scope creep.

I think any hierarchical config (TypeSafe or yaml) would make things much
more maintainable, the plugin could simply grab the appropriate part of the
config and handle accordingly.  I'd also cut down the number of command
line options to only those that change between runs often (like
input/output)

>One option is to make Pirk pluggable, so that a Pirk installation could
>use one or more of these in an extensible fashion by adding JAR files.
>That would still require selecting one by command-line argument.

An argument for this approach is for lambda architecture approaches (say
spark/spark-streaming) were the contents of the jars would be so similar it
seems like to much trouble to create separate jars.

Happy to continue working on this given some direction on where you'd like
it to go.  Also, it's a bit of a blocker to refactoring the build into
submodules.

Darin




On Mon, Sep 19, 2016 at 9:33 AM, Tim Ellison <t....@gmail.com> wrote:

> On 19/09/16 13:40, Ellison Anne Williams wrote:
> > It seems that it's the same idea as the ResponderLauncher with the
> service
> > component added to maintain something akin to the 'platform'. I would
> > prefer that we just did away with the platform notion altogether and make
> > the ResponderDriver 'dumb'. We get around needing a platform-aware
> service
> > by requiring the ResponderLauncher implementation to be passed as a CLI
> to
> > the ResponderDriver.
>
> Let me check I understand what you are saying here.
>
> At the moment, there is a monolithic Pirk that hard codes how to respond
> using lots of different backends (mapreduce, spark, sparkstreaming,
> storm , standalone), and that is selected by command-line argument.
>
> One option is to make Pirk pluggable, so that a Pirk installation could
> use one or more of these in an extensible fashion by adding JAR files.
> That would still require selecting one by command-line argument.
>
> A second option is to simply pass in the required backend JAR to select
> the particular implementation you choose, as a specific Pirk
> installation doesn't need to use multiple backends simultaneously.
>
> ...and you are leaning towards the second option.  Do I have that correct?
>
> Regards,
> Tim
>
> > Am I missing something? Is there a good reason to provide a service by
> > which platforms are registered? I'm open...
> >
> > On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com>
> wrote:
> >
> >> How about an approach like this?
> >>    https://github.com/tellison/incubator-pirk/tree/pirk-63
> >>
> >> The "on-ramp" is the driver [1], which calls upon the service to find a
> >> plug-in [2] that claims to implement the required platform responder,
> >> e.g. [3].
> >>
> >> The list of plug-ins is given in the provider's JAR file, so the ones we
> >> provide in Pirk are listed together [4], but if you split these into
> >> modules, or somebody brings their own JAR alongside, these would be
> >> listed in each JAR's services/ directory.
> >>
> >> [1]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
> >> [2]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> >> [3]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/java/org/apache/pirk/responder/wideskies/storm/
> >> StormResponder.java
> >> [4]
> >> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> >> src/main/services/org.apache.responder.spi.Responder
> >>
> >> I'm not even going to dignify this with a WIP PR, it is far from ready,
> >> so proceed with caution.  There is hopefully enough there to show the
> >> approach, and if it is worth continuing I'm happy to do so.
> >>
> >> Regards,
> >> Tim
> >>
> >>
> >
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

On 19/09/16 13:40, Ellison Anne Williams wrote:
> It seems that it's the same idea as the ResponderLauncher with the service
> component added to maintain something akin to the 'platform'. I would
> prefer that we just did away with the platform notion altogether and make
> the ResponderDriver 'dumb'. We get around needing a platform-aware service
> by requiring the ResponderLauncher implementation to be passed as a CLI to
> the ResponderDriver.

Let me check I understand what you are saying here.

At the moment, there is a monolithic Pirk that hard codes how to respond
using lots of different backends (mapreduce, spark, sparkstreaming,
storm , standalone), and that is selected by command-line argument.

One option is to make Pirk pluggable, so that a Pirk installation could
use one or more of these in an extensible fashion by adding JAR files.
That would still require selecting one by command-line argument.

A second option is to simply pass in the required backend JAR to select
the particular implementation you choose, as a specific Pirk
installation doesn't need to use multiple backends simultaneously.

...and you are leaning towards the second option.  Do I have that correct?

Regards,
Tim

> Am I missing something? Is there a good reason to provide a service by
> which platforms are registered? I'm open...
> 
> On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com> wrote:
> 
>> How about an approach like this?
>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>
>> The "on-ramp" is the driver [1], which calls upon the service to find a
>> plug-in [2] that claims to implement the required platform responder,
>> e.g. [3].
>>
>> The list of plug-ins is given in the provider's JAR file, so the ones we
>> provide in Pirk are listed together [4], but if you split these into
>> modules, or somebody brings their own JAR alongside, these would be
>> listed in each JAR's services/ directory.
>>
>> [1]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
>> [2]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
>> [3]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>> StormResponder.java
>> [4]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/services/org.apache.responder.spi.Responder
>>
>> I'm not even going to dignify this with a WIP PR, it is far from ready,
>> so proceed with caution.  There is hopefully enough there to show the
>> approach, and if it is worth continuing I'm happy to do so.
>>
>> Regards,
>> Tim
>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Ellison Anne Williams <ea...@apache.org>.

It seems that it's the same idea as the ResponderLauncher with the service
component added to maintain something akin to the 'platform'. I would
prefer that we just did away with the platform notion altogether and make
the ResponderDriver 'dumb'. We get around needing a platform-aware service
by requiring the ResponderLauncher implementation to be passed as a CLI to
the ResponderDriver.

Am I missing something? Is there a good reason to provide a service by
which platforms are registered? I'm open...

On Mon, Sep 19, 2016 at 8:28 AM, Tim Ellison <t....@gmail.com> wrote:

> How about an approach like this?
>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>
> The "on-ramp" is the driver [1], which calls upon the service to find a
> plug-in [2] that claims to implement the required platform responder,
> e.g. [3].
>
> The list of plug-ins is given in the provider's JAR file, so the ones we
> provide in Pirk are listed together [4], but if you split these into
> modules, or somebody brings their own JAR alongside, these would be
> listed in each JAR's services/ directory.
>
> [1]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
> [2]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> [3]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/java/org/apache/pirk/responder/wideskies/storm/
> StormResponder.java
> [4]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/services/org.apache.responder.spi.Responder
>
> I'm not even going to dignify this with a WIP PR, it is far from ready,
> so proceed with caution.  There is hopefully enough there to show the
> approach, and if it is worth continuing I'm happy to do so.
>
> Regards,
> Tim
>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

On 19/09/16 13:39, Suneel Marthi wrote:
> The way this PR is now is so similar to how bad IBM SystemML which is a
> hackwork of hurriedly put together and something I have often pointed out
> to others as a clear example of "how not to design crappy software".  See
> this gist of an example code snippet from IBM SystemML -
> https://gist.github.com/smarthi/eb848e46621b7444924f

Not sure if you are looking at PR93, or the URL I sent you.

I agree that a large, explicit enumeration via a switch/if statement is
not conducive to extensibility, and that is what PIRK-63 is trying to
address.

> First things for the project:
> 
> 1. Move away from using the java properties (this is so 2002 way of doing
> things) to using TypeSafe style configurations which allow for structured
> properties.

From a quick look, that covers a different level, namely how the
configurations are represented.  First we need to look at the responder
architecture to allow for different responder types to be plugged in to
the Pirk framework.

Each plug-in responder type can figure out how to depict it's configuration.

> 2. From a Responder design, there would be a Responder-impl-class property
> which would be read from TypeSafe config and the appropriate driver class
> invoked.

I've not used style configurations before.  I think they overlap with
the SystemConfiguration a bit.  It would be interesting to see what changes.

> As an example for the above ^^^ two, please look at at the Oryx 2.0 project
> for reference
> 
> https://github.com/oryxproject/oryx

I'd rather look at a proposed change to Pirk ;-)

Regards,
Tim

> On Mon, Sep 19, 2016 at 2:28 PM, Tim Ellison <t....@gmail.com> wrote:
> 
>> How about an approach like this?
>>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>>
>> The "on-ramp" is the driver [1], which calls upon the service to find a
>> plug-in [2] that claims to implement the required platform responder,
>> e.g. [3].
>>
>> The list of plug-ins is given in the provider's JAR file, so the ones we
>> provide in Pirk are listed together [4], but if you split these into
>> modules, or somebody brings their own JAR alongside, these would be
>> listed in each JAR's services/ directory.
>>
>> [1]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
>> [2]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
>> [3]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/java/org/apache/pirk/responder/wideskies/storm/
>> StormResponder.java
>> [4]
>> https://github.com/tellison/incubator-pirk/blob/pirk-63/
>> src/main/services/org.apache.responder.spi.Responder
>>
>> I'm not even going to dignify this with a WIP PR, it is far from ready,
>> so proceed with caution.  There is hopefully enough there to show the
>> approach, and if it is worth continuing I'm happy to do so.
>>
>> Regards,
>> Tim
>>
>>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Suneel Marthi <su...@gmail.com>.

The way this PR is now is so similar to how bad IBM SystemML which is a
hackwork of hurriedly put together and something I have often pointed out
to others as a clear example of "how not to design crappy software".  See
this gist of an example code snippet from IBM SystemML -
https://gist.github.com/smarthi/eb848e46621b7444924f

First things for the project:

1. Move away from using the java properties (this is so 2002 way of doing
things) to using TypeSafe style configurations which allow for structured
properties.

2. From a Responder design, there would be a Responder-impl-class property
which would be read from TypeSafe config and the appropriate driver class
invoked.

As an example for the above ^^^ two, please look at at the Oryx 2.0 project
for reference

https://github.com/oryxproject/oryx

On Mon, Sep 19, 2016 at 2:28 PM, Tim Ellison <t....@gmail.com> wrote:

> How about an approach like this?
>    https://github.com/tellison/incubator-pirk/tree/pirk-63
>
> The "on-ramp" is the driver [1], which calls upon the service to find a
> plug-in [2] that claims to implement the required platform responder,
> e.g. [3].
>
> The list of plug-ins is given in the provider's JAR file, so the ones we
> provide in Pirk are listed together [4], but if you split these into
> modules, or somebody brings their own JAR alongside, these would be
> listed in each JAR's services/ directory.
>
> [1]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
> [2]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
> [3]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/java/org/apache/pirk/responder/wideskies/storm/
> StormResponder.java
> [4]
> https://github.com/tellison/incubator-pirk/blob/pirk-63/
> src/main/services/org.apache.responder.spi.Responder
>
> I'm not even going to dignify this with a WIP PR, it is far from ready,
> so proceed with caution.  There is hopefully enough there to show the
> approach, and if it is worth continuing I'm happy to do so.
>
> Regards,
> Tim
>
>

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

How about an approach like this?
   https://github.com/tellison/incubator-pirk/tree/pirk-63

The "on-ramp" is the driver [1], which calls upon the service to find a
plug-in [2] that claims to implement the required platform responder,
e.g. [3].

The list of plug-ins is given in the provider's JAR file, so the ones we
provide in Pirk are listed together [4], but if you split these into
modules, or somebody brings their own JAR alongside, these would be
listed in each JAR's services/ directory.

[1]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java
[2]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/spi/ResponderPlugin.java
[3]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/java/org/apache/pirk/responder/wideskies/storm/StormResponder.java
[4]
https://github.com/tellison/incubator-pirk/blob/pirk-63/src/main/services/org.apache.responder.spi.Responder

I'm not even going to dignify this with a WIP PR, it is far from ready,
so proceed with caution.  There is hopefully enough there to show the
approach, and if it is worth continuing I'm happy to do so.

Regards,
Tim

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79377189
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
    @@ -49,83 +41,111 @@
     public class ResponderDriver
     {
       private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class);
    +  // ClassNames to instantiate Platforms using the platform CLI
    +  private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
    +  private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
    +  private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
    +  private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
    +  private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
     
       private enum Platform
       {
         MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE
       }
     
    -  public static void main(String[] args) throws Exception
    +  private static void launch(String launcherClassName)
    +  {
    +    logger.info("Launching Responder with {}", launcherClassName);
    +    try
    +    {
    +      Class clazz = Class.forName(launcherClassName);
    +      if (ResponderLauncher.class.isAssignableFrom(clazz))
    +      {
    +        Object launcherInstance = clazz.newInstance();
    +        Method m = launcherInstance.getClass().getDeclaredMethod("run");
    +        m.invoke(launcherInstance);
    +      }
    +      else
    +      {
    +        logger.error("Class {} does not implement ResponderLauncher", launcherClassName);
    +      }
    +    }
    +    catch (ClassNotFoundException e)
    +    {
    +      logger.error("Class {} not found, check launcher property: {}", launcherClassName);
    +    }
    +    catch (NoSuchMethodException e)
    +    {
    +      logger.error("In {} run method not found: {}", launcherClassName);
    +    }
    +    catch (InvocationTargetException e)
    +    {
    +      logger.error("In {} run method could not be invoked: {}: {}", launcherClassName, e);
    +    }
    +    catch (InstantiationException e)
    +    {
    +      logger.error("Instantiation exception within {}: {}", launcherClassName, e);
    +    }
    +    catch (IllegalAccessException e)
    +    {
    +      logger.error("IllegalAccess Exception {}", e);
    +    }
    +  }
    +
    +  public static void main(String[] args)
       {
         ResponderCLI responderCLI = new ResponderCLI(args);
     
         // For handling System.exit calls from Spark Streaming
         System.setSecurityManager(new SystemExitManager());
     
    -    Platform platform = Platform.NONE;
    -    String platformString = SystemConfiguration.getProperty(ResponderProps.PLATFORM);
    -    try
    -    {
    -      platform = Platform.valueOf(platformString.toUpperCase());
    -    } catch (IllegalArgumentException e)
    +    String launcherClassName = SystemConfiguration.getProperty(ResponderProps.LAUNCHER);
    +    if (launcherClassName != null)
         {
    -      logger.error("platform " + platformString + " not found.");
    +      launch(launcherClassName);
         }
    -
    -    logger.info("platform = " + platform);
    -    switch (platform)
    +    else
         {
    -      case MAPREDUCE:
    -        logger.info("Launching MapReduce ResponderTool:");
    -
    -        ComputeResponseTool pirWLTool = new ComputeResponseTool();
    -        ToolRunner.run(pirWLTool, new String[] {});
    -        break;
    -
    -      case SPARK:
    -        logger.info("Launching Spark ComputeResponse:");
    -
    -        ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration()));
    -        computeResponse.performQuery();
    -        break;
    -
    -      case SPARKSTREAMING:
    -        logger.info("Launching Spark ComputeStreamingResponse:");
    -
    -        ComputeStreamingResponse computeSR = new ComputeStreamingResponse(FileSystem.get(new Configuration()));
    -        try
    -        {
    -          computeSR.performQuery();
    -        } catch (SystemExitException e)
    -        {
    -          // If System.exit(0) is not caught from Spark Streaming,
    -          // the application will complete with a 'failed' status
    -          logger.info("Exited with System.exit(0) from Spark Streaming");
    -        }
    -
    -        // Teardown the context
    -        computeSR.teardown();
    -        break;
    -
    -      case STORM:
    -        logger.info("Launching Storm PirkTopology:");
    -        PirkTopology.runPirkTopology();
    -        break;
    -
    -      case STANDALONE:
    -        logger.info("Launching Standalone Responder:");
    -
    -        String queryInput = SystemConfiguration.getProperty("pir.queryInput");
    -        Query query = new LocalFileSystemStore().recall(queryInput, Query.class);
    -
    -        Responder pirResponder = new Responder(query);
    -        pirResponder.computeStandaloneResponse();
    -        break;
    +      logger.warn("platform is being deprecaited in flavor of launcher");
    --- End diff --
    
    You figured it out :).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80055483
  
    --- Diff: src/main/resources/META-INF/services/org.apache.pirk.responder.wideskies.spi.ResponderPlugin ---
    @@ -0,0 +1,5 @@
    +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
    --- End diff --
    
    > not only does the user have to specify the class/platform
    
    Yes.  Do you envisage some way in which the user could **not** have to specify the platform they want Pirk to use?
    
    > the code has to be aware, via a static list, of the possible plugin
    > implementations (as listed in org.apache.pirk.responder.wideskies.spi.ResponderPlugin)
    
    The static list of plugins is in the ```META-INF/services file``` that lists all the known implementers of that interface _in this JAR_.  Today, because we have one large JAR, all the responders are listed, but (i) when we split into submodules each impl will move to its corresponding JAR artefact, and (ii) anyone can now drop a JAR on the classpath with an impl declared in it's services list, and Pirk will find it by getPlatformName.
    
    How do you propose pirk-core finds the responder implementers at runtime?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79352002
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
    @@ -49,83 +41,111 @@
     public class ResponderDriver
     {
       private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class);
    +  // ClassNames to instantiate Platforms using the platform CLI
    +  private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
    +  private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
    +  private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
    +  private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
    +  private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
     
    --- End diff --
    
    I'm confused by this, I though the goal of PIRK-63 was to avoid having to change the ResponderDriver each time a new responder type is introduced?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80026414
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/spi/ResponderPlugin.java ---
    @@ -0,0 +1,40 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +package org.apache.pirk.responder.wideskies.spi;
    +
    +/**
    + * Interface which launches a responder
    + * <p>
    + * Implement this interface to start the execution of a framework responder, the run method will be called via reflection by the ResponderDriver.
    + * </p>
    + */
    +public interface ResponderPlugin
    +{
    +  /**
    +   * Returns the plugin name for your framework
    +   * This will be the platform argument
    +   * @return
    +   */
    +  public String getPlatformName();
    +  /**
    +   * This method launches your framework responder.
    +   */
    +  public void run() throws Exception;
    --- End diff --
    
    Please do, I almost did.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by ellisonanne <gi...@git.apache.org>.

Github user ellisonanne commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80003616
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/spi/ResponderPlugin.java ---
    @@ -0,0 +1,40 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +
    +package org.apache.pirk.responder.wideskies.spi;
    +
    +/**
    + * Interface which launches a responder
    + * <p>
    + * Implement this interface to start the execution of a framework responder, the run method will be called via reflection by the ResponderDriver.
    + * </p>
    + */
    +public interface ResponderPlugin
    +{
    +  /**
    +   * Returns the plugin name for your framework
    +   * This will be the platform argument
    +   * @return
    +   */
    +  public String getPlatformName();
    +  /**
    +   * This method launches your framework responder.
    +   */
    +  public void run() throws Exception;
    --- End diff --
    
    Lol :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE

Posted by ellisonanne <gi...@git.apache.org>.

Github user ellisonanne commented on the issue:

https://github.com/apache/incubator-pirk/pull/93

A few other comments for discussion:

First, I am not opposed to having separate ResponderDrivers for each responder, but let's think it through and see if we really need to go down that path.

I think that that main concern with having a single ResponderDriver vs. delegating the ResponderDrivers to each responder is the bloating of the main CLI and ResponderProps. Other than keeping the CLI/Props under control, I can't see a particularly good, material (i.e. not stylistic) reason to delegate now that we are rolling in a ResponderLauncher.

The ResponderProps can go ahead and be delegated down into the specific responders independently of whether or not the ResponderDrivers get delegated. The ResponderLauncher for each responder can be responsible for implementing the 'validateResponderProperties' method that is currently in the central ResponderProps - since the CLI loads the properties from the properties files into SystemConfiguration, it will not require passing anything extra to the launchers.

One design alternative to breaking out into specific ResponderDrivers (which I am not opposed to BTW) would be to only allow the core properties in the main CLI and force everything else to be specified via properties files. This is somewhat limiting in some (contrived) cases that I can think of, but it would allow for a main CLI and prevent the bloat since responder-specific CLI options would not need to be added to the main CLI.

Thoughts?

---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Re: [GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by Tim Ellison <t....@gmail.com>.

Darin,

Unless I'm reading this wrong, the patch still has many references from
the ResponderDriver to the set of currently supported responders.  This
code will have to change when somebody wants to add a new responder type.

I thought the plan was to have the responder driver agnostic of the
responders available?  So, for example, having the driver maintain a
list of responders by name, and letting people specify the name on the
command line.

Each responder would then be responsible for implementing a standardised
interface, and registering themselves with the driver by name.

In that model the responders would each know about (a) the driver, and
how to register themselves by name, and (b) implement a standard
life-cycle for building a response.

The driver would be responsible for (a) collecting and maintaining the
registrations of any responder being loaded, and (b) invoking the
correct responder based on user selection.

Make sense?

I can hack something together to show what I mean.

Regards,
Tim



On 19/09/16 07:05, DarinJ wrote:
> GitHub user DarinJ opened a pull request:
> 
>     https://github.com/apache/incubator-pirk/pull/93
> 
>     WIP-Pirk 63-DO NOT MERGE
> 
>     This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to other responders without having to modify the actual code of Pirk.  It's submitted for feedback only, please DO NOT MERGE.  I've only tested standalone mode.
>         
>     It deprecates the "platform" CLI option in favor of the "launcher" option which is the name of a class implementing the `ResponderLauncher` interface which will invoke the run method via reflection.  This allows a developer of a different responder to merely place a jar on the classpath and specify the appropriate `ResponderLauncher` on the classpath.
>         
>     The "platform" CLI option is still made available.  However, I removed the explicit dependencies in favor of using reflection.  This was done in anticipation other refactoring the build into submodules, though this does admittedly make the code more fragile.
>     
>     ResponderDriver had no unit tests, and unfortunately I saw no good way to create good ones for this particular change, especially as it required multiple frameworks to run.
>     
>     I should say that another possible route here is to have each framework responder implement their own ResponderDriver.  We could provide some utilities to check the minimum Pirk required options are set, but leave the rest to the implementation of the responder.  It would clean up the ResponderCLI and ResponderProps which are rather bloated and might continue to grow if left unchecked.
> 
> You can merge this pull request into a Git repository by running:
> 
>     $ git pull https://github.com/DarinJ/incubator-pirk Pirk-63
> 
> Alternatively you can review and apply these changes as the patch at:
> 
>     https://github.com/apache/incubator-pirk/pull/93.patch
> 
> To close this pull request, make a commit to your master/trunk branch
> with (at least) the following in the commit message:
> 
>     This closes #93
>     
> ----
> commit dda458bb2ae77fd9e3dc686d17dd8b49095b3395
> Author: Darin Johnson <da...@apache.org>
> Date:   2016-09-13T03:19:12Z
> 
>     This is a WIP for [PIRK-63](https://issues.apache.org/jira/browse/PIRK-63) to open the door to other responders without having to modify the actual code of Pirk.  It's submitted for feedback only, please DO NOT MERGE.
>     
>     It deprecates the "platform" CLI option in favor of the "launcher" option which is the name of a class implementing the `ResponderLauncher` interface which will invoke the run method via reflection.  This allows a developer of a different responder to merely place a jar on the classpath and specify the appropriate `ResponderLauncher` on the classpath.
>     
>     The "platform" CLI option is still made available.  However, I removed the explicit dependencies in favor of using reflection.  This was done in anticipation other refactoring the build into submodules, though this does admittedly make the code more fragile.
> 
> ----
> 
> 
> ---
> If your project is set up for it, you can reply to this email and have your
> reply appear on GitHub as well. If your project does not have this feature
> enabled and wishes so, or if the feature is enabled but not working, please
> contact infrastructure at infrastructure@apache.org or file a JIRA ticket
> with INFRA.
> ---
>

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-pirk/pull/93


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79989030
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/standalone/StandaloneResponder.java ---
    @@ -0,0 +1,58 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.pirk.responder.wideskies.standalone;
    +
    +import org.apache.pirk.query.wideskies.Query;
    +import org.apache.pirk.responder.wideskies.spi.ResponderPlugin;
    +import org.apache.pirk.serialization.LocalFileSystemStore;
    +import org.apache.pirk.utils.SystemConfiguration;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.io.IOException;
    +
    +/**
    + * Class to launch stand alone responder
    + */
    +public class StandaloneResponder implements ResponderPlugin
    +{
    +  private static final Logger logger = LoggerFactory.getLogger(StandaloneResponder.class);
    +
    +  @Override
    +  public String getPlatformName() {
    +    return "standalone";
    +  }
    +
    +  @Override
    +  public void run()
    +  {
    +    logger.info("Launching Standalone Responder:");
    +    String queryInput = SystemConfiguration.getProperty("pir.queryInput");
    +    try
    +    {
    +      Query query = new LocalFileSystemStore().recall(queryInput, Query.class);
    +      Responder pirResponder = new Responder(query);
    +      pirResponder.computeStandaloneResponse();
    +    }
    +    catch (IOException e)
    +    {
    +      logger.error("Error reading {}, {}", queryInput, e.getMessage());
    --- End diff --
    
    Re-throw exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80025462
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/spark/SparkResponder.java ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.pirk.responder.wideskies.spark;
    +
    +import java.io.IOException;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.pirk.responder.wideskies.spi.ResponderPlugin;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * Class to launch spark responder
    + */
    +public class SparkResponder implements ResponderPlugin
    +{
    +  private static final Logger logger = LoggerFactory.getLogger(SparkResponder.class);
    +
    +  @Override
    +  public String getPlatformName() {
    +    return "spark";
    +  }
    +
    +  @Override
    +  public void run() throws Exception
    +  {
    +    logger.info("Launching Spark ComputeResponse:");
    +    try
    +    {
    +      ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration()));
    +      computeResponse.performQuery();
    +    }
    +    catch (IOException e)
    +    {
    +      logger.error("Unable to open filesystem: {}", e);
    --- End diff --
    
    Lazy, will fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on the issue:

    https://github.com/apache/incubator-pirk/pull/93
  
    I've merged the PR Darin.  That's not to say that the debate is over, but this does move us forward and we can use new PRs to address ongoing improvements.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by ellisonanne <gi...@git.apache.org>.

Github user ellisonanne commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r80037087
  
    --- Diff: src/main/resources/META-INF/services/org.apache.pirk.responder.wideskies.spi.ResponderPlugin ---
    @@ -0,0 +1,5 @@
    +org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponder
    --- End diff --
    
    My concern is that instead of keeping this code 'dumb' and forcing the user to give the specific ResponderLauncher implementation via the CLI (or props file), the PR seems to have moved to a service-plugin model where not only does the user have to specify the class/platform, the code has to be aware, via a static list, of the possible plugin implementations (as listed in org.apache.pirk.responder.wideskies.spi.ResponderPlugin). While this is far more flexible than the way that the platform selection was previously implemented, it does not remove the need for the codebase to understand (in one location) all of it's possible platforms. I am not opposed (as you pointed out, it is a well known model), but I thought that one of the primary goals was to remove the need for a static, central awareness of all possible platforms.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE

Posted by DarinJ <gi...@git.apache.org>.

Github user DarinJ commented on the issue:

    https://github.com/apache/incubator-pirk/pull/93
  
    Looks like everyone is OK with this minus some sloppy exceptions.  If that's the case I'll fixe the issues mentioned.   Would you prefer I just squash the commits and push back to here or close this and resubmit?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk issue #93: WIP-Pirk 63-DO NOT MERGE

Posted by ellisonanne <gi...@git.apache.org>.

Github user ellisonanne commented on the issue:

    https://github.com/apache/incubator-pirk/pull/93
  
    +1 - looks good so far. 
    
    One item for consideration: I am in favor of *not* providing backwards compatibility with the 'platform' option at this point, i.e. removing it altogether in favor of just the launcher. Since we just completed our first release, I think that we can go ahead and change the API - this would only require an argument change in current command lines and a deployment of the new jar - completely doable. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79988842
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/spark/SparkResponder.java ---
    @@ -0,0 +1,55 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *   http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing,
    + * software distributed under the License is distributed on an
    + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    + * KIND, either express or implied.  See the License for the
    + * specific language governing permissions and limitations
    + * under the License.
    + */
    +package org.apache.pirk.responder.wideskies.spark;
    +
    +import java.io.IOException;
    +
    +import org.apache.hadoop.conf.Configuration;
    +import org.apache.hadoop.fs.FileSystem;
    +import org.apache.pirk.responder.wideskies.spi.ResponderPlugin;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * Class to launch spark responder
    + */
    +public class SparkResponder implements ResponderPlugin
    +{
    +  private static final Logger logger = LoggerFactory.getLogger(SparkResponder.class);
    +
    +  @Override
    +  public String getPlatformName() {
    +    return "spark";
    +  }
    +
    +  @Override
    +  public void run() throws Exception
    +  {
    +    logger.info("Launching Spark ComputeResponse:");
    +    try
    +    {
    +      ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration()));
    +      computeResponse.performQuery();
    +    }
    +    catch (IOException e)
    +    {
    +      logger.error("Unable to open filesystem: {}", e);
    --- End diff --
    
    Re-throw exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-pirk pull request #93: WIP-Pirk 63-DO NOT MERGE

Posted by tellison <gi...@git.apache.org>.

Github user tellison commented on a diff in the pull request:

    https://github.com/apache/incubator-pirk/pull/93#discussion_r79351660
  
    --- Diff: src/main/java/org/apache/pirk/responder/wideskies/ResponderDriver.java ---
    @@ -49,83 +41,111 @@
     public class ResponderDriver
     {
       private static final Logger logger = LoggerFactory.getLogger(ResponderDriver.class);
    +  // ClassNames to instantiate Platforms using the platform CLI
    +  private final static String MAPREDUCE_LAUNCHER = "org.apache.pirk.responder.wideskies.mapreduce.MapReduceResponderLauncher";
    +  private final static String SPARK_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.SparkResponderLauncher";
    +  private final static String SPARKSTREAMING_LAUNCHER = "org.apache.pirk.responder.wideskies.spark.streaming.SparkStreamingResponderLauncher";
    +  private final static String STANDALONE_LAUNCHER = "org.apache.pirk.responder.wideskies.standalone.StandaloneResponderLauncher";
    +  private final static String STORM_LAUNCHER = "org.apache.pirk.responder.wideskies.storm.StormResponderLauncher";
     
       private enum Platform
       {
         MAPREDUCE, SPARK, SPARKSTREAMING, STORM, STANDALONE, NONE
       }
     
    -  public static void main(String[] args) throws Exception
    +  private static void launch(String launcherClassName)
    +  {
    +    logger.info("Launching Responder with {}", launcherClassName);
    +    try
    +    {
    +      Class clazz = Class.forName(launcherClassName);
    +      if (ResponderLauncher.class.isAssignableFrom(clazz))
    +      {
    +        Object launcherInstance = clazz.newInstance();
    +        Method m = launcherInstance.getClass().getDeclaredMethod("run");
    +        m.invoke(launcherInstance);
    +      }
    +      else
    +      {
    +        logger.error("Class {} does not implement ResponderLauncher", launcherClassName);
    +      }
    +    }
    +    catch (ClassNotFoundException e)
    +    {
    +      logger.error("Class {} not found, check launcher property: {}", launcherClassName);
    +    }
    +    catch (NoSuchMethodException e)
    +    {
    +      logger.error("In {} run method not found: {}", launcherClassName);
    +    }
    +    catch (InvocationTargetException e)
    +    {
    +      logger.error("In {} run method could not be invoked: {}: {}", launcherClassName, e);
    +    }
    +    catch (InstantiationException e)
    +    {
    +      logger.error("Instantiation exception within {}: {}", launcherClassName, e);
    +    }
    +    catch (IllegalAccessException e)
    +    {
    +      logger.error("IllegalAccess Exception {}", e);
    +    }
    +  }
    +
    +  public static void main(String[] args)
       {
         ResponderCLI responderCLI = new ResponderCLI(args);
     
         // For handling System.exit calls from Spark Streaming
         System.setSecurityManager(new SystemExitManager());
     
    -    Platform platform = Platform.NONE;
    -    String platformString = SystemConfiguration.getProperty(ResponderProps.PLATFORM);
    -    try
    -    {
    -      platform = Platform.valueOf(platformString.toUpperCase());
    -    } catch (IllegalArgumentException e)
    +    String launcherClassName = SystemConfiguration.getProperty(ResponderProps.LAUNCHER);
    +    if (launcherClassName != null)
         {
    -      logger.error("platform " + platformString + " not found.");
    +      launch(launcherClassName);
         }
    -
    -    logger.info("platform = " + platform);
    -    switch (platform)
    +    else
         {
    -      case MAPREDUCE:
    -        logger.info("Launching MapReduce ResponderTool:");
    -
    -        ComputeResponseTool pirWLTool = new ComputeResponseTool();
    -        ToolRunner.run(pirWLTool, new String[] {});
    -        break;
    -
    -      case SPARK:
    -        logger.info("Launching Spark ComputeResponse:");
    -
    -        ComputeResponse computeResponse = new ComputeResponse(FileSystem.get(new Configuration()));
    -        computeResponse.performQuery();
    -        break;
    -
    -      case SPARKSTREAMING:
    -        logger.info("Launching Spark ComputeStreamingResponse:");
    -
    -        ComputeStreamingResponse computeSR = new ComputeStreamingResponse(FileSystem.get(new Configuration()));
    -        try
    -        {
    -          computeSR.performQuery();
    -        } catch (SystemExitException e)
    -        {
    -          // If System.exit(0) is not caught from Spark Streaming,
    -          // the application will complete with a 'failed' status
    -          logger.info("Exited with System.exit(0) from Spark Streaming");
    -        }
    -
    -        // Teardown the context
    -        computeSR.teardown();
    -        break;
    -
    -      case STORM:
    -        logger.info("Launching Storm PirkTopology:");
    -        PirkTopology.runPirkTopology();
    -        break;
    -
    -      case STANDALONE:
    -        logger.info("Launching Standalone Responder:");
    -
    -        String queryInput = SystemConfiguration.getProperty("pir.queryInput");
    -        Query query = new LocalFileSystemStore().recall(queryInput, Query.class);
    -
    -        Responder pirResponder = new Responder(query);
    -        pirResponder.computeStandaloneResponse();
    -        break;
    +      logger.warn("platform is being deprecaited in flavor of launcher");
    --- End diff --
    
    :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---