You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by sblackmon <sb...@apache.org> on 2016/10/06 18:55:53 UTC

Ease-of-use : minimizing TTHW (time-to-hello-world)

 
TL;DR I’ve found a way to dramatically reduce barriers to using streams as a beginner.

Using the streams 0.3 release, it’s quite a headache for a novice to use streams. We have a tutorial on the website, but it’s quite a journey. You have to check out all three repos and install them each in order before you get a jar file you could use to get data, then you can run a few pre-canned streams, and those are intermediate not beginner level.  

In an ideal world, anyone would be able to yum or apt-get (or docker pull) individual providers or processors and run them on their own without building from source or composing them into multi-step streams.  

We'd have increase our build and compliance complexity significantly to publish official binaries. So what can we do to drop the learning curve precipitously without doing that?

Providers are really simple to run. The hard part is getting all of the right classes and configuration properties into a JVM. Inspired by how zeppelin’s %dep interpreter reduces the friction in composing and running a scala notebook, I wanted to find a way to get the same ability from a linux shell.

The commands below go from just a java installation to flat files of twitter data in just a few minutes.

I think until we have binary distributions, this is how our tutorials should tell the world to get started with streams.  

Thoughts?  

-----  

# install sbtx

curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx

# create a workspace

mkdir twitter-test; cd twitter-test;

# supply a config file with credentials

cat > application.conf << EOF
twitter {
  oauth {
    consumerKey = ""
    consumerSecret = ""
    accessToken = ""
    accessTokenSecret = ""
  }
  retrySleepMs = 5000
  retryMax = 250
  info = [
    18055613
  ]
}
EOF

sbtx -210 -sbt-create

set resolvers += "Local Maven Repository" at "file://"+Path.userHome.absolutePath+"/.m2/repository"

set libraryDependencies += "org.apache.streams" % "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"

set fork := true

run-main org.apache.streams.twitter.provider.TwitterUserInformationProvider application.conf users.txt

run-main org.apache.streams.twitter.provider.TwitterTimelineProvider application.conf statuses.txt

set javaOptions += "-Dtwitter.endpoint=friends"

run-main org.apache.streams.twitter.provider.TwitterFollowingProvider application.conf friends.txt

set javaOptions += "-Dtwitter.endpoint=followers"

exit

ls -l

Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
-rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
-rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
-rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
-rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
-rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt



Re: Ease-of-use : minimizing TTHW (time-to-hello-world)

Posted by sblackmon <sb...@apache.org>.
Trevor,

Awesome, thanks for giving it a shot.  

With some recent changes we’re quite close to making data collection with streams providers turnkey for new users

I ran the following through my deployment of zeppelin - it should work for you too.  Please confirm :)

Cheers,
Steve

——

%dep
z.reset()
z.addRepo("apache-snapshots").url("https://repository.apache.org/content/repositories/snapshots").snapshot()
z.load("org.apache.streams:streams-provider-twitter:0.4-incubating-SNAPSHOT")

import com.typesafe.config._
import org.apache.streams.config._
import org.apache.streams.core._
import java.util.Iterator
import org.apache.streams.twitter.pojo._
import org.apache.streams.twitter.provider._

val hocon = s"""
    twitter {
      oauth {
       consumerKey = ""
    consumerSecret = ""
    accessToken = ""
    accessTokenSecret = ""
      }
      retrySleepMs = 5000
  retryMax = 250
  info = [
    18055613
  ]
    }
"""

val typesafe = ConfigFactory.parseString(hocon)
val config = new ComponentConfigurator(classOf[TwitterUserInformationConfiguration]).detectConfiguration(typesafe, "twitter");
val provider = new TwitterTimelineProvider(config);
provider.prepare(null)
provider.startStream()
while(provider.isRunning())

val resultSet = provider.readCurrent()
resultSet.size()
val iterator = resultSet.iterator();
while(iterator.hasNext()) {
    val datum = iterator.next();
    println(datum.getDocument)
}
On October 14, 2016 at 8:33:55 AM, Trevor Grant (trevor.d.grant@gmail.com) wrote:

I agree a minimal TTHW would be good- esp a user who is trying to create a  
hello world.  

I am a big fan of Apache Zeppelin notebooks for this sort of thing- easy to  
host and include Markdown.  

If I could get some community assistance getting myself started, I'd be  
happy to write it up.  

I need to know:  
Minimum dependencies-  
From the little work I have done so far I know this can be a murky  
subject as we migrate version. I'd prefer to do the minimal example in  
what ever version can be ran based on artifacts sitting in maven now. Happy  
to update when new version is pushed.  

Scala-  
Zeppelin is for all intents and purposes like running in the Spark/Flink  
shell. I'll need some help getting things going in this sort of env.  

If someone reading this is like "oh that's easy, here's your dependencies,  
and then run this code", that would be very helpful, I can get to writing  
right away. Otherwise I can hack it out, but again will need some support.  

tg  


Trevor Grant  
Data Scientist  
https://github.com/rawkintrevo  
http://stackexchange.com/users/3002022/rawkintrevo  
http://trevorgrant.org  

*"Fortunate is he, who is able to know the causes of things." -Virgil*  


On Tue, Oct 11, 2016 at 11:00 AM, Matt Franklin <m....@gmail.com>  
wrote:  

> On Mon, Oct 10, 2016 at 11:30 AM sblackmon <sb...@apache.org> wrote:  
>  
> > Some other projects are currently looking at publishing docker containers  
> > that people can easily extend. I am totally in favor of this approach.  
> >  
> >  
> > Docker distribution would open up a lot of cool options for this project.  
> >  
> > Which projects are farthest along this road?  
> >  
>  
> https://hub.docker.com/r/apache/  
>  
>  
> >  
> > I think even publishing this as a Docker file example on the website  
> would  
> > be a good start.  
> >  
> > These PRs use a maven docker plugin during verify phase.  
> > https://github.com/apache/incubator-streams-examples/pull/14  
> > https://github.com/apache/incubator-streams/pull/288  
> >  
> > The same plugin can build tag and deploy images with goals docker:build  
> > and docker:push .  
> >  
>  
> Per policy, the only thing that should make it to repositories like Docker  
> hub and Maven Central should be released convenience binaries.  
>  
>  
> >  
> > Once these merge I’ll take another pass through the examples  
> documentation  
> > and for each describe a few alternative processes (STREAMS-428)  
> >  
> > 1) Build from source, run stream from *nix shell with dist uber-jar.  
> > 2) Run stream with sbt interactive shell using artifacts from maven  
> central  
> > 3) Run stream with docker using artifacts from docker hub  
> >  
> > On October 10, 2016 at 8:09:45 AM, Matt Franklin (  
> m.ben.franklin@gmail.com)  
> > wrote:  
> >  
> > On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:  
> >  
> >  
> >  
> > >  
> >  
> > >  
> >  
> > > TL;DR I’ve found a way to dramatically reduce barriers to using streams  
> > as  
> >  
> > > a beginner.  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > Using the streams 0.3 release, it’s quite a headache for a novice to  
> use  
> >  
> > > streams. We have a tutorial on the website, but it’s quite a journey.  
> You  
> >  
> > > have to check out all three repos and install them each in order before  
> > you  
> >  
> > > get a jar file you could use to get data, then you can run a few  
> > pre-canned  
> >  
> > > streams, and those are intermediate not beginner level.  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > In an ideal world, anyone would be able to yum or apt-get (or docker  
> > pull)  
> >  
> > > individual providers or processors and run them on their own without  
> >  
> > > building from source or composing them into multi-step streams.  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > We'd have increase our build and compliance complexity significantly to  
> >  
> > > publish official binaries. So what can we do to drop the learning curve  
> >  
> > > precipitously without doing that?  
> >  
> > >  
> >  
> >  
> >  
> > Some other projects are currently looking at publishing docker containers  
> >  
> > that people can easily extend. I am totally in favor of this approach.  
> >  
> >  
> >  
> >  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > Providers are really simple to run. The hard part is getting all of the  
> >  
> > > right classes and configuration properties into a JVM. Inspired by how  
> >  
> > > zeppelin’s %dep interpreter reduces the friction in composing and  
> > running a  
> >  
> > > scala notebook, I wanted to find a way to get the same ability from a  
> > linux  
> >  
> > > shell.  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > The commands below go from just a java installation to flat files of  
> >  
> > > twitter data in just a few minutes.  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > I think until we have binary distributions, this is how our tutorials  
> >  
> > > should tell the world to get started with streams.  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > Thoughts?  
> >  
> > >  
> >  
> >  
> >  
> > I think even publishing this as a Docker file example on the website  
> would  
> >  
> > be a good start.  
> >  
> >  
> >  
> >  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > -----  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > # install sbtx  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt  
> >  
> >  
> > > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > # create a workspace  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > mkdir twitter-test; cd twitter-test;  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > # supply a config file with credentials  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > cat > application.conf << EOF  
> >  
> > >  
> >  
> > > twitter {  
> >  
> > >  
> >  
> > > oauth {  
> >  
> > >  
> >  
> > > consumerKey = ""  
> >  
> > >  
> >  
> > > consumerSecret = ""  
> >  
> > >  
> >  
> > > accessToken = ""  
> >  
> > >  
> >  
> > > accessTokenSecret = ""  
> >  
> > >  
> >  
> > > }  
> >  
> > >  
> >  
> > > retrySleepMs = 5000  
> >  
> > >  
> >  
> > > retryMax = 250  
> >  
> > >  
> >  
> > > info = [  
> >  
> > >  
> >  
> > > 18055613  
> >  
> > >  
> >  
> > > ]  
> >  
> > >  
> >  
> > > }  
> >  
> > >  
> >  
> > > EOF  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > sbtx -210 -sbt-create  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > set resolvers += "Local Maven Repository" at  
> >  
> > > "file://"+Path.userHome.absolutePath+"/.m2/repository"  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > set libraryDependencies += "org.apache.streams" %  
> >  
> > > "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > set fork := true  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > run-main  
> >  
> > > org.apache.streams.twitter.provider.TwitterUserInformationProvider  
> >  
> > > application.conf users.txt  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > run-main org.apache.streams.twitter.provider.TwitterTimelineProvider  
> >  
> > > application.conf statuses.txt  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > set javaOptions += "-Dtwitter.endpoint=friends"  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > run-main org.apache.streams.twitter.provider.TwitterFollowingProvider  
> >  
> > > application.conf friends.txt  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > set javaOptions += "-Dtwitter.endpoint=followers"  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > exit  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > ls -l  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > > Steves-MacBook-Pro-3:twitter sblackmon$ ls -l  
> >  
> > >  
> >  
> > > -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf  
> >  
> > >  
> >  
> > > -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt  
> >  
> > >  
> >  
> > > -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt  
> >  
> > >  
> >  
> > > drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project  
> >  
> > >  
> >  
> > > -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt  
> >  
> > >  
> >  
> > > drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target  
> >  
> > >  
> >  
> > > -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> > >  
> >  
> >  
>  

Re: Ease-of-use : minimizing TTHW (time-to-hello-world)

Posted by Trevor Grant <tr...@gmail.com>.
I agree a minimal TTHW would be good- esp a user who is trying to create a
hello world.

I am a big fan of Apache Zeppelin notebooks for this sort of thing- easy to
host and include Markdown.

If I could get some community assistance getting myself started, I'd be
happy to write it up.

I need to know:
Minimum dependencies-
  From the little work I have done so far I know this can be a murky
subject as we migrate version.  I'd prefer to do the minimal example in
what ever version can be ran based on artifacts sitting in maven now. Happy
to update when new version is pushed.

Scala-
Zeppelin is for all intents and purposes like running in the Spark/Flink
shell.  I'll need some help getting things going in this sort of env.

If someone reading this is like "oh that's easy, here's your dependencies,
and then run this code", that would be very helpful,  I can get to writing
right away.  Otherwise I can hack it out, but again will need some support.

tg


Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Oct 11, 2016 at 11:00 AM, Matt Franklin <m....@gmail.com>
wrote:

> On Mon, Oct 10, 2016 at 11:30 AM sblackmon <sb...@apache.org> wrote:
>
> > Some other projects are currently looking at publishing docker containers
> > that people can easily extend. I am totally in favor of this approach.
> >
> >
> > Docker distribution would open up a lot of cool options for this project.
> >
> > Which projects are farthest along this road?
> >
>
> https://hub.docker.com/r/apache/
>
>
> >
> > I think even publishing this as a Docker file example on the website
> would
> > be a good start.
> >
> > These PRs use a maven docker plugin during verify phase.
> > https://github.com/apache/incubator-streams-examples/pull/14
> > https://github.com/apache/incubator-streams/pull/288
> >
> > The same plugin can build tag and deploy images with goals docker:build
> > and docker:push .
> >
>
> Per policy, the only thing that should make it to repositories like Docker
> hub and Maven Central should be released convenience binaries.
>
>
> >
> > Once these merge I’ll take another pass through the examples
> documentation
> > and for each describe a few alternative processes (STREAMS-428)
> >
> > 1) Build from source, run stream from *nix shell with dist uber-jar.
> > 2) Run stream with sbt interactive shell using artifacts from maven
> central
> > 3) Run stream with docker using artifacts from docker hub
> >
> > On October 10, 2016 at 8:09:45 AM, Matt Franklin (
> m.ben.franklin@gmail.com)
> > wrote:
> >
> > On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:
> >
> >
> >
> > >
> >
> > >
> >
> > > TL;DR I’ve found a way to dramatically reduce barriers to using streams
> > as
> >
> > > a beginner.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Using the streams 0.3 release, it’s quite a headache for a novice to
> use
> >
> > > streams. We have a tutorial on the website, but it’s quite a journey.
> You
> >
> > > have to check out all three repos and install them each in order before
> > you
> >
> > > get a jar file you could use to get data, then you can run a few
> > pre-canned
> >
> > > streams, and those are intermediate not beginner level.
> >
> > >
> >
> > >
> >
> > >
> >
> > > In an ideal world, anyone would be able to yum or apt-get (or docker
> > pull)
> >
> > > individual providers or processors and run them on their own without
> >
> > > building from source or composing them into multi-step streams.
> >
> > >
> >
> > >
> >
> > >
> >
> > > We'd have increase our build and compliance complexity significantly to
> >
> > > publish official binaries. So what can we do to drop the learning curve
> >
> > > precipitously without doing that?
> >
> > >
> >
> >
> >
> > Some other projects are currently looking at publishing docker containers
> >
> > that people can easily extend.  I am totally in favor of this approach.
> >
> >
> >
> >
> >
> > >
> >
> > >
> >
> > >
> >
> > > Providers are really simple to run. The hard part is getting all of the
> >
> > > right classes and configuration properties into a JVM. Inspired by how
> >
> > > zeppelin’s %dep interpreter reduces the friction in composing and
> > running a
> >
> > > scala notebook, I wanted to find a way to get the same ability from a
> > linux
> >
> > > shell.
> >
> > >
> >
> > >
> >
> > >
> >
> > > The commands below go from just a java installation to flat files of
> >
> > > twitter data in just a few minutes.
> >
> > >
> >
> > >
> >
> > >
> >
> > > I think until we have binary distributions, this is how our tutorials
> >
> > > should tell the world to get started with streams.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Thoughts?
> >
> > >
> >
> >
> >
> > I think even publishing this as a Docker file example on the website
> would
> >
> > be a good start.
> >
> >
> >
> >
> >
> > >
> >
> > >
> >
> > >
> >
> > > -----
> >
> > >
> >
> > >
> >
> > >
> >
> > > # install sbtx
> >
> > >
> >
> > >
> >
> > >
> >
> > > curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt
> >
> >
> > > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx
> >
> > >
> >
> > >
> >
> > >
> >
> > > # create a workspace
> >
> > >
> >
> > >
> >
> > >
> >
> > > mkdir twitter-test; cd twitter-test;
> >
> > >
> >
> > >
> >
> > >
> >
> > > # supply a config file with credentials
> >
> > >
> >
> > >
> >
> > >
> >
> > > cat > application.conf << EOF
> >
> > >
> >
> > > twitter {
> >
> > >
> >
> > >   oauth {
> >
> > >
> >
> > >     consumerKey = ""
> >
> > >
> >
> > >     consumerSecret = ""
> >
> > >
> >
> > >     accessToken = ""
> >
> > >
> >
> > >     accessTokenSecret = ""
> >
> > >
> >
> > >   }
> >
> > >
> >
> > >   retrySleepMs = 5000
> >
> > >
> >
> > >   retryMax = 250
> >
> > >
> >
> > >   info = [
> >
> > >
> >
> > >     18055613
> >
> > >
> >
> > >   ]
> >
> > >
> >
> > > }
> >
> > >
> >
> > > EOF
> >
> > >
> >
> > >
> >
> > >
> >
> > > sbtx -210 -sbt-create
> >
> > >
> >
> > >
> >
> > >
> >
> > > set resolvers += "Local Maven Repository" at
> >
> > > "file://"+Path.userHome.absolutePath+"/.m2/repository"
> >
> > >
> >
> > >
> >
> > >
> >
> > > set libraryDependencies += "org.apache.streams" %
> >
> > > "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"
> >
> > >
> >
> > >
> >
> > >
> >
> > > set fork := true
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main
> >
> > > org.apache.streams.twitter.provider.TwitterUserInformationProvider
> >
> > > application.conf users.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main org.apache.streams.twitter.provider.TwitterTimelineProvider
> >
> > > application.conf statuses.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > set javaOptions += "-Dtwitter.endpoint=friends"
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main org.apache.streams.twitter.provider.TwitterFollowingProvider
> >
> > > application.conf friends.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > set javaOptions += "-Dtwitter.endpoint=followers"
> >
> > >
> >
> > >
> >
> > >
> >
> > > exit
> >
> > >
> >
> > >
> >
> > >
> >
> > > ls -l
> >
> > >
> >
> > >
> >
> > >
> >
> > > Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
> >
> > >
> >
> > > -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
> >
> > >
> >
> > > drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
> >
> > >
> >
> > > drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> >
>

Re: Distribution / Docker next steps

Posted by Matt Franklin <m....@gmail.com>.
On Mon, Oct 17, 2016 at 11:36 AM sblackmon <sb...@apache.org> wrote:

>
> On October 11, 2016 at 11:01:18 AM, Matt Franklin (
> m.ben.franklin@gmail.com) wrote:
>
> On Mon, Oct 10, 2016 at 11:30 AM sblackmon <sb...@apache.org> wrote:
>
> > Some other projects are currently looking at publishing docker
> containers
> > that people can easily extend. I am totally in favor of this approach.
> >
> >
> > Docker distribution would open up a lot of cool options for this
> project.
> >
> > Which projects are farthest along this road?
> >
>
> https://hub.docker.com/r/apache/
>
>
> I had been thinking more along the lines of publishing a distribution for
> each provider, processor, and persister module containing a minimal
> uber-jar.  Going this route would probably warrant a dedicated organization
> for streams.  OTOH, if we get to the point of having a binary distribution
> containing all of the classes in streams-project, that could be published
> to a top-level /apache repository and perform all of the same work
> (probably with a much larger docker image)
>

Tomcat (and I think a few others) have their own organization on Docker
Hub, so it is definitely a possibility.


>
>
> >
> > I think even publishing this as a Docker file example on the website
> would
> > be a good start.
> >
> > These PRs use a maven docker plugin during verify phase.
> > https://github.com/apache/incubator-streams-examples/pull/14
> > https://github.com/apache/incubator-streams/pull/288
> >
> > The same plugin can build tag and deploy images with goals docker:build
> > and docker:push .
> >
>
> Per policy, the only thing that should make it to repositories like Docker
> hub and Maven Central should be released convenience binaries.
>
>
> I think the next step is to figure out what would need to happen to build,
> certify, and publish a convenience binary and docker image for (initially)
> just one one individual provider module in an upcoming releases.  The
> dependency tree for a single provider will be more tractable than for the
> whole project and there’s a clear user benefit - greatly simplified project
> tutorial.
>

I would submit an Infra ticket


>
>
> >
> > Once these merge I’ll take another pass through the examples
> documentation
> > and for each describe a few alternative processes (STREAMS-428)
> >
> > 1) Build from source, run stream from *nix shell with dist uber-jar.
> > 2) Run stream with sbt interactive shell using artifacts from maven
> central
> > 3) Run stream with docker using artifacts from docker hub
> >
> > On October 10, 2016 at 8:09:45 AM, Matt Franklin (
> m.ben.franklin@gmail.com)
> > wrote:
> >
> > On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:
> >
> >
> >
> > >
> >
> > >
> >
> > > TL;DR I’ve found a way to dramatically reduce barriers to using
> streams
> > as
> >
> > > a beginner.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Using the streams 0.3 release, it’s quite a headache for a novice to
> use
> >
> > > streams. We have a tutorial on the website, but it’s quite a journey.
> You
> >
> > > have to check out all three repos and install them each in order
> before
> > you
> >
> > > get a jar file you could use to get data, then you can run a few
> > pre-canned
> >
> > > streams, and those are intermediate not beginner level.
> >
> > >
> >
> > >
> >
> > >
> >
> > > In an ideal world, anyone would be able to yum or apt-get (or docker
> > pull)
> >
> > > individual providers or processors and run them on their own without
> >
> > > building from source or composing them into multi-step streams.
> >
> > >
> >
> > >
> >
> > >
> >
> > > We'd have increase our build and compliance complexity significantly
> to
> >
> > > publish official binaries. So what can we do to drop the learning
> curve
> >
> > > precipitously without doing that?
> >
> > >
> >
> >
> >
> > Some other projects are currently looking at publishing docker
> containers
> >
> > that people can easily extend. I am totally in favor of this approach.
> >
> >
> >
> >
> >
> > >
> >
> > >
> >
> > >
> >
> > > Providers are really simple to run. The hard part is getting all of
> the
> >
> > > right classes and configuration properties into a JVM. Inspired by how
> >
> > > zeppelin’s %dep interpreter reduces the friction in composing and
> > running a
> >
> > > scala notebook, I wanted to find a way to get the same ability from a
> > linux
> >
> > > shell.
> >
> > >
> >
> > >
> >
> > >
> >
> > > The commands below go from just a java installation to flat files of
> >
> > > twitter data in just a few minutes.
> >
> > >
> >
> > >
> >
> > >
> >
> > > I think until we have binary distributions, this is how our tutorials
> >
> > > should tell the world to get started with streams.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Thoughts?
> >
> > >
> >
> >
> >
> > I think even publishing this as a Docker file example on the website
> would
> >
> > be a good start.
> >
> >
> >
> >
> >
> > >
> >
> > >
> >
> > >
> >
> > > -----
> >
> > >
> >
> > >
> >
> > >
> >
> > > # install sbtx
> >
> > >
> >
> > >
> >
> > >
> >
> > > curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt
> >
> >
> > > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx
> >
> > >
> >
> > >
> >
> > >
> >
> > > # create a workspace
> >
> > >
> >
> > >
> >
> > >
> >
> > > mkdir twitter-test; cd twitter-test;
> >
> > >
> >
> > >
> >
> > >
> >
> > > # supply a config file with credentials
> >
> > >
> >
> > >
> >
> > >
> >
> > > cat > application.conf << EOF
> >
> > >
> >
> > > twitter {
> >
> > >
> >
> > > oauth {
> >
> > >
> >
> > > consumerKey = ""
> >
> > >
> >
> > > consumerSecret = ""
> >
> > >
> >
> > > accessToken = ""
> >
> > >
> >
> > > accessTokenSecret = ""
> >
> > >
> >
> > > }
> >
> > >
> >
> > > retrySleepMs = 5000
> >
> > >
> >
> > > retryMax = 250
> >
> > >
> >
> > > info = [
> >
> > >
> >
> > > 18055613
> >
> > >
> >
> > > ]
> >
> > >
> >
> > > }
> >
> > >
> >
> > > EOF
> >
> > >
> >
> > >
> >
> > >
> >
> > > sbtx -210 -sbt-create
> >
> > >
> >
> > >
> >
> > >
> >
> > > set resolvers += "Local Maven Repository" at
> >
> > > "file://"+Path.userHome.absolutePath+"/.m2/repository"
> >
> > >
> >
> > >
> >
> > >
> >
> > > set libraryDependencies += "org.apache.streams" %
> >
> > > "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"
> >
> > >
> >
> > >
> >
> > >
> >
> > > set fork := true
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main
> >
> > > org.apache.streams.twitter.provider.TwitterUserInformationProvider
> >
> > > application.conf users.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main org.apache.streams.twitter.provider.TwitterTimelineProvider
> >
> > > application.conf statuses.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > set javaOptions += "-Dtwitter.endpoint=friends"
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main org.apache.streams.twitter.provider.TwitterFollowingProvider
> >
> > > application.conf friends.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > set javaOptions += "-Dtwitter.endpoint=followers"
> >
> > >
> >
> > >
> >
> > >
> >
> > > exit
> >
> > >
> >
> > >
> >
> > >
> >
> > > ls -l
> >
> > >
> >
> > >
> >
> > >
> >
> > > Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
> >
> > >
> >
> > > -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
> >
> > >
> >
> > > drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
> >
> > >
> >
> > > drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> >
>
>

Distribution / Docker next steps

Posted by sblackmon <sb...@apache.org>.
On October 11, 2016 at 11:01:18 AM, Matt Franklin (m.ben.franklin@gmail.com) wrote:
On Mon, Oct 10, 2016 at 11:30 AM sblackmon <sb...@apache.org> wrote:  

> Some other projects are currently looking at publishing docker containers  
> that people can easily extend. I am totally in favor of this approach.  
>  
>  
> Docker distribution would open up a lot of cool options for this project.  
>  
> Which projects are farthest along this road?  
>  

https://hub.docker.com/r/apache/  


I had been thinking more along the lines of publishing a distribution for each provider, processor, and persister module containing a minimal uber-jar.  Going this route would probably warrant a dedicated organization for streams.  OTOH, if we get to the point of having a binary distribution containing all of the classes in streams-project, that could be published to a top-level /apache repository and perform all of the same work (probably with a much larger docker image)


>  
> I think even publishing this as a Docker file example on the website would  
> be a good start.  
>  
> These PRs use a maven docker plugin during verify phase.  
> https://github.com/apache/incubator-streams-examples/pull/14  
> https://github.com/apache/incubator-streams/pull/288  
>  
> The same plugin can build tag and deploy images with goals docker:build  
> and docker:push .  
>  

Per policy, the only thing that should make it to repositories like Docker  
hub and Maven Central should be released convenience binaries.  


I think the next step is to figure out what would need to happen to build, certify, and publish a convenience binary and docker image for (initially) just one one individual provider module in an upcoming releases.  The dependency tree for a single provider will be more tractable than for the whole project and there’s a clear user benefit - greatly simplified project tutorial.


>  
> Once these merge I’ll take another pass through the examples documentation  
> and for each describe a few alternative processes (STREAMS-428)  
>  
> 1) Build from source, run stream from *nix shell with dist uber-jar.  
> 2) Run stream with sbt interactive shell using artifacts from maven central  
> 3) Run stream with docker using artifacts from docker hub  
>  
> On October 10, 2016 at 8:09:45 AM, Matt Franklin (m.ben.franklin@gmail.com)  
> wrote:  
>  
> On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:  
>  
>  
>  
> >  
>  
> >  
>  
> > TL;DR I’ve found a way to dramatically reduce barriers to using streams  
> as  
>  
> > a beginner.  
>  
> >  
>  
> >  
>  
> >  
>  
> > Using the streams 0.3 release, it’s quite a headache for a novice to use  
>  
> > streams. We have a tutorial on the website, but it’s quite a journey. You  
>  
> > have to check out all three repos and install them each in order before  
> you  
>  
> > get a jar file you could use to get data, then you can run a few  
> pre-canned  
>  
> > streams, and those are intermediate not beginner level.  
>  
> >  
>  
> >  
>  
> >  
>  
> > In an ideal world, anyone would be able to yum or apt-get (or docker  
> pull)  
>  
> > individual providers or processors and run them on their own without  
>  
> > building from source or composing them into multi-step streams.  
>  
> >  
>  
> >  
>  
> >  
>  
> > We'd have increase our build and compliance complexity significantly to  
>  
> > publish official binaries. So what can we do to drop the learning curve  
>  
> > precipitously without doing that?  
>  
> >  
>  
>  
>  
> Some other projects are currently looking at publishing docker containers  
>  
> that people can easily extend. I am totally in favor of this approach.  
>  
>  
>  
>  
>  
> >  
>  
> >  
>  
> >  
>  
> > Providers are really simple to run. The hard part is getting all of the  
>  
> > right classes and configuration properties into a JVM. Inspired by how  
>  
> > zeppelin’s %dep interpreter reduces the friction in composing and  
> running a  
>  
> > scala notebook, I wanted to find a way to get the same ability from a  
> linux  
>  
> > shell.  
>  
> >  
>  
> >  
>  
> >  
>  
> > The commands below go from just a java installation to flat files of  
>  
> > twitter data in just a few minutes.  
>  
> >  
>  
> >  
>  
> >  
>  
> > I think until we have binary distributions, this is how our tutorials  
>  
> > should tell the world to get started with streams.  
>  
> >  
>  
> >  
>  
> >  
>  
> > Thoughts?  
>  
> >  
>  
>  
>  
> I think even publishing this as a Docker file example on the website would  
>  
> be a good start.  
>  
>  
>  
>  
>  
> >  
>  
> >  
>  
> >  
>  
> > -----  
>  
> >  
>  
> >  
>  
> >  
>  
> > # install sbtx  
>  
> >  
>  
> >  
>  
> >  
>  
> > curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt >  
>  
> > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx  
>  
> >  
>  
> >  
>  
> >  
>  
> > # create a workspace  
>  
> >  
>  
> >  
>  
> >  
>  
> > mkdir twitter-test; cd twitter-test;  
>  
> >  
>  
> >  
>  
> >  
>  
> > # supply a config file with credentials  
>  
> >  
>  
> >  
>  
> >  
>  
> > cat > application.conf << EOF  
>  
> >  
>  
> > twitter {  
>  
> >  
>  
> > oauth {  
>  
> >  
>  
> > consumerKey = ""  
>  
> >  
>  
> > consumerSecret = ""  
>  
> >  
>  
> > accessToken = ""  
>  
> >  
>  
> > accessTokenSecret = ""  
>  
> >  
>  
> > }  
>  
> >  
>  
> > retrySleepMs = 5000  
>  
> >  
>  
> > retryMax = 250  
>  
> >  
>  
> > info = [  
>  
> >  
>  
> > 18055613  
>  
> >  
>  
> > ]  
>  
> >  
>  
> > }  
>  
> >  
>  
> > EOF  
>  
> >  
>  
> >  
>  
> >  
>  
> > sbtx -210 -sbt-create  
>  
> >  
>  
> >  
>  
> >  
>  
> > set resolvers += "Local Maven Repository" at  
>  
> > "file://"+Path.userHome.absolutePath+"/.m2/repository"  
>  
> >  
>  
> >  
>  
> >  
>  
> > set libraryDependencies += "org.apache.streams" %  
>  
> > "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"  
>  
> >  
>  
> >  
>  
> >  
>  
> > set fork := true  
>  
> >  
>  
> >  
>  
> >  
>  
> > run-main  
>  
> > org.apache.streams.twitter.provider.TwitterUserInformationProvider  
>  
> > application.conf users.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> > run-main org.apache.streams.twitter.provider.TwitterTimelineProvider  
>  
> > application.conf statuses.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> > set javaOptions += "-Dtwitter.endpoint=friends"  
>  
> >  
>  
> >  
>  
> >  
>  
> > run-main org.apache.streams.twitter.provider.TwitterFollowingProvider  
>  
> > application.conf friends.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> > set javaOptions += "-Dtwitter.endpoint=followers"  
>  
> >  
>  
> >  
>  
> >  
>  
> > exit  
>  
> >  
>  
> >  
>  
> >  
>  
> > ls -l  
>  
> >  
>  
> >  
>  
> >  
>  
> > Steves-MacBook-Pro-3:twitter sblackmon$ ls -l  
>  
> >  
>  
> > -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt  
>  
> >  
>  
> > drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt  
>  
> >  
>  
> > drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> >  
>  
> >  
>  
> >  
>  
>  

Re: Ease-of-use : minimizing TTHW (time-to-hello-world)

Posted by Matt Franklin <m....@gmail.com>.
On Mon, Oct 10, 2016 at 11:30 AM sblackmon <sb...@apache.org> wrote:

> Some other projects are currently looking at publishing docker containers
> that people can easily extend. I am totally in favor of this approach.
>
>
> Docker distribution would open up a lot of cool options for this project.
>
> Which projects are farthest along this road?
>

https://hub.docker.com/r/apache/


>
> I think even publishing this as a Docker file example on the website would
> be a good start.
>
> These PRs use a maven docker plugin during verify phase.
> https://github.com/apache/incubator-streams-examples/pull/14
> https://github.com/apache/incubator-streams/pull/288
>
> The same plugin can build tag and deploy images with goals docker:build
> and docker:push .
>

Per policy, the only thing that should make it to repositories like Docker
hub and Maven Central should be released convenience binaries.


>
> Once these merge I’ll take another pass through the examples documentation
> and for each describe a few alternative processes (STREAMS-428)
>
> 1) Build from source, run stream from *nix shell with dist uber-jar.
> 2) Run stream with sbt interactive shell using artifacts from maven central
> 3) Run stream with docker using artifacts from docker hub
>
> On October 10, 2016 at 8:09:45 AM, Matt Franklin (m.ben.franklin@gmail.com)
> wrote:
>
> On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:
>
>
>
> >
>
> >
>
> > TL;DR I’ve found a way to dramatically reduce barriers to using streams
> as
>
> > a beginner.
>
> >
>
> >
>
> >
>
> > Using the streams 0.3 release, it’s quite a headache for a novice to use
>
> > streams. We have a tutorial on the website, but it’s quite a journey. You
>
> > have to check out all three repos and install them each in order before
> you
>
> > get a jar file you could use to get data, then you can run a few
> pre-canned
>
> > streams, and those are intermediate not beginner level.
>
> >
>
> >
>
> >
>
> > In an ideal world, anyone would be able to yum or apt-get (or docker
> pull)
>
> > individual providers or processors and run them on their own without
>
> > building from source or composing them into multi-step streams.
>
> >
>
> >
>
> >
>
> > We'd have increase our build and compliance complexity significantly to
>
> > publish official binaries. So what can we do to drop the learning curve
>
> > precipitously without doing that?
>
> >
>
>
>
> Some other projects are currently looking at publishing docker containers
>
> that people can easily extend.  I am totally in favor of this approach.
>
>
>
>
>
> >
>
> >
>
> >
>
> > Providers are really simple to run. The hard part is getting all of the
>
> > right classes and configuration properties into a JVM. Inspired by how
>
> > zeppelin’s %dep interpreter reduces the friction in composing and
> running a
>
> > scala notebook, I wanted to find a way to get the same ability from a
> linux
>
> > shell.
>
> >
>
> >
>
> >
>
> > The commands below go from just a java installation to flat files of
>
> > twitter data in just a few minutes.
>
> >
>
> >
>
> >
>
> > I think until we have binary distributions, this is how our tutorials
>
> > should tell the world to get started with streams.
>
> >
>
> >
>
> >
>
> > Thoughts?
>
> >
>
>
>
> I think even publishing this as a Docker file example on the website would
>
> be a good start.
>
>
>
>
>
> >
>
> >
>
> >
>
> > -----
>
> >
>
> >
>
> >
>
> > # install sbtx
>
> >
>
> >
>
> >
>
> > curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt >
>
> > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx
>
> >
>
> >
>
> >
>
> > # create a workspace
>
> >
>
> >
>
> >
>
> > mkdir twitter-test; cd twitter-test;
>
> >
>
> >
>
> >
>
> > # supply a config file with credentials
>
> >
>
> >
>
> >
>
> > cat > application.conf << EOF
>
> >
>
> > twitter {
>
> >
>
> >   oauth {
>
> >
>
> >     consumerKey = ""
>
> >
>
> >     consumerSecret = ""
>
> >
>
> >     accessToken = ""
>
> >
>
> >     accessTokenSecret = ""
>
> >
>
> >   }
>
> >
>
> >   retrySleepMs = 5000
>
> >
>
> >   retryMax = 250
>
> >
>
> >   info = [
>
> >
>
> >     18055613
>
> >
>
> >   ]
>
> >
>
> > }
>
> >
>
> > EOF
>
> >
>
> >
>
> >
>
> > sbtx -210 -sbt-create
>
> >
>
> >
>
> >
>
> > set resolvers += "Local Maven Repository" at
>
> > "file://"+Path.userHome.absolutePath+"/.m2/repository"
>
> >
>
> >
>
> >
>
> > set libraryDependencies += "org.apache.streams" %
>
> > "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"
>
> >
>
> >
>
> >
>
> > set fork := true
>
> >
>
> >
>
> >
>
> > run-main
>
> > org.apache.streams.twitter.provider.TwitterUserInformationProvider
>
> > application.conf users.txt
>
> >
>
> >
>
> >
>
> > run-main org.apache.streams.twitter.provider.TwitterTimelineProvider
>
> > application.conf statuses.txt
>
> >
>
> >
>
> >
>
> > set javaOptions += "-Dtwitter.endpoint=friends"
>
> >
>
> >
>
> >
>
> > run-main org.apache.streams.twitter.provider.TwitterFollowingProvider
>
> > application.conf friends.txt
>
> >
>
> >
>
> >
>
> > set javaOptions += "-Dtwitter.endpoint=followers"
>
> >
>
> >
>
> >
>
> > exit
>
> >
>
> >
>
> >
>
> > ls -l
>
> >
>
> >
>
> >
>
> > Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
>
> >
>
> > -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
>
> >
>
> > -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
>
> >
>
> > -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
>
> >
>
> > drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
>
> >
>
> > -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
>
> >
>
> > drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
>
> >
>
> > -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt
>
> >
>
> >
>
> >
>
> >
>
> >
>
> >
>
>

Re: Ease-of-use : minimizing TTHW (time-to-hello-world)

Posted by sblackmon <sb...@apache.org>.
Some other projects are currently looking at publishing docker containers 
that people can easily extend. I am totally in favor of this approach. 

Docker distribution would open up a lot of cool options for this project.

Which projects are farthest along this road?

I think even publishing this as a Docker file example on the website would 
be a good start. 
These PRs use a maven docker plugin during verify phase.

https://github.com/apache/incubator-streams-examples/pull/14
https://github.com/apache/incubator-streams/pull/288

The same plugin can build tag and deploy images with goals docker:build and docker:push .

Once these merge I’ll take another pass through the examples documentation and for each describe a few alternative processes (STREAMS-428)

1) Build from source, run stream from *nix shell with dist uber-jar.
2) Run stream with sbt interactive shell using artifacts from maven central
3) Run stream with docker using artifacts from docker hub

On October 10, 2016 at 8:09:45 AM, Matt Franklin (m.ben.franklin@gmail.com) wrote:
On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:  

>  
>  
> TL;DR I’ve found a way to dramatically reduce barriers to using streams as  
> a beginner.  
>  
>  
>  
> Using the streams 0.3 release, it’s quite a headache for a novice to use  
> streams. We have a tutorial on the website, but it’s quite a journey. You  
> have to check out all three repos and install them each in order before you  
> get a jar file you could use to get data, then you can run a few pre-canned  
> streams, and those are intermediate not beginner level.  
>  
>  
>  
> In an ideal world, anyone would be able to yum or apt-get (or docker pull)  
> individual providers or processors and run them on their own without  
> building from source or composing them into multi-step streams.  
>  
>  
>  
> We'd have increase our build and compliance complexity significantly to  
> publish official binaries. So what can we do to drop the learning curve  
> precipitously without doing that?  
>  

Some other projects are currently looking at publishing docker containers  
that people can easily extend. I am totally in favor of this approach.  


>  
>  
>  
> Providers are really simple to run. The hard part is getting all of the  
> right classes and configuration properties into a JVM. Inspired by how  
> zeppelin’s %dep interpreter reduces the friction in composing and running a  
> scala notebook, I wanted to find a way to get the same ability from a linux  
> shell.  
>  
>  
>  
> The commands below go from just a java installation to flat files of  
> twitter data in just a few minutes.  
>  
>  
>  
> I think until we have binary distributions, this is how our tutorials  
> should tell the world to get started with streams.  
>  
>  
>  
> Thoughts?  
>  

I think even publishing this as a Docker file example on the website would  
be a good start.  


>  
>  
>  
> -----  
>  
>  
>  
> # install sbtx  
>  
>  
>  
> curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt >  
> /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx  
>  
>  
>  
> # create a workspace  
>  
>  
>  
> mkdir twitter-test; cd twitter-test;  
>  
>  
>  
> # supply a config file with credentials  
>  
>  
>  
> cat > application.conf << EOF  
>  
> twitter {  
>  
> oauth {  
>  
> consumerKey = ""  
>  
> consumerSecret = ""  
>  
> accessToken = ""  
>  
> accessTokenSecret = ""  
>  
> }  
>  
> retrySleepMs = 5000  
>  
> retryMax = 250  
>  
> info = [  
>  
> 18055613  
>  
> ]  
>  
> }  
>  
> EOF  
>  
>  
>  
> sbtx -210 -sbt-create  
>  
>  
>  
> set resolvers += "Local Maven Repository" at  
> "file://"+Path.userHome.absolutePath+"/.m2/repository"  
>  
>  
>  
> set libraryDependencies += "org.apache.streams" %  
> "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"  
>  
>  
>  
> set fork := true  
>  
>  
>  
> run-main  
> org.apache.streams.twitter.provider.TwitterUserInformationProvider  
> application.conf users.txt  
>  
>  
>  
> run-main org.apache.streams.twitter.provider.TwitterTimelineProvider  
> application.conf statuses.txt  
>  
>  
>  
> set javaOptions += "-Dtwitter.endpoint=friends"  
>  
>  
>  
> run-main org.apache.streams.twitter.provider.TwitterFollowingProvider  
> application.conf friends.txt  
>  
>  
>  
> set javaOptions += "-Dtwitter.endpoint=followers"  
>  
>  
>  
> exit  
>  
>  
>  
> ls -l  
>  
>  
>  
> Steves-MacBook-Pro-3:twitter sblackmon$ ls -l  
>  
> -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf  
>  
> -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt  
>  
> -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt  
>  
> drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project  
>  
> -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt  
>  
> drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target  
>  
> -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt  
>  
>  
>  
>  
>  
>  

Re: Ease-of-use : minimizing TTHW (time-to-hello-world)

Posted by Matt Franklin <m....@gmail.com>.
On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:

>
>
> TL;DR I’ve found a way to dramatically reduce barriers to using streams as
> a beginner.
>
>
>
> Using the streams 0.3 release, it’s quite a headache for a novice to use
> streams. We have a tutorial on the website, but it’s quite a journey. You
> have to check out all three repos and install them each in order before you
> get a jar file you could use to get data, then you can run a few pre-canned
> streams, and those are intermediate not beginner level.
>
>
>
> In an ideal world, anyone would be able to yum or apt-get (or docker pull)
> individual providers or processors and run them on their own without
> building from source or composing them into multi-step streams.
>
>
>
> We'd have increase our build and compliance complexity significantly to
> publish official binaries. So what can we do to drop the learning curve
> precipitously without doing that?
>

Some other projects are currently looking at publishing docker containers
that people can easily extend.  I am totally in favor of this approach.


>
>
>
> Providers are really simple to run. The hard part is getting all of the
> right classes and configuration properties into a JVM. Inspired by how
> zeppelin’s %dep interpreter reduces the friction in composing and running a
> scala notebook, I wanted to find a way to get the same ability from a linux
> shell.
>
>
>
> The commands below go from just a java installation to flat files of
> twitter data in just a few minutes.
>
>
>
> I think until we have binary distributions, this is how our tutorials
> should tell the world to get started with streams.
>
>
>
> Thoughts?
>

I think even publishing this as a Docker file example on the website would
be a good start.


>
>
>
> -----
>
>
>
> # install sbtx
>
>
>
> curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt >
> /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx
>
>
>
> # create a workspace
>
>
>
> mkdir twitter-test; cd twitter-test;
>
>
>
> # supply a config file with credentials
>
>
>
> cat > application.conf << EOF
>
> twitter {
>
>   oauth {
>
>     consumerKey = ""
>
>     consumerSecret = ""
>
>     accessToken = ""
>
>     accessTokenSecret = ""
>
>   }
>
>   retrySleepMs = 5000
>
>   retryMax = 250
>
>   info = [
>
>     18055613
>
>   ]
>
> }
>
> EOF
>
>
>
> sbtx -210 -sbt-create
>
>
>
> set resolvers += "Local Maven Repository" at
> "file://"+Path.userHome.absolutePath+"/.m2/repository"
>
>
>
> set libraryDependencies += "org.apache.streams" %
> "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"
>
>
>
> set fork := true
>
>
>
> run-main
> org.apache.streams.twitter.provider.TwitterUserInformationProvider
> application.conf users.txt
>
>
>
> run-main org.apache.streams.twitter.provider.TwitterTimelineProvider
> application.conf statuses.txt
>
>
>
> set javaOptions += "-Dtwitter.endpoint=friends"
>
>
>
> run-main org.apache.streams.twitter.provider.TwitterFollowingProvider
> application.conf friends.txt
>
>
>
> set javaOptions += "-Dtwitter.endpoint=followers"
>
>
>
> exit
>
>
>
> ls -l
>
>
>
> Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
>
> -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
>
> -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
>
> -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
>
> drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
>
> -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
>
> drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
>
> -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt
>
>
>
>
>
>