You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by sblackmon <sb...@apache.org> on 2016/10/17 15:36:37 UTC

Distribution / Docker next steps

On October 11, 2016 at 11:01:18 AM, Matt Franklin (m.ben.franklin@gmail.com) wrote:
On Mon, Oct 10, 2016 at 11:30 AM sblackmon <sb...@apache.org> wrote:  

> Some other projects are currently looking at publishing docker containers  
> that people can easily extend. I am totally in favor of this approach.  
>  
>  
> Docker distribution would open up a lot of cool options for this project.  
>  
> Which projects are farthest along this road?  
>  

https://hub.docker.com/r/apache/  


I had been thinking more along the lines of publishing a distribution for each provider, processor, and persister module containing a minimal uber-jar.  Going this route would probably warrant a dedicated organization for streams.  OTOH, if we get to the point of having a binary distribution containing all of the classes in streams-project, that could be published to a top-level /apache repository and perform all of the same work (probably with a much larger docker image)


>  
> I think even publishing this as a Docker file example on the website would  
> be a good start.  
>  
> These PRs use a maven docker plugin during verify phase.  
> https://github.com/apache/incubator-streams-examples/pull/14  
> https://github.com/apache/incubator-streams/pull/288  
>  
> The same plugin can build tag and deploy images with goals docker:build  
> and docker:push .  
>  

Per policy, the only thing that should make it to repositories like Docker  
hub and Maven Central should be released convenience binaries.  


I think the next step is to figure out what would need to happen to build, certify, and publish a convenience binary and docker image for (initially) just one one individual provider module in an upcoming releases.  The dependency tree for a single provider will be more tractable than for the whole project and there’s a clear user benefit - greatly simplified project tutorial.


>  
> Once these merge I’ll take another pass through the examples documentation  
> and for each describe a few alternative processes (STREAMS-428)  
>  
> 1) Build from source, run stream from *nix shell with dist uber-jar.  
> 2) Run stream with sbt interactive shell using artifacts from maven central  
> 3) Run stream with docker using artifacts from docker hub  
>  
> On October 10, 2016 at 8:09:45 AM, Matt Franklin (m.ben.franklin@gmail.com)  
> wrote:  
>  
> On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:  
>  
>  
>  
> >  
>  
> >  
>  
> > TL;DR I’ve found a way to dramatically reduce barriers to using streams  
> as  
>  
> > a beginner.  
>  
> >  
>  
> >  
>  
> >  
>  
> > Using the streams 0.3 release, it’s quite a headache for a novice to use  
>  
> > streams. We have a tutorial on the website, but it’s quite a journey. You  
>  
> > have to check out all three repos and install them each in order before  
> you  
>  
> > get a jar file you could use to get data, then you can run a few  
> pre-canned  
>  
> > streams, and those are intermediate not beginner level.  
>  
> >  
>  
> >  
>  
> >  
>  
> > In an ideal world, anyone would be able to yum or apt-get (or docker  
> pull)  
>  
> > individual providers or processors and run them on their own without  
>  
> > building from source or composing them into multi-step streams.  
>  
> >  
>  
> >  
>  
> >  
>  
> > We'd have increase our build and compliance complexity significantly to  
>  
> > publish official binaries. So what can we do to drop the learning curve  
>  
> > precipitously without doing that?  
>  
> >  
>  
>  
>  
> Some other projects are currently looking at publishing docker containers  
>  
> that people can easily extend. I am totally in favor of this approach.  
>  
>  
>  
>  
>  
> >  
>  
> >  
>  
> >  
>  
> > Providers are really simple to run. The hard part is getting all of the  
>  
> > right classes and configuration properties into a JVM. Inspired by how  
>  
> > zeppelin’s %dep interpreter reduces the friction in composing and  
> running a  
>  
> > scala notebook, I wanted to find a way to get the same ability from a  
> linux  
>  
> > shell.  
>  
> >  
>  
> >  
>  
> >  
>  
> > The commands below go from just a java installation to flat files of  
>  
> > twitter data in just a few minutes.  
>  
> >  
>  
> >  
>  
> >  
>  
> > I think until we have binary distributions, this is how our tutorials  
>  
> > should tell the world to get started with streams.  
>  
> >  
>  
> >  
>  
> >  
>  
> > Thoughts?  
>  
> >  
>  
>  
>  
> I think even publishing this as a Docker file example on the website would  
>  
> be a good start.  
>  
>  
>  
>  
>  
> >  
>  
> >  
>  
> >  
>  
> > -----  
>  
> >  
>  
> >  
>  
> >  
>  
> > # install sbtx  
>  
> >  
>  
> >  
>  
> >  
>  
> > curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt >  
>  
> > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx  
>  
> >  
>  
> >  
>  
> >  
>  
> > # create a workspace  
>  
> >  
>  
> >  
>  
> >  
>  
> > mkdir twitter-test; cd twitter-test;  
>  
> >  
>  
> >  
>  
> >  
>  
> > # supply a config file with credentials  
>  
> >  
>  
> >  
>  
> >  
>  
> > cat > application.conf << EOF  
>  
> >  
>  
> > twitter {  
>  
> >  
>  
> > oauth {  
>  
> >  
>  
> > consumerKey = ""  
>  
> >  
>  
> > consumerSecret = ""  
>  
> >  
>  
> > accessToken = ""  
>  
> >  
>  
> > accessTokenSecret = ""  
>  
> >  
>  
> > }  
>  
> >  
>  
> > retrySleepMs = 5000  
>  
> >  
>  
> > retryMax = 250  
>  
> >  
>  
> > info = [  
>  
> >  
>  
> > 18055613  
>  
> >  
>  
> > ]  
>  
> >  
>  
> > }  
>  
> >  
>  
> > EOF  
>  
> >  
>  
> >  
>  
> >  
>  
> > sbtx -210 -sbt-create  
>  
> >  
>  
> >  
>  
> >  
>  
> > set resolvers += "Local Maven Repository" at  
>  
> > "file://"+Path.userHome.absolutePath+"/.m2/repository"  
>  
> >  
>  
> >  
>  
> >  
>  
> > set libraryDependencies += "org.apache.streams" %  
>  
> > "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"  
>  
> >  
>  
> >  
>  
> >  
>  
> > set fork := true  
>  
> >  
>  
> >  
>  
> >  
>  
> > run-main  
>  
> > org.apache.streams.twitter.provider.TwitterUserInformationProvider  
>  
> > application.conf users.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> > run-main org.apache.streams.twitter.provider.TwitterTimelineProvider  
>  
> > application.conf statuses.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> > set javaOptions += "-Dtwitter.endpoint=friends"  
>  
> >  
>  
> >  
>  
> >  
>  
> > run-main org.apache.streams.twitter.provider.TwitterFollowingProvider  
>  
> > application.conf friends.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> > set javaOptions += "-Dtwitter.endpoint=followers"  
>  
> >  
>  
> >  
>  
> >  
>  
> > exit  
>  
> >  
>  
> >  
>  
> >  
>  
> > ls -l  
>  
> >  
>  
> >  
>  
> >  
>  
> > Steves-MacBook-Pro-3:twitter sblackmon$ ls -l  
>  
> >  
>  
> > -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt  
>  
> >  
>  
> > drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt  
>  
> >  
>  
> > drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target  
>  
> >  
>  
> > -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt  
>  
> >  
>  
> >  
>  
> >  
>  
> >  
>  
> >  
>  
> >  
>  
>  

Re: Distribution / Docker next steps

Posted by Matt Franklin <m....@gmail.com>.
On Mon, Oct 17, 2016 at 11:36 AM sblackmon <sb...@apache.org> wrote:

>
> On October 11, 2016 at 11:01:18 AM, Matt Franklin (
> m.ben.franklin@gmail.com) wrote:
>
> On Mon, Oct 10, 2016 at 11:30 AM sblackmon <sb...@apache.org> wrote:
>
> > Some other projects are currently looking at publishing docker
> containers
> > that people can easily extend. I am totally in favor of this approach.
> >
> >
> > Docker distribution would open up a lot of cool options for this
> project.
> >
> > Which projects are farthest along this road?
> >
>
> https://hub.docker.com/r/apache/
>
>
> I had been thinking more along the lines of publishing a distribution for
> each provider, processor, and persister module containing a minimal
> uber-jar.  Going this route would probably warrant a dedicated organization
> for streams.  OTOH, if we get to the point of having a binary distribution
> containing all of the classes in streams-project, that could be published
> to a top-level /apache repository and perform all of the same work
> (probably with a much larger docker image)
>

Tomcat (and I think a few others) have their own organization on Docker
Hub, so it is definitely a possibility.


>
>
> >
> > I think even publishing this as a Docker file example on the website
> would
> > be a good start.
> >
> > These PRs use a maven docker plugin during verify phase.
> > https://github.com/apache/incubator-streams-examples/pull/14
> > https://github.com/apache/incubator-streams/pull/288
> >
> > The same plugin can build tag and deploy images with goals docker:build
> > and docker:push .
> >
>
> Per policy, the only thing that should make it to repositories like Docker
> hub and Maven Central should be released convenience binaries.
>
>
> I think the next step is to figure out what would need to happen to build,
> certify, and publish a convenience binary and docker image for (initially)
> just one one individual provider module in an upcoming releases.  The
> dependency tree for a single provider will be more tractable than for the
> whole project and there’s a clear user benefit - greatly simplified project
> tutorial.
>

I would submit an Infra ticket


>
>
> >
> > Once these merge I’ll take another pass through the examples
> documentation
> > and for each describe a few alternative processes (STREAMS-428)
> >
> > 1) Build from source, run stream from *nix shell with dist uber-jar.
> > 2) Run stream with sbt interactive shell using artifacts from maven
> central
> > 3) Run stream with docker using artifacts from docker hub
> >
> > On October 10, 2016 at 8:09:45 AM, Matt Franklin (
> m.ben.franklin@gmail.com)
> > wrote:
> >
> > On Thu, Oct 6, 2016 at 2:56 PM sblackmon <sb...@apache.org> wrote:
> >
> >
> >
> > >
> >
> > >
> >
> > > TL;DR I’ve found a way to dramatically reduce barriers to using
> streams
> > as
> >
> > > a beginner.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Using the streams 0.3 release, it’s quite a headache for a novice to
> use
> >
> > > streams. We have a tutorial on the website, but it’s quite a journey.
> You
> >
> > > have to check out all three repos and install them each in order
> before
> > you
> >
> > > get a jar file you could use to get data, then you can run a few
> > pre-canned
> >
> > > streams, and those are intermediate not beginner level.
> >
> > >
> >
> > >
> >
> > >
> >
> > > In an ideal world, anyone would be able to yum or apt-get (or docker
> > pull)
> >
> > > individual providers or processors and run them on their own without
> >
> > > building from source or composing them into multi-step streams.
> >
> > >
> >
> > >
> >
> > >
> >
> > > We'd have increase our build and compliance complexity significantly
> to
> >
> > > publish official binaries. So what can we do to drop the learning
> curve
> >
> > > precipitously without doing that?
> >
> > >
> >
> >
> >
> > Some other projects are currently looking at publishing docker
> containers
> >
> > that people can easily extend. I am totally in favor of this approach.
> >
> >
> >
> >
> >
> > >
> >
> > >
> >
> > >
> >
> > > Providers are really simple to run. The hard part is getting all of
> the
> >
> > > right classes and configuration properties into a JVM. Inspired by how
> >
> > > zeppelin’s %dep interpreter reduces the friction in composing and
> > running a
> >
> > > scala notebook, I wanted to find a way to get the same ability from a
> > linux
> >
> > > shell.
> >
> > >
> >
> > >
> >
> > >
> >
> > > The commands below go from just a java installation to flat files of
> >
> > > twitter data in just a few minutes.
> >
> > >
> >
> > >
> >
> > >
> >
> > > I think until we have binary distributions, this is how our tutorials
> >
> > > should tell the world to get started with streams.
> >
> > >
> >
> > >
> >
> > >
> >
> > > Thoughts?
> >
> > >
> >
> >
> >
> > I think even publishing this as a Docker file example on the website
> would
> >
> > be a good start.
> >
> >
> >
> >
> >
> > >
> >
> > >
> >
> > >
> >
> > > -----
> >
> > >
> >
> > >
> >
> > >
> >
> > > # install sbtx
> >
> > >
> >
> > >
> >
> > >
> >
> > > curl -s https://raw.githubusercontent.com/paulp/sbt-extras/master/sbt
> >
> >
> > > /usr/bin/sbtx && chmod 0755 /usr/bin/sbtx
> >
> > >
> >
> > >
> >
> > >
> >
> > > # create a workspace
> >
> > >
> >
> > >
> >
> > >
> >
> > > mkdir twitter-test; cd twitter-test;
> >
> > >
> >
> > >
> >
> > >
> >
> > > # supply a config file with credentials
> >
> > >
> >
> > >
> >
> > >
> >
> > > cat > application.conf << EOF
> >
> > >
> >
> > > twitter {
> >
> > >
> >
> > > oauth {
> >
> > >
> >
> > > consumerKey = ""
> >
> > >
> >
> > > consumerSecret = ""
> >
> > >
> >
> > > accessToken = ""
> >
> > >
> >
> > > accessTokenSecret = ""
> >
> > >
> >
> > > }
> >
> > >
> >
> > > retrySleepMs = 5000
> >
> > >
> >
> > > retryMax = 250
> >
> > >
> >
> > > info = [
> >
> > >
> >
> > > 18055613
> >
> > >
> >
> > > ]
> >
> > >
> >
> > > }
> >
> > >
> >
> > > EOF
> >
> > >
> >
> > >
> >
> > >
> >
> > > sbtx -210 -sbt-create
> >
> > >
> >
> > >
> >
> > >
> >
> > > set resolvers += "Local Maven Repository" at
> >
> > > "file://"+Path.userHome.absolutePath+"/.m2/repository"
> >
> > >
> >
> > >
> >
> > >
> >
> > > set libraryDependencies += "org.apache.streams" %
> >
> > > "streams-provider-twitter" % "0.4-incubating-SNAPSHOT"
> >
> > >
> >
> > >
> >
> > >
> >
> > > set fork := true
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main
> >
> > > org.apache.streams.twitter.provider.TwitterUserInformationProvider
> >
> > > application.conf users.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main org.apache.streams.twitter.provider.TwitterTimelineProvider
> >
> > > application.conf statuses.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > set javaOptions += "-Dtwitter.endpoint=friends"
> >
> > >
> >
> > >
> >
> > >
> >
> > > run-main org.apache.streams.twitter.provider.TwitterFollowingProvider
> >
> > > application.conf friends.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > > set javaOptions += "-Dtwitter.endpoint=followers"
> >
> > >
> >
> > >
> >
> > >
> >
> > > exit
> >
> > >
> >
> > >
> >
> > >
> >
> > > ls -l
> >
> > >
> >
> > >
> >
> > >
> >
> > > Steves-MacBook-Pro-3:twitter sblackmon$ ls -l
> >
> > >
> >
> > > -rw-r--r--@ 1 sblackmon staff 356 Oct 6 11:54 application.conf
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 293780 Oct 6 13:42 followers.txt
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 6260 Oct 6 13:43 friends.txt
> >
> > >
> >
> > > drwxr-xr-x 3 sblackmon staff 102 Oct 6 10:17 project
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 3339460 Oct 6 13:43 statuses.txt
> >
> > >
> >
> > > drwxr-xr-x 6 sblackmon staff 204 Oct 6 10:19 target
> >
> > >
> >
> > > -rw-r--r-- 1 sblackmon staff 3321 Oct 6 13:43 users.txt
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> > >
> >
> >
>
>