You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by Steve Blackmon <sb...@apache.org> on 2014/01/10 23:47:16 UTC

Substantial commit to new branch

Greetings,

Yesterday I completed a push of code we've been using to ingest data
streams from several major data providers, validate their messages, and
convert them to activitystreams format. There are some new top-level
modules, including
   a) streams-core - standard interfaces for the atomic units of streams -
providers, persisters, and processors
   b) streams-pojo - Jackson-compatible beans generated from
activitystreams json schemas
   c) streams-contrib - a collection of implementation modules, two or more
of which can be imported into a new project and woven together to create a
customized performant data stream to execute with java jar, storm jar,
hadoop jar, yarn jar, etc...
   d) streams-config - a typesafe-based configuration scheme that allows
individual modules and coordinator code to pull the configuration
parameters they require or support from supplied defaults, environment
variables, run-time property files, command line parameters, or accessible
HTTP end-points.

I'd love to see this project emerge as a code workspace where social data
vendors and consumers collaborate to ease the process of integration, and
facilitate data interchange with public data schemas and protocols such as
xml and json activitystreams formats.  No jvm-centric social data
interoperability ecosystem exists today to my knowledge.  Hopefully this
code will become a valuable starting point.  We have additional assets we
will commit to streams-contrib in the coming months as we get them cleaned
up, compliant with the streams-core interfaces, unit-tested, and real-world
tested.

I've also created a seperate external repository with some reference data
pipelines that demonstrate how to assemble various modules into end-to-end
streams at https://github.com/w2ogroup/streams-examples.  Today it contains
a working twitter gardenhose to activitystreams java process, and a
storm-based firehose processor that is still WIP.  More to come in this
repo as well.

Would love to get feedback on the concepts, patterns, and interfaces
proposed.  Will seek to merge with master in the standard 72 hours unless
anyone objects.

Best,
Steve Blackmon

RE: Substantial commit to new branch

Posted by Danny Sullivan <ds...@hotmail.com>.
Hey Steve,
Cool stuff! Let me know when a new place in svn is set up for the examples you've written, I'd be happy to add running instructions to the wiki for new developers. I needed to change the pom.xml for twitter-sample-standalone to specify the main method to be <mainClass>org.apache.streams.twitter.example.TwitterSampleStandalone</mainClass>. But I think it should work after that. Perhaps I can submit a pull request for that. 
Looking forward to integrating other platforms with Streams,
Danny
> Date: Mon, 13 Jan 2014 09:57:13 -0600
> Subject: Re: Substantial commit to new branch
> From: m.ben.franklin@gmail.com
> To: dev@streams.incubator.apache.org
> 
> On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sb...@apache.org>wrote:
> 
> > Greetings,
> >
> > Yesterday I completed a push of code we've been using to ingest data
> > streams from several major data providers, validate their messages, and
> > convert them to activitystreams format.
> 
> 
> Very cool.
> 
> 
> > There are some new top-level
> > modules, including
> >    a) streams-core - standard interfaces for the atomic units of streams -
> > providers, persisters, and processors
> >    b) streams-pojo - Jackson-compatible beans generated from
> > activitystreams json schemas
> >
> 
> Tickets need to be created to remove the dependency on the Rave
> ActivityStreams implementation then.
> 
> 
> >    c) streams-contrib - a collection of implementation modules, two or more
> > of which can be imported into a new project and woven together to create a
> > customized performant data stream to execute with java jar, storm jar,
> > hadoop jar, yarn jar, etc...
> >    d) streams-config - a typesafe-based configuration scheme that allows
> > individual modules and coordinator code to pull the configuration
> > parameters they require or support from supplied defaults, environment
> > variables, run-time property files, command line parameters, or accessible
> > HTTP end-points.
> >
> > I'd love to see this project emerge as a code workspace where social data
> > vendors and consumers collaborate to ease the process of integration, and
> > facilitate data interchange with public data schemas and protocols such as
> > xml and json activitystreams formats.  No jvm-centric social data
> > interoperability ecosystem exists today to my knowledge.  Hopefully this
> > code will become a valuable starting point.  We have additional assets we
> > will commit to streams-contrib in the coming months as we get them cleaned
> > up, compliant with the streams-core interfaces, unit-tested, and real-world
> > tested.
> >
> > I've also created a seperate external repository with some reference data
> > pipelines that demonstrate how to assemble various modules into end-to-end
> > streams at https://github.com/w2ogroup/streams-examples.  Today it
> > contains
> > a working twitter gardenhose to activitystreams java process, and a
> > storm-based firehose processor that is still WIP.  More to come in this
> > repo as well.
> >
> 
> Do you intend to contribute this to Apache?  If so, we should setup a
> different area in SVN for it.
> 
> 
> >
> > Would love to get feedback on the concepts, patterns, and interfaces
> > proposed.  Will seek to merge with master in the standard 72 hours unless
> > anyone objects.
> >
> > Best,
> > Steve Blackmon
> >
 		 	   		  

Re: Substantial commit to new branch

Posted by Matt Franklin <m....@gmail.com>.
On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sb...@apache.org>wrote:

> Greetings,
>
> Yesterday I completed a push of code we've been using to ingest data
> streams from several major data providers, validate their messages, and
> convert them to activitystreams format.


Very cool.


> There are some new top-level
> modules, including
>    a) streams-core - standard interfaces for the atomic units of streams -
> providers, persisters, and processors
>    b) streams-pojo - Jackson-compatible beans generated from
> activitystreams json schemas
>

Tickets need to be created to remove the dependency on the Rave
ActivityStreams implementation then.


>    c) streams-contrib - a collection of implementation modules, two or more
> of which can be imported into a new project and woven together to create a
> customized performant data stream to execute with java jar, storm jar,
> hadoop jar, yarn jar, etc...
>    d) streams-config - a typesafe-based configuration scheme that allows
> individual modules and coordinator code to pull the configuration
> parameters they require or support from supplied defaults, environment
> variables, run-time property files, command line parameters, or accessible
> HTTP end-points.
>
> I'd love to see this project emerge as a code workspace where social data
> vendors and consumers collaborate to ease the process of integration, and
> facilitate data interchange with public data schemas and protocols such as
> xml and json activitystreams formats.  No jvm-centric social data
> interoperability ecosystem exists today to my knowledge.  Hopefully this
> code will become a valuable starting point.  We have additional assets we
> will commit to streams-contrib in the coming months as we get them cleaned
> up, compliant with the streams-core interfaces, unit-tested, and real-world
> tested.
>
> I've also created a seperate external repository with some reference data
> pipelines that demonstrate how to assemble various modules into end-to-end
> streams at https://github.com/w2ogroup/streams-examples.  Today it
> contains
> a working twitter gardenhose to activitystreams java process, and a
> storm-based firehose processor that is still WIP.  More to come in this
> repo as well.
>

Do you intend to contribute this to Apache?  If so, we should setup a
different area in SVN for it.


>
> Would love to get feedback on the concepts, patterns, and interfaces
> proposed.  Will seek to merge with master in the standard 72 hours unless
> anyone objects.
>
> Best,
> Steve Blackmon
>

Re: Substantial commit to new branch

Posted by Jason Letourneau <jl...@gmail.com>.
This is bad@$$ - hoping to run spin it up in the next couple of days!

On Tue, Jan 21, 2014 at 5:54 PM, Steve Blackmon <sb...@apache.org> wrote:
> Hello,
>
> Following up to let everyone know this work has been merged with master.
>
> Additionally, a few small changes to storm-core and an initial
> implementation of storm wrappers have been committed.
>
> Anyone interested in how storm would be used to deploy data pipelines,
> pull down the latest trunk, install, and then build
> https://github.com/w2ogroup/streams-examples/moreover-metabase-storm
>
> At a high level, moreover-metabase-storm/pom.xml and
> MoreoverMetabaseTopology are a template for how easy I think it should
> be to assemble a storm topology from streams components.
>
> Steve Blackmon
>
> On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sb...@apache.org> wrote:
>> Greetings,
>>
>> Yesterday I completed a push of code we've been using to ingest data streams
>> from several major data providers, validate their messages, and convert them
>> to activitystreams format. There are some new top-level modules, including
>>    a) streams-core - standard interfaces for the atomic units of streams -
>> providers, persisters, and processors
>>    b) streams-pojo - Jackson-compatible beans generated from activitystreams
>> json schemas
>>    c) streams-contrib - a collection of implementation modules, two or more
>> of which can be imported into a new project and woven together to create a
>> customized performant data stream to execute with java jar, storm jar,
>> hadoop jar, yarn jar, etc...
>>    d) streams-config - a typesafe-based configuration scheme that allows
>> individual modules and coordinator code to pull the configuration parameters
>> they require or support from supplied defaults, environment variables,
>> run-time property files, command line parameters, or accessible HTTP
>> end-points.
>>
>> I'd love to see this project emerge as a code workspace where social data
>> vendors and consumers collaborate to ease the process of integration, and
>> facilitate data interchange with public data schemas and protocols such as
>> xml and json activitystreams formats.  No jvm-centric social data
>> interoperability ecosystem exists today to my knowledge.  Hopefully this
>> code will become a valuable starting point.  We have additional assets we
>> will commit to streams-contrib in the coming months as we get them cleaned
>> up, compliant with the streams-core interfaces, unit-tested, and real-world
>> tested.
>>
>> I've also created a seperate external repository with some reference data
>> pipelines that demonstrate how to assemble various modules into end-to-end
>> streams at https://github.com/w2ogroup/streams-examples.  Today it contains
>> a working twitter gardenhose to activitystreams java process, and a
>> storm-based firehose processor that is still WIP.  More to come in this repo
>> as well.
>>
>> Would love to get feedback on the concepts, patterns, and interfaces
>> proposed.  Will seek to merge with master in the standard 72 hours unless
>> anyone objects.
>>
>> Best,
>> Steve Blackmon
>>
>>

Re: Substantial commit to new branch

Posted by Steve Blackmon <sb...@apache.org>.
Hello,

Following up to let everyone know this work has been merged with master.

Additionally, a few small changes to storm-core and an initial
implementation of storm wrappers have been committed.

Anyone interested in how storm would be used to deploy data pipelines,
pull down the latest trunk, install, and then build
https://github.com/w2ogroup/streams-examples/moreover-metabase-storm

At a high level, moreover-metabase-storm/pom.xml and
MoreoverMetabaseTopology are a template for how easy I think it should
be to assemble a storm topology from streams components.

Steve Blackmon

On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sb...@apache.org> wrote:
> Greetings,
>
> Yesterday I completed a push of code we've been using to ingest data streams
> from several major data providers, validate their messages, and convert them
> to activitystreams format. There are some new top-level modules, including
>    a) streams-core - standard interfaces for the atomic units of streams -
> providers, persisters, and processors
>    b) streams-pojo - Jackson-compatible beans generated from activitystreams
> json schemas
>    c) streams-contrib - a collection of implementation modules, two or more
> of which can be imported into a new project and woven together to create a
> customized performant data stream to execute with java jar, storm jar,
> hadoop jar, yarn jar, etc...
>    d) streams-config - a typesafe-based configuration scheme that allows
> individual modules and coordinator code to pull the configuration parameters
> they require or support from supplied defaults, environment variables,
> run-time property files, command line parameters, or accessible HTTP
> end-points.
>
> I'd love to see this project emerge as a code workspace where social data
> vendors and consumers collaborate to ease the process of integration, and
> facilitate data interchange with public data schemas and protocols such as
> xml and json activitystreams formats.  No jvm-centric social data
> interoperability ecosystem exists today to my knowledge.  Hopefully this
> code will become a valuable starting point.  We have additional assets we
> will commit to streams-contrib in the coming months as we get them cleaned
> up, compliant with the streams-core interfaces, unit-tested, and real-world
> tested.
>
> I've also created a seperate external repository with some reference data
> pipelines that demonstrate how to assemble various modules into end-to-end
> streams at https://github.com/w2ogroup/streams-examples.  Today it contains
> a working twitter gardenhose to activitystreams java process, and a
> storm-based firehose processor that is still WIP.  More to come in this repo
> as well.
>
> Would love to get feedback on the concepts, patterns, and interfaces
> proposed.  Will seek to merge with master in the standard 72 hours unless
> anyone objects.
>
> Best,
> Steve Blackmon
>
>