You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@streams.apache.org by Steve Blackmon <sb...@apache.org> on 2014/01/22 08:34:54 UTC

Integration, Documentation, Opportunities

Danny: thanks for trying it out.  It would be excellent if you would
test drive our examples and help with READMEs.  Good catch on the
manifest - I was launching the jar with -cp.  I see no reason we
couldn't formally open source our examples and possibly migrate them
into the streams repository if the list thinks that is a good idea and
will help develop them and add additional examples.  I've seen example
implementations maintained both inside and outside of platform
repositories.


The next persistence modules we are preparing to contribute are hdfs
and elasticsearch, both useful and surprisingly tricky to get right
performance-wise.  A recursive link unwinder, boilerpipes article
extractor, and lucene tagger exist but need some work.


Not hard to come up with other modules that would be useful as part of
a real-time data flow and relatively straight-forward to write.  A
Rome-based RSS collector for example.  IRC listener.  OpenNLP?  Who
has other ideas or prototypes we could integrate?


-----Original Message-----
From: Danny Sullivan [mailto:dsullivan7@hotmail.com]
Sent: Tuesday, January 21, 2014 3:08 PM
To: dev@streams.incubator.apache.org
Subject: RE: Substantial commit to new branch


Hey Steve,

Cool stuff! Let me know when a new place in svn is set up for the
examples you've written, I'd be happy to add running instructions to
the wiki for new developers. I needed to change the pom.xml for
twitter-sample-standalone to specify the main method to be
<mainClass>org.apache.streams.twitter.example.TwitterSampleStandalone</mainClass>.
But I think it should work after that. Perhaps I can submit a pull
request for that.

Looking forward to integrating other platforms with Streams, Danny

> Date: Mon, 13 Jan 2014 09:57:13 -0600

> Subject: Re: Substantial commit to new branch

> From: m.ben.franklin@gmail.com

> To: dev@streams.incubator.apache.org

>

> On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sb...@apache.org>wrote:

>

> > Greetings,

> >

> > Yesterday I completed a push of code we've been using to ingest data

> > streams from several major data providers, validate their messages,

> > and convert them to activitystreams format.

>

>

> Very cool.

>

>

> > There are some new top-level

> > modules, including

> >    a) streams-core - standard interfaces for the atomic units of

> > streams - providers, persisters, and processors

> >    b) streams-pojo - Jackson-compatible beans generated from

> > activitystreams json schemas

> >

>

> Tickets need to be created to remove the dependency on the Rave

> ActivityStreams implementation then.

>

>

> >    c) streams-contrib - a collection of implementation modules, two

> > or more of which can be imported into a new project and woven

> > together to create a customized performant data stream to execute

> > with java jar, storm jar, hadoop jar, yarn jar, etc...

> >    d) streams-config - a typesafe-based configuration scheme that

> > allows individual modules and coordinator code to pull the

> > configuration parameters they require or support from supplied

> > defaults, environment variables, run-time property files, command

> > line parameters, or accessible HTTP end-points.

> >

> > I'd love to see this project emerge as a code workspace where social

> > data vendors and consumers collaborate to ease the process of

> > integration, and facilitate data interchange with public data

> > schemas and protocols such as xml and json activitystreams formats.

> > No jvm-centric social data interoperability ecosystem exists today

> > to my knowledge.  Hopefully this code will become a valuable

> > starting point.  We have additional assets we will commit to

> > streams-contrib in the coming months as we get them cleaned up,

> > compliant with the streams-core interfaces, unit-tested, and real-world tested.

> >

> > I've also created a seperate external repository with some reference

> > data pipelines that demonstrate how to assemble various modules into

> > end-to-end streams at

> > http://cp.mcafee.com/d/1jWVIp43qb9EVKOOY-OedTdFEIzDxRQTxNJd5x5Z5dB4s

> > rjhp7f3HFLf6QQm66jhOYYCCrEgfSNlmI0kojGx8zauDYKrc9RgAhBfj-ndLuz_E9LZv

> > ChMVxZYtRXBQQXFY-CYOUCPORQX8FGTKzOEuvkzaT0QSyrhdTVeZXTLuZXCXCOsVHkiP

> > 2cFASO7bUojGhzlJrajz-GQb-7x0_W6QfcBU_dKc2XZPhOnEgfSNlmI3z2tk94pjQ_BMlisvRmDixpIxIWNXlJIj_w09JNdNAS2NF8Qg33p2AZxemDCy1tmkPh1eFEwxVwQg48X2NcTsT34c19HJncxO.  Today it contains a working twitter gardenhose to activitystreams java process, and a storm-based firehose processor that is still WIP.  More to come in this repo as well.

> >

>

> Do you intend to contribute this to Apache?  If so, we should setup a

> different area in SVN for it.

>

>

> >

> > Would love to get feedback on the concepts, patterns, and interfaces

> > proposed.  Will seek to merge with master in the standard 72 hours

> > unless anyone objects.

> >

> > Best,

> > Steve Blackmon

> >

Re: Integration, Documentation, Opportunities

Posted by Steve Blackmon <st...@blackmon.org>.
I like it!  I'm more of a dropbox guy, but they also have a java api
(http://dropbox.github.io/dropbox-sdk-java/api-docs/v1.7.x/) with a
method that will enumerate changes to all files in a folder.

This is also an interesting use case to tackle, getting streams ready
to support a front-end present links back to source documents within
Google Docs / Dropbox.

If anyone is unfamiliar with ifttt.com, they and their community have
come up with some useful applications by integrating with popular
APIs, normalizing data formats, and applying filters with boolean
logic.

Steve

On Wed, Jan 22, 2014 at 9:42 AM, Jason Letourneau
<jl...@gmail.com> wrote:
> What about Google docs activity publishing they just released?  That
> would be useful in many ways...
>
> On Wed, Jan 22, 2014 at 2:34 AM, Steve Blackmon <sb...@apache.org> wrote:
>> Danny: thanks for trying it out.  It would be excellent if you would
>> test drive our examples and help with READMEs.  Good catch on the
>> manifest - I was launching the jar with -cp.  I see no reason we
>> couldn't formally open source our examples and possibly migrate them
>> into the streams repository if the list thinks that is a good idea and
>> will help develop them and add additional examples.  I've seen example
>> implementations maintained both inside and outside of platform
>> repositories.
>>
>>
>> The next persistence modules we are preparing to contribute are hdfs
>> and elasticsearch, both useful and surprisingly tricky to get right
>> performance-wise.  A recursive link unwinder, boilerpipes article
>> extractor, and lucene tagger exist but need some work.
>>
>>
>> Not hard to come up with other modules that would be useful as part of
>> a real-time data flow and relatively straight-forward to write.  A
>> Rome-based RSS collector for example.  IRC listener.  OpenNLP?  Who
>> has other ideas or prototypes we could integrate?
>>
>>
>> -----Original Message-----
>> From: Danny Sullivan [mailto:dsullivan7@hotmail.com]
>> Sent: Tuesday, January 21, 2014 3:08 PM
>> To: dev@streams.incubator.apache.org
>> Subject: RE: Substantial commit to new branch
>>
>>
>> Hey Steve,
>>
>> Cool stuff! Let me know when a new place in svn is set up for the
>> examples you've written, I'd be happy to add running instructions to
>> the wiki for new developers. I needed to change the pom.xml for
>> twitter-sample-standalone to specify the main method to be
>> <mainClass>org.apache.streams.twitter.example.TwitterSampleStandalone</mainClass>.
>> But I think it should work after that. Perhaps I can submit a pull
>> request for that.
>>
>> Looking forward to integrating other platforms with Streams, Danny
>>
>>> Date: Mon, 13 Jan 2014 09:57:13 -0600
>>
>>> Subject: Re: Substantial commit to new branch
>>
>>> From: m.ben.franklin@gmail.com
>>
>>> To: dev@streams.incubator.apache.org
>>
>>>
>>
>>> On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sb...@apache.org>wrote:
>>
>>>
>>
>>> > Greetings,
>>
>>> >
>>
>>> > Yesterday I completed a push of code we've been using to ingest data
>>
>>> > streams from several major data providers, validate their messages,
>>
>>> > and convert them to activitystreams format.
>>
>>>
>>
>>>
>>
>>> Very cool.
>>
>>>
>>
>>>
>>
>>> > There are some new top-level
>>
>>> > modules, including
>>
>>> >    a) streams-core - standard interfaces for the atomic units of
>>
>>> > streams - providers, persisters, and processors
>>
>>> >    b) streams-pojo - Jackson-compatible beans generated from
>>
>>> > activitystreams json schemas
>>
>>> >
>>
>>>
>>
>>> Tickets need to be created to remove the dependency on the Rave
>>
>>> ActivityStreams implementation then.
>>
>>>
>>
>>>
>>
>>> >    c) streams-contrib - a collection of implementation modules, two
>>
>>> > or more of which can be imported into a new project and woven
>>
>>> > together to create a customized performant data stream to execute
>>
>>> > with java jar, storm jar, hadoop jar, yarn jar, etc...
>>
>>> >    d) streams-config - a typesafe-based configuration scheme that
>>
>>> > allows individual modules and coordinator code to pull the
>>
>>> > configuration parameters they require or support from supplied
>>
>>> > defaults, environment variables, run-time property files, command
>>
>>> > line parameters, or accessible HTTP end-points.
>>
>>> >
>>
>>> > I'd love to see this project emerge as a code workspace where social
>>
>>> > data vendors and consumers collaborate to ease the process of
>>
>>> > integration, and facilitate data interchange with public data
>>
>>> > schemas and protocols such as xml and json activitystreams formats.
>>
>>> > No jvm-centric social data interoperability ecosystem exists today
>>
>>> > to my knowledge.  Hopefully this code will become a valuable
>>
>>> > starting point.  We have additional assets we will commit to
>>
>>> > streams-contrib in the coming months as we get them cleaned up,
>>
>>> > compliant with the streams-core interfaces, unit-tested, and real-world tested.
>>
>>> >
>>
>>> > I've also created a seperate external repository with some reference
>>
>>> > data pipelines that demonstrate how to assemble various modules into
>>
>>> > end-to-end streams at
>>
>>> > http://cp.mcafee.com/d/1jWVIp43qb9EVKOOY-OedTdFEIzDxRQTxNJd5x5Z5dB4s
>>
>>> > rjhp7f3HFLf6QQm66jhOYYCCrEgfSNlmI0kojGx8zauDYKrc9RgAhBfj-ndLuz_E9LZv
>>
>>> > ChMVxZYtRXBQQXFY-CYOUCPORQX8FGTKzOEuvkzaT0QSyrhdTVeZXTLuZXCXCOsVHkiP
>>
>>> > 2cFASO7bUojGhzlJrajz-GQb-7x0_W6QfcBU_dKc2XZPhOnEgfSNlmI3z2tk94pjQ_BMlisvRmDixpIxIWNXlJIj_w09JNdNAS2NF8Qg33p2AZxemDCy1tmkPh1eFEwxVwQg48X2NcTsT34c19HJncxO.  Today it contains a working twitter gardenhose to activitystreams java process, and a storm-based firehose processor that is still WIP.  More to come in this repo as well.
>>
>>> >
>>
>>>
>>
>>> Do you intend to contribute this to Apache?  If so, we should setup a
>>
>>> different area in SVN for it.
>>
>>>
>>
>>>
>>
>>> >
>>
>>> > Would love to get feedback on the concepts, patterns, and interfaces
>>
>>> > proposed.  Will seek to merge with master in the standard 72 hours
>>
>>> > unless anyone objects.
>>
>>> >
>>
>>> > Best,
>>
>>> > Steve Blackmon
>>
>>> >

Re: Integration, Documentation, Opportunities

Posted by Jason Letourneau <jl...@gmail.com>.
What about Google docs activity publishing they just released?  That
would be useful in many ways...

On Wed, Jan 22, 2014 at 2:34 AM, Steve Blackmon <sb...@apache.org> wrote:
> Danny: thanks for trying it out.  It would be excellent if you would
> test drive our examples and help with READMEs.  Good catch on the
> manifest - I was launching the jar with -cp.  I see no reason we
> couldn't formally open source our examples and possibly migrate them
> into the streams repository if the list thinks that is a good idea and
> will help develop them and add additional examples.  I've seen example
> implementations maintained both inside and outside of platform
> repositories.
>
>
> The next persistence modules we are preparing to contribute are hdfs
> and elasticsearch, both useful and surprisingly tricky to get right
> performance-wise.  A recursive link unwinder, boilerpipes article
> extractor, and lucene tagger exist but need some work.
>
>
> Not hard to come up with other modules that would be useful as part of
> a real-time data flow and relatively straight-forward to write.  A
> Rome-based RSS collector for example.  IRC listener.  OpenNLP?  Who
> has other ideas or prototypes we could integrate?
>
>
> -----Original Message-----
> From: Danny Sullivan [mailto:dsullivan7@hotmail.com]
> Sent: Tuesday, January 21, 2014 3:08 PM
> To: dev@streams.incubator.apache.org
> Subject: RE: Substantial commit to new branch
>
>
> Hey Steve,
>
> Cool stuff! Let me know when a new place in svn is set up for the
> examples you've written, I'd be happy to add running instructions to
> the wiki for new developers. I needed to change the pom.xml for
> twitter-sample-standalone to specify the main method to be
> <mainClass>org.apache.streams.twitter.example.TwitterSampleStandalone</mainClass>.
> But I think it should work after that. Perhaps I can submit a pull
> request for that.
>
> Looking forward to integrating other platforms with Streams, Danny
>
>> Date: Mon, 13 Jan 2014 09:57:13 -0600
>
>> Subject: Re: Substantial commit to new branch
>
>> From: m.ben.franklin@gmail.com
>
>> To: dev@streams.incubator.apache.org
>
>>
>
>> On Fri, Jan 10, 2014 at 4:47 PM, Steve Blackmon <sb...@apache.org>wrote:
>
>>
>
>> > Greetings,
>
>> >
>
>> > Yesterday I completed a push of code we've been using to ingest data
>
>> > streams from several major data providers, validate their messages,
>
>> > and convert them to activitystreams format.
>
>>
>
>>
>
>> Very cool.
>
>>
>
>>
>
>> > There are some new top-level
>
>> > modules, including
>
>> >    a) streams-core - standard interfaces for the atomic units of
>
>> > streams - providers, persisters, and processors
>
>> >    b) streams-pojo - Jackson-compatible beans generated from
>
>> > activitystreams json schemas
>
>> >
>
>>
>
>> Tickets need to be created to remove the dependency on the Rave
>
>> ActivityStreams implementation then.
>
>>
>
>>
>
>> >    c) streams-contrib - a collection of implementation modules, two
>
>> > or more of which can be imported into a new project and woven
>
>> > together to create a customized performant data stream to execute
>
>> > with java jar, storm jar, hadoop jar, yarn jar, etc...
>
>> >    d) streams-config - a typesafe-based configuration scheme that
>
>> > allows individual modules and coordinator code to pull the
>
>> > configuration parameters they require or support from supplied
>
>> > defaults, environment variables, run-time property files, command
>
>> > line parameters, or accessible HTTP end-points.
>
>> >
>
>> > I'd love to see this project emerge as a code workspace where social
>
>> > data vendors and consumers collaborate to ease the process of
>
>> > integration, and facilitate data interchange with public data
>
>> > schemas and protocols such as xml and json activitystreams formats.
>
>> > No jvm-centric social data interoperability ecosystem exists today
>
>> > to my knowledge.  Hopefully this code will become a valuable
>
>> > starting point.  We have additional assets we will commit to
>
>> > streams-contrib in the coming months as we get them cleaned up,
>
>> > compliant with the streams-core interfaces, unit-tested, and real-world tested.
>
>> >
>
>> > I've also created a seperate external repository with some reference
>
>> > data pipelines that demonstrate how to assemble various modules into
>
>> > end-to-end streams at
>
>> > http://cp.mcafee.com/d/1jWVIp43qb9EVKOOY-OedTdFEIzDxRQTxNJd5x5Z5dB4s
>
>> > rjhp7f3HFLf6QQm66jhOYYCCrEgfSNlmI0kojGx8zauDYKrc9RgAhBfj-ndLuz_E9LZv
>
>> > ChMVxZYtRXBQQXFY-CYOUCPORQX8FGTKzOEuvkzaT0QSyrhdTVeZXTLuZXCXCOsVHkiP
>
>> > 2cFASO7bUojGhzlJrajz-GQb-7x0_W6QfcBU_dKc2XZPhOnEgfSNlmI3z2tk94pjQ_BMlisvRmDixpIxIWNXlJIj_w09JNdNAS2NF8Qg33p2AZxemDCy1tmkPh1eFEwxVwQg48X2NcTsT34c19HJncxO.  Today it contains a working twitter gardenhose to activitystreams java process, and a storm-based firehose processor that is still WIP.  More to come in this repo as well.
>
>> >
>
>>
>
>> Do you intend to contribute this to Apache?  If so, we should setup a
>
>> different area in SVN for it.
>
>>
>
>>
>
>> >
>
>> > Would love to get feedback on the concepts, patterns, and interfaces
>
>> > proposed.  Will seek to merge with master in the standard 72 hours
>
>> > unless anyone objects.
>
>> >
>
>> > Best,
>
>> > Steve Blackmon
>
>> >