You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@streams.apache.org by Jason Letourneau <jl...@gmail.com> on 2013/01/31 21:00:53 UTC

Streams Subscriptions

I am curious on the group's thinking about subscriptions to activity
streams.  As I am stubbing out the end-to-end heartbeat on my proposed
architecture, I've just been working with URL sources as the
subscription mode.  Obviously this is a way over-simplification.

I know for shindig the social graph can be used, but we don't
necessarily have that.  Considering the mechanism for establishing a
new subscription stream (defined as aggregated individual activities
pulled from a varying array of sources) is POSTing to the Activity
Streams server to establish the channel (currently just a
subscriptions=url1,url2,url3 is the over simplified mechanism)...what
would people see as a reasonable way to establish subscriptions?  List
of userIds? Subjects?  How should these be represented?  I was
thinking of a JSON object, but any one have other thoughts?

Jason

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

I've modified the JSON a bit to account for these things - the filter
interface becomes very simple, expecting some string for query,
defines an evaluate method that returns boolean given an activity, and
the json specifies the @class - so an implementor could add a new osgi
bundle, implement the filter interface and tell streams to make a
subscription with the filter class specified (still some refining for
sure to go, but the latest is below):

{
    "authToken": "token",
    "@class":"org.apache.streams.osgi.components.activitysubscriber.impl.ActivityStreamsSubscriptionImpl",
    "filters": [
        {
            "@class":"org.apache.streams.osgi.components.activitysubscriber.impl.ActivityStreamsSubscriptionLuceneFilterImpl",
            "query": "string represented query of type expected by
filter implementation"
        }
    ],
    "outputs": [
        {
            "output_type": "http",
            "method": "post",
            "url": "http.example.com:8888",
            "delivery_frequency": "60",
            "max_size": "10485760",
            "auth_type": "none",
            "username": "username",
            "password": "password"
        }
    ]
}

On Sat, Feb 2, 2013 at 5:39 PM, Jason Letourneau
<jl...@gmail.com> wrote:
> Really great stuff - i will make sure filtering is componentized enough for this scenario in my next commit - and start being good about jira tracking ;)
>
> Sent from my iPhone
>
> On Feb 1, 2013, at 5:22 PM, Craig McClanahan <cr...@gmail.com> wrote:
>
>> On Fri, Feb 1, 2013 at 1:23 PM, Steve Blackmon [W2O Digital] <
>> sblackmon@w2odigital.com> wrote:
>>
>>> One nice thing lucene offer is support for nested conditional logic right
>>> in the query - so subscribers can request very complicated filters with a
>>> single filter tag in the json request.  Lucene is also the basis for
>>> querying elastic search, and some of the largest data providers such as
>>> Sysomos/Marketwire - within W2O we have a large library of lucene queries
>>> and it would be great to use those with minimal modification to configure
>>> streams.
>>>
>>> Lucene syntax makes sense to me ... I'd rather work on the more
>> "interesting" problems than designing a filter syntax :-).
>>
>>
>>> But this brings up a wider topic regarding adoption - many users will be
>>> migrating or integrating solutions where they filter based on lucene, or
>>> solr, or ham crest, or regex, etcŠ  So a plug-in architecture that would
>>> let users who can compile java embed whatever filtering logic works best
>>> for them into streams, without having to commit to master would be
>>> advisable.  Bonus points if those plugins can bring their own class path
>>> via osgi or similar approach.
>>>
>>> So, a "filter" would become just a set of Lucene (by default) search
>> expressions as strings, with pluggability for how the strings actually get
>> interpreted?  I like it.
>>
>>
>>> Steve Blackmon
>>> Director, Data Sciences
>>>
>>> 101 W. 6th Street
>>> Austin, Texas 78701
>>> cell 512.965.0451 | work 512.402.6366
>>> twitter @steveblackmon
>>>
>>> Craig

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

Really great stuff - i will make sure filtering is componentized enough for this scenario in my next commit - and start being good about jira tracking ;)

Sent from my iPhone

On Feb 1, 2013, at 5:22 PM, Craig McClanahan <cr...@gmail.com> wrote:

> On Fri, Feb 1, 2013 at 1:23 PM, Steve Blackmon [W2O Digital] <
> sblackmon@w2odigital.com> wrote:
> 
>> One nice thing lucene offer is support for nested conditional logic right
>> in the query - so subscribers can request very complicated filters with a
>> single filter tag in the json request.  Lucene is also the basis for
>> querying elastic search, and some of the largest data providers such as
>> Sysomos/Marketwire - within W2O we have a large library of lucene queries
>> and it would be great to use those with minimal modification to configure
>> streams.
>> 
>> Lucene syntax makes sense to me ... I'd rather work on the more
> "interesting" problems than designing a filter syntax :-).
> 
> 
>> But this brings up a wider topic regarding adoption - many users will be
>> migrating or integrating solutions where they filter based on lucene, or
>> solr, or ham crest, or regex, etcŠ  So a plug-in architecture that would
>> let users who can compile java embed whatever filtering logic works best
>> for them into streams, without having to commit to master would be
>> advisable.  Bonus points if those plugins can bring their own class path
>> via osgi or similar approach.
>> 
>> So, a "filter" would become just a set of Lucene (by default) search
> expressions as strings, with pluggability for how the strings actually get
> interpreted?  I like it.
> 
> 
>> Steve Blackmon
>> Director, Data Sciences
>> 
>> 101 W. 6th Street
>> Austin, Texas 78701
>> cell 512.965.0451 | work 512.402.6366
>> twitter @steveblackmon
>> 
>> Craig

Re: Streams Subscriptions

Posted by Craig McClanahan <cr...@gmail.com>.

On Fri, Feb 1, 2013 at 1:23 PM, Steve Blackmon [W2O Digital] <
sblackmon@w2odigital.com> wrote:

> One nice thing lucene offer is support for nested conditional logic right
> in the query - so subscribers can request very complicated filters with a
> single filter tag in the json request.  Lucene is also the basis for
> querying elastic search, and some of the largest data providers such as
> Sysomos/Marketwire - within W2O we have a large library of lucene queries
> and it would be great to use those with minimal modification to configure
> streams.
>
> Lucene syntax makes sense to me ... I'd rather work on the more
"interesting" problems than designing a filter syntax :-).


> But this brings up a wider topic regarding adoption - many users will be
> migrating or integrating solutions where they filter based on lucene, or
> solr, or ham crest, or regex, etcŠ  So a plug-in architecture that would
> let users who can compile java embed whatever filtering logic works best
> for them into streams, without having to commit to master would be
> advisable.  Bonus points if those plugins can bring their own class path
> via osgi or similar approach.
>
> So, a "filter" would become just a set of Lucene (by default) search
expressions as strings, with pluggability for how the strings actually get
interpreted?  I like it.


> Steve Blackmon
> Director, Data Sciences
>
> 101 W. 6th Street
> Austin, Texas 78701
> cell 512.965.0451 | work 512.402.6366
> twitter @steveblackmon
>
> Craig

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

I definitely think we can support the osgi wish based on our current component break down 

Sent from my iPhone

On Feb 1, 2013, at 4:23 PM, "Steve Blackmon [W2O Digital]" <sb...@w2odigital.com> wrote:

> One nice thing lucene offer is support for nested conditional logic right
> in the query - so subscribers can request very complicated filters with a
> single filter tag in the json request.  Lucene is also the basis for
> querying elastic search, and some of the largest data providers such as
> Sysomos/Marketwire - within W2O we have a large library of lucene queries
> and it would be great to use those with minimal modification to configure
> streams.
> 
> But this brings up a wider topic regarding adoption - many users will be
> migrating or integrating solutions where they filter based on lucene, or
> solr, or ham crest, or regex, etcŠ  So a plug-in architecture that would
> let users who can compile java embed whatever filtering logic works best
> for them into streams, without having to commit to master would be
> advisable.  Bonus points if those plugins can bring their own class path
> via osgi or similar approach.
> 
> Steve Blackmon
> Director, Data Sciences
> 
> 101 W. 6th Street
> Austin, Texas 78701
> cell 512.965.0451 | work 512.402.6366
> twitter @steveblackmon
> 
> 
> 
> 
> 
> 
> 
> On 2/1/13 3:08 PM, "Jason Letourneau" <jl...@gmail.com> wrote:
> 
>> that seems like a great place to go - I'm not personally familiar with
>> the DSL syntax of lucene, but I am familiar with the project
>> 
>> Jason
>> 
>> On Fri, Feb 1, 2013 at 2:34 PM, Steve Blackmon [W2O Digital]
>> <sb...@w2odigital.com> wrote:
>>> What do you think about standardizing on lucene (or at least supporting
>>> it
>>> natively) as a DSL to describe textual filters?
>>> 
>>> Steve Blackmon
>>> Director, Data Sciences
>>> 
>>> 101 W. 6th Street
>>> Austin, Texas 78701
>>> cell 512.965.0451 | work 512.402.6366
>>> twitter @steveblackmon
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 2/1/13 1:31 PM, "Jason Letourneau" <jl...@gmail.com> wrote:
>>> 
>>>> slight iteration for clarity:
>>>> 
>>>> {
>>>>   "auth_token": "token",
>>>>   "filters": [
>>>>       {
>>>>           "field": "fieldname",
>>>>           "comparison_operator": "operator",
>>>>           "value_set": [
>>>>               "val1",
>>>>               "val2"
>>>>           ]
>>>>       }
>>>>   ],
>>>>   "outputs": [
>>>>       {
>>>>           "output_type": "http",
>>>>           "method": "post",
>>>>           "url": "http.example.com:8888",
>>>>           "delivery_frequency": "60",
>>>>           "max_size": "10485760",
>>>>           "auth_type": "none",
>>>>           "username": "username",
>>>>           "password": "password"
>>>>       }
>>>>   ]
>>>> }
>>>> 
>>>> On Fri, Feb 1, 2013 at 12:51 PM, Jason Letourneau
>>>> <jl...@gmail.com> wrote:
>>>>> So a subscription URL (result of setting up a subscription) is for all
>>>>> intents and purposes representative of a set of filters.  That
>>>>> subscription can be told to do a variety of things for delivery to the
>>>>> subscriber, but the identity of the subscription is rooted in its
>>>>> filters.  Posting additional filters to the subscription URL or
>>>>> additional output configurations affect the behavior of that
>>>>> subscription by either adding more filters or more outputs (removal as
>>>>> well).
>>>>> 
>>>>> On Fri, Feb 1, 2013 at 12:17 PM, Craig McClanahan <cr...@gmail.com>
>>>>> wrote:
>>>>>> A couple of thoughts.
>>>>>> 
>>>>>> * On "outputs" you list "username" and "password" as possible fields.
>>>>>>  I presume that specifying these would imply using HTTP Basic auth?
>>>>>>  We might want to consider different options as well.
>>>>>> 
>>>>>> * From my (possibly myopic :-) viewpoint, the filtering and delivery
>>>>>>  decisions are different object types.  I'd like to be able to
>>>>>> register
>>>>>>  my set of filters and get a unique identifier for them, and then
>>>>>>  separately be able to say "send the results of subscription 123
>>>>>>  to this webhook URL every 60 minutes".
>>>>>> 
>>>>>> * Regarding query syntax, pretty much any sort of simple patterns
>>>>>>  are probably not going to be sufficient for some use cases.  Maybe
>>>>>>  we should offer that as simple defaults, but also support falling
>>>>>> back
>>>>>>  to some sort of SQL-like syntax (i.e. what JIRA does on the
>>>>>>  advanced search).
>>>>>> 
>>>>>> Craig
>>>>>> 
>>>>>> 
>>>>>> On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau
>>>>>> <jl...@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Based on Steve and Craig's feedback, I've come up with something
>>>>>>> that
>>>>>>> I think can work.  Below it specifies that:
>>>>>>> 1) you can set up more than one subscription at a time
>>>>>>> 2) each subscription can have many outputs
>>>>>>> 3) each subscription can have many filters
>>>>>>> 
>>>>>>> The details of the config would do things like determine the
>>>>>>> behavior
>>>>>>> of the stream delivery (is it posted back or is the subscriber
>>>>>>> polling
>>>>>>> for instance).  Also, all subscriptions created in this way would be
>>>>>>> accessed through a single URL.
>>>>>>> 
>>>>>>> {
>>>>>>>    "auth_token": "token",
>>>>>>>    "subscriptions": [
>>>>>>>        {
>>>>>>>            "outputs": [
>>>>>>>                {
>>>>>>>                    "output_type": "http",
>>>>>>>                    "method": "post",
>>>>>>>                    "url": "http.example.com:8888",
>>>>>>>                    "delivery_frequency": "60",
>>>>>>>                    "max_size": "10485760",
>>>>>>>                    "auth_type": "none",
>>>>>>>                    "username": "username",
>>>>>>>                    "password": "password"
>>>>>>>                }
>>>>>>>            ]
>>>>>>>        },
>>>>>>>        {
>>>>>>>            "filters": [
>>>>>>>                {
>>>>>>>                    "field": "fieldname",
>>>>>>>                    "comparison_operator": "operator",
>>>>>>>                    "value_set": [
>>>>>>>                        "val1",
>>>>>>>                        "val2"
>>>>>>>                    ]
>>>>>>>                }
>>>>>>>            ]
>>>>>>>        }
>>>>>>>    ]
>>>>>>> }
>>>>>>> 
>>>>>>> Thoughts?
>>>>>>> 
>>>>>>> Jason
>>>>>>> 
>>>>>>> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan
>>>>>>> <cr...@gmail.com>
>>>>>>> wrote:
>>>>>>>> Welcome Steve!
>>>>>>>> 
>>>>>>>> DataSift's UI to set these things up is indeed pretty cool.  I
>>>>>>> think
>>>>>>>> what
>>>>>>>> we're talking about here is more what the internal REST APIs
>>>>>>> between the
>>>>>>>> UI
>>>>>>>> and the back end might look like.
>>>>>>>> 
>>>>>>>> I also think we should deliberately separate the filter definition
>>>>>>> of a
>>>>>>>> "subscription" from the instructions on how the data gets
>>>>>>> delivered.  I
>>>>>>>> could see use cases for any or all of:
>>>>>>>> * Polling with a filter on oldest date of interest
>>>>>>>> * Webhook that gets updated at some specified interval
>>>>>>>> * URL to which the Streams server would periodically POST
>>>>>>>>  new activities (in case I don't have webhooks set up)
>>>>>>>> 
>>>>>>>> Separately, looking at DataSift is a reminder we will want to be
>>>>>>> able to
>>>>>>>> filter on words inside an activity stream value like "subject" or
>>>>>>>> "content", not just on the entire value.
>>>>>>>> 
>>>>>>>> Craig
>>>>>>>> 
>>>>>>>> On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
>>>>>>>> <jl...@gmail.com>wrote:
>>>>>>>> 
>>>>>>>>> Hi Steve - thanks for the input and congrats on your first post
>>>>>>> - I
>>>>>>>>> think what you are describing is where Craig and I are circling
>>>>>>> around
>>>>>>>>> (or something similar anyways) - the details on that POST request
>>>>>>> are
>>>>>>>>> really helpful in particular.  I'll try and put something
>>>>>>> together
>>>>>>>>> tomorrow that would be a start for the "setup" request (and
>>>>>>> subsequent
>>>>>>>>> additional configuration after the subscription is initialized)
>>>>>>> and
>>>>>>>>> post back to the group.
>>>>>>>>> 
>>>>>>>>> Jason
>>>>>>>>> 
>>>>>>>>> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>>>>>>>>> <sb...@w2odigital.com> wrote:
>>>>>>>>>> First post from me (btw I am Steve, stoked about this project
>>>>>>> and
>>>>>>>>>> meeting
>>>>>>>>>> everyone eventually.)
>>>>>>>>>> 
>>>>>>>>>> Sorry if I missed the point of the thread, but I think this is
>>>>>>>>>> related
>>>>>>>>> and
>>>>>>>>>> might be educational for some in the group.
>>>>>>>>>> 
>>>>>>>>>> I like the way DataSift's API lets you establish streams - you
>>>>>>> POST a
>>>>>>>>>> definition, it returns a hash, and thereafter their service
>>>>>>> follows
>>>>>>>>>> the
>>>>>>>>>> instructions you gave it as new messages meet the filter you
>>>>>>> defined.
>>>>>>>>>> In
>>>>>>>>>> addition, once a stream exists, then you can set up listeners
>>>>>>> on
>>>>>>> that
>>>>>>>>>> specific hash via web sockets with the hash.
>>>>>>>>>> 
>>>>>>>>>> For example, here is how you instruct DataSift to push new
>>>>>>> messages
>>>>>>>>>> meeting your criteria to a WebHooks end-point.
>>>>>>>>>> 
>>>>>>>>>> curl -X POST 'https://api.datasift.com/push/create' \
>>>>>>>>>> -d 'name=connectorhttp' \
>>>>>>>>>> -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>>>>>>>>>> -d 'output_type=http' \
>>>>>>>>>> -d 'output_params.method=post' \
>>>>>>>>>> -d 'output_params.url=http.example.com:8888' \
>>>>>>>>>> -d 'output_params.use_gzip' \
>>>>>>>>>> -d 'output_params.delivery_frequency=60' \
>>>>>>>>>> -d 'output_params.max_size=10485760' \
>>>>>>>>>> -d 'output_params.verify_ssl=false' \
>>>>>>>>>> -d 'output_params.auth.type=none' \
>>>>>>>>>> -d 'output_params.auth.username=YourHTTPServerUsername' \
>>>>>>>>>> -d 'output_params.auth.password=YourHTTPServerPassword' \
>>>>>>>>>> -H 'Auth: datasift-user:your-datasift-api-key
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Now new messages get pushed to me every 60 seconds, and I can
>>>>>>> get the
>>>>>>>>> feed
>>>>>>>>>> in real-time like this:
>>>>>>>>>> 
>>>>>>>>>> var websocketsUser = 'datasift-user';
>>>>>>>>>> var websocketsHost = 'websocket.datasift.com';
>>>>>>>>>> var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>>>>>>>>>> var apiKey = 'your-datasift-api-key';
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> var ws = new
>>>>>>> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocke
>>>>>>> ts
>>>>>>> User
>>>>>>>>>> +'&api_key='+apiKey);
>>>>>>>>>> 
>>>>>>>>>> ws.onopen = function(evt) {
>>>>>>>>>>    // connection event
>>>>>>>>>>        $("#stream").append('open: '+evt.data+'<br/>');
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> ws.onmessage = function(evt) {
>>>>>>>>>>    // parse received message
>>>>>>>>>>        $("#stream").append('message: '+evt.data+'<br/>');
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> ws.onclose = function(evt) {
>>>>>>>>>>    // parse event
>>>>>>>>>>        $("#stream").append('close: '+evt.data+'<br/>');
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> // No event object is passed to the event callback, so no
>>>>>>> useful
>>>>>>>>> debugging
>>>>>>>>>> can be done
>>>>>>>>>> ws.onerror = function() {
>>>>>>>>>>    // Some error occurred
>>>>>>>>>>        $("#stream").append('error: '+evt.data+'<br/>');
>>>>>>>>>> }
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> At W2OGroup we have built utility libraries for receiving and
>>>>>>>>>> processing
>>>>>>>>>> Json object streams from data sift in Storm/Kafka that I'm
>>>>>>> interested
>>>>>>>>>> in
>>>>>>>>>> extending to work with Streams, and can probably commit to the
>>>>>>>>>> project if
>>>>>>>>>> the community would find them useful.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Steve Blackmon
>>>>>>>>>> Director, Data Sciences
>>>>>>>>>> 
>>>>>>>>>> 101 W. 6th Street
>>>>>>>>>> Austin, Texas 78701
>>>>>>>>>> cell 512.965.0451 | work 512.402.6366
>>>>>>>>>> twitter @steveblackmon
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com>
>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> We'll probably want some way to do the equivalent of ">", ">=",
>>>>>>> "<",
>>>>>>>>> "<=",
>>>>>>>>>>> and "!=" in addition to the implicit "equal" that I assume you
>>>>>>> mean
>>>>>>>>>>> in
>>>>>>>>>>> this
>>>>>>>>>>> example.
>>>>>>>>>>> 
>>>>>>>>>>> Craig
>>>>>>>>>>> 
>>>>>>>>>>> On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>>>>>>>>>>> <jl...@gmail.com>wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> I really like this - this is somewhat what I was getting at
>>>>>>> with
>>>>>>>>>>>> the
>>>>>>>>>>>> JSON object i.e. POST:
>>>>>>>>>>>> {
>>>>>>>>>>>> "subscriptions":
>>>>>>>>>>>> [{"activityField":"value"},
>>>>>>>>>>>> {"activityField":"value",
>>>>>>>>>>>> "anotherActivityField":"value" }
>>>>>>>>>>>> ]
>>>>>>>>>>>> }
>>>>>>>>>>>> 
>>>>>>>>>>>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan
>>>>>>>>>>>> <cr...@gmail.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>>>>>>>>>>>>> <jl...@gmail.com>wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I am curious on the group's thinking about subscriptions
>>>>>>> to
>>>>>>>>>>>>>> activity
>>>>>>>>>>>>>> streams.  As I am stubbing out the end-to-end heartbeat on
>>>>>>> my
>>>>>>>>>>>> proposed
>>>>>>>>>>>>>> architecture, I've just been working with URL sources as
>>>>>>> the
>>>>>>>>>>>>>> subscription mode.  Obviously this is a way
>>>>>>> over-simplification.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I know for shindig the social graph can be used, but we
>>>>>>> don't
>>>>>>>>>>>>>> necessarily have that.  Considering the mechanism for
>>>>>>>>>>>>>> establishing a
>>>>>>>>>>>>>> new subscription stream (defined as aggregated individual
>>>>>>>>>>>>>> activities
>>>>>>>>>>>>>> pulled from a varying array of sources) is POSTing to the
>>>>>>>>>>>>>> Activity
>>>>>>>>>>>>>> Streams server to establish the channel (currently just a
>>>>>>>>>>>>>> subscriptions=url1,url2,url3 is the over simplified
>>>>>>>>> mechanism)...what
>>>>>>>>>>>>>> would people see as a reasonable way to establish
>>>>>>> subscriptions?
>>>>>>>>>>>> List
>>>>>>>>>>>>>> of userIds? Subjects?  How should these be represented?  I
>>>>>>> was
>>>>>>>>>>>>>> thinking of a JSON object, but any one have other
>>>>>>> thoughts?
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jason
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One idea would be take some inspiration from how JIRA lets
>>>>>>> you
>>>>>>>>>>>>> (in
>>>>>>>>>>>> effect)
>>>>>>>>>>>>> create a WHERE clause that looks at any fields (in all the
>>>>>>>>>>>>> activities
>>>>>>>>>>>>> flowing through the server) that you want.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Example filter criteria
>>>>>>>>>>>>> * provider.id = 'xxx' // Filter on a particular provider
>>>>>>>>>>>>> * verb = 'yyy'
>>>>>>>>>>>>> * object.type = 'blogpost'
>>>>>>>>>>>>> and you'd want to accept more than one value (effectively
>>>>>>>>>>>>> creating OR
>>>>>>>>>>>> or
>>>>>>>>>>>> IN
>>>>>>>>>>>>> type clauses).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For completeness, I'd want to be able to specify more than
>>>>>>> one
>>>>>>>>>>>>> filter
>>>>>>>>>>>>> expression in the same subscription.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Craig
>

Re: Streams Subscriptions

Posted by "Steve Blackmon [W2O Digital]" <sb...@w2odigital.com>.

One nice thing lucene offer is support for nested conditional logic right
in the query - so subscribers can request very complicated filters with a
single filter tag in the json request.  Lucene is also the basis for
querying elastic search, and some of the largest data providers such as
Sysomos/Marketwire - within W2O we have a large library of lucene queries
and it would be great to use those with minimal modification to configure
streams.

But this brings up a wider topic regarding adoption - many users will be
migrating or integrating solutions where they filter based on lucene, or
solr, or ham crest, or regex, etcŠ  So a plug-in architecture that would
let users who can compile java embed whatever filtering logic works best
for them into streams, without having to commit to master would be
advisable.  Bonus points if those plugins can bring their own class path
via osgi or similar approach.

Steve Blackmon
Director, Data Sciences

101 W. 6th Street
Austin, Texas 78701
cell 512.965.0451 | work 512.402.6366
twitter @steveblackmon







On 2/1/13 3:08 PM, "Jason Letourneau" <jl...@gmail.com> wrote:

>that seems like a great place to go - I'm not personally familiar with
>the DSL syntax of lucene, but I am familiar with the project
>
>Jason
>
>On Fri, Feb 1, 2013 at 2:34 PM, Steve Blackmon [W2O Digital]
><sb...@w2odigital.com> wrote:
>> What do you think about standardizing on lucene (or at least supporting
>>it
>> natively) as a DSL to describe textual filters?
>>
>> Steve Blackmon
>> Director, Data Sciences
>>
>> 101 W. 6th Street
>> Austin, Texas 78701
>> cell 512.965.0451 | work 512.402.6366
>> twitter @steveblackmon
>>
>>
>>
>>
>>
>>
>>
>> On 2/1/13 1:31 PM, "Jason Letourneau" <jl...@gmail.com> wrote:
>>
>>>slight iteration for clarity:
>>>
>>>{
>>>    "auth_token": "token",
>>>    "filters": [
>>>        {
>>>            "field": "fieldname",
>>>            "comparison_operator": "operator",
>>>            "value_set": [
>>>                "val1",
>>>                "val2"
>>>            ]
>>>        }
>>>    ],
>>>    "outputs": [
>>>        {
>>>            "output_type": "http",
>>>            "method": "post",
>>>            "url": "http.example.com:8888",
>>>            "delivery_frequency": "60",
>>>            "max_size": "10485760",
>>>            "auth_type": "none",
>>>            "username": "username",
>>>            "password": "password"
>>>        }
>>>    ]
>>>}
>>>
>>>On Fri, Feb 1, 2013 at 12:51 PM, Jason Letourneau
>>><jl...@gmail.com> wrote:
>>>> So a subscription URL (result of setting up a subscription) is for all
>>>> intents and purposes representative of a set of filters.  That
>>>> subscription can be told to do a variety of things for delivery to the
>>>> subscriber, but the identity of the subscription is rooted in its
>>>> filters.  Posting additional filters to the subscription URL or
>>>> additional output configurations affect the behavior of that
>>>> subscription by either adding more filters or more outputs (removal as
>>>> well).
>>>>
>>>> On Fri, Feb 1, 2013 at 12:17 PM, Craig McClanahan <cr...@gmail.com>
>>>>wrote:
>>>>> A couple of thoughts.
>>>>>
>>>>> * On "outputs" you list "username" and "password" as possible fields.
>>>>>   I presume that specifying these would imply using HTTP Basic auth?
>>>>>   We might want to consider different options as well.
>>>>>
>>>>> * From my (possibly myopic :-) viewpoint, the filtering and delivery
>>>>>   decisions are different object types.  I'd like to be able to
>>>>>register
>>>>>   my set of filters and get a unique identifier for them, and then
>>>>>   separately be able to say "send the results of subscription 123
>>>>>   to this webhook URL every 60 minutes".
>>>>>
>>>>> * Regarding query syntax, pretty much any sort of simple patterns
>>>>>   are probably not going to be sufficient for some use cases.  Maybe
>>>>>   we should offer that as simple defaults, but also support falling
>>>>>back
>>>>>   to some sort of SQL-like syntax (i.e. what JIRA does on the
>>>>>   advanced search).
>>>>>
>>>>> Craig
>>>>>
>>>>>
>>>>> On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau
>>>>><jl...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Based on Steve and Craig's feedback, I've come up with something
>>>>>>that
>>>>>> I think can work.  Below it specifies that:
>>>>>> 1) you can set up more than one subscription at a time
>>>>>> 2) each subscription can have many outputs
>>>>>> 3) each subscription can have many filters
>>>>>>
>>>>>> The details of the config would do things like determine the
>>>>>>behavior
>>>>>> of the stream delivery (is it posted back or is the subscriber
>>>>>>polling
>>>>>> for instance).  Also, all subscriptions created in this way would be
>>>>>> accessed through a single URL.
>>>>>>
>>>>>> {
>>>>>>     "auth_token": "token",
>>>>>>     "subscriptions": [
>>>>>>         {
>>>>>>             "outputs": [
>>>>>>                 {
>>>>>>                     "output_type": "http",
>>>>>>                     "method": "post",
>>>>>>                     "url": "http.example.com:8888",
>>>>>>                     "delivery_frequency": "60",
>>>>>>                     "max_size": "10485760",
>>>>>>                     "auth_type": "none",
>>>>>>                     "username": "username",
>>>>>>                     "password": "password"
>>>>>>                 }
>>>>>>             ]
>>>>>>         },
>>>>>>         {
>>>>>>             "filters": [
>>>>>>                 {
>>>>>>                     "field": "fieldname",
>>>>>>                     "comparison_operator": "operator",
>>>>>>                     "value_set": [
>>>>>>                         "val1",
>>>>>>                         "val2"
>>>>>>                     ]
>>>>>>                 }
>>>>>>             ]
>>>>>>         }
>>>>>>     ]
>>>>>> }
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>>> Jason
>>>>>>
>>>>>> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan
>>>>>><cr...@gmail.com>
>>>>>> wrote:
>>>>>> > Welcome Steve!
>>>>>> >
>>>>>> > DataSift's UI to set these things up is indeed pretty cool.  I
>>>>>>think
>>>>>> > what
>>>>>> > we're talking about here is more what the internal REST APIs
>>>>>>between the
>>>>>> > UI
>>>>>> > and the back end might look like.
>>>>>> >
>>>>>> > I also think we should deliberately separate the filter definition
>>>>>>of a
>>>>>> > "subscription" from the instructions on how the data gets
>>>>>>delivered.  I
>>>>>> > could see use cases for any or all of:
>>>>>> > * Polling with a filter on oldest date of interest
>>>>>> > * Webhook that gets updated at some specified interval
>>>>>> > * URL to which the Streams server would periodically POST
>>>>>> >   new activities (in case I don't have webhooks set up)
>>>>>> >
>>>>>> > Separately, looking at DataSift is a reminder we will want to be
>>>>>>able to
>>>>>> > filter on words inside an activity stream value like "subject" or
>>>>>> > "content", not just on the entire value.
>>>>>> >
>>>>>> > Craig
>>>>>> >
>>>>>> > On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
>>>>>> > <jl...@gmail.com>wrote:
>>>>>> >
>>>>>> >> Hi Steve - thanks for the input and congrats on your first post
>>>>>>- I
>>>>>> >> think what you are describing is where Craig and I are circling
>>>>>>around
>>>>>> >> (or something similar anyways) - the details on that POST request
>>>>>>are
>>>>>> >> really helpful in particular.  I'll try and put something
>>>>>>together
>>>>>> >> tomorrow that would be a start for the "setup" request (and
>>>>>>subsequent
>>>>>> >> additional configuration after the subscription is initialized)
>>>>>>and
>>>>>> >> post back to the group.
>>>>>> >>
>>>>>> >> Jason
>>>>>> >>
>>>>>> >> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>>>>>> >> <sb...@w2odigital.com> wrote:
>>>>>> >> > First post from me (btw I am Steve, stoked about this project
>>>>>>and
>>>>>> >> > meeting
>>>>>> >> > everyone eventually.)
>>>>>> >> >
>>>>>> >> > Sorry if I missed the point of the thread, but I think this is
>>>>>> >> > related
>>>>>> >> and
>>>>>> >> > might be educational for some in the group.
>>>>>> >> >
>>>>>> >> > I like the way DataSift's API lets you establish streams - you
>>>>>>POST a
>>>>>> >> > definition, it returns a hash, and thereafter their service
>>>>>>follows
>>>>>> >> > the
>>>>>> >> > instructions you gave it as new messages meet the filter you
>>>>>>defined.
>>>>>> >> > In
>>>>>> >> > addition, once a stream exists, then you can set up listeners
>>>>>>on
>>>>>>that
>>>>>> >> > specific hash via web sockets with the hash.
>>>>>> >> >
>>>>>> >> > For example, here is how you instruct DataSift to push new
>>>>>>messages
>>>>>> >> > meeting your criteria to a WebHooks end-point.
>>>>>> >> >
>>>>>> >> > curl -X POST 'https://api.datasift.com/push/create' \
>>>>>> >> > -d 'name=connectorhttp' \
>>>>>> >> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>>>>>> >> > -d 'output_type=http' \
>>>>>> >> > -d 'output_params.method=post' \
>>>>>> >> > -d 'output_params.url=http.example.com:8888' \
>>>>>> >> > -d 'output_params.use_gzip' \
>>>>>> >> > -d 'output_params.delivery_frequency=60' \
>>>>>> >> > -d 'output_params.max_size=10485760' \
>>>>>> >> > -d 'output_params.verify_ssl=false' \
>>>>>> >> > -d 'output_params.auth.type=none' \
>>>>>> >> > -d 'output_params.auth.username=YourHTTPServerUsername' \
>>>>>> >> > -d 'output_params.auth.password=YourHTTPServerPassword' \
>>>>>> >> > -H 'Auth: datasift-user:your-datasift-api-key
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > Now new messages get pushed to me every 60 seconds, and I can
>>>>>>get the
>>>>>> >> feed
>>>>>> >> > in real-time like this:
>>>>>> >> >
>>>>>> >> > var websocketsUser = 'datasift-user';
>>>>>> >> > var websocketsHost = 'websocket.datasift.com';
>>>>>> >> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>>>>>> >> > var apiKey = 'your-datasift-api-key';
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > var ws = new
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>>WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocke
>>>>>>ts
>>>>>>User
>>>>>> >> > +'&api_key='+apiKey);
>>>>>> >> >
>>>>>> >> > ws.onopen = function(evt) {
>>>>>> >> >     // connection event
>>>>>> >> >         $("#stream").append('open: '+evt.data+'<br/>');
>>>>>> >> > }
>>>>>> >> >
>>>>>> >> > ws.onmessage = function(evt) {
>>>>>> >> >     // parse received message
>>>>>> >> >         $("#stream").append('message: '+evt.data+'<br/>');
>>>>>> >> > }
>>>>>> >> >
>>>>>> >> > ws.onclose = function(evt) {
>>>>>> >> >     // parse event
>>>>>> >> >         $("#stream").append('close: '+evt.data+'<br/>');
>>>>>> >> > }
>>>>>> >> >
>>>>>> >> > // No event object is passed to the event callback, so no
>>>>>>useful
>>>>>> >> debugging
>>>>>> >> > can be done
>>>>>> >> > ws.onerror = function() {
>>>>>> >> >     // Some error occurred
>>>>>> >> >         $("#stream").append('error: '+evt.data+'<br/>');
>>>>>> >> > }
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > At W2OGroup we have built utility libraries for receiving and
>>>>>> >> > processing
>>>>>> >> > Json object streams from data sift in Storm/Kafka that I'm
>>>>>>interested
>>>>>> >> > in
>>>>>> >> > extending to work with Streams, and can probably commit to the
>>>>>> >> > project if
>>>>>> >> > the community would find them useful.
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > Steve Blackmon
>>>>>> >> > Director, Data Sciences
>>>>>> >> >
>>>>>> >> > 101 W. 6th Street
>>>>>> >> > Austin, Texas 78701
>>>>>> >> > cell 512.965.0451 | work 512.402.6366
>>>>>> >> > twitter @steveblackmon
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> >
>>>>>> >> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com>
>>>>>>wrote:
>>>>>> >> >
>>>>>> >> >>We'll probably want some way to do the equivalent of ">", ">=",
>>>>>>"<",
>>>>>> >> "<=",
>>>>>> >> >>and "!=" in addition to the implicit "equal" that I assume you
>>>>>>mean
>>>>>> >> >> in
>>>>>> >> >>this
>>>>>> >> >>example.
>>>>>> >> >>
>>>>>> >> >>Craig
>>>>>> >> >>
>>>>>> >> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>>>>>> >> >><jl...@gmail.com>wrote:
>>>>>> >> >>
>>>>>> >> >>> I really like this - this is somewhat what I was getting at
>>>>>>with
>>>>>> >> >>> the
>>>>>> >> >>> JSON object i.e. POST:
>>>>>> >> >>> {
>>>>>> >> >>> "subscriptions":
>>>>>> >> >>> [{"activityField":"value"},
>>>>>> >> >>> {"activityField":"value",
>>>>>> >> >>>  "anotherActivityField":"value" }
>>>>>> >> >>> ]
>>>>>> >> >>> }
>>>>>> >> >>>
>>>>>> >> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan
>>>>>> >> >>> <cr...@gmail.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>>>>>> >> >>> > <jl...@gmail.com>wrote:
>>>>>> >> >>> >
>>>>>> >> >>> >> I am curious on the group's thinking about subscriptions
>>>>>>to
>>>>>> >> >>> >> activity
>>>>>> >> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on
>>>>>>my
>>>>>> >> >>>proposed
>>>>>> >> >>> >> architecture, I've just been working with URL sources as
>>>>>>the
>>>>>> >> >>> >> subscription mode.  Obviously this is a way
>>>>>>over-simplification.
>>>>>> >> >>> >>
>>>>>> >> >>> >> I know for shindig the social graph can be used, but we
>>>>>>don't
>>>>>> >> >>> >> necessarily have that.  Considering the mechanism for
>>>>>> >> >>> >> establishing a
>>>>>> >> >>> >> new subscription stream (defined as aggregated individual
>>>>>> >> >>> >> activities
>>>>>> >> >>> >> pulled from a varying array of sources) is POSTing to the
>>>>>> >> >>> >> Activity
>>>>>> >> >>> >> Streams server to establish the channel (currently just a
>>>>>> >> >>> >> subscriptions=url1,url2,url3 is the over simplified
>>>>>> >> mechanism)...what
>>>>>> >> >>> >> would people see as a reasonable way to establish
>>>>>>subscriptions?
>>>>>> >> >>>List
>>>>>> >> >>> >> of userIds? Subjects?  How should these be represented?  I
>>>>>>was
>>>>>> >> >>> >> thinking of a JSON object, but any one have other
>>>>>>thoughts?
>>>>>> >> >>> >>
>>>>>> >> >>> >> Jason
>>>>>> >> >>> >>
>>>>>> >> >>> >
>>>>>> >> >>> > One idea would be take some inspiration from how JIRA lets
>>>>>>you
>>>>>> >> >>> > (in
>>>>>> >> >>> effect)
>>>>>> >> >>> > create a WHERE clause that looks at any fields (in all the
>>>>>> >> >>> > activities
>>>>>> >> >>> > flowing through the server) that you want.
>>>>>> >> >>> >
>>>>>> >> >>> > Example filter criteria
>>>>>> >> >>> > * provider.id = 'xxx' // Filter on a particular provider
>>>>>> >> >>> > * verb = 'yyy'
>>>>>> >> >>> > * object.type = 'blogpost'
>>>>>> >> >>> > and you'd want to accept more than one value (effectively
>>>>>> >> >>> > creating OR
>>>>>> >> >>>or
>>>>>> >> >>> IN
>>>>>> >> >>> > type clauses).
>>>>>> >> >>> >
>>>>>> >> >>> > For completeness, I'd want to be able to specify more than
>>>>>>one
>>>>>> >> >>> > filter
>>>>>> >> >>> > expression in the same subscription.
>>>>>> >> >>> >
>>>>>> >> >>> > Craig
>>>>>> >> >>>
>>>>>> >> >
>>>>>> >>
>>>>>
>>>>>
>>

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

that seems like a great place to go - I'm not personally familiar with
the DSL syntax of lucene, but I am familiar with the project

Jason

On Fri, Feb 1, 2013 at 2:34 PM, Steve Blackmon [W2O Digital]
<sb...@w2odigital.com> wrote:
> What do you think about standardizing on lucene (or at least supporting it
> natively) as a DSL to describe textual filters?
>
> Steve Blackmon
> Director, Data Sciences
>
> 101 W. 6th Street
> Austin, Texas 78701
> cell 512.965.0451 | work 512.402.6366
> twitter @steveblackmon
>
>
>
>
>
>
>
> On 2/1/13 1:31 PM, "Jason Letourneau" <jl...@gmail.com> wrote:
>
>>slight iteration for clarity:
>>
>>{
>>    "auth_token": "token",
>>    "filters": [
>>        {
>>            "field": "fieldname",
>>            "comparison_operator": "operator",
>>            "value_set": [
>>                "val1",
>>                "val2"
>>            ]
>>        }
>>    ],
>>    "outputs": [
>>        {
>>            "output_type": "http",
>>            "method": "post",
>>            "url": "http.example.com:8888",
>>            "delivery_frequency": "60",
>>            "max_size": "10485760",
>>            "auth_type": "none",
>>            "username": "username",
>>            "password": "password"
>>        }
>>    ]
>>}
>>
>>On Fri, Feb 1, 2013 at 12:51 PM, Jason Letourneau
>><jl...@gmail.com> wrote:
>>> So a subscription URL (result of setting up a subscription) is for all
>>> intents and purposes representative of a set of filters.  That
>>> subscription can be told to do a variety of things for delivery to the
>>> subscriber, but the identity of the subscription is rooted in its
>>> filters.  Posting additional filters to the subscription URL or
>>> additional output configurations affect the behavior of that
>>> subscription by either adding more filters or more outputs (removal as
>>> well).
>>>
>>> On Fri, Feb 1, 2013 at 12:17 PM, Craig McClanahan <cr...@gmail.com>
>>>wrote:
>>>> A couple of thoughts.
>>>>
>>>> * On "outputs" you list "username" and "password" as possible fields.
>>>>   I presume that specifying these would imply using HTTP Basic auth?
>>>>   We might want to consider different options as well.
>>>>
>>>> * From my (possibly myopic :-) viewpoint, the filtering and delivery
>>>>   decisions are different object types.  I'd like to be able to
>>>>register
>>>>   my set of filters and get a unique identifier for them, and then
>>>>   separately be able to say "send the results of subscription 123
>>>>   to this webhook URL every 60 minutes".
>>>>
>>>> * Regarding query syntax, pretty much any sort of simple patterns
>>>>   are probably not going to be sufficient for some use cases.  Maybe
>>>>   we should offer that as simple defaults, but also support falling
>>>>back
>>>>   to some sort of SQL-like syntax (i.e. what JIRA does on the
>>>>   advanced search).
>>>>
>>>> Craig
>>>>
>>>>
>>>> On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau
>>>><jl...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Based on Steve and Craig's feedback, I've come up with something that
>>>>> I think can work.  Below it specifies that:
>>>>> 1) you can set up more than one subscription at a time
>>>>> 2) each subscription can have many outputs
>>>>> 3) each subscription can have many filters
>>>>>
>>>>> The details of the config would do things like determine the behavior
>>>>> of the stream delivery (is it posted back or is the subscriber polling
>>>>> for instance).  Also, all subscriptions created in this way would be
>>>>> accessed through a single URL.
>>>>>
>>>>> {
>>>>>     "auth_token": "token",
>>>>>     "subscriptions": [
>>>>>         {
>>>>>             "outputs": [
>>>>>                 {
>>>>>                     "output_type": "http",
>>>>>                     "method": "post",
>>>>>                     "url": "http.example.com:8888",
>>>>>                     "delivery_frequency": "60",
>>>>>                     "max_size": "10485760",
>>>>>                     "auth_type": "none",
>>>>>                     "username": "username",
>>>>>                     "password": "password"
>>>>>                 }
>>>>>             ]
>>>>>         },
>>>>>         {
>>>>>             "filters": [
>>>>>                 {
>>>>>                     "field": "fieldname",
>>>>>                     "comparison_operator": "operator",
>>>>>                     "value_set": [
>>>>>                         "val1",
>>>>>                         "val2"
>>>>>                     ]
>>>>>                 }
>>>>>             ]
>>>>>         }
>>>>>     ]
>>>>> }
>>>>>
>>>>> Thoughts?
>>>>>
>>>>> Jason
>>>>>
>>>>> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan <cr...@gmail.com>
>>>>> wrote:
>>>>> > Welcome Steve!
>>>>> >
>>>>> > DataSift's UI to set these things up is indeed pretty cool.  I think
>>>>> > what
>>>>> > we're talking about here is more what the internal REST APIs
>>>>>between the
>>>>> > UI
>>>>> > and the back end might look like.
>>>>> >
>>>>> > I also think we should deliberately separate the filter definition
>>>>>of a
>>>>> > "subscription" from the instructions on how the data gets
>>>>>delivered.  I
>>>>> > could see use cases for any or all of:
>>>>> > * Polling with a filter on oldest date of interest
>>>>> > * Webhook that gets updated at some specified interval
>>>>> > * URL to which the Streams server would periodically POST
>>>>> >   new activities (in case I don't have webhooks set up)
>>>>> >
>>>>> > Separately, looking at DataSift is a reminder we will want to be
>>>>>able to
>>>>> > filter on words inside an activity stream value like "subject" or
>>>>> > "content", not just on the entire value.
>>>>> >
>>>>> > Craig
>>>>> >
>>>>> > On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
>>>>> > <jl...@gmail.com>wrote:
>>>>> >
>>>>> >> Hi Steve - thanks for the input and congrats on your first post - I
>>>>> >> think what you are describing is where Craig and I are circling
>>>>>around
>>>>> >> (or something similar anyways) - the details on that POST request
>>>>>are
>>>>> >> really helpful in particular.  I'll try and put something together
>>>>> >> tomorrow that would be a start for the "setup" request (and
>>>>>subsequent
>>>>> >> additional configuration after the subscription is initialized) and
>>>>> >> post back to the group.
>>>>> >>
>>>>> >> Jason
>>>>> >>
>>>>> >> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>>>>> >> <sb...@w2odigital.com> wrote:
>>>>> >> > First post from me (btw I am Steve, stoked about this project and
>>>>> >> > meeting
>>>>> >> > everyone eventually.)
>>>>> >> >
>>>>> >> > Sorry if I missed the point of the thread, but I think this is
>>>>> >> > related
>>>>> >> and
>>>>> >> > might be educational for some in the group.
>>>>> >> >
>>>>> >> > I like the way DataSift's API lets you establish streams - you
>>>>>POST a
>>>>> >> > definition, it returns a hash, and thereafter their service
>>>>>follows
>>>>> >> > the
>>>>> >> > instructions you gave it as new messages meet the filter you
>>>>>defined.
>>>>> >> > In
>>>>> >> > addition, once a stream exists, then you can set up listeners on
>>>>>that
>>>>> >> > specific hash via web sockets with the hash.
>>>>> >> >
>>>>> >> > For example, here is how you instruct DataSift to push new
>>>>>messages
>>>>> >> > meeting your criteria to a WebHooks end-point.
>>>>> >> >
>>>>> >> > curl -X POST 'https://api.datasift.com/push/create' \
>>>>> >> > -d 'name=connectorhttp' \
>>>>> >> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>>>>> >> > -d 'output_type=http' \
>>>>> >> > -d 'output_params.method=post' \
>>>>> >> > -d 'output_params.url=http.example.com:8888' \
>>>>> >> > -d 'output_params.use_gzip' \
>>>>> >> > -d 'output_params.delivery_frequency=60' \
>>>>> >> > -d 'output_params.max_size=10485760' \
>>>>> >> > -d 'output_params.verify_ssl=false' \
>>>>> >> > -d 'output_params.auth.type=none' \
>>>>> >> > -d 'output_params.auth.username=YourHTTPServerUsername' \
>>>>> >> > -d 'output_params.auth.password=YourHTTPServerPassword' \
>>>>> >> > -H 'Auth: datasift-user:your-datasift-api-key
>>>>> >> >
>>>>> >> >
>>>>> >> > Now new messages get pushed to me every 60 seconds, and I can
>>>>>get the
>>>>> >> feed
>>>>> >> > in real-time like this:
>>>>> >> >
>>>>> >> > var websocketsUser = 'datasift-user';
>>>>> >> > var websocketsHost = 'websocket.datasift.com';
>>>>> >> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>>>>> >> > var apiKey = 'your-datasift-api-key';
>>>>> >> >
>>>>> >> >
>>>>> >> > var ws = new
>>>>> >> >
>>>>> >>
>>>>> >>
>>>>>WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websockets
>>>>>User
>>>>> >> > +'&api_key='+apiKey);
>>>>> >> >
>>>>> >> > ws.onopen = function(evt) {
>>>>> >> >     // connection event
>>>>> >> >         $("#stream").append('open: '+evt.data+'<br/>');
>>>>> >> > }
>>>>> >> >
>>>>> >> > ws.onmessage = function(evt) {
>>>>> >> >     // parse received message
>>>>> >> >         $("#stream").append('message: '+evt.data+'<br/>');
>>>>> >> > }
>>>>> >> >
>>>>> >> > ws.onclose = function(evt) {
>>>>> >> >     // parse event
>>>>> >> >         $("#stream").append('close: '+evt.data+'<br/>');
>>>>> >> > }
>>>>> >> >
>>>>> >> > // No event object is passed to the event callback, so no useful
>>>>> >> debugging
>>>>> >> > can be done
>>>>> >> > ws.onerror = function() {
>>>>> >> >     // Some error occurred
>>>>> >> >         $("#stream").append('error: '+evt.data+'<br/>');
>>>>> >> > }
>>>>> >> >
>>>>> >> >
>>>>> >> > At W2OGroup we have built utility libraries for receiving and
>>>>> >> > processing
>>>>> >> > Json object streams from data sift in Storm/Kafka that I'm
>>>>>interested
>>>>> >> > in
>>>>> >> > extending to work with Streams, and can probably commit to the
>>>>> >> > project if
>>>>> >> > the community would find them useful.
>>>>> >> >
>>>>> >> >
>>>>> >> > Steve Blackmon
>>>>> >> > Director, Data Sciences
>>>>> >> >
>>>>> >> > 101 W. 6th Street
>>>>> >> > Austin, Texas 78701
>>>>> >> > cell 512.965.0451 | work 512.402.6366
>>>>> >> > twitter @steveblackmon
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> >
>>>>> >> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com>
>>>>>wrote:
>>>>> >> >
>>>>> >> >>We'll probably want some way to do the equivalent of ">", ">=",
>>>>>"<",
>>>>> >> "<=",
>>>>> >> >>and "!=" in addition to the implicit "equal" that I assume you
>>>>>mean
>>>>> >> >> in
>>>>> >> >>this
>>>>> >> >>example.
>>>>> >> >>
>>>>> >> >>Craig
>>>>> >> >>
>>>>> >> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>>>>> >> >><jl...@gmail.com>wrote:
>>>>> >> >>
>>>>> >> >>> I really like this - this is somewhat what I was getting at
>>>>>with
>>>>> >> >>> the
>>>>> >> >>> JSON object i.e. POST:
>>>>> >> >>> {
>>>>> >> >>> "subscriptions":
>>>>> >> >>> [{"activityField":"value"},
>>>>> >> >>> {"activityField":"value",
>>>>> >> >>>  "anotherActivityField":"value" }
>>>>> >> >>> ]
>>>>> >> >>> }
>>>>> >> >>>
>>>>> >> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan
>>>>> >> >>> <cr...@gmail.com>
>>>>> >> >>> wrote:
>>>>> >> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>>>>> >> >>> > <jl...@gmail.com>wrote:
>>>>> >> >>> >
>>>>> >> >>> >> I am curious on the group's thinking about subscriptions to
>>>>> >> >>> >> activity
>>>>> >> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on
>>>>>my
>>>>> >> >>>proposed
>>>>> >> >>> >> architecture, I've just been working with URL sources as the
>>>>> >> >>> >> subscription mode.  Obviously this is a way
>>>>>over-simplification.
>>>>> >> >>> >>
>>>>> >> >>> >> I know for shindig the social graph can be used, but we
>>>>>don't
>>>>> >> >>> >> necessarily have that.  Considering the mechanism for
>>>>> >> >>> >> establishing a
>>>>> >> >>> >> new subscription stream (defined as aggregated individual
>>>>> >> >>> >> activities
>>>>> >> >>> >> pulled from a varying array of sources) is POSTing to the
>>>>> >> >>> >> Activity
>>>>> >> >>> >> Streams server to establish the channel (currently just a
>>>>> >> >>> >> subscriptions=url1,url2,url3 is the over simplified
>>>>> >> mechanism)...what
>>>>> >> >>> >> would people see as a reasonable way to establish
>>>>>subscriptions?
>>>>> >> >>>List
>>>>> >> >>> >> of userIds? Subjects?  How should these be represented?  I
>>>>>was
>>>>> >> >>> >> thinking of a JSON object, but any one have other thoughts?
>>>>> >> >>> >>
>>>>> >> >>> >> Jason
>>>>> >> >>> >>
>>>>> >> >>> >
>>>>> >> >>> > One idea would be take some inspiration from how JIRA lets
>>>>>you
>>>>> >> >>> > (in
>>>>> >> >>> effect)
>>>>> >> >>> > create a WHERE clause that looks at any fields (in all the
>>>>> >> >>> > activities
>>>>> >> >>> > flowing through the server) that you want.
>>>>> >> >>> >
>>>>> >> >>> > Example filter criteria
>>>>> >> >>> > * provider.id = 'xxx' // Filter on a particular provider
>>>>> >> >>> > * verb = 'yyy'
>>>>> >> >>> > * object.type = 'blogpost'
>>>>> >> >>> > and you'd want to accept more than one value (effectively
>>>>> >> >>> > creating OR
>>>>> >> >>>or
>>>>> >> >>> IN
>>>>> >> >>> > type clauses).
>>>>> >> >>> >
>>>>> >> >>> > For completeness, I'd want to be able to specify more than
>>>>>one
>>>>> >> >>> > filter
>>>>> >> >>> > expression in the same subscription.
>>>>> >> >>> >
>>>>> >> >>> > Craig
>>>>> >> >>>
>>>>> >> >
>>>>> >>
>>>>
>>>>
>

Re: Streams Subscriptions

Posted by "Steve Blackmon [W2O Digital]" <sb...@w2odigital.com>.

What do you think about standardizing on lucene (or at least supporting it
natively) as a DSL to describe textual filters?

Steve Blackmon
Director, Data Sciences

101 W. 6th Street
Austin, Texas 78701
cell 512.965.0451 | work 512.402.6366
twitter @steveblackmon







On 2/1/13 1:31 PM, "Jason Letourneau" <jl...@gmail.com> wrote:

>slight iteration for clarity:
>
>{
>    "auth_token": "token",
>    "filters": [
>        {
>            "field": "fieldname",
>            "comparison_operator": "operator",
>            "value_set": [
>                "val1",
>                "val2"
>            ]
>        }
>    ],
>    "outputs": [
>        {
>            "output_type": "http",
>            "method": "post",
>            "url": "http.example.com:8888",
>            "delivery_frequency": "60",
>            "max_size": "10485760",
>            "auth_type": "none",
>            "username": "username",
>            "password": "password"
>        }
>    ]
>}
>
>On Fri, Feb 1, 2013 at 12:51 PM, Jason Letourneau
><jl...@gmail.com> wrote:
>> So a subscription URL (result of setting up a subscription) is for all
>> intents and purposes representative of a set of filters.  That
>> subscription can be told to do a variety of things for delivery to the
>> subscriber, but the identity of the subscription is rooted in its
>> filters.  Posting additional filters to the subscription URL or
>> additional output configurations affect the behavior of that
>> subscription by either adding more filters or more outputs (removal as
>> well).
>>
>> On Fri, Feb 1, 2013 at 12:17 PM, Craig McClanahan <cr...@gmail.com>
>>wrote:
>>> A couple of thoughts.
>>>
>>> * On "outputs" you list "username" and "password" as possible fields.
>>>   I presume that specifying these would imply using HTTP Basic auth?
>>>   We might want to consider different options as well.
>>>
>>> * From my (possibly myopic :-) viewpoint, the filtering and delivery
>>>   decisions are different object types.  I'd like to be able to
>>>register
>>>   my set of filters and get a unique identifier for them, and then
>>>   separately be able to say "send the results of subscription 123
>>>   to this webhook URL every 60 minutes".
>>>
>>> * Regarding query syntax, pretty much any sort of simple patterns
>>>   are probably not going to be sufficient for some use cases.  Maybe
>>>   we should offer that as simple defaults, but also support falling
>>>back
>>>   to some sort of SQL-like syntax (i.e. what JIRA does on the
>>>   advanced search).
>>>
>>> Craig
>>>
>>>
>>> On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau
>>><jl...@gmail.com>
>>> wrote:
>>>>
>>>> Based on Steve and Craig's feedback, I've come up with something that
>>>> I think can work.  Below it specifies that:
>>>> 1) you can set up more than one subscription at a time
>>>> 2) each subscription can have many outputs
>>>> 3) each subscription can have many filters
>>>>
>>>> The details of the config would do things like determine the behavior
>>>> of the stream delivery (is it posted back or is the subscriber polling
>>>> for instance).  Also, all subscriptions created in this way would be
>>>> accessed through a single URL.
>>>>
>>>> {
>>>>     "auth_token": "token",
>>>>     "subscriptions": [
>>>>         {
>>>>             "outputs": [
>>>>                 {
>>>>                     "output_type": "http",
>>>>                     "method": "post",
>>>>                     "url": "http.example.com:8888",
>>>>                     "delivery_frequency": "60",
>>>>                     "max_size": "10485760",
>>>>                     "auth_type": "none",
>>>>                     "username": "username",
>>>>                     "password": "password"
>>>>                 }
>>>>             ]
>>>>         },
>>>>         {
>>>>             "filters": [
>>>>                 {
>>>>                     "field": "fieldname",
>>>>                     "comparison_operator": "operator",
>>>>                     "value_set": [
>>>>                         "val1",
>>>>                         "val2"
>>>>                     ]
>>>>                 }
>>>>             ]
>>>>         }
>>>>     ]
>>>> }
>>>>
>>>> Thoughts?
>>>>
>>>> Jason
>>>>
>>>> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan <cr...@gmail.com>
>>>> wrote:
>>>> > Welcome Steve!
>>>> >
>>>> > DataSift's UI to set these things up is indeed pretty cool.  I think
>>>> > what
>>>> > we're talking about here is more what the internal REST APIs
>>>>between the
>>>> > UI
>>>> > and the back end might look like.
>>>> >
>>>> > I also think we should deliberately separate the filter definition
>>>>of a
>>>> > "subscription" from the instructions on how the data gets
>>>>delivered.  I
>>>> > could see use cases for any or all of:
>>>> > * Polling with a filter on oldest date of interest
>>>> > * Webhook that gets updated at some specified interval
>>>> > * URL to which the Streams server would periodically POST
>>>> >   new activities (in case I don't have webhooks set up)
>>>> >
>>>> > Separately, looking at DataSift is a reminder we will want to be
>>>>able to
>>>> > filter on words inside an activity stream value like "subject" or
>>>> > "content", not just on the entire value.
>>>> >
>>>> > Craig
>>>> >
>>>> > On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
>>>> > <jl...@gmail.com>wrote:
>>>> >
>>>> >> Hi Steve - thanks for the input and congrats on your first post - I
>>>> >> think what you are describing is where Craig and I are circling
>>>>around
>>>> >> (or something similar anyways) - the details on that POST request
>>>>are
>>>> >> really helpful in particular.  I'll try and put something together
>>>> >> tomorrow that would be a start for the "setup" request (and
>>>>subsequent
>>>> >> additional configuration after the subscription is initialized) and
>>>> >> post back to the group.
>>>> >>
>>>> >> Jason
>>>> >>
>>>> >> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>>>> >> <sb...@w2odigital.com> wrote:
>>>> >> > First post from me (btw I am Steve, stoked about this project and
>>>> >> > meeting
>>>> >> > everyone eventually.)
>>>> >> >
>>>> >> > Sorry if I missed the point of the thread, but I think this is
>>>> >> > related
>>>> >> and
>>>> >> > might be educational for some in the group.
>>>> >> >
>>>> >> > I like the way DataSift's API lets you establish streams - you
>>>>POST a
>>>> >> > definition, it returns a hash, and thereafter their service
>>>>follows
>>>> >> > the
>>>> >> > instructions you gave it as new messages meet the filter you
>>>>defined.
>>>> >> > In
>>>> >> > addition, once a stream exists, then you can set up listeners on
>>>>that
>>>> >> > specific hash via web sockets with the hash.
>>>> >> >
>>>> >> > For example, here is how you instruct DataSift to push new
>>>>messages
>>>> >> > meeting your criteria to a WebHooks end-point.
>>>> >> >
>>>> >> > curl -X POST 'https://api.datasift.com/push/create' \
>>>> >> > -d 'name=connectorhttp' \
>>>> >> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>>>> >> > -d 'output_type=http' \
>>>> >> > -d 'output_params.method=post' \
>>>> >> > -d 'output_params.url=http.example.com:8888' \
>>>> >> > -d 'output_params.use_gzip' \
>>>> >> > -d 'output_params.delivery_frequency=60' \
>>>> >> > -d 'output_params.max_size=10485760' \
>>>> >> > -d 'output_params.verify_ssl=false' \
>>>> >> > -d 'output_params.auth.type=none' \
>>>> >> > -d 'output_params.auth.username=YourHTTPServerUsername' \
>>>> >> > -d 'output_params.auth.password=YourHTTPServerPassword' \
>>>> >> > -H 'Auth: datasift-user:your-datasift-api-key
>>>> >> >
>>>> >> >
>>>> >> > Now new messages get pushed to me every 60 seconds, and I can
>>>>get the
>>>> >> feed
>>>> >> > in real-time like this:
>>>> >> >
>>>> >> > var websocketsUser = 'datasift-user';
>>>> >> > var websocketsHost = 'websocket.datasift.com';
>>>> >> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>>>> >> > var apiKey = 'your-datasift-api-key';
>>>> >> >
>>>> >> >
>>>> >> > var ws = new
>>>> >> >
>>>> >>
>>>> >> 
>>>>WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websockets
>>>>User
>>>> >> > +'&api_key='+apiKey);
>>>> >> >
>>>> >> > ws.onopen = function(evt) {
>>>> >> >     // connection event
>>>> >> >         $("#stream").append('open: '+evt.data+'<br/>');
>>>> >> > }
>>>> >> >
>>>> >> > ws.onmessage = function(evt) {
>>>> >> >     // parse received message
>>>> >> >         $("#stream").append('message: '+evt.data+'<br/>');
>>>> >> > }
>>>> >> >
>>>> >> > ws.onclose = function(evt) {
>>>> >> >     // parse event
>>>> >> >         $("#stream").append('close: '+evt.data+'<br/>');
>>>> >> > }
>>>> >> >
>>>> >> > // No event object is passed to the event callback, so no useful
>>>> >> debugging
>>>> >> > can be done
>>>> >> > ws.onerror = function() {
>>>> >> >     // Some error occurred
>>>> >> >         $("#stream").append('error: '+evt.data+'<br/>');
>>>> >> > }
>>>> >> >
>>>> >> >
>>>> >> > At W2OGroup we have built utility libraries for receiving and
>>>> >> > processing
>>>> >> > Json object streams from data sift in Storm/Kafka that I'm
>>>>interested
>>>> >> > in
>>>> >> > extending to work with Streams, and can probably commit to the
>>>> >> > project if
>>>> >> > the community would find them useful.
>>>> >> >
>>>> >> >
>>>> >> > Steve Blackmon
>>>> >> > Director, Data Sciences
>>>> >> >
>>>> >> > 101 W. 6th Street
>>>> >> > Austin, Texas 78701
>>>> >> > cell 512.965.0451 | work 512.402.6366
>>>> >> > twitter @steveblackmon
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com>
>>>>wrote:
>>>> >> >
>>>> >> >>We'll probably want some way to do the equivalent of ">", ">=",
>>>>"<",
>>>> >> "<=",
>>>> >> >>and "!=" in addition to the implicit "equal" that I assume you
>>>>mean
>>>> >> >> in
>>>> >> >>this
>>>> >> >>example.
>>>> >> >>
>>>> >> >>Craig
>>>> >> >>
>>>> >> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>>>> >> >><jl...@gmail.com>wrote:
>>>> >> >>
>>>> >> >>> I really like this - this is somewhat what I was getting at
>>>>with
>>>> >> >>> the
>>>> >> >>> JSON object i.e. POST:
>>>> >> >>> {
>>>> >> >>> "subscriptions":
>>>> >> >>> [{"activityField":"value"},
>>>> >> >>> {"activityField":"value",
>>>> >> >>>  "anotherActivityField":"value" }
>>>> >> >>> ]
>>>> >> >>> }
>>>> >> >>>
>>>> >> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan
>>>> >> >>> <cr...@gmail.com>
>>>> >> >>> wrote:
>>>> >> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>>>> >> >>> > <jl...@gmail.com>wrote:
>>>> >> >>> >
>>>> >> >>> >> I am curious on the group's thinking about subscriptions to
>>>> >> >>> >> activity
>>>> >> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on
>>>>my
>>>> >> >>>proposed
>>>> >> >>> >> architecture, I've just been working with URL sources as the
>>>> >> >>> >> subscription mode.  Obviously this is a way
>>>>over-simplification.
>>>> >> >>> >>
>>>> >> >>> >> I know for shindig the social graph can be used, but we
>>>>don't
>>>> >> >>> >> necessarily have that.  Considering the mechanism for
>>>> >> >>> >> establishing a
>>>> >> >>> >> new subscription stream (defined as aggregated individual
>>>> >> >>> >> activities
>>>> >> >>> >> pulled from a varying array of sources) is POSTing to the
>>>> >> >>> >> Activity
>>>> >> >>> >> Streams server to establish the channel (currently just a
>>>> >> >>> >> subscriptions=url1,url2,url3 is the over simplified
>>>> >> mechanism)...what
>>>> >> >>> >> would people see as a reasonable way to establish
>>>>subscriptions?
>>>> >> >>>List
>>>> >> >>> >> of userIds? Subjects?  How should these be represented?  I
>>>>was
>>>> >> >>> >> thinking of a JSON object, but any one have other thoughts?
>>>> >> >>> >>
>>>> >> >>> >> Jason
>>>> >> >>> >>
>>>> >> >>> >
>>>> >> >>> > One idea would be take some inspiration from how JIRA lets
>>>>you
>>>> >> >>> > (in
>>>> >> >>> effect)
>>>> >> >>> > create a WHERE clause that looks at any fields (in all the
>>>> >> >>> > activities
>>>> >> >>> > flowing through the server) that you want.
>>>> >> >>> >
>>>> >> >>> > Example filter criteria
>>>> >> >>> > * provider.id = 'xxx' // Filter on a particular provider
>>>> >> >>> > * verb = 'yyy'
>>>> >> >>> > * object.type = 'blogpost'
>>>> >> >>> > and you'd want to accept more than one value (effectively
>>>> >> >>> > creating OR
>>>> >> >>>or
>>>> >> >>> IN
>>>> >> >>> > type clauses).
>>>> >> >>> >
>>>> >> >>> > For completeness, I'd want to be able to specify more than
>>>>one
>>>> >> >>> > filter
>>>> >> >>> > expression in the same subscription.
>>>> >> >>> >
>>>> >> >>> > Craig
>>>> >> >>>
>>>> >> >
>>>> >>
>>>
>>>

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

slight iteration for clarity:

{
    "auth_token": "token",
    "filters": [
        {
            "field": "fieldname",
            "comparison_operator": "operator",
            "value_set": [
                "val1",
                "val2"
            ]
        }
    ],
    "outputs": [
        {
            "output_type": "http",
            "method": "post",
            "url": "http.example.com:8888",
            "delivery_frequency": "60",
            "max_size": "10485760",
            "auth_type": "none",
            "username": "username",
            "password": "password"
        }
    ]
}

On Fri, Feb 1, 2013 at 12:51 PM, Jason Letourneau
<jl...@gmail.com> wrote:
> So a subscription URL (result of setting up a subscription) is for all
> intents and purposes representative of a set of filters.  That
> subscription can be told to do a variety of things for delivery to the
> subscriber, but the identity of the subscription is rooted in its
> filters.  Posting additional filters to the subscription URL or
> additional output configurations affect the behavior of that
> subscription by either adding more filters or more outputs (removal as
> well).
>
> On Fri, Feb 1, 2013 at 12:17 PM, Craig McClanahan <cr...@gmail.com> wrote:
>> A couple of thoughts.
>>
>> * On "outputs" you list "username" and "password" as possible fields.
>>   I presume that specifying these would imply using HTTP Basic auth?
>>   We might want to consider different options as well.
>>
>> * From my (possibly myopic :-) viewpoint, the filtering and delivery
>>   decisions are different object types.  I'd like to be able to register
>>   my set of filters and get a unique identifier for them, and then
>>   separately be able to say "send the results of subscription 123
>>   to this webhook URL every 60 minutes".
>>
>> * Regarding query syntax, pretty much any sort of simple patterns
>>   are probably not going to be sufficient for some use cases.  Maybe
>>   we should offer that as simple defaults, but also support falling back
>>   to some sort of SQL-like syntax (i.e. what JIRA does on the
>>   advanced search).
>>
>> Craig
>>
>>
>> On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau <jl...@gmail.com>
>> wrote:
>>>
>>> Based on Steve and Craig's feedback, I've come up with something that
>>> I think can work.  Below it specifies that:
>>> 1) you can set up more than one subscription at a time
>>> 2) each subscription can have many outputs
>>> 3) each subscription can have many filters
>>>
>>> The details of the config would do things like determine the behavior
>>> of the stream delivery (is it posted back or is the subscriber polling
>>> for instance).  Also, all subscriptions created in this way would be
>>> accessed through a single URL.
>>>
>>> {
>>>     "auth_token": "token",
>>>     "subscriptions": [
>>>         {
>>>             "outputs": [
>>>                 {
>>>                     "output_type": "http",
>>>                     "method": "post",
>>>                     "url": "http.example.com:8888",
>>>                     "delivery_frequency": "60",
>>>                     "max_size": "10485760",
>>>                     "auth_type": "none",
>>>                     "username": "username",
>>>                     "password": "password"
>>>                 }
>>>             ]
>>>         },
>>>         {
>>>             "filters": [
>>>                 {
>>>                     "field": "fieldname",
>>>                     "comparison_operator": "operator",
>>>                     "value_set": [
>>>                         "val1",
>>>                         "val2"
>>>                     ]
>>>                 }
>>>             ]
>>>         }
>>>     ]
>>> }
>>>
>>> Thoughts?
>>>
>>> Jason
>>>
>>> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan <cr...@gmail.com>
>>> wrote:
>>> > Welcome Steve!
>>> >
>>> > DataSift's UI to set these things up is indeed pretty cool.  I think
>>> > what
>>> > we're talking about here is more what the internal REST APIs between the
>>> > UI
>>> > and the back end might look like.
>>> >
>>> > I also think we should deliberately separate the filter definition of a
>>> > "subscription" from the instructions on how the data gets delivered.  I
>>> > could see use cases for any or all of:
>>> > * Polling with a filter on oldest date of interest
>>> > * Webhook that gets updated at some specified interval
>>> > * URL to which the Streams server would periodically POST
>>> >   new activities (in case I don't have webhooks set up)
>>> >
>>> > Separately, looking at DataSift is a reminder we will want to be able to
>>> > filter on words inside an activity stream value like "subject" or
>>> > "content", not just on the entire value.
>>> >
>>> > Craig
>>> >
>>> > On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
>>> > <jl...@gmail.com>wrote:
>>> >
>>> >> Hi Steve - thanks for the input and congrats on your first post - I
>>> >> think what you are describing is where Craig and I are circling around
>>> >> (or something similar anyways) - the details on that POST request are
>>> >> really helpful in particular.  I'll try and put something together
>>> >> tomorrow that would be a start for the "setup" request (and subsequent
>>> >> additional configuration after the subscription is initialized) and
>>> >> post back to the group.
>>> >>
>>> >> Jason
>>> >>
>>> >> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>>> >> <sb...@w2odigital.com> wrote:
>>> >> > First post from me (btw I am Steve, stoked about this project and
>>> >> > meeting
>>> >> > everyone eventually.)
>>> >> >
>>> >> > Sorry if I missed the point of the thread, but I think this is
>>> >> > related
>>> >> and
>>> >> > might be educational for some in the group.
>>> >> >
>>> >> > I like the way DataSift's API lets you establish streams - you POST a
>>> >> > definition, it returns a hash, and thereafter their service follows
>>> >> > the
>>> >> > instructions you gave it as new messages meet the filter you defined.
>>> >> > In
>>> >> > addition, once a stream exists, then you can set up listeners on that
>>> >> > specific hash via web sockets with the hash.
>>> >> >
>>> >> > For example, here is how you instruct DataSift to push new messages
>>> >> > meeting your criteria to a WebHooks end-point.
>>> >> >
>>> >> > curl -X POST 'https://api.datasift.com/push/create' \
>>> >> > -d 'name=connectorhttp' \
>>> >> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>>> >> > -d 'output_type=http' \
>>> >> > -d 'output_params.method=post' \
>>> >> > -d 'output_params.url=http.example.com:8888' \
>>> >> > -d 'output_params.use_gzip' \
>>> >> > -d 'output_params.delivery_frequency=60' \
>>> >> > -d 'output_params.max_size=10485760' \
>>> >> > -d 'output_params.verify_ssl=false' \
>>> >> > -d 'output_params.auth.type=none' \
>>> >> > -d 'output_params.auth.username=YourHTTPServerUsername' \
>>> >> > -d 'output_params.auth.password=YourHTTPServerPassword' \
>>> >> > -H 'Auth: datasift-user:your-datasift-api-key
>>> >> >
>>> >> >
>>> >> > Now new messages get pushed to me every 60 seconds, and I can get the
>>> >> feed
>>> >> > in real-time like this:
>>> >> >
>>> >> > var websocketsUser = 'datasift-user';
>>> >> > var websocketsHost = 'websocket.datasift.com';
>>> >> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>>> >> > var apiKey = 'your-datasift-api-key';
>>> >> >
>>> >> >
>>> >> > var ws = new
>>> >> >
>>> >>
>>> >> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
>>> >> > +'&api_key='+apiKey);
>>> >> >
>>> >> > ws.onopen = function(evt) {
>>> >> >     // connection event
>>> >> >         $("#stream").append('open: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> > ws.onmessage = function(evt) {
>>> >> >     // parse received message
>>> >> >         $("#stream").append('message: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> > ws.onclose = function(evt) {
>>> >> >     // parse event
>>> >> >         $("#stream").append('close: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> > // No event object is passed to the event callback, so no useful
>>> >> debugging
>>> >> > can be done
>>> >> > ws.onerror = function() {
>>> >> >     // Some error occurred
>>> >> >         $("#stream").append('error: '+evt.data+'<br/>');
>>> >> > }
>>> >> >
>>> >> >
>>> >> > At W2OGroup we have built utility libraries for receiving and
>>> >> > processing
>>> >> > Json object streams from data sift in Storm/Kafka that I'm interested
>>> >> > in
>>> >> > extending to work with Streams, and can probably commit to the
>>> >> > project if
>>> >> > the community would find them useful.
>>> >> >
>>> >> >
>>> >> > Steve Blackmon
>>> >> > Director, Data Sciences
>>> >> >
>>> >> > 101 W. 6th Street
>>> >> > Austin, Texas 78701
>>> >> > cell 512.965.0451 | work 512.402.6366
>>> >> > twitter @steveblackmon
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com> wrote:
>>> >> >
>>> >> >>We'll probably want some way to do the equivalent of ">", ">=", "<",
>>> >> "<=",
>>> >> >>and "!=" in addition to the implicit "equal" that I assume you mean
>>> >> >> in
>>> >> >>this
>>> >> >>example.
>>> >> >>
>>> >> >>Craig
>>> >> >>
>>> >> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>>> >> >><jl...@gmail.com>wrote:
>>> >> >>
>>> >> >>> I really like this - this is somewhat what I was getting at with
>>> >> >>> the
>>> >> >>> JSON object i.e. POST:
>>> >> >>> {
>>> >> >>> "subscriptions":
>>> >> >>> [{"activityField":"value"},
>>> >> >>> {"activityField":"value",
>>> >> >>>  "anotherActivityField":"value" }
>>> >> >>> ]
>>> >> >>> }
>>> >> >>>
>>> >> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan
>>> >> >>> <cr...@gmail.com>
>>> >> >>> wrote:
>>> >> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>>> >> >>> > <jl...@gmail.com>wrote:
>>> >> >>> >
>>> >> >>> >> I am curious on the group's thinking about subscriptions to
>>> >> >>> >> activity
>>> >> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on my
>>> >> >>>proposed
>>> >> >>> >> architecture, I've just been working with URL sources as the
>>> >> >>> >> subscription mode.  Obviously this is a way over-simplification.
>>> >> >>> >>
>>> >> >>> >> I know for shindig the social graph can be used, but we don't
>>> >> >>> >> necessarily have that.  Considering the mechanism for
>>> >> >>> >> establishing a
>>> >> >>> >> new subscription stream (defined as aggregated individual
>>> >> >>> >> activities
>>> >> >>> >> pulled from a varying array of sources) is POSTing to the
>>> >> >>> >> Activity
>>> >> >>> >> Streams server to establish the channel (currently just a
>>> >> >>> >> subscriptions=url1,url2,url3 is the over simplified
>>> >> mechanism)...what
>>> >> >>> >> would people see as a reasonable way to establish subscriptions?
>>> >> >>>List
>>> >> >>> >> of userIds? Subjects?  How should these be represented?  I was
>>> >> >>> >> thinking of a JSON object, but any one have other thoughts?
>>> >> >>> >>
>>> >> >>> >> Jason
>>> >> >>> >>
>>> >> >>> >
>>> >> >>> > One idea would be take some inspiration from how JIRA lets you
>>> >> >>> > (in
>>> >> >>> effect)
>>> >> >>> > create a WHERE clause that looks at any fields (in all the
>>> >> >>> > activities
>>> >> >>> > flowing through the server) that you want.
>>> >> >>> >
>>> >> >>> > Example filter criteria
>>> >> >>> > * provider.id = 'xxx' // Filter on a particular provider
>>> >> >>> > * verb = 'yyy'
>>> >> >>> > * object.type = 'blogpost'
>>> >> >>> > and you'd want to accept more than one value (effectively
>>> >> >>> > creating OR
>>> >> >>>or
>>> >> >>> IN
>>> >> >>> > type clauses).
>>> >> >>> >
>>> >> >>> > For completeness, I'd want to be able to specify more than one
>>> >> >>> > filter
>>> >> >>> > expression in the same subscription.
>>> >> >>> >
>>> >> >>> > Craig
>>> >> >>>
>>> >> >
>>> >>
>>
>>

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

So a subscription URL (result of setting up a subscription) is for all
intents and purposes representative of a set of filters.  That
subscription can be told to do a variety of things for delivery to the
subscriber, but the identity of the subscription is rooted in its
filters.  Posting additional filters to the subscription URL or
additional output configurations affect the behavior of that
subscription by either adding more filters or more outputs (removal as
well).

On Fri, Feb 1, 2013 at 12:17 PM, Craig McClanahan <cr...@gmail.com> wrote:
> A couple of thoughts.
>
> * On "outputs" you list "username" and "password" as possible fields.
>   I presume that specifying these would imply using HTTP Basic auth?
>   We might want to consider different options as well.
>
> * From my (possibly myopic :-) viewpoint, the filtering and delivery
>   decisions are different object types.  I'd like to be able to register
>   my set of filters and get a unique identifier for them, and then
>   separately be able to say "send the results of subscription 123
>   to this webhook URL every 60 minutes".
>
> * Regarding query syntax, pretty much any sort of simple patterns
>   are probably not going to be sufficient for some use cases.  Maybe
>   we should offer that as simple defaults, but also support falling back
>   to some sort of SQL-like syntax (i.e. what JIRA does on the
>   advanced search).
>
> Craig
>
>
> On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau <jl...@gmail.com>
> wrote:
>>
>> Based on Steve and Craig's feedback, I've come up with something that
>> I think can work.  Below it specifies that:
>> 1) you can set up more than one subscription at a time
>> 2) each subscription can have many outputs
>> 3) each subscription can have many filters
>>
>> The details of the config would do things like determine the behavior
>> of the stream delivery (is it posted back or is the subscriber polling
>> for instance).  Also, all subscriptions created in this way would be
>> accessed through a single URL.
>>
>> {
>>     "auth_token": "token",
>>     "subscriptions": [
>>         {
>>             "outputs": [
>>                 {
>>                     "output_type": "http",
>>                     "method": "post",
>>                     "url": "http.example.com:8888",
>>                     "delivery_frequency": "60",
>>                     "max_size": "10485760",
>>                     "auth_type": "none",
>>                     "username": "username",
>>                     "password": "password"
>>                 }
>>             ]
>>         },
>>         {
>>             "filters": [
>>                 {
>>                     "field": "fieldname",
>>                     "comparison_operator": "operator",
>>                     "value_set": [
>>                         "val1",
>>                         "val2"
>>                     ]
>>                 }
>>             ]
>>         }
>>     ]
>> }
>>
>> Thoughts?
>>
>> Jason
>>
>> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan <cr...@gmail.com>
>> wrote:
>> > Welcome Steve!
>> >
>> > DataSift's UI to set these things up is indeed pretty cool.  I think
>> > what
>> > we're talking about here is more what the internal REST APIs between the
>> > UI
>> > and the back end might look like.
>> >
>> > I also think we should deliberately separate the filter definition of a
>> > "subscription" from the instructions on how the data gets delivered.  I
>> > could see use cases for any or all of:
>> > * Polling with a filter on oldest date of interest
>> > * Webhook that gets updated at some specified interval
>> > * URL to which the Streams server would periodically POST
>> >   new activities (in case I don't have webhooks set up)
>> >
>> > Separately, looking at DataSift is a reminder we will want to be able to
>> > filter on words inside an activity stream value like "subject" or
>> > "content", not just on the entire value.
>> >
>> > Craig
>> >
>> > On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
>> > <jl...@gmail.com>wrote:
>> >
>> >> Hi Steve - thanks for the input and congrats on your first post - I
>> >> think what you are describing is where Craig and I are circling around
>> >> (or something similar anyways) - the details on that POST request are
>> >> really helpful in particular.  I'll try and put something together
>> >> tomorrow that would be a start for the "setup" request (and subsequent
>> >> additional configuration after the subscription is initialized) and
>> >> post back to the group.
>> >>
>> >> Jason
>> >>
>> >> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>> >> <sb...@w2odigital.com> wrote:
>> >> > First post from me (btw I am Steve, stoked about this project and
>> >> > meeting
>> >> > everyone eventually.)
>> >> >
>> >> > Sorry if I missed the point of the thread, but I think this is
>> >> > related
>> >> and
>> >> > might be educational for some in the group.
>> >> >
>> >> > I like the way DataSift's API lets you establish streams - you POST a
>> >> > definition, it returns a hash, and thereafter their service follows
>> >> > the
>> >> > instructions you gave it as new messages meet the filter you defined.
>> >> > In
>> >> > addition, once a stream exists, then you can set up listeners on that
>> >> > specific hash via web sockets with the hash.
>> >> >
>> >> > For example, here is how you instruct DataSift to push new messages
>> >> > meeting your criteria to a WebHooks end-point.
>> >> >
>> >> > curl -X POST 'https://api.datasift.com/push/create' \
>> >> > -d 'name=connectorhttp' \
>> >> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>> >> > -d 'output_type=http' \
>> >> > -d 'output_params.method=post' \
>> >> > -d 'output_params.url=http.example.com:8888' \
>> >> > -d 'output_params.use_gzip' \
>> >> > -d 'output_params.delivery_frequency=60' \
>> >> > -d 'output_params.max_size=10485760' \
>> >> > -d 'output_params.verify_ssl=false' \
>> >> > -d 'output_params.auth.type=none' \
>> >> > -d 'output_params.auth.username=YourHTTPServerUsername' \
>> >> > -d 'output_params.auth.password=YourHTTPServerPassword' \
>> >> > -H 'Auth: datasift-user:your-datasift-api-key
>> >> >
>> >> >
>> >> > Now new messages get pushed to me every 60 seconds, and I can get the
>> >> feed
>> >> > in real-time like this:
>> >> >
>> >> > var websocketsUser = 'datasift-user';
>> >> > var websocketsHost = 'websocket.datasift.com';
>> >> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>> >> > var apiKey = 'your-datasift-api-key';
>> >> >
>> >> >
>> >> > var ws = new
>> >> >
>> >>
>> >> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
>> >> > +'&api_key='+apiKey);
>> >> >
>> >> > ws.onopen = function(evt) {
>> >> >     // connection event
>> >> >         $("#stream").append('open: '+evt.data+'<br/>');
>> >> > }
>> >> >
>> >> > ws.onmessage = function(evt) {
>> >> >     // parse received message
>> >> >         $("#stream").append('message: '+evt.data+'<br/>');
>> >> > }
>> >> >
>> >> > ws.onclose = function(evt) {
>> >> >     // parse event
>> >> >         $("#stream").append('close: '+evt.data+'<br/>');
>> >> > }
>> >> >
>> >> > // No event object is passed to the event callback, so no useful
>> >> debugging
>> >> > can be done
>> >> > ws.onerror = function() {
>> >> >     // Some error occurred
>> >> >         $("#stream").append('error: '+evt.data+'<br/>');
>> >> > }
>> >> >
>> >> >
>> >> > At W2OGroup we have built utility libraries for receiving and
>> >> > processing
>> >> > Json object streams from data sift in Storm/Kafka that I'm interested
>> >> > in
>> >> > extending to work with Streams, and can probably commit to the
>> >> > project if
>> >> > the community would find them useful.
>> >> >
>> >> >
>> >> > Steve Blackmon
>> >> > Director, Data Sciences
>> >> >
>> >> > 101 W. 6th Street
>> >> > Austin, Texas 78701
>> >> > cell 512.965.0451 | work 512.402.6366
>> >> > twitter @steveblackmon
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com> wrote:
>> >> >
>> >> >>We'll probably want some way to do the equivalent of ">", ">=", "<",
>> >> "<=",
>> >> >>and "!=" in addition to the implicit "equal" that I assume you mean
>> >> >> in
>> >> >>this
>> >> >>example.
>> >> >>
>> >> >>Craig
>> >> >>
>> >> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>> >> >><jl...@gmail.com>wrote:
>> >> >>
>> >> >>> I really like this - this is somewhat what I was getting at with
>> >> >>> the
>> >> >>> JSON object i.e. POST:
>> >> >>> {
>> >> >>> "subscriptions":
>> >> >>> [{"activityField":"value"},
>> >> >>> {"activityField":"value",
>> >> >>>  "anotherActivityField":"value" }
>> >> >>> ]
>> >> >>> }
>> >> >>>
>> >> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan
>> >> >>> <cr...@gmail.com>
>> >> >>> wrote:
>> >> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>> >> >>> > <jl...@gmail.com>wrote:
>> >> >>> >
>> >> >>> >> I am curious on the group's thinking about subscriptions to
>> >> >>> >> activity
>> >> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on my
>> >> >>>proposed
>> >> >>> >> architecture, I've just been working with URL sources as the
>> >> >>> >> subscription mode.  Obviously this is a way over-simplification.
>> >> >>> >>
>> >> >>> >> I know for shindig the social graph can be used, but we don't
>> >> >>> >> necessarily have that.  Considering the mechanism for
>> >> >>> >> establishing a
>> >> >>> >> new subscription stream (defined as aggregated individual
>> >> >>> >> activities
>> >> >>> >> pulled from a varying array of sources) is POSTing to the
>> >> >>> >> Activity
>> >> >>> >> Streams server to establish the channel (currently just a
>> >> >>> >> subscriptions=url1,url2,url3 is the over simplified
>> >> mechanism)...what
>> >> >>> >> would people see as a reasonable way to establish subscriptions?
>> >> >>>List
>> >> >>> >> of userIds? Subjects?  How should these be represented?  I was
>> >> >>> >> thinking of a JSON object, but any one have other thoughts?
>> >> >>> >>
>> >> >>> >> Jason
>> >> >>> >>
>> >> >>> >
>> >> >>> > One idea would be take some inspiration from how JIRA lets you
>> >> >>> > (in
>> >> >>> effect)
>> >> >>> > create a WHERE clause that looks at any fields (in all the
>> >> >>> > activities
>> >> >>> > flowing through the server) that you want.
>> >> >>> >
>> >> >>> > Example filter criteria
>> >> >>> > * provider.id = 'xxx' // Filter on a particular provider
>> >> >>> > * verb = 'yyy'
>> >> >>> > * object.type = 'blogpost'
>> >> >>> > and you'd want to accept more than one value (effectively
>> >> >>> > creating OR
>> >> >>>or
>> >> >>> IN
>> >> >>> > type clauses).
>> >> >>> >
>> >> >>> > For completeness, I'd want to be able to specify more than one
>> >> >>> > filter
>> >> >>> > expression in the same subscription.
>> >> >>> >
>> >> >>> > Craig
>> >> >>>
>> >> >
>> >>
>
>

Re: Streams Subscriptions

Posted by Craig McClanahan <cr...@gmail.com>.

A couple of thoughts.

* On "outputs" you list "username" and "password" as possible fields.
  I presume that specifying these would imply using HTTP Basic auth?
  We might want to consider different options as well.

* From my (possibly myopic :-) viewpoint, the filtering and delivery
  decisions are different object types.  I'd like to be able to register
  my set of filters and get a unique identifier for them, and then
  separately be able to say "send the results of subscription 123
  to this webhook URL every 60 minutes".

* Regarding query syntax, pretty much any sort of simple patterns
  are probably not going to be sufficient for some use cases.  Maybe
  we should offer that as simple defaults, but also support falling back
  to some sort of SQL-like syntax (i.e. what JIRA does on the
  advanced search).

Craig

On Fri, Feb 1, 2013 at 8:55 AM, Jason Letourneau <jl...@gmail.com>wrote:

> Based on Steve and Craig's feedback, I've come up with something that
> I think can work.  Below it specifies that:
> 1) you can set up more than one subscription at a time
> 2) each subscription can have many outputs
> 3) each subscription can have many filters
>
> The details of the config would do things like determine the behavior
> of the stream delivery (is it posted back or is the subscriber polling
> for instance).  Also, all subscriptions created in this way would be
> accessed through a single URL.
>
> {
>     "auth_token": "token",
>     "subscriptions": [
>         {
>             "outputs": [
>                 {
>                     "output_type": "http",
>                     "method": "post",
>                     "url": "http.example.com:8888",
>                     "delivery_frequency": "60",
>                     "max_size": "10485760",
>                     "auth_type": "none",
>                     "username": "username",
>                     "password": "password"
>                 }
>             ]
>         },
>         {
>             "filters": [
>                 {
>                     "field": "fieldname",
>                     "comparison_operator": "operator",
>                     "value_set": [
>                         "val1",
>                         "val2"
>                     ]
>                 }
>             ]
>         }
>     ]
> }
>
> Thoughts?
>
> Jason
>
> On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan <cr...@gmail.com>
> wrote:
> > Welcome Steve!
> >
> > DataSift's UI to set these things up is indeed pretty cool.  I think what
> > we're talking about here is more what the internal REST APIs between the
> UI
> > and the back end might look like.
> >
> > I also think we should deliberately separate the filter definition of a
> > "subscription" from the instructions on how the data gets delivered.  I
> > could see use cases for any or all of:
> > * Polling with a filter on oldest date of interest
> > * Webhook that gets updated at some specified interval
> > * URL to which the Streams server would periodically POST
> >   new activities (in case I don't have webhooks set up)
> >
> > Separately, looking at DataSift is a reminder we will want to be able to
> > filter on words inside an activity stream value like "subject" or
> > "content", not just on the entire value.
> >
> > Craig
> >
> > On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
> > <jl...@gmail.com>wrote:
> >
> >> Hi Steve - thanks for the input and congrats on your first post - I
> >> think what you are describing is where Craig and I are circling around
> >> (or something similar anyways) - the details on that POST request are
> >> really helpful in particular.  I'll try and put something together
> >> tomorrow that would be a start for the "setup" request (and subsequent
> >> additional configuration after the subscription is initialized) and
> >> post back to the group.
> >>
> >> Jason
> >>
> >> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
> >> <sb...@w2odigital.com> wrote:
> >> > First post from me (btw I am Steve, stoked about this project and
> meeting
> >> > everyone eventually.)
> >> >
> >> > Sorry if I missed the point of the thread, but I think this is related
> >> and
> >> > might be educational for some in the group.
> >> >
> >> > I like the way DataSift's API lets you establish streams - you POST a
> >> > definition, it returns a hash, and thereafter their service follows
> the
> >> > instructions you gave it as new messages meet the filter you defined.
>  In
> >> > addition, once a stream exists, then you can set up listeners on that
> >> > specific hash via web sockets with the hash.
> >> >
> >> > For example, here is how you instruct DataSift to push new messages
> >> > meeting your criteria to a WebHooks end-point.
> >> >
> >> > curl -X POST 'https://api.datasift.com/push/create' \
> >> > -d 'name=connectorhttp' \
> >> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
> >> > -d 'output_type=http' \
> >> > -d 'output_params.method=post' \
> >> > -d 'output_params.url=http.example.com:8888' \
> >> > -d 'output_params.use_gzip' \
> >> > -d 'output_params.delivery_frequency=60' \
> >> > -d 'output_params.max_size=10485760' \
> >> > -d 'output_params.verify_ssl=false' \
> >> > -d 'output_params.auth.type=none' \
> >> > -d 'output_params.auth.username=YourHTTPServerUsername' \
> >> > -d 'output_params.auth.password=YourHTTPServerPassword' \
> >> > -H 'Auth: datasift-user:your-datasift-api-key
> >> >
> >> >
> >> > Now new messages get pushed to me every 60 seconds, and I can get the
> >> feed
> >> > in real-time like this:
> >> >
> >> > var websocketsUser = 'datasift-user';
> >> > var websocketsHost = 'websocket.datasift.com';
> >> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
> >> > var apiKey = 'your-datasift-api-key';
> >> >
> >> >
> >> > var ws = new
> >> >
> >>
> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
> >> > +'&api_key='+apiKey);
> >> >
> >> > ws.onopen = function(evt) {
> >> >     // connection event
> >> >         $("#stream").append('open: '+evt.data+'<br/>');
> >> > }
> >> >
> >> > ws.onmessage = function(evt) {
> >> >     // parse received message
> >> >         $("#stream").append('message: '+evt.data+'<br/>');
> >> > }
> >> >
> >> > ws.onclose = function(evt) {
> >> >     // parse event
> >> >         $("#stream").append('close: '+evt.data+'<br/>');
> >> > }
> >> >
> >> > // No event object is passed to the event callback, so no useful
> >> debugging
> >> > can be done
> >> > ws.onerror = function() {
> >> >     // Some error occurred
> >> >         $("#stream").append('error: '+evt.data+'<br/>');
> >> > }
> >> >
> >> >
> >> > At W2OGroup we have built utility libraries for receiving and
> processing
> >> > Json object streams from data sift in Storm/Kafka that I'm interested
> in
> >> > extending to work with Streams, and can probably commit to the
> project if
> >> > the community would find them useful.
> >> >
> >> >
> >> > Steve Blackmon
> >> > Director, Data Sciences
> >> >
> >> > 101 W. 6th Street
> >> > Austin, Texas 78701
> >> > cell 512.965.0451 | work 512.402.6366
> >> > twitter @steveblackmon
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com> wrote:
> >> >
> >> >>We'll probably want some way to do the equivalent of ">", ">=", "<",
> >> "<=",
> >> >>and "!=" in addition to the implicit "equal" that I assume you mean in
> >> >>this
> >> >>example.
> >> >>
> >> >>Craig
> >> >>
> >> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
> >> >><jl...@gmail.com>wrote:
> >> >>
> >> >>> I really like this - this is somewhat what I was getting at with the
> >> >>> JSON object i.e. POST:
> >> >>> {
> >> >>> "subscriptions":
> >> >>> [{"activityField":"value"},
> >> >>> {"activityField":"value",
> >> >>>  "anotherActivityField":"value" }
> >> >>> ]
> >> >>> }
> >> >>>
> >> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <
> craigmcc@gmail.com>
> >> >>> wrote:
> >> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
> >> >>> > <jl...@gmail.com>wrote:
> >> >>> >
> >> >>> >> I am curious on the group's thinking about subscriptions to
> activity
> >> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on my
> >> >>>proposed
> >> >>> >> architecture, I've just been working with URL sources as the
> >> >>> >> subscription mode.  Obviously this is a way over-simplification.
> >> >>> >>
> >> >>> >> I know for shindig the social graph can be used, but we don't
> >> >>> >> necessarily have that.  Considering the mechanism for
> establishing a
> >> >>> >> new subscription stream (defined as aggregated individual
> activities
> >> >>> >> pulled from a varying array of sources) is POSTing to the
> Activity
> >> >>> >> Streams server to establish the channel (currently just a
> >> >>> >> subscriptions=url1,url2,url3 is the over simplified
> >> mechanism)...what
> >> >>> >> would people see as a reasonable way to establish subscriptions?
> >> >>>List
> >> >>> >> of userIds? Subjects?  How should these be represented?  I was
> >> >>> >> thinking of a JSON object, but any one have other thoughts?
> >> >>> >>
> >> >>> >> Jason
> >> >>> >>
> >> >>> >
> >> >>> > One idea would be take some inspiration from how JIRA lets you (in
> >> >>> effect)
> >> >>> > create a WHERE clause that looks at any fields (in all the
> activities
> >> >>> > flowing through the server) that you want.
> >> >>> >
> >> >>> > Example filter criteria
> >> >>> > * provider.id = 'xxx' // Filter on a particular provider
> >> >>> > * verb = 'yyy'
> >> >>> > * object.type = 'blogpost'
> >> >>> > and you'd want to accept more than one value (effectively
> creating OR
> >> >>>or
> >> >>> IN
> >> >>> > type clauses).
> >> >>> >
> >> >>> > For completeness, I'd want to be able to specify more than one
> filter
> >> >>> > expression in the same subscription.
> >> >>> >
> >> >>> > Craig
> >> >>>
> >> >
> >>
>

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

Based on Steve and Craig's feedback, I've come up with something that
I think can work.  Below it specifies that:
1) you can set up more than one subscription at a time
2) each subscription can have many outputs
3) each subscription can have many filters

The details of the config would do things like determine the behavior
of the stream delivery (is it posted back or is the subscriber polling
for instance).  Also, all subscriptions created in this way would be
accessed through a single URL.

{
    "auth_token": "token",
    "subscriptions": [
        {
            "outputs": [
                {
                    "output_type": "http",
                    "method": "post",
                    "url": "http.example.com:8888",
                    "delivery_frequency": "60",
                    "max_size": "10485760",
                    "auth_type": "none",
                    "username": "username",
                    "password": "password"
                }
            ]
        },
        {
            "filters": [
                {
                    "field": "fieldname",
                    "comparison_operator": "operator",
                    "value_set": [
                        "val1",
                        "val2"
                    ]
                }
            ]
        }
    ]
}

Thoughts?

Jason

On Thu, Jan 31, 2013 at 7:53 PM, Craig McClanahan <cr...@gmail.com> wrote:
> Welcome Steve!
>
> DataSift's UI to set these things up is indeed pretty cool.  I think what
> we're talking about here is more what the internal REST APIs between the UI
> and the back end might look like.
>
> I also think we should deliberately separate the filter definition of a
> "subscription" from the instructions on how the data gets delivered.  I
> could see use cases for any or all of:
> * Polling with a filter on oldest date of interest
> * Webhook that gets updated at some specified interval
> * URL to which the Streams server would periodically POST
>   new activities (in case I don't have webhooks set up)
>
> Separately, looking at DataSift is a reminder we will want to be able to
> filter on words inside an activity stream value like "subject" or
> "content", not just on the entire value.
>
> Craig
>
> On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
> <jl...@gmail.com>wrote:
>
>> Hi Steve - thanks for the input and congrats on your first post - I
>> think what you are describing is where Craig and I are circling around
>> (or something similar anyways) - the details on that POST request are
>> really helpful in particular.  I'll try and put something together
>> tomorrow that would be a start for the "setup" request (and subsequent
>> additional configuration after the subscription is initialized) and
>> post back to the group.
>>
>> Jason
>>
>> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
>> <sb...@w2odigital.com> wrote:
>> > First post from me (btw I am Steve, stoked about this project and meeting
>> > everyone eventually.)
>> >
>> > Sorry if I missed the point of the thread, but I think this is related
>> and
>> > might be educational for some in the group.
>> >
>> > I like the way DataSift's API lets you establish streams - you POST a
>> > definition, it returns a hash, and thereafter their service follows the
>> > instructions you gave it as new messages meet the filter you defined.  In
>> > addition, once a stream exists, then you can set up listeners on that
>> > specific hash via web sockets with the hash.
>> >
>> > For example, here is how you instruct DataSift to push new messages
>> > meeting your criteria to a WebHooks end-point.
>> >
>> > curl -X POST 'https://api.datasift.com/push/create' \
>> > -d 'name=connectorhttp' \
>> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
>> > -d 'output_type=http' \
>> > -d 'output_params.method=post' \
>> > -d 'output_params.url=http.example.com:8888' \
>> > -d 'output_params.use_gzip' \
>> > -d 'output_params.delivery_frequency=60' \
>> > -d 'output_params.max_size=10485760' \
>> > -d 'output_params.verify_ssl=false' \
>> > -d 'output_params.auth.type=none' \
>> > -d 'output_params.auth.username=YourHTTPServerUsername' \
>> > -d 'output_params.auth.password=YourHTTPServerPassword' \
>> > -H 'Auth: datasift-user:your-datasift-api-key
>> >
>> >
>> > Now new messages get pushed to me every 60 seconds, and I can get the
>> feed
>> > in real-time like this:
>> >
>> > var websocketsUser = 'datasift-user';
>> > var websocketsHost = 'websocket.datasift.com';
>> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
>> > var apiKey = 'your-datasift-api-key';
>> >
>> >
>> > var ws = new
>> >
>> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
>> > +'&api_key='+apiKey);
>> >
>> > ws.onopen = function(evt) {
>> >     // connection event
>> >         $("#stream").append('open: '+evt.data+'<br/>');
>> > }
>> >
>> > ws.onmessage = function(evt) {
>> >     // parse received message
>> >         $("#stream").append('message: '+evt.data+'<br/>');
>> > }
>> >
>> > ws.onclose = function(evt) {
>> >     // parse event
>> >         $("#stream").append('close: '+evt.data+'<br/>');
>> > }
>> >
>> > // No event object is passed to the event callback, so no useful
>> debugging
>> > can be done
>> > ws.onerror = function() {
>> >     // Some error occurred
>> >         $("#stream").append('error: '+evt.data+'<br/>');
>> > }
>> >
>> >
>> > At W2OGroup we have built utility libraries for receiving and processing
>> > Json object streams from data sift in Storm/Kafka that I'm interested in
>> > extending to work with Streams, and can probably commit to the project if
>> > the community would find them useful.
>> >
>> >
>> > Steve Blackmon
>> > Director, Data Sciences
>> >
>> > 101 W. 6th Street
>> > Austin, Texas 78701
>> > cell 512.965.0451 | work 512.402.6366
>> > twitter @steveblackmon
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com> wrote:
>> >
>> >>We'll probably want some way to do the equivalent of ">", ">=", "<",
>> "<=",
>> >>and "!=" in addition to the implicit "equal" that I assume you mean in
>> >>this
>> >>example.
>> >>
>> >>Craig
>> >>
>> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>> >><jl...@gmail.com>wrote:
>> >>
>> >>> I really like this - this is somewhat what I was getting at with the
>> >>> JSON object i.e. POST:
>> >>> {
>> >>> "subscriptions":
>> >>> [{"activityField":"value"},
>> >>> {"activityField":"value",
>> >>>  "anotherActivityField":"value" }
>> >>> ]
>> >>> }
>> >>>
>> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <cr...@gmail.com>
>> >>> wrote:
>> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>> >>> > <jl...@gmail.com>wrote:
>> >>> >
>> >>> >> I am curious on the group's thinking about subscriptions to activity
>> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on my
>> >>>proposed
>> >>> >> architecture, I've just been working with URL sources as the
>> >>> >> subscription mode.  Obviously this is a way over-simplification.
>> >>> >>
>> >>> >> I know for shindig the social graph can be used, but we don't
>> >>> >> necessarily have that.  Considering the mechanism for establishing a
>> >>> >> new subscription stream (defined as aggregated individual activities
>> >>> >> pulled from a varying array of sources) is POSTing to the Activity
>> >>> >> Streams server to establish the channel (currently just a
>> >>> >> subscriptions=url1,url2,url3 is the over simplified
>> mechanism)...what
>> >>> >> would people see as a reasonable way to establish subscriptions?
>> >>>List
>> >>> >> of userIds? Subjects?  How should these be represented?  I was
>> >>> >> thinking of a JSON object, but any one have other thoughts?
>> >>> >>
>> >>> >> Jason
>> >>> >>
>> >>> >
>> >>> > One idea would be take some inspiration from how JIRA lets you (in
>> >>> effect)
>> >>> > create a WHERE clause that looks at any fields (in all the activities
>> >>> > flowing through the server) that you want.
>> >>> >
>> >>> > Example filter criteria
>> >>> > * provider.id = 'xxx' // Filter on a particular provider
>> >>> > * verb = 'yyy'
>> >>> > * object.type = 'blogpost'
>> >>> > and you'd want to accept more than one value (effectively creating OR
>> >>>or
>> >>> IN
>> >>> > type clauses).
>> >>> >
>> >>> > For completeness, I'd want to be able to specify more than one filter
>> >>> > expression in the same subscription.
>> >>> >
>> >>> > Craig
>> >>>
>> >
>>

Re: Streams Subscriptions

Posted by Craig McClanahan <cr...@gmail.com>.

Welcome Steve!

DataSift's UI to set these things up is indeed pretty cool.  I think what
we're talking about here is more what the internal REST APIs between the UI
and the back end might look like.

I also think we should deliberately separate the filter definition of a
"subscription" from the instructions on how the data gets delivered.  I
could see use cases for any or all of:
* Polling with a filter on oldest date of interest
* Webhook that gets updated at some specified interval
* URL to which the Streams server would periodically POST
  new activities (in case I don't have webhooks set up)

Separately, looking at DataSift is a reminder we will want to be able to
filter on words inside an activity stream value like "subject" or
"content", not just on the entire value.

Craig

On Thu, Jan 31, 2013 at 4:29 PM, Jason Letourneau
<jl...@gmail.com>wrote:

> Hi Steve - thanks for the input and congrats on your first post - I
> think what you are describing is where Craig and I are circling around
> (or something similar anyways) - the details on that POST request are
> really helpful in particular.  I'll try and put something together
> tomorrow that would be a start for the "setup" request (and subsequent
> additional configuration after the subscription is initialized) and
> post back to the group.
>
> Jason
>
> On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
> <sb...@w2odigital.com> wrote:
> > First post from me (btw I am Steve, stoked about this project and meeting
> > everyone eventually.)
> >
> > Sorry if I missed the point of the thread, but I think this is related
> and
> > might be educational for some in the group.
> >
> > I like the way DataSift's API lets you establish streams - you POST a
> > definition, it returns a hash, and thereafter their service follows the
> > instructions you gave it as new messages meet the filter you defined.  In
> > addition, once a stream exists, then you can set up listeners on that
> > specific hash via web sockets with the hash.
> >
> > For example, here is how you instruct DataSift to push new messages
> > meeting your criteria to a WebHooks end-point.
> >
> > curl -X POST 'https://api.datasift.com/push/create' \
> > -d 'name=connectorhttp' \
> > -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
> > -d 'output_type=http' \
> > -d 'output_params.method=post' \
> > -d 'output_params.url=http.example.com:8888' \
> > -d 'output_params.use_gzip' \
> > -d 'output_params.delivery_frequency=60' \
> > -d 'output_params.max_size=10485760' \
> > -d 'output_params.verify_ssl=false' \
> > -d 'output_params.auth.type=none' \
> > -d 'output_params.auth.username=YourHTTPServerUsername' \
> > -d 'output_params.auth.password=YourHTTPServerPassword' \
> > -H 'Auth: datasift-user:your-datasift-api-key
> >
> >
> > Now new messages get pushed to me every 60 seconds, and I can get the
> feed
> > in real-time like this:
> >
> > var websocketsUser = 'datasift-user';
> > var websocketsHost = 'websocket.datasift.com';
> > var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
> > var apiKey = 'your-datasift-api-key';
> >
> >
> > var ws = new
> >
> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
> > +'&api_key='+apiKey);
> >
> > ws.onopen = function(evt) {
> >     // connection event
> >         $("#stream").append('open: '+evt.data+'<br/>');
> > }
> >
> > ws.onmessage = function(evt) {
> >     // parse received message
> >         $("#stream").append('message: '+evt.data+'<br/>');
> > }
> >
> > ws.onclose = function(evt) {
> >     // parse event
> >         $("#stream").append('close: '+evt.data+'<br/>');
> > }
> >
> > // No event object is passed to the event callback, so no useful
> debugging
> > can be done
> > ws.onerror = function() {
> >     // Some error occurred
> >         $("#stream").append('error: '+evt.data+'<br/>');
> > }
> >
> >
> > At W2OGroup we have built utility libraries for receiving and processing
> > Json object streams from data sift in Storm/Kafka that I'm interested in
> > extending to work with Streams, and can probably commit to the project if
> > the community would find them useful.
> >
> >
> > Steve Blackmon
> > Director, Data Sciences
> >
> > 101 W. 6th Street
> > Austin, Texas 78701
> > cell 512.965.0451 | work 512.402.6366
> > twitter @steveblackmon
> >
> >
> >
> >
> >
> >
> >
> > On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com> wrote:
> >
> >>We'll probably want some way to do the equivalent of ">", ">=", "<",
> "<=",
> >>and "!=" in addition to the implicit "equal" that I assume you mean in
> >>this
> >>example.
> >>
> >>Craig
> >>
> >>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
> >><jl...@gmail.com>wrote:
> >>
> >>> I really like this - this is somewhat what I was getting at with the
> >>> JSON object i.e. POST:
> >>> {
> >>> "subscriptions":
> >>> [{"activityField":"value"},
> >>> {"activityField":"value",
> >>>  "anotherActivityField":"value" }
> >>> ]
> >>> }
> >>>
> >>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <cr...@gmail.com>
> >>> wrote:
> >>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
> >>> > <jl...@gmail.com>wrote:
> >>> >
> >>> >> I am curious on the group's thinking about subscriptions to activity
> >>> >> streams.  As I am stubbing out the end-to-end heartbeat on my
> >>>proposed
> >>> >> architecture, I've just been working with URL sources as the
> >>> >> subscription mode.  Obviously this is a way over-simplification.
> >>> >>
> >>> >> I know for shindig the social graph can be used, but we don't
> >>> >> necessarily have that.  Considering the mechanism for establishing a
> >>> >> new subscription stream (defined as aggregated individual activities
> >>> >> pulled from a varying array of sources) is POSTing to the Activity
> >>> >> Streams server to establish the channel (currently just a
> >>> >> subscriptions=url1,url2,url3 is the over simplified
> mechanism)...what
> >>> >> would people see as a reasonable way to establish subscriptions?
> >>>List
> >>> >> of userIds? Subjects?  How should these be represented?  I was
> >>> >> thinking of a JSON object, but any one have other thoughts?
> >>> >>
> >>> >> Jason
> >>> >>
> >>> >
> >>> > One idea would be take some inspiration from how JIRA lets you (in
> >>> effect)
> >>> > create a WHERE clause that looks at any fields (in all the activities
> >>> > flowing through the server) that you want.
> >>> >
> >>> > Example filter criteria
> >>> > * provider.id = 'xxx' // Filter on a particular provider
> >>> > * verb = 'yyy'
> >>> > * object.type = 'blogpost'
> >>> > and you'd want to accept more than one value (effectively creating OR
> >>>or
> >>> IN
> >>> > type clauses).
> >>> >
> >>> > For completeness, I'd want to be able to specify more than one filter
> >>> > expression in the same subscription.
> >>> >
> >>> > Craig
> >>>
> >
>

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

Hi Steve - thanks for the input and congrats on your first post - I
think what you are describing is where Craig and I are circling around
(or something similar anyways) - the details on that POST request are
really helpful in particular.  I'll try and put something together
tomorrow that would be a start for the "setup" request (and subsequent
additional configuration after the subscription is initialized) and
post back to the group.

Jason

On Thu, Jan 31, 2013 at 7:00 PM, Steve Blackmon [W2O Digital]
<sb...@w2odigital.com> wrote:
> First post from me (btw I am Steve, stoked about this project and meeting
> everyone eventually.)
>
> Sorry if I missed the point of the thread, but I think this is related and
> might be educational for some in the group.
>
> I like the way DataSift's API lets you establish streams - you POST a
> definition, it returns a hash, and thereafter their service follows the
> instructions you gave it as new messages meet the filter you defined.  In
> addition, once a stream exists, then you can set up listeners on that
> specific hash via web sockets with the hash.
>
> For example, here is how you instruct DataSift to push new messages
> meeting your criteria to a WebHooks end-point.
>
> curl -X POST 'https://api.datasift.com/push/create' \
> -d 'name=connectorhttp' \
> -d 'hash=dce320ce31a8919784e6e85aecbd040e' \
> -d 'output_type=http' \
> -d 'output_params.method=post' \
> -d 'output_params.url=http.example.com:8888' \
> -d 'output_params.use_gzip' \
> -d 'output_params.delivery_frequency=60' \
> -d 'output_params.max_size=10485760' \
> -d 'output_params.verify_ssl=false' \
> -d 'output_params.auth.type=none' \
> -d 'output_params.auth.username=YourHTTPServerUsername' \
> -d 'output_params.auth.password=YourHTTPServerPassword' \
> -H 'Auth: datasift-user:your-datasift-api-key
>
>
> Now new messages get pushed to me every 60 seconds, and I can get the feed
> in real-time like this:
>
> var websocketsUser = 'datasift-user';
> var websocketsHost = 'websocket.datasift.com';
> var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
> var apiKey = 'your-datasift-api-key';
>
>
> var ws = new
> WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
> +'&api_key='+apiKey);
>
> ws.onopen = function(evt) {
>     // connection event
>         $("#stream").append('open: '+evt.data+'<br/>');
> }
>
> ws.onmessage = function(evt) {
>     // parse received message
>         $("#stream").append('message: '+evt.data+'<br/>');
> }
>
> ws.onclose = function(evt) {
>     // parse event
>         $("#stream").append('close: '+evt.data+'<br/>');
> }
>
> // No event object is passed to the event callback, so no useful debugging
> can be done
> ws.onerror = function() {
>     // Some error occurred
>         $("#stream").append('error: '+evt.data+'<br/>');
> }
>
>
> At W2OGroup we have built utility libraries for receiving and processing
> Json object streams from data sift in Storm/Kafka that I'm interested in
> extending to work with Streams, and can probably commit to the project if
> the community would find them useful.
>
>
> Steve Blackmon
> Director, Data Sciences
>
> 101 W. 6th Street
> Austin, Texas 78701
> cell 512.965.0451 | work 512.402.6366
> twitter @steveblackmon
>
>
>
>
>
>
>
> On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com> wrote:
>
>>We'll probably want some way to do the equivalent of ">", ">=", "<", "<=",
>>and "!=" in addition to the implicit "equal" that I assume you mean in
>>this
>>example.
>>
>>Craig
>>
>>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
>><jl...@gmail.com>wrote:
>>
>>> I really like this - this is somewhat what I was getting at with the
>>> JSON object i.e. POST:
>>> {
>>> "subscriptions":
>>> [{"activityField":"value"},
>>> {"activityField":"value",
>>>  "anotherActivityField":"value" }
>>> ]
>>> }
>>>
>>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <cr...@gmail.com>
>>> wrote:
>>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>>> > <jl...@gmail.com>wrote:
>>> >
>>> >> I am curious on the group's thinking about subscriptions to activity
>>> >> streams.  As I am stubbing out the end-to-end heartbeat on my
>>>proposed
>>> >> architecture, I've just been working with URL sources as the
>>> >> subscription mode.  Obviously this is a way over-simplification.
>>> >>
>>> >> I know for shindig the social graph can be used, but we don't
>>> >> necessarily have that.  Considering the mechanism for establishing a
>>> >> new subscription stream (defined as aggregated individual activities
>>> >> pulled from a varying array of sources) is POSTing to the Activity
>>> >> Streams server to establish the channel (currently just a
>>> >> subscriptions=url1,url2,url3 is the over simplified mechanism)...what
>>> >> would people see as a reasonable way to establish subscriptions?
>>>List
>>> >> of userIds? Subjects?  How should these be represented?  I was
>>> >> thinking of a JSON object, but any one have other thoughts?
>>> >>
>>> >> Jason
>>> >>
>>> >
>>> > One idea would be take some inspiration from how JIRA lets you (in
>>> effect)
>>> > create a WHERE clause that looks at any fields (in all the activities
>>> > flowing through the server) that you want.
>>> >
>>> > Example filter criteria
>>> > * provider.id = 'xxx' // Filter on a particular provider
>>> > * verb = 'yyy'
>>> > * object.type = 'blogpost'
>>> > and you'd want to accept more than one value (effectively creating OR
>>>or
>>> IN
>>> > type clauses).
>>> >
>>> > For completeness, I'd want to be able to specify more than one filter
>>> > expression in the same subscription.
>>> >
>>> > Craig
>>>
>

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

Great points

Jason

On Thu, Jan 31, 2013 at 6:45 PM, Craig McClanahan <cr...@gmail.com> wrote:
> We'll probably want some way to do the equivalent of ">", ">=", "<", "<=",
> and "!=" in addition to the implicit "equal" that I assume you mean in this
> example.
>
> Craig
>
>
> On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau <jl...@gmail.com>
> wrote:
>>
>> I really like this - this is somewhat what I was getting at with the
>> JSON object i.e. POST:
>> {
>> "subscriptions":
>> [{"activityField":"value"},
>> {"activityField":"value",
>>  "anotherActivityField":"value" }
>> ]
>> }
>>
>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <cr...@gmail.com>
>> wrote:
>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>> > <jl...@gmail.com>wrote:
>> >
>> >> I am curious on the group's thinking about subscriptions to activity
>> >> streams.  As I am stubbing out the end-to-end heartbeat on my proposed
>> >> architecture, I've just been working with URL sources as the
>> >> subscription mode.  Obviously this is a way over-simplification.
>> >>
>> >> I know for shindig the social graph can be used, but we don't
>> >> necessarily have that.  Considering the mechanism for establishing a
>> >> new subscription stream (defined as aggregated individual activities
>> >> pulled from a varying array of sources) is POSTing to the Activity
>> >> Streams server to establish the channel (currently just a
>> >> subscriptions=url1,url2,url3 is the over simplified mechanism)...what
>> >> would people see as a reasonable way to establish subscriptions?  List
>> >> of userIds? Subjects?  How should these be represented?  I was
>> >> thinking of a JSON object, but any one have other thoughts?
>> >>
>> >> Jason
>> >>
>> >
>> > One idea would be take some inspiration from how JIRA lets you (in
>> > effect)
>> > create a WHERE clause that looks at any fields (in all the activities
>> > flowing through the server) that you want.
>> >
>> > Example filter criteria
>> > * provider.id = 'xxx' // Filter on a particular provider
>> > * verb = 'yyy'
>> > * object.type = 'blogpost'
>> > and you'd want to accept more than one value (effectively creating OR or
>> > IN
>> > type clauses).
>> >
>> > For completeness, I'd want to be able to specify more than one filter
>> > expression in the same subscription.
>> >
>> > Craig
>
>

Re: Streams Subscriptions

Posted by "Steve Blackmon [W2O Digital]" <sb...@w2odigital.com>.

First post from me (btw I am Steve, stoked about this project and meeting
everyone eventually.)

Sorry if I missed the point of the thread, but I think this is related and
might be educational for some in the group.

I like the way DataSift's API lets you establish streams - you POST a
definition, it returns a hash, and thereafter their service follows the
instructions you gave it as new messages meet the filter you defined.  In
addition, once a stream exists, then you can set up listeners on that
specific hash via web sockets with the hash.

For example, here is how you instruct DataSift to push new messages
meeting your criteria to a WebHooks end-point.

curl -X POST 'https://api.datasift.com/push/create' \
-d 'name=connectorhttp' \
-d 'hash=dce320ce31a8919784e6e85aecbd040e' \
-d 'output_type=http' \
-d 'output_params.method=post' \
-d 'output_params.url=http.example.com:8888' \
-d 'output_params.use_gzip' \
-d 'output_params.delivery_frequency=60' \
-d 'output_params.max_size=10485760' \
-d 'output_params.verify_ssl=false' \
-d 'output_params.auth.type=none' \
-d 'output_params.auth.username=YourHTTPServerUsername' \
-d 'output_params.auth.password=YourHTTPServerPassword' \
-H 'Auth: datasift-user:your-datasift-api-key


Now new messages get pushed to me every 60 seconds, and I can get the feed
in real-time like this:

var websocketsUser = 'datasift-user';
var websocketsHost = 'websocket.datasift.com';
var streamHash = 'dce320ce31a8919784e6e85aecbd040e';
var apiKey = 'your-datasift-api-key';


var ws = new 
WebSocket('ws://'+websocketsHost+'/'+streamHash+'?username='+websocketsUser
+'&api_key='+apiKey);
 
ws.onopen = function(evt) {
    // connection event
	$("#stream").append('open: '+evt.data+'<br/>');
}
 
ws.onmessage = function(evt) {
    // parse received message
	$("#stream").append('message: '+evt.data+'<br/>');
}
 
ws.onclose = function(evt) {
    // parse event
	$("#stream").append('close: '+evt.data+'<br/>');
}
 
// No event object is passed to the event callback, so no useful debugging
can be done
ws.onerror = function() {
    // Some error occurred
	$("#stream").append('error: '+evt.data+'<br/>');
}


At W2OGroup we have built utility libraries for receiving and processing
Json object streams from data sift in Storm/Kafka that I'm interested in
extending to work with Streams, and can probably commit to the project if
the community would find them useful.


Steve Blackmon
Director, Data Sciences

101 W. 6th Street
Austin, Texas 78701
cell 512.965.0451 | work 512.402.6366
twitter @steveblackmon







On 1/31/13 5:45 PM, "Craig McClanahan" <cr...@gmail.com> wrote:

>We'll probably want some way to do the equivalent of ">", ">=", "<", "<=",
>and "!=" in addition to the implicit "equal" that I assume you mean in
>this
>example.
>
>Craig
>
>On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
><jl...@gmail.com>wrote:
>
>> I really like this - this is somewhat what I was getting at with the
>> JSON object i.e. POST:
>> {
>> "subscriptions":
>> [{"activityField":"value"},
>> {"activityField":"value",
>>  "anotherActivityField":"value" }
>> ]
>> }
>>
>> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <cr...@gmail.com>
>> wrote:
>> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
>> > <jl...@gmail.com>wrote:
>> >
>> >> I am curious on the group's thinking about subscriptions to activity
>> >> streams.  As I am stubbing out the end-to-end heartbeat on my
>>proposed
>> >> architecture, I've just been working with URL sources as the
>> >> subscription mode.  Obviously this is a way over-simplification.
>> >>
>> >> I know for shindig the social graph can be used, but we don't
>> >> necessarily have that.  Considering the mechanism for establishing a
>> >> new subscription stream (defined as aggregated individual activities
>> >> pulled from a varying array of sources) is POSTing to the Activity
>> >> Streams server to establish the channel (currently just a
>> >> subscriptions=url1,url2,url3 is the over simplified mechanism)...what
>> >> would people see as a reasonable way to establish subscriptions?
>>List
>> >> of userIds? Subjects?  How should these be represented?  I was
>> >> thinking of a JSON object, but any one have other thoughts?
>> >>
>> >> Jason
>> >>
>> >
>> > One idea would be take some inspiration from how JIRA lets you (in
>> effect)
>> > create a WHERE clause that looks at any fields (in all the activities
>> > flowing through the server) that you want.
>> >
>> > Example filter criteria
>> > * provider.id = 'xxx' // Filter on a particular provider
>> > * verb = 'yyy'
>> > * object.type = 'blogpost'
>> > and you'd want to accept more than one value (effectively creating OR
>>or
>> IN
>> > type clauses).
>> >
>> > For completeness, I'd want to be able to specify more than one filter
>> > expression in the same subscription.
>> >
>> > Craig
>>

Re: Streams Subscriptions

Posted by Craig McClanahan <cr...@gmail.com>.

We'll probably want some way to do the equivalent of ">", ">=", "<", "<=",
and "!=" in addition to the implicit "equal" that I assume you mean in this
example.

Craig

On Thu, Jan 31, 2013 at 3:39 PM, Jason Letourneau
<jl...@gmail.com>wrote:

> I really like this - this is somewhat what I was getting at with the
> JSON object i.e. POST:
> {
> "subscriptions":
> [{"activityField":"value"},
> {"activityField":"value",
>  "anotherActivityField":"value" }
> ]
> }
>
> On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <cr...@gmail.com>
> wrote:
> > On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
> > <jl...@gmail.com>wrote:
> >
> >> I am curious on the group's thinking about subscriptions to activity
> >> streams.  As I am stubbing out the end-to-end heartbeat on my proposed
> >> architecture, I've just been working with URL sources as the
> >> subscription mode.  Obviously this is a way over-simplification.
> >>
> >> I know for shindig the social graph can be used, but we don't
> >> necessarily have that.  Considering the mechanism for establishing a
> >> new subscription stream (defined as aggregated individual activities
> >> pulled from a varying array of sources) is POSTing to the Activity
> >> Streams server to establish the channel (currently just a
> >> subscriptions=url1,url2,url3 is the over simplified mechanism)...what
> >> would people see as a reasonable way to establish subscriptions?  List
> >> of userIds? Subjects?  How should these be represented?  I was
> >> thinking of a JSON object, but any one have other thoughts?
> >>
> >> Jason
> >>
> >
> > One idea would be take some inspiration from how JIRA lets you (in
> effect)
> > create a WHERE clause that looks at any fields (in all the activities
> > flowing through the server) that you want.
> >
> > Example filter criteria
> > * provider.id = 'xxx' // Filter on a particular provider
> > * verb = 'yyy'
> > * object.type = 'blogpost'
> > and you'd want to accept more than one value (effectively creating OR or
> IN
> > type clauses).
> >
> > For completeness, I'd want to be able to specify more than one filter
> > expression in the same subscription.
> >
> > Craig
>

Re: Streams Subscriptions

Posted by Jason Letourneau <jl...@gmail.com>.

I really like this - this is somewhat what I was getting at with the
JSON object i.e. POST:
{
"subscriptions":
[{"activityField":"value"},
{"activityField":"value",
 "anotherActivityField":"value" }
]
}

On Thu, Jan 31, 2013 at 4:32 PM, Craig McClanahan <cr...@gmail.com> wrote:
> On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
> <jl...@gmail.com>wrote:
>
>> I am curious on the group's thinking about subscriptions to activity
>> streams.  As I am stubbing out the end-to-end heartbeat on my proposed
>> architecture, I've just been working with URL sources as the
>> subscription mode.  Obviously this is a way over-simplification.
>>
>> I know for shindig the social graph can be used, but we don't
>> necessarily have that.  Considering the mechanism for establishing a
>> new subscription stream (defined as aggregated individual activities
>> pulled from a varying array of sources) is POSTing to the Activity
>> Streams server to establish the channel (currently just a
>> subscriptions=url1,url2,url3 is the over simplified mechanism)...what
>> would people see as a reasonable way to establish subscriptions?  List
>> of userIds? Subjects?  How should these be represented?  I was
>> thinking of a JSON object, but any one have other thoughts?
>>
>> Jason
>>
>
> One idea would be take some inspiration from how JIRA lets you (in effect)
> create a WHERE clause that looks at any fields (in all the activities
> flowing through the server) that you want.
>
> Example filter criteria
> * provider.id = 'xxx' // Filter on a particular provider
> * verb = 'yyy'
> * object.type = 'blogpost'
> and you'd want to accept more than one value (effectively creating OR or IN
> type clauses).
>
> For completeness, I'd want to be able to specify more than one filter
> expression in the same subscription.
>
> Craig

Re: Streams Subscriptions

Posted by Craig McClanahan <cr...@gmail.com>.

On Thu, Jan 31, 2013 at 12:00 PM, Jason Letourneau
<jl...@gmail.com>wrote:

> I am curious on the group's thinking about subscriptions to activity
> streams.  As I am stubbing out the end-to-end heartbeat on my proposed
> architecture, I've just been working with URL sources as the
> subscription mode.  Obviously this is a way over-simplification.
>
> I know for shindig the social graph can be used, but we don't
> necessarily have that.  Considering the mechanism for establishing a
> new subscription stream (defined as aggregated individual activities
> pulled from a varying array of sources) is POSTing to the Activity
> Streams server to establish the channel (currently just a
> subscriptions=url1,url2,url3 is the over simplified mechanism)...what
> would people see as a reasonable way to establish subscriptions?  List
> of userIds? Subjects?  How should these be represented?  I was
> thinking of a JSON object, but any one have other thoughts?
>
> Jason
>

One idea would be take some inspiration from how JIRA lets you (in effect)
create a WHERE clause that looks at any fields (in all the activities
flowing through the server) that you want.

Example filter criteria
* provider.id = 'xxx' // Filter on a particular provider
* verb = 'yyy'
* object.type = 'blogpost'
and you'd want to accept more than one value (effectively creating OR or IN
type clauses).

For completeness, I'd want to be able to specify more than one filter
expression in the same subscription.

Craig