You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@community.apache.org by sblackmon <sb...@apache.org> on 2016/11/21 17:48:26 UTC

Social Media Metrics using Apache stack

Hello ComDev,

The Streams podling has been brainstorming ways to increase awareness of the project and it’s capabilities.  We’ve also been working to make it easier to get started as a user, without starting the journey by downloading JDK Maven and friends.  Using the software to provide benefit to the Foundation seems like a good thing to try.

One use case for Streams is to build personal or organizational datasets of social media profiles and content for internal development and analysis, using the technologies and tools you and your organization prefer, rather than those provided by the upstream system.

I took the liberty of creating a few Zeppelin notebooks which collect Apache project profiles and posts, normalize them to activity streams format, and interact with them using spark data frames.

The notebooks are currently hosted in my zeppelinhub account, which anyone with the link below can access.  

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC9lNzQzZjRkZGVkMGY0YjA3YTkzZTQ2NWFkYjU2ZTQxOS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC8zZmQ3M2Y1OWEzOGE0YmM2YjFkMGM4MzBkNTczZDU0Mi9ub3RlLmpzb24

If this group sees potential benefit, I’d be happy to work to set them up for use by anyone at Apache in a dedicated Zeppelin deployment and take the lead on maintaining them going forward.

In any case we’d appreciate any feedback on what could would make this prototype more valuable..

Background on Streams:

Apache Streams (incubating) unifies a diverse world of digital profiles and online activities into common formats and vocabularies, and makes these datasets accessible across a variety of databases, devices, and platforms for streaming, browsing, search, sharing, and analytics use-cases.

Streams contains libraries and patterns for specifying, publishing, and inter-linking schemas, and assists with conversion of activities (posts, shares, likes, follows, etc.) and objects (profiles, pages, photos, videos, etc.) between the representation, format, and encoding preferred by supported data providers (Twitter, Instagram, etc.), and storage services (Cassandra, Elasticsearch, HBase, HDFS, Neo4J, etc.)

In theory pretty much any JSON or XML API which uses a "look-up by ID and type” model can be co-erced into collections of activity-streams normalized profiles and posts - systems such as GitHub, JIRA, MeetUp could be added to the roadmap and have notebooks created once those providers are built.

Re: Social Media Metrics using Apache stack

Posted by Franco Perruna <fr...@gmail.com>.
Yes Yes

Am 15.12.2016 11:41 nachm. schrieb "sblackmon" <sb...@apache.org>:

> Hello,
>
> Writing to let you all know I’ve added a google plus apache-related
> profile / post collection notebook here:
>
> https://www.zeppelinhub.com/viewer/notebooks/
> bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2Fy
> ZC84ZGY0NzQwNWNhMGY0Mjg0OGFlZmJmNTgxMjI5ZjhlZi9ub3RlLmpzb24
>
> Just checked and the Twitter, Facebook, and YouTube notebooks are still
> published.
>
> There are several people on the streams list who know how to update these
> - whether that means just refreshing the statistics or altering the code.
>
> We have providers for github.com and meetup.com on our roadmap, both of
> which could be interesting to this group.
>
> If there are other third-party APIs where metrics about Apache projects
> and/or contributors are stored, we’d be happy to add those to the roadmap
> as well.
>
> Best,
> Steve
> On November 21, 2016 at 11:48:27 AM, sblackmon (sblackmon@apache.org)
> wrote:
>
> Hello ComDev,
>
> The Streams podling has been brainstorming ways to increase awareness of
> the project and it’s capabilities.  We’ve also been working to make it
> easier to get started as a user, without starting the journey by
> downloading JDK Maven and friends.  Using the software to provide benefit
> to the Foundation seems like a good thing to try.
>
> One use case for Streams is to build personal or organizational datasets
> of social media profiles and content for internal development and analysis,
> using the technologies and tools you and your organization prefer, rather
> than those provided by the upstream system.
>
> I took the liberty of creating a few Zeppelin notebooks which collect
> Apache project profiles and posts, normalize them to activity streams
> format, and interact with them using spark data frames.
>
> The notebooks are currently hosted in my zeppelinhub account, which anyone
> with the link below can access.
>
> https://www.zeppelinhub.com/viewer/notebooks/
> bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2Fy
> ZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24
>
> https://www.zeppelinhub.com/viewer/notebooks/
> bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2Fy
> ZC9lNzQzZjRkZGVkMGY0YjA3YTkzZTQ2NWFkYjU2ZTQxOS9ub3RlLmpzb24
>
> https://www.zeppelinhub.com/viewer/notebooks/
> bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2Fy
> ZC8zZmQ3M2Y1OWEzOGE0YmM2YjFkMGM4MzBkNTczZDU0Mi9ub3RlLmpzb24
>
> If this group sees potential benefit, I’d be happy to work to set them up
> for use by anyone at Apache in a dedicated Zeppelin deployment and take the
> lead on maintaining them going forward.
>
> In any case we’d appreciate any feedback on what could would make this
> prototype more valuable..
>
> Background on Streams:
>
> Apache Streams (incubating) unifies a diverse world of digital profiles
> and online activities into common formats and vocabularies, and makes these
> datasets accessible across a variety of databases, devices, and platforms
> for streaming, browsing, search, sharing, and analytics use-cases.
>
> Streams contains libraries and patterns for specifying, publishing, and
> inter-linking schemas, and assists with conversion of activities (posts,
> shares, likes, follows, etc.) and objects (profiles, pages, photos, videos,
> etc.) between the representation, format, and encoding preferred by
> supported data providers (Twitter, Instagram, etc.), and storage services
> (Cassandra, Elasticsearch, HBase, HDFS, Neo4J, etc.)
>
> In theory pretty much any JSON or XML API which uses a "look-up by ID and
> type” model can be co-erced into collections of activity-streams normalized
> profiles and posts - systems such as GitHub, JIRA, MeetUp could be added to
> the roadmap and have notebooks created once those providers are built.

Re: Social Media Metrics using Apache stack

Posted by sblackmon <sb...@apache.org>.
Hello,

Writing to let you all know I’ve added a google plus apache-related profile / post collection notebook here:

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84ZGY0NzQwNWNhMGY0Mjg0OGFlZmJmNTgxMjI5ZjhlZi9ub3RlLmpzb24

Just checked and the Twitter, Facebook, and YouTube notebooks are still published.  

There are several people on the streams list who know how to update these - whether that means just refreshing the statistics or altering the code.

We have providers for github.com and meetup.com on our roadmap, both of which could be interesting to this group.

If there are other third-party APIs where metrics about Apache projects and/or contributors are stored, we’d be happy to add those to the roadmap as well.

Best,
Steve
On November 21, 2016 at 11:48:27 AM, sblackmon (sblackmon@apache.org) wrote:

Hello ComDev,

The Streams podling has been brainstorming ways to increase awareness of the project and it’s capabilities.  We’ve also been working to make it easier to get started as a user, without starting the journey by downloading JDK Maven and friends.  Using the software to provide benefit to the Foundation seems like a good thing to try.

One use case for Streams is to build personal or organizational datasets of social media profiles and content for internal development and analysis, using the technologies and tools you and your organization prefer, rather than those provided by the upstream system.

I took the liberty of creating a few Zeppelin notebooks which collect Apache project profiles and posts, normalize them to activity streams format, and interact with them using spark data frames.

The notebooks are currently hosted in my zeppelinhub account, which anyone with the link below can access.  

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC9lNzQzZjRkZGVkMGY0YjA3YTkzZTQ2NWFkYjU2ZTQxOS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC8zZmQ3M2Y1OWEzOGE0YmM2YjFkMGM4MzBkNTczZDU0Mi9ub3RlLmpzb24

If this group sees potential benefit, I’d be happy to work to set them up for use by anyone at Apache in a dedicated Zeppelin deployment and take the lead on maintaining them going forward.

In any case we’d appreciate any feedback on what could would make this prototype more valuable..

Background on Streams:

Apache Streams (incubating) unifies a diverse world of digital profiles and online activities into common formats and vocabularies, and makes these datasets accessible across a variety of databases, devices, and platforms for streaming, browsing, search, sharing, and analytics use-cases.

Streams contains libraries and patterns for specifying, publishing, and inter-linking schemas, and assists with conversion of activities (posts, shares, likes, follows, etc.) and objects (profiles, pages, photos, videos, etc.) between the representation, format, and encoding preferred by supported data providers (Twitter, Instagram, etc.), and storage services (Cassandra, Elasticsearch, HBase, HDFS, Neo4J, etc.)

In theory pretty much any JSON or XML API which uses a "look-up by ID and type” model can be co-erced into collections of activity-streams normalized profiles and posts - systems such as GitHub, JIRA, MeetUp could be added to the roadmap and have notebooks created once those providers are built.

Re: Social Media Metrics using Apache stack

Posted by sblackmon <sb...@apache.org>.
Hello,

Writing to let you all know I’ve added a google plus apache-related profile / post collection notebook here:

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84ZGY0NzQwNWNhMGY0Mjg0OGFlZmJmNTgxMjI5ZjhlZi9ub3RlLmpzb24

Just checked and the Twitter, Facebook, and YouTube notebooks are still published.  

There are several people on the streams list who know how to update these - whether that means just refreshing the statistics or altering the code.

We have providers for github.com and meetup.com on our roadmap, both of which could be interesting to this group.

If there are other third-party APIs where metrics about Apache projects and/or contributors are stored, we’d be happy to add those to the roadmap as well.

Best,
Steve
On November 21, 2016 at 11:48:27 AM, sblackmon (sblackmon@apache.org) wrote:

Hello ComDev,

The Streams podling has been brainstorming ways to increase awareness of the project and it’s capabilities.  We’ve also been working to make it easier to get started as a user, without starting the journey by downloading JDK Maven and friends.  Using the software to provide benefit to the Foundation seems like a good thing to try.

One use case for Streams is to build personal or organizational datasets of social media profiles and content for internal development and analysis, using the technologies and tools you and your organization prefer, rather than those provided by the upstream system.

I took the liberty of creating a few Zeppelin notebooks which collect Apache project profiles and posts, normalize them to activity streams format, and interact with them using spark data frames.

The notebooks are currently hosted in my zeppelinhub account, which anyone with the link below can access.  

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC9lNzQzZjRkZGVkMGY0YjA3YTkzZTQ2NWFkYjU2ZTQxOS9ub3RlLmpzb24

https://www.zeppelinhub.com/viewer/notebooks/bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2FyZC8zZmQ3M2Y1OWEzOGE0YmM2YjFkMGM4MzBkNTczZDU0Mi9ub3RlLmpzb24

If this group sees potential benefit, I’d be happy to work to set them up for use by anyone at Apache in a dedicated Zeppelin deployment and take the lead on maintaining them going forward.

In any case we’d appreciate any feedback on what could would make this prototype more valuable..

Background on Streams:

Apache Streams (incubating) unifies a diverse world of digital profiles and online activities into common formats and vocabularies, and makes these datasets accessible across a variety of databases, devices, and platforms for streaming, browsing, search, sharing, and analytics use-cases.

Streams contains libraries and patterns for specifying, publishing, and inter-linking schemas, and assists with conversion of activities (posts, shares, likes, follows, etc.) and objects (profiles, pages, photos, videos, etc.) between the representation, format, and encoding preferred by supported data providers (Twitter, Instagram, etc.), and storage services (Cassandra, Elasticsearch, HBase, HDFS, Neo4J, etc.)

In theory pretty much any JSON or XML API which uses a "look-up by ID and type” model can be co-erced into collections of activity-streams normalized profiles and posts - systems such as GitHub, JIRA, MeetUp could be added to the roadmap and have notebooks created once those providers are built.

Re: Social Media Metrics using Apache stack

Posted by Franco Perruna <fr...@gmail.com>.

Re: Social Media Metrics using Apache stack

Posted by Rimon Chowdhury <ri...@gmail.com>.
On Monday, November 21, 2016, sblackmon <sb...@apache.org> wrote:

> Hello ComDev,
>
> The Streams podling has been brainstorming ways to increase awareness of
> the project and it’s capabilities.  We’ve also been working to make it
> easier to get started as a user, without starting the journey by
> downloading JDK Maven and friends.  Using the software to provide benefit
> to the Foundation seems like a good thing to try.
>
> One use case for Streams is to build personal or organizational datasets
> of social media profiles and content for internal development and analysis,
> using the technologies and tools you and your organization prefer, rather
> than those provided by the upstream system.
>
> I took the liberty of creating a few Zeppelin notebooks which collect
> Apache project profiles and posts, normalize them to activity streams
> format, and interact with them using spark data frames.
>
> The notebooks are currently hosted in my zeppelinhub account, which anyone
> with the link below can access.
>
> https://www.zeppelinhub.com/viewer/notebooks/
> bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2Fy
> ZC84YjQ5YmY3MWIxYTU0ZTE2YjlkMDQyMTliMzNlMjQzYS9ub3RlLmpzb24
>
> https://www.zeppelinhub.com/viewer/notebooks/
> bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2Fy
> ZC9lNzQzZjRkZGVkMGY0YjA3YTkzZTQ2NWFkYjU2ZTQxOS9ub3RlLmpzb24
>
> https://www.zeppelinhub.com/viewer/notebooks/
> bm90ZTovL3N0ZXZlYmxhY2ttb24vYXBhY2hlLXplcHBlbGluLWRhc2hib2Fy
> ZC8zZmQ3M2Y1OWEzOGE0YmM2YjFkMGM4MzBkNTczZDU0Mi9ub3RlLmpzb24
>
> If this group sees potential benefit, I’d be happy to work to set them up
> for use by anyone at Apache in a dedicated Zeppelin deployment and take the
> lead on maintaining them going forward.
>
> In any case we’d appreciate any feedback on what could would make this
> prototype more valuable..
>
> Background on Streams:
>
> Apache Streams (incubating) unifies a diverse world of digital profiles
> and online activities into common formats and vocabularies, and makes these
> datasets accessible across a variety of databases, devices, and platforms
> for streaming, browsing, search, sharing, and analytics use-cases.
>
> Streams contains libraries and patterns for specifying, publishing, and
> inter-linking schemas, and assists with conversion of activities (posts,
> shares, likes, follows, etc.) and objects (profiles, pages, photos, videos,
> etc.) between the representation, format, and encoding preferred by
> supported data providers (Twitter, Instagram, etc.), and storage services
> (Cassandra, Elasticsearch, HBase, HDFS, Neo4J, etc.)
>
> In theory pretty much any JSON or XML API which uses a "look-up by ID and
> type” model can be co-erced into collections of activity-streams normalized
> profiles and posts - systems such as GitHub, JIRA, MeetUp could be added to
> the roadmap and have notebooks created once those providers are built.



-- 
Sent from Gmail Mobile