You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@metron.apache.org by Bryan Taylor <bt...@rackspace.com> on 2015/12/09 18:52:28 UTC

Hello

Hi Folks,

I just joined the list and thought I'd say "hi". I work at Rackspace and will be joining Andrew Hartnett's team there and hacking on metron. This is my first ASF project and I'm looking forward to being part of this community.

I'm curious what the development vision is for metron. What do people like and not like about the codebase? I gather this code transitioned from a Cisco internal project and is incubating now at the ASF. Are there any code changes that need to be made to support incubation?

Bryan

Re: Hello

Posted by "Marc Solanas Tarre (msolanas)" <ms...@cisco.com>.

Hey!
Also from Cisco, worked with Debo implementing a similar pipeline and looking forward to contributing back to Metron.

-M.




On 12/9/15, 11:02 AM, "Debo Dutta (dedutta)" <de...@cisco.com> wrote:

>Another hello: Am from Cisco and I have interacted with James Sirota and
>the team on OpenSOC in Summer 2014 when we (a small team) were building a
>similar pipeline for our ops. Now that Metron has incubated, we would like
>to contribute back.
>
>debo
>
>On 12/9/15, 9:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>Hi Folks,
>>
>>I just joined the list and thought I'd say "hi". I work at Rackspace and
>>will be joining Andrew Hartnett's team there and hacking on metron. This
>>is my first ASF project and I'm looking forward to being part of this
>>community.
>>
>>I'm curious what the development vision is for metron. What do people
>>like and not like about the codebase? I gather this code transitioned
>>from a Cisco internal project and is incubating now at the ASF. Are there
>>any code changes that need to be made to support incubation?
>>
>>Bryan
>

Re: Hello

Posted by "Johnu George (johnugeo)" <jo...@cisco.com>.

Hey All,
         I am from Cisco. Worked under Debo building a similar pipeline. I am looking forward to contribute and be part of the community.



Thanks,
Johnu

On 12/9/15, 11:02 AM, "Debo Dutta (dedutta)" <de...@cisco.com> wrote:

>Another hello: Am from Cisco and I have interacted with James Sirota and
>the team on OpenSOC in Summer 2014 when we (a small team) were building a
>similar pipeline for our ops. Now that Metron has incubated, we would like
>to contribute back.
>
>debo
>
>On 12/9/15, 9:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>Hi Folks,
>>
>>I just joined the list and thought I'd say "hi". I work at Rackspace and
>>will be joining Andrew Hartnett's team there and hacking on metron. This
>>is my first ASF project and I'm looking forward to being part of this
>>community.
>>
>>I'm curious what the development vision is for metron. What do people
>>like and not like about the codebase? I gather this code transitioned
>>from a Cisco internal project and is incubating now at the ASF. Are there
>>any code changes that need to be made to support incubation?
>>
>>Bryan
>

Re: Hello

Posted by "Kai Zhang (kazhang2)" <ka...@cisco.com>.

Hi Folks,

Yet another hello: Am from Cisco and I also worked under Debo building data pipeline.
Looking forward to contributing to the community.

Thanks,
Kai



On 12/9/15, 11:02 AM, "Debo Dutta (dedutta)" <de...@cisco.com> wrote:

>Another hello: Am from Cisco and I have interacted with James Sirota and
>the team on OpenSOC in Summer 2014 when we (a small team) were building a
>similar pipeline for our ops. Now that Metron has incubated, we would like
>to contribute back.
>
>debo
>
>On 12/9/15, 9:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>Hi Folks,
>>
>>I just joined the list and thought I'd say "hi". I work at Rackspace and
>>will be joining Andrew Hartnett's team there and hacking on metron. This
>>is my first ASF project and I'm looking forward to being part of this
>>community.
>>
>>I'm curious what the development vision is for metron. What do people
>>like and not like about the codebase? I gather this code transitioned
>>from a Cisco internal project and is incubating now at the ASF. Are there
>>any code changes that need to be made to support incubation?
>>
>>Bryan
>

Re: Hello

Posted by "Debo Dutta (dedutta)" <de...@cisco.com>.

Another hello: Am from Cisco and I have interacted with James Sirota and
the team on OpenSOC in Summer 2014 when we (a small team) were building a
similar pipeline for our ops. Now that Metron has incubated, we would like
to contribute back.

debo

On 12/9/15, 9:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>Hi Folks,
>
>I just joined the list and thought I'd say "hi". I work at Rackspace and
>will be joining Andrew Hartnett's team there and hacking on metron. This
>is my first ASF project and I'm looking forward to being part of this
>community.
>
>I'm curious what the development vision is for metron. What do people
>like and not like about the codebase? I gather this code transitioned
>from a Cisco internal project and is incubating now at the ASF. Are there
>any code changes that need to be made to support incubation?
>
>Bryan

Re: Hello

Posted by Sheetal Dolas <sh...@hortonworks.com>.

You would need to think of the JVM memory of a task (storm bolt in this case). For various practical reasons a task bolt heap size will typically be in the range of 2-8G, that will be shared by multiple instances/threads of a bolt in that JVM . Having more memory on a host, you would probably try to run more tasks per node than just using large memory for one task.
So I would say if system can run reliably with 2-4 Gs of heap, it has higher chances of running in most environments including dev, test etc where users may have limited resources.

- Sheetal





On 12/9/15, 3:35 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>
>The GeoLite site says they update once a month, so I assume something can
>check for this and grab the new file. It seems like a fun problem to have
>this also trigger a rebuild of the in-memory cache and swap it out live.
>This seems like it would be a useful streaming enrichment pattern, where
>the configuration data for the enrichment changes.
>
>This does raise another interesting question about what do we expect the
>memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
>by itself, but when is it worth it? and how do operators take advantage of
>more system memory if they have it.
>
>
>On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:
>
>>Hi Bryan,
>>
>>We had HSQLDB at one point, but we were struggling to make these bolts
>>reliable.  Also, the geo data needs to be periodically updated and it¹s
>>easier to do when it¹s decoupled.
>>
>>Thanks,
>>James
>>
>>
>>
>>On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>
>>>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>>>with our email folks]
>>>
>>>Nice.
>>>
>>>I was just looking around at the geotagging enrichment adapter. The city
>>>data is split between the two files of 70Mb and 40Mb sizes. It seems like
>>>the data is small enough to just load it all into memory. This would
>>>eliminate two SQL queries for every event.
>>>
>>>Bryan
>>>
>>>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>>
>>>>Hi Bryan,
>>>>
>>>>For automation, B23 is planning to contribute Ansible scripts to deploy
>>>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC
>>>>UI
>>>>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>>>database for geotagging. We template the OpenSOC_Config files so that we
>>>>can use variable injection for the different services: Zookeeper, Hbase,
>>>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>>different, but I think this will be a really good start.
>>>>
>>>>I'm in the process of decoupling the scripts from our internal tooling -
>>>>I should be able to make available the Ambari stuff later this week.
>>>>Once
>>>>we merge the disparate forks of the Cisco codebase, there will be some
>>>>work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>>not
>>>>a ton.
>>>>
>>>>Mark
>>>>
>>>>
>>>>
>>>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>
>>>>>
>>>>>Seems like a good list. I'm probably not your UI guy, but I'll be happy
>>>>>to
>>>>>help with any of the others. Test coverage is probably a good place for
>>>>>me
>>>>>to start learning my way around. Do we have CI set up? I see a travis
>>>>>file
>>>>>in the code.
>>>>>
>>>>>Regarding automation deployments, are we targeting Ambari or something
>>>>>else? 
>>>>>
>>>>>On the hadoop component compatibility, I see from the opens-streaming
>>>>>pom
>>>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>>0.98.0-hadoop2. These are all several iterations old. How aggressive do
>>>>>we
>>>>>want to be, generally, with tracking new releases? Are Hive, Flume, and
>>>>>Spark also going to up rev for us?
>>>>>
>>>>>Bryan
>>>>>
>>>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>>
>>>>>>Hi Brian,
>>>>>>
>>>>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>>>>code
>>>>>>base that the community will contribute back shortly.  We are waiting
>>>>>>for
>>>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>>>think
>>>>>>the overall feeling is that we need to make the code base compatible
>>>>>>with
>>>>>>the latest version of HDP, automate deployments, increase test
>>>>>>coverage,
>>>>>>and start working on a new UI.  There may be more significant
>>>>>>architectural changes to the code base, but we need to get the
>>>>>>essential
>>>>>>items knocked out before we go there.
>>>>>>
>>>>>>Thanks,
>>>>>>James
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>
>>>>>>>Hi Folks,
>>>>>>>
>>>>>>>I just joined the list and thought I'd say "hi". I work at Rackspace
>>>>>>>and
>>>>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>>>This
>>>>>>>is my first ASF project and I'm looking forward to being part of this
>>>>>>>community.
>>>>>>>
>>>>>>>I'm curious what the development vision is for metron. What do people
>>>>>>>like and not like about the codebase? I gather this code transitioned
>>>>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>>>>there any code changes that need to be made to support incubation?
>>>>>>>
>>>>>>>Bryan
>>>>>
>>>
>>>
>
>

Re: Hello

Posted by Mark Bittmann <ma...@b23.io>.

I'm in favor of removing the MySQL database if we can find a way. We reimplemented the geo-enrichment in Spark (not recommending that, just indicating that it isn't too difficult). One concern with reading into JVM memory is that the full maxmind GeoIP database is larger than the free GeoLite version. Production deployments would likely use the licensed database. Unfortunately I don't have access to that version anymore and cant remember the total size, so I don't know whether an in-mem solution (JVM or otherwise) would be feasible. 



On 12/9/15, 6:35 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>
>The GeoLite site says they update once a month, so I assume something can
>check for this and grab the new file. It seems like a fun problem to have
>this also trigger a rebuild of the in-memory cache and swap it out live.
>This seems like it would be a useful streaming enrichment pattern, where
>the configuration data for the enrichment changes.
>
>This does raise another interesting question about what do we expect the
>memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
>by itself, but when is it worth it? and how do operators take advantage of
>more system memory if they have it.
>
>
>On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:
>
>>Hi Bryan,
>>
>>We had HSQLDB at one point, but we were struggling to make these bolts
>>reliable.  Also, the geo data needs to be periodically updated and it¹s
>>easier to do when it¹s decoupled.
>>
>>Thanks,
>>James
>>
>>
>>
>>On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>
>>>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>>>with our email folks]
>>>
>>>Nice.
>>>
>>>I was just looking around at the geotagging enrichment adapter. The city
>>>data is split between the two files of 70Mb and 40Mb sizes. It seems like
>>>the data is small enough to just load it all into memory. This would
>>>eliminate two SQL queries for every event.
>>>
>>>Bryan
>>>
>>>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>>
>>>>Hi Bryan,
>>>>
>>>>For automation, B23 is planning to contribute Ansible scripts to deploy
>>>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC
>>>>UI
>>>>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>>>database for geotagging. We template the OpenSOC_Config files so that we
>>>>can use variable injection for the different services: Zookeeper, Hbase,
>>>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>>different, but I think this will be a really good start.
>>>>
>>>>I'm in the process of decoupling the scripts from our internal tooling -
>>>>I should be able to make available the Ambari stuff later this week.
>>>>Once
>>>>we merge the disparate forks of the Cisco codebase, there will be some
>>>>work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>>not
>>>>a ton.
>>>>
>>>>Mark
>>>>
>>>>
>>>>
>>>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>
>>>>>
>>>>>Seems like a good list. I'm probably not your UI guy, but I'll be happy
>>>>>to
>>>>>help with any of the others. Test coverage is probably a good place for
>>>>>me
>>>>>to start learning my way around. Do we have CI set up? I see a travis
>>>>>file
>>>>>in the code.
>>>>>
>>>>>Regarding automation deployments, are we targeting Ambari or something
>>>>>else? 
>>>>>
>>>>>On the hadoop component compatibility, I see from the opens-streaming
>>>>>pom
>>>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>>0.98.0-hadoop2. These are all several iterations old. How aggressive do
>>>>>we
>>>>>want to be, generally, with tracking new releases? Are Hive, Flume, and
>>>>>Spark also going to up rev for us?
>>>>>
>>>>>Bryan
>>>>>
>>>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>>
>>>>>>Hi Brian,
>>>>>>
>>>>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>>>>code
>>>>>>base that the community will contribute back shortly.  We are waiting
>>>>>>for
>>>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>>>think
>>>>>>the overall feeling is that we need to make the code base compatible
>>>>>>with
>>>>>>the latest version of HDP, automate deployments, increase test
>>>>>>coverage,
>>>>>>and start working on a new UI.  There may be more significant
>>>>>>architectural changes to the code base, but we need to get the
>>>>>>essential
>>>>>>items knocked out before we go there.
>>>>>>
>>>>>>Thanks,
>>>>>>James
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>
>>>>>>>Hi Folks,
>>>>>>>
>>>>>>>I just joined the list and thought I'd say "hi". I work at Rackspace
>>>>>>>and
>>>>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>>>This
>>>>>>>is my first ASF project and I'm looking forward to being part of this
>>>>>>>community.
>>>>>>>
>>>>>>>I'm curious what the development vision is for metron. What do people
>>>>>>>like and not like about the codebase? I gather this code transitioned
>>>>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>>>>there any code changes that need to be made to support incubation?
>>>>>>>
>>>>>>>Bryan
>>>>>
>>>
>>>
>

Re: Hello

Posted by Bryan Taylor <bt...@rackspace.com>.

Good stuff. I see the patterns you are discussing on those slides as
coming up a lot for us.


On 12/9/15 6:25 PM, "Sheetal Dolas" <sh...@hortonworks.com> wrote:

>Additionally this might give some ideas (slide 14 onwards) for handling
>these types of problems.
>
>http://www.slideshare.net/Hadoop_Summit/design-patterns-for-real-time-stre
>aming-data-analytics
>
>
>
>
>
>
>On 12/9/15, 4:20 PM, "James Sirota" <js...@hortonworks.com> wrote:
>
>>So the nature of the problem was that as we were processing ~1.3 million
>>of messages per second the time it took for the in-memory DB to update
>>caused the Storm tuples to back up to a point where this would bring
>>down the topology.  We also had problems during initialization.  I don’t
>>know if this feature exists now, but at the time we couldn’t figure out
>>a way to have the topology deploy and wait for all the instances of Geo
>>bolt to finish reading their data and signal back that they were ready.
>>So at initialization they would get blasted with tuples and fall over.
>>We solved that problem at the time by delaying our ingest 30 seconds to
>>give the topology a chance to fully come up.  But eventually we decided
>>we needed to simplify things so we abandoned the in-memory route.
>>
>>Thanks,
>>James    
>>
>>
>>
>>On 12/9/15, 5:35 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>
>>>
>>>The GeoLite site says they update once a month, so I assume something
>>>can
>>>check for this and grab the new file. It seems like a fun problem to
>>>have
>>>this also trigger a rebuild of the in-memory cache and swap it out live.
>>>This seems like it would be a useful streaming enrichment pattern, where
>>>the configuration data for the enrichment changes.
>>>
>>>This does raise another interesting question about what do we expect the
>>>memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
>>>by itself, but when is it worth it? and how do operators take advantage
>>>of
>>>more system memory if they have it.
>>>
>>>
>>>On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>
>>>>Hi Bryan,
>>>>
>>>>We had HSQLDB at one point, but we were struggling to make these bolts
>>>>reliable.  Also, the geo data needs to be periodically updated and it¹s
>>>>easier to do when it¹s decoupled.
>>>>
>>>>Thanks,
>>>>James
>>>>
>>>>
>>>>
>>>>On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>
>>>>>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to
>>>>>fix
>>>>>with our email folks]
>>>>>
>>>>>Nice.
>>>>>
>>>>>I was just looking around at the geotagging enrichment adapter. The
>>>>>city
>>>>>data is split between the two files of 70Mb and 40Mb sizes. It seems
>>>>>like
>>>>>the data is small enough to just load it all into memory. This would
>>>>>eliminate two SQL queries for every event.
>>>>>
>>>>>Bryan
>>>>>
>>>>>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>>>>
>>>>>>Hi Bryan,
>>>>>>
>>>>>>For automation, B23 is planning to contribute Ansible scripts to
>>>>>>deploy
>>>>>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>>>>ecosystem. We also install Elasticsearch, configure the legacy
>>>>>>OpenSOC
>>>>>>UI
>>>>>>(based on Kibana/nodejs), create directories in hdfs, populate a
>>>>>>MySQL
>>>>>>database for geotagging. We template the OpenSOC_Config files so
>>>>>>that we
>>>>>>can use variable injection for the different services: Zookeeper,
>>>>>>Hbase,
>>>>>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>>>>different, but I think this will be a really good start.
>>>>>>
>>>>>>I'm in the process of decoupling the scripts from our internal
>>>>>>tooling -
>>>>>>I should be able to make available the Ambari stuff later this week.
>>>>>>Once
>>>>>>we merge the disparate forks of the Cisco codebase, there will be
>>>>>>some
>>>>>>work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>>>>not
>>>>>>a ton.
>>>>>>
>>>>>>Mark
>>>>>>
>>>>>>
>>>>>>
>>>>>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>
>>>>>>>
>>>>>>>Seems like a good list. I'm probably not your UI guy, but I'll be
>>>>>>>happy
>>>>>>>to
>>>>>>>help with any of the others. Test coverage is probably a good place
>>>>>>>for
>>>>>>>me
>>>>>>>to start learning my way around. Do we have CI set up? I see a
>>>>>>>travis
>>>>>>>file
>>>>>>>in the code.
>>>>>>>
>>>>>>>Regarding automation deployments, are we targeting Ambari or
>>>>>>>something
>>>>>>>else? 
>>>>>>>
>>>>>>>On the hadoop component compatibility, I see from the
>>>>>>>opens-streaming
>>>>>>>pom
>>>>>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>>>>0.98.0-hadoop2. These are all several iterations old. How
>>>>>>>aggressive do
>>>>>>>we
>>>>>>>want to be, generally, with tracking new releases? Are Hive, Flume,
>>>>>>>and
>>>>>>>Spark also going to up rev for us?
>>>>>>>
>>>>>>>Bryan
>>>>>>>
>>>>>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>>>>
>>>>>>>>Hi Brian,
>>>>>>>>
>>>>>>>>Welcome.  Glad to have you contribute.  There will be changes to
>>>>>>>>the
>>>>>>>>code
>>>>>>>>base that the community will contribute back shortly.  We are
>>>>>>>>waiting
>>>>>>>>for
>>>>>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>>>>>think
>>>>>>>>the overall feeling is that we need to make the code base
>>>>>>>>compatible
>>>>>>>>with
>>>>>>>>the latest version of HDP, automate deployments, increase test
>>>>>>>>coverage,
>>>>>>>>and start working on a new UI.  There may be more significant
>>>>>>>>architectural changes to the code base, but we need to get the
>>>>>>>>essential
>>>>>>>>items knocked out before we go there.
>>>>>>>>
>>>>>>>>Thanks,
>>>>>>>>James
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>>>
>>>>>>>>>Hi Folks,
>>>>>>>>>
>>>>>>>>>I just joined the list and thought I'd say "hi". I work at
>>>>>>>>>Rackspace
>>>>>>>>>and
>>>>>>>>>will be joining Andrew Hartnett's team there and hacking on
>>>>>>>>>metron.
>>>>>>>>>This
>>>>>>>>>is my first ASF project and I'm looking forward to being part of
>>>>>>>>>this
>>>>>>>>>community.
>>>>>>>>>
>>>>>>>>>I'm curious what the development vision is for metron. What do
>>>>>>>>>people
>>>>>>>>>like and not like about the codebase? I gather this code
>>>>>>>>>transitioned
>>>>>>>>>from a Cisco internal project and is incubating now at the ASF.
>>>>>>>>>Are
>>>>>>>>>there any code changes that need to be made to support incubation?
>>>>>>>>>
>>>>>>>>>Bryan
>>>>>>>
>>>>>
>>>>>
>>>
>>>

Re: Hello

Posted by Sheetal Dolas <sh...@hortonworks.com>.

Additionally this might give some ideas (slide 14 onwards) for handling these types of problems.

http://www.slideshare.net/Hadoop_Summit/design-patterns-for-real-time-streaming-data-analytics 






On 12/9/15, 4:20 PM, "James Sirota" <js...@hortonworks.com> wrote:

>So the nature of the problem was that as we were processing ~1.3 million of messages per second the time it took for the in-memory DB to update caused the Storm tuples to back up to a point where this would bring down the topology.  We also had problems during initialization.  I don’t know if this feature exists now, but at the time we couldn’t figure out a way to have the topology deploy and wait for all the instances of Geo bolt to finish reading their data and signal back that they were ready.  So at initialization they would get blasted with tuples and fall over.  We solved that problem at the time by delaying our ingest 30 seconds to give the topology a chance to fully come up.  But eventually we decided we needed to simplify things so we abandoned the in-memory route.  
>
>Thanks,
>James    
>
>
>
>On 12/9/15, 5:35 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>
>>The GeoLite site says they update once a month, so I assume something can
>>check for this and grab the new file. It seems like a fun problem to have
>>this also trigger a rebuild of the in-memory cache and swap it out live.
>>This seems like it would be a useful streaming enrichment pattern, where
>>the configuration data for the enrichment changes.
>>
>>This does raise another interesting question about what do we expect the
>>memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
>>by itself, but when is it worth it? and how do operators take advantage of
>>more system memory if they have it.
>>
>>
>>On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>
>>>Hi Bryan,
>>>
>>>We had HSQLDB at one point, but we were struggling to make these bolts
>>>reliable.  Also, the geo data needs to be periodically updated and it¹s
>>>easier to do when it¹s decoupled.
>>>
>>>Thanks,
>>>James
>>>
>>>
>>>
>>>On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>
>>>>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>>>>with our email folks]
>>>>
>>>>Nice.
>>>>
>>>>I was just looking around at the geotagging enrichment adapter. The city
>>>>data is split between the two files of 70Mb and 40Mb sizes. It seems like
>>>>the data is small enough to just load it all into memory. This would
>>>>eliminate two SQL queries for every event.
>>>>
>>>>Bryan
>>>>
>>>>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>>>
>>>>>Hi Bryan,
>>>>>
>>>>>For automation, B23 is planning to contribute Ansible scripts to deploy
>>>>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>>>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC
>>>>>UI
>>>>>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>>>>database for geotagging. We template the OpenSOC_Config files so that we
>>>>>can use variable injection for the different services: Zookeeper, Hbase,
>>>>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>>>different, but I think this will be a really good start.
>>>>>
>>>>>I'm in the process of decoupling the scripts from our internal tooling -
>>>>>I should be able to make available the Ambari stuff later this week.
>>>>>Once
>>>>>we merge the disparate forks of the Cisco codebase, there will be some
>>>>>work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>>>not
>>>>>a ton.
>>>>>
>>>>>Mark
>>>>>
>>>>>
>>>>>
>>>>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>
>>>>>>
>>>>>>Seems like a good list. I'm probably not your UI guy, but I'll be happy
>>>>>>to
>>>>>>help with any of the others. Test coverage is probably a good place for
>>>>>>me
>>>>>>to start learning my way around. Do we have CI set up? I see a travis
>>>>>>file
>>>>>>in the code.
>>>>>>
>>>>>>Regarding automation deployments, are we targeting Ambari or something
>>>>>>else? 
>>>>>>
>>>>>>On the hadoop component compatibility, I see from the opens-streaming
>>>>>>pom
>>>>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>>>0.98.0-hadoop2. These are all several iterations old. How aggressive do
>>>>>>we
>>>>>>want to be, generally, with tracking new releases? Are Hive, Flume, and
>>>>>>Spark also going to up rev for us?
>>>>>>
>>>>>>Bryan
>>>>>>
>>>>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>>>
>>>>>>>Hi Brian,
>>>>>>>
>>>>>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>>>>>code
>>>>>>>base that the community will contribute back shortly.  We are waiting
>>>>>>>for
>>>>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>>>>think
>>>>>>>the overall feeling is that we need to make the code base compatible
>>>>>>>with
>>>>>>>the latest version of HDP, automate deployments, increase test
>>>>>>>coverage,
>>>>>>>and start working on a new UI.  There may be more significant
>>>>>>>architectural changes to the code base, but we need to get the
>>>>>>>essential
>>>>>>>items knocked out before we go there.
>>>>>>>
>>>>>>>Thanks,
>>>>>>>James
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>>
>>>>>>>>Hi Folks,
>>>>>>>>
>>>>>>>>I just joined the list and thought I'd say "hi". I work at Rackspace
>>>>>>>>and
>>>>>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>>>>This
>>>>>>>>is my first ASF project and I'm looking forward to being part of this
>>>>>>>>community.
>>>>>>>>
>>>>>>>>I'm curious what the development vision is for metron. What do people
>>>>>>>>like and not like about the codebase? I gather this code transitioned
>>>>>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>>>>>there any code changes that need to be made to support incubation?
>>>>>>>>
>>>>>>>>Bryan
>>>>>>
>>>>
>>>>
>>
>>

Re: Hello

Posted by "P. Taylor Goetz" <pt...@gmail.com>.

Just a suggestion, but this thread has diverged from the original subject line. Which is fine and happens all the time.

When it does happen, one thing you can/should do is change the subject line (e.g "FooBar (was re: Hello"). That makes it easier for subscribers to notice the discussion branch, and also makes it easy to follow when viewing the email archives.

Just $.02 from a mentor. :)

-Taylor

> On Dec 9, 2015, at 8:14 PM, Bryan Taylor <bt...@rackspace.com> wrote:
> 
> Hi James,
> 
> It seems the solution to "stop the world while we rebuild the cache" is to
> rebuild the new cache out of band in a separate thread/spout and then
> inject a reference to it once it is complete.
> 
> Actually, I'm curious how updating the data works now with MySQL. The
> "Setting Up GeoLite Data" page describes the initial load using a LOAD
> DATA INFILE command in the MySQL shell, which would duplicate records if
> it was run a second time.
> 
> I suppose you could delete and load all the new rows in one big atomic
> transaction, but there could be cute issues if you commit as you go during
> the table data rebuild, since queries that happen along the way would be
> hitting a blend of the two datasets. I've used partition swapping for this
> with some databases, but I'm not sure if MySQL supports that feature. A
> similar idea is to load the new data into a completely new table, but
> expose it via a view and then recompile the view when the new table is
> done. This has to get a lock on the view, which will stop the world while
> the view definition changes, but that's a pretty short duration.
> 
> Bryan
> 
> 
> 
>> On 12/9/15 6:20 PM, "James Sirota" <js...@hortonworks.com> wrote:
>> 
>> So the nature of the problem was that as we were processing ~1.3 million
>> of messages per second the time it took for the in-memory DB to update
>> caused the Storm tuples to back up to a point where this would bring down
>> the topology.  We also had problems during initialization.  I don’t know
>> if this feature exists now, but at the time we couldn’t figure out a way
>> to have the topology deploy and wait for all the instances of Geo bolt to
>> finish reading their data and signal back that they were ready.  So at
>> initialization they would get blasted with tuples and fall over.  We
>> solved that problem at the time by delaying our ingest 30 seconds to give
>> the topology a chance to fully come up.  But eventually we decided we
>> needed to simplify things so we abandoned the in-memory route.
>> 
>> Thanks,
>> James    
>> 
>> 
>> 
>>> On 12/9/15, 5:35 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>> 
>>> 
>>> The GeoLite site says they update once a month, so I assume something can
>>> check for this and grab the new file. It seems like a fun problem to have
>>> this also trigger a rebuild of the in-memory cache and swap it out live.
>>> This seems like it would be a useful streaming enrichment pattern, where
>>> the configuration data for the enrichment changes.
>>> 
>>> This does raise another interesting question about what do we expect the
>>> memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
>>> by itself, but when is it worth it? and how do operators take advantage
>>> of
>>> more system memory if they have it.
>>> 
>>> 
>>>> On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>> 
>>>> Hi Bryan,
>>>> 
>>>> We had HSQLDB at one point, but we were struggling to make these bolts
>>>> reliable.  Also, the geo data needs to be periodically updated and it¹s
>>>> easier to do when it¹s decoupled.
>>>> 
>>>> Thanks,
>>>> James
>>>> 
>>>> 
>>>> 
>>>>> On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>> 
>>>>> [Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>>>>> with our email folks]
>>>>> 
>>>>> Nice.
>>>>> 
>>>>> I was just looking around at the geotagging enrichment adapter. The
>>>>> city
>>>>> data is split between the two files of 70Mb and 40Mb sizes. It seems
>>>>> like
>>>>> the data is small enough to just load it all into memory. This would
>>>>> eliminate two SQL queries for every event.
>>>>> 
>>>>> Bryan
>>>>> 
>>>>>> On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>>>>> 
>>>>>> Hi Bryan,
>>>>>> 
>>>>>> For automation, B23 is planning to contribute Ansible scripts to
>>>>>> deploy
>>>>>> the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>>>> ecosystem. We also install Elasticsearch, configure the legacy OpenSOC
>>>>>> UI
>>>>>> (based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>>>>> database for geotagging. We template the OpenSOC_Config files so that
>>>>>> we
>>>>>> can use variable injection for the different services: Zookeeper,
>>>>>> Hbase,
>>>>>> Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>>>> different, but I think this will be a really good start.
>>>>>> 
>>>>>> I'm in the process of decoupling the scripts from our internal
>>>>>> tooling -
>>>>>> I should be able to make available the Ambari stuff later this week.
>>>>>> Once
>>>>>> we merge the disparate forks of the Cisco codebase, there will be some
>>>>>> work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>>>> not
>>>>>> a ton.
>>>>>> 
>>>>>> Mark
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Seems like a good list. I'm probably not your UI guy, but I'll be
>>>>>>> happy
>>>>>>> to
>>>>>>> help with any of the others. Test coverage is probably a good place
>>>>>>> for
>>>>>>> me
>>>>>>> to start learning my way around. Do we have CI set up? I see a travis
>>>>>>> file
>>>>>>> in the code.
>>>>>>> 
>>>>>>> Regarding automation deployments, are we targeting Ambari or
>>>>>>> something
>>>>>>> else? 
>>>>>>> 
>>>>>>> On the hadoop component compatibility, I see from the opens-streaming
>>>>>>> pom
>>>>>>> that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>>>> 0.98.0-hadoop2. These are all several iterations old. How aggressive
>>>>>>> do
>>>>>>> we
>>>>>>> want to be, generally, with tracking new releases? Are Hive, Flume,
>>>>>>> and
>>>>>>> Spark also going to up rev for us?
>>>>>>> 
>>>>>>> Bryan
>>>>>>> 
>>>>>>>> On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>>>>> 
>>>>>>>> Hi Brian,
>>>>>>>> 
>>>>>>>> Welcome.  Glad to have you contribute.  There will be changes to the
>>>>>>>> code
>>>>>>>> base that the community will contribute back shortly.  We are
>>>>>>>> waiting
>>>>>>>> for
>>>>>>>> the Jira to be setup so the backlog can be created and voted on.  I
>>>>>>>> think
>>>>>>>> the overall feeling is that we need to make the code base compatible
>>>>>>>> with
>>>>>>>> the latest version of HDP, automate deployments, increase test
>>>>>>>> coverage,
>>>>>>>> and start working on a new UI.  There may be more significant
>>>>>>>> architectural changes to the code base, but we need to get the
>>>>>>>> essential
>>>>>>>> items knocked out before we go there.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> James
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>>>> 
>>>>>>>>> Hi Folks,
>>>>>>>>> 
>>>>>>>>> I just joined the list and thought I'd say "hi". I work at
>>>>>>>>> Rackspace
>>>>>>>>> and
>>>>>>>>> will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>>>>> This
>>>>>>>>> is my first ASF project and I'm looking forward to being part of
>>>>>>>>> this
>>>>>>>>> community.
>>>>>>>>> 
>>>>>>>>> I'm curious what the development vision is for metron. What do
>>>>>>>>> people
>>>>>>>>> like and not like about the codebase? I gather this code
>>>>>>>>> transitioned
>>>>>>>>> from a Cisco internal project and is incubating now at the ASF. Are
>>>>>>>>> there any code changes that need to be made to support incubation?
>>>>>>>>> 
>>>>>>>>> Bryan
>

Re: Hello

Posted by Bryan Taylor <bt...@rackspace.com>.

Hi James,

It seems the solution to "stop the world while we rebuild the cache" is to
rebuild the new cache out of band in a separate thread/spout and then
inject a reference to it once it is complete.

Actually, I'm curious how updating the data works now with MySQL. The
"Setting Up GeoLite Data" page describes the initial load using a LOAD
DATA INFILE command in the MySQL shell, which would duplicate records if
it was run a second time.

I suppose you could delete and load all the new rows in one big atomic
transaction, but there could be cute issues if you commit as you go during
the table data rebuild, since queries that happen along the way would be
hitting a blend of the two datasets. I've used partition swapping for this
with some databases, but I'm not sure if MySQL supports that feature. A
similar idea is to load the new data into a completely new table, but
expose it via a view and then recompile the view when the new table is
done. This has to get a lock on the view, which will stop the world while
the view definition changes, but that's a pretty short duration.

Bryan



On 12/9/15 6:20 PM, "James Sirota" <js...@hortonworks.com> wrote:

>So the nature of the problem was that as we were processing ~1.3 million
>of messages per second the time it took for the in-memory DB to update
>caused the Storm tuples to back up to a point where this would bring down
>the topology.  We also had problems during initialization.  I don’t know
>if this feature exists now, but at the time we couldn’t figure out a way
>to have the topology deploy and wait for all the instances of Geo bolt to
>finish reading their data and signal back that they were ready.  So at
>initialization they would get blasted with tuples and fall over.  We
>solved that problem at the time by delaying our ingest 30 seconds to give
>the topology a chance to fully come up.  But eventually we decided we
>needed to simplify things so we abandoned the in-memory route.
>
>Thanks,
>James    
>
>
>
>On 12/9/15, 5:35 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>
>>The GeoLite site says they update once a month, so I assume something can
>>check for this and grab the new file. It seems like a fun problem to have
>>this also trigger a rebuild of the in-memory cache and swap it out live.
>>This seems like it would be a useful streaming enrichment pattern, where
>>the configuration data for the enrichment changes.
>>
>>This does raise another interesting question about what do we expect the
>>memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
>>by itself, but when is it worth it? and how do operators take advantage
>>of
>>more system memory if they have it.
>>
>>
>>On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>
>>>Hi Bryan,
>>>
>>>We had HSQLDB at one point, but we were struggling to make these bolts
>>>reliable.  Also, the geo data needs to be periodically updated and it¹s
>>>easier to do when it¹s decoupled.
>>>
>>>Thanks,
>>>James
>>>
>>>
>>>
>>>On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>
>>>>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>>>>with our email folks]
>>>>
>>>>Nice.
>>>>
>>>>I was just looking around at the geotagging enrichment adapter. The
>>>>city
>>>>data is split between the two files of 70Mb and 40Mb sizes. It seems
>>>>like
>>>>the data is small enough to just load it all into memory. This would
>>>>eliminate two SQL queries for every event.
>>>>
>>>>Bryan
>>>>
>>>>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>>>
>>>>>Hi Bryan,
>>>>>
>>>>>For automation, B23 is planning to contribute Ansible scripts to
>>>>>deploy
>>>>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>>>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC
>>>>>UI
>>>>>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>>>>database for geotagging. We template the OpenSOC_Config files so that
>>>>>we
>>>>>can use variable injection for the different services: Zookeeper,
>>>>>Hbase,
>>>>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>>>different, but I think this will be a really good start.
>>>>>
>>>>>I'm in the process of decoupling the scripts from our internal
>>>>>tooling -
>>>>>I should be able to make available the Ambari stuff later this week.
>>>>>Once
>>>>>we merge the disparate forks of the Cisco codebase, there will be some
>>>>>work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>>>not
>>>>>a ton.
>>>>>
>>>>>Mark
>>>>>
>>>>>
>>>>>
>>>>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>
>>>>>>
>>>>>>Seems like a good list. I'm probably not your UI guy, but I'll be
>>>>>>happy
>>>>>>to
>>>>>>help with any of the others. Test coverage is probably a good place
>>>>>>for
>>>>>>me
>>>>>>to start learning my way around. Do we have CI set up? I see a travis
>>>>>>file
>>>>>>in the code.
>>>>>>
>>>>>>Regarding automation deployments, are we targeting Ambari or
>>>>>>something
>>>>>>else? 
>>>>>>
>>>>>>On the hadoop component compatibility, I see from the opens-streaming
>>>>>>pom
>>>>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>>>0.98.0-hadoop2. These are all several iterations old. How aggressive
>>>>>>do
>>>>>>we
>>>>>>want to be, generally, with tracking new releases? Are Hive, Flume,
>>>>>>and
>>>>>>Spark also going to up rev for us?
>>>>>>
>>>>>>Bryan
>>>>>>
>>>>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>>>
>>>>>>>Hi Brian,
>>>>>>>
>>>>>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>>>>>code
>>>>>>>base that the community will contribute back shortly.  We are
>>>>>>>waiting
>>>>>>>for
>>>>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>>>>think
>>>>>>>the overall feeling is that we need to make the code base compatible
>>>>>>>with
>>>>>>>the latest version of HDP, automate deployments, increase test
>>>>>>>coverage,
>>>>>>>and start working on a new UI.  There may be more significant
>>>>>>>architectural changes to the code base, but we need to get the
>>>>>>>essential
>>>>>>>items knocked out before we go there.
>>>>>>>
>>>>>>>Thanks,
>>>>>>>James
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>>
>>>>>>>>Hi Folks,
>>>>>>>>
>>>>>>>>I just joined the list and thought I'd say "hi". I work at
>>>>>>>>Rackspace
>>>>>>>>and
>>>>>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>>>>This
>>>>>>>>is my first ASF project and I'm looking forward to being part of
>>>>>>>>this
>>>>>>>>community.
>>>>>>>>
>>>>>>>>I'm curious what the development vision is for metron. What do
>>>>>>>>people
>>>>>>>>like and not like about the codebase? I gather this code
>>>>>>>>transitioned
>>>>>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>>>>>there any code changes that need to be made to support incubation?
>>>>>>>>
>>>>>>>>Bryan
>>>>>>
>>>>
>>>>
>>
>>

Re: Hello

Posted by James Sirota <js...@hortonworks.com>.

So the nature of the problem was that as we were processing ~1.3 million of messages per second the time it took for the in-memory DB to update caused the Storm tuples to back up to a point where this would bring down the topology.  We also had problems during initialization.  I don’t know if this feature exists now, but at the time we couldn’t figure out a way to have the topology deploy and wait for all the instances of Geo bolt to finish reading their data and signal back that they were ready.  So at initialization they would get blasted with tuples and fall over.  We solved that problem at the time by delaying our ingest 30 seconds to give the topology a chance to fully come up.  But eventually we decided we needed to simplify things so we abandoned the in-memory route.  

Thanks,
James    



On 12/9/15, 5:35 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>
>The GeoLite site says they update once a month, so I assume something can
>check for this and grab the new file. It seems like a fun problem to have
>this also trigger a rebuild of the in-memory cache and swap it out live.
>This seems like it would be a useful streaming enrichment pattern, where
>the configuration data for the enrichment changes.
>
>This does raise another interesting question about what do we expect the
>memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
>by itself, but when is it worth it? and how do operators take advantage of
>more system memory if they have it.
>
>
>On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:
>
>>Hi Bryan,
>>
>>We had HSQLDB at one point, but we were struggling to make these bolts
>>reliable.  Also, the geo data needs to be periodically updated and it¹s
>>easier to do when it¹s decoupled.
>>
>>Thanks,
>>James
>>
>>
>>
>>On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>
>>>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>>>with our email folks]
>>>
>>>Nice.
>>>
>>>I was just looking around at the geotagging enrichment adapter. The city
>>>data is split between the two files of 70Mb and 40Mb sizes. It seems like
>>>the data is small enough to just load it all into memory. This would
>>>eliminate two SQL queries for every event.
>>>
>>>Bryan
>>>
>>>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>>
>>>>Hi Bryan,
>>>>
>>>>For automation, B23 is planning to contribute Ansible scripts to deploy
>>>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC
>>>>UI
>>>>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>>>database for geotagging. We template the OpenSOC_Config files so that we
>>>>can use variable injection for the different services: Zookeeper, Hbase,
>>>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>>different, but I think this will be a really good start.
>>>>
>>>>I'm in the process of decoupling the scripts from our internal tooling -
>>>>I should be able to make available the Ambari stuff later this week.
>>>>Once
>>>>we merge the disparate forks of the Cisco codebase, there will be some
>>>>work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>>not
>>>>a ton.
>>>>
>>>>Mark
>>>>
>>>>
>>>>
>>>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>
>>>>>
>>>>>Seems like a good list. I'm probably not your UI guy, but I'll be happy
>>>>>to
>>>>>help with any of the others. Test coverage is probably a good place for
>>>>>me
>>>>>to start learning my way around. Do we have CI set up? I see a travis
>>>>>file
>>>>>in the code.
>>>>>
>>>>>Regarding automation deployments, are we targeting Ambari or something
>>>>>else? 
>>>>>
>>>>>On the hadoop component compatibility, I see from the opens-streaming
>>>>>pom
>>>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>>0.98.0-hadoop2. These are all several iterations old. How aggressive do
>>>>>we
>>>>>want to be, generally, with tracking new releases? Are Hive, Flume, and
>>>>>Spark also going to up rev for us?
>>>>>
>>>>>Bryan
>>>>>
>>>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>>
>>>>>>Hi Brian,
>>>>>>
>>>>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>>>>code
>>>>>>base that the community will contribute back shortly.  We are waiting
>>>>>>for
>>>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>>>think
>>>>>>the overall feeling is that we need to make the code base compatible
>>>>>>with
>>>>>>the latest version of HDP, automate deployments, increase test
>>>>>>coverage,
>>>>>>and start working on a new UI.  There may be more significant
>>>>>>architectural changes to the code base, but we need to get the
>>>>>>essential
>>>>>>items knocked out before we go there.
>>>>>>
>>>>>>Thanks,
>>>>>>James
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>>
>>>>>>>Hi Folks,
>>>>>>>
>>>>>>>I just joined the list and thought I'd say "hi". I work at Rackspace
>>>>>>>and
>>>>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>>>This
>>>>>>>is my first ASF project and I'm looking forward to being part of this
>>>>>>>community.
>>>>>>>
>>>>>>>I'm curious what the development vision is for metron. What do people
>>>>>>>like and not like about the codebase? I gather this code transitioned
>>>>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>>>>there any code changes that need to be made to support incubation?
>>>>>>>
>>>>>>>Bryan
>>>>>
>>>
>>>
>
>

Re: Hello

Posted by Bryan Taylor <bt...@rackspace.com>.

The GeoLite site says they update once a month, so I assume something can
check for this and grab the new file. It seems like a fun problem to have
this also trigger a rebuild of the in-memory cache and swap it out live.
This seems like it would be a useful streaming enrichment pattern, where
the configuration data for the enrichment changes.

This does raise another interesting question about what do we expect the
memory profile of the steam processing to be. 70Mb + 40Mb isn't that big
by itself, but when is it worth it? and how do operators take advantage of
more system memory if they have it.


On 12/9/15 4:32 PM, "James Sirota" <js...@hortonworks.com> wrote:

>Hi Bryan,
>
>We had HSQLDB at one point, but we were struggling to make these bolts
>reliable.  Also, the geo data needs to be periodically updated and it¹s
>easier to do when it¹s decoupled.
>
>Thanks,
>James
>
>
>
>On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>>with our email folks]
>>
>>Nice.
>>
>>I was just looking around at the geotagging enrichment adapter. The city
>>data is split between the two files of 70Mb and 40Mb sizes. It seems like
>>the data is small enough to just load it all into memory. This would
>>eliminate two SQL queries for every event.
>>
>>Bryan
>>
>>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>>
>>>Hi Bryan,
>>>
>>>For automation, B23 is planning to contribute Ansible scripts to deploy
>>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC
>>>UI
>>>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>>database for geotagging. We template the OpenSOC_Config files so that we
>>>can use variable injection for the different services: Zookeeper, Hbase,
>>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>>different, but I think this will be a really good start.
>>>
>>>I'm in the process of decoupling the scripts from our internal tooling -
>>>I should be able to make available the Ambari stuff later this week.
>>>Once
>>>we merge the disparate forks of the Cisco codebase, there will be some
>>>work to bring the playbooks up to date (i.e., use of Storm Flux), but
>>>not
>>>a ton.
>>>
>>>Mark
>>>
>>>
>>>
>>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>
>>>>
>>>>Seems like a good list. I'm probably not your UI guy, but I'll be happy
>>>>to
>>>>help with any of the others. Test coverage is probably a good place for
>>>>me
>>>>to start learning my way around. Do we have CI set up? I see a travis
>>>>file
>>>>in the code.
>>>>
>>>>Regarding automation deployments, are we targeting Ambari or something
>>>>else? 
>>>>
>>>>On the hadoop component compatibility, I see from the opens-streaming
>>>>pom
>>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>>0.98.0-hadoop2. These are all several iterations old. How aggressive do
>>>>we
>>>>want to be, generally, with tracking new releases? Are Hive, Flume, and
>>>>Spark also going to up rev for us?
>>>>
>>>>Bryan
>>>>
>>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>>
>>>>>Hi Brian,
>>>>>
>>>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>>>code
>>>>>base that the community will contribute back shortly.  We are waiting
>>>>>for
>>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>>think
>>>>>the overall feeling is that we need to make the code base compatible
>>>>>with
>>>>>the latest version of HDP, automate deployments, increase test
>>>>>coverage,
>>>>>and start working on a new UI.  There may be more significant
>>>>>architectural changes to the code base, but we need to get the
>>>>>essential
>>>>>items knocked out before we go there.
>>>>>
>>>>>Thanks,
>>>>>James
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>>
>>>>>>Hi Folks,
>>>>>>
>>>>>>I just joined the list and thought I'd say "hi". I work at Rackspace
>>>>>>and
>>>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>>This
>>>>>>is my first ASF project and I'm looking forward to being part of this
>>>>>>community.
>>>>>>
>>>>>>I'm curious what the development vision is for metron. What do people
>>>>>>like and not like about the codebase? I gather this code transitioned
>>>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>>>there any code changes that need to be made to support incubation?
>>>>>>
>>>>>>Bryan
>>>>
>>
>>

Re: Hello

Posted by James Sirota <js...@hortonworks.com>.

Hi Bryan,

We had HSQLDB at one point, but we were struggling to make these bolts reliable.  Also, the geo data needs to be periodically updated and it’s easier to do when it’s decoupled.  

Thanks,
James



On 12/9/15, 4:26 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
>with our email folks]
>
>Nice.
>
>I was just looking around at the geotagging enrichment adapter. The city
>data is split between the two files of 70Mb and 40Mb sizes. It seems like
>the data is small enough to just load it all into memory. This would
>eliminate two SQL queries for every event.
>
>Bryan
>
>On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:
>
>>Hi Bryan,
>>
>>For automation, B23 is planning to contribute Ansible scripts to deploy
>>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC UI
>>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>>database for geotagging. We template the OpenSOC_Config files so that we
>>can use variable injection for the different services: Zookeeper, Hbase,
>>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>>different, but I think this will be a really good start.
>>
>>I'm in the process of decoupling the scripts from our internal tooling -
>>I should be able to make available the Ambari stuff later this week. Once
>>we merge the disparate forks of the Cisco codebase, there will be some
>>work to bring the playbooks up to date (i.e., use of Storm Flux), but not
>>a ton.
>>
>>Mark
>>
>>
>>
>>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>
>>>
>>>Seems like a good list. I'm probably not your UI guy, but I'll be happy
>>>to
>>>help with any of the others. Test coverage is probably a good place for
>>>me
>>>to start learning my way around. Do we have CI set up? I see a travis
>>>file
>>>in the code.
>>>
>>>Regarding automation deployments, are we targeting Ambari or something
>>>else? 
>>>
>>>On the hadoop component compatibility, I see from the opens-streaming pom
>>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>>0.98.0-hadoop2. These are all several iterations old. How aggressive do
>>>we
>>>want to be, generally, with tracking new releases? Are Hive, Flume, and
>>>Spark also going to up rev for us?
>>>
>>>Bryan
>>>
>>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>>
>>>>Hi Brian,
>>>>
>>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>>code
>>>>base that the community will contribute back shortly.  We are waiting
>>>>for
>>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>>think
>>>>the overall feeling is that we need to make the code base compatible
>>>>with
>>>>the latest version of HDP, automate deployments, increase test coverage,
>>>>and start working on a new UI.  There may be more significant
>>>>architectural changes to the code base, but we need to get the essential
>>>>items knocked out before we go there.
>>>>
>>>>Thanks,
>>>>James
>>>>
>>>>
>>>>
>>>>
>>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>>
>>>>>Hi Folks,
>>>>>
>>>>>I just joined the list and thought I'd say "hi". I work at Rackspace
>>>>>and
>>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>>This
>>>>>is my first ASF project and I'm looking forward to being part of this
>>>>>community.
>>>>>
>>>>>I'm curious what the development vision is for metron. What do people
>>>>>like and not like about the codebase? I gather this code transitioned
>>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>>there any code changes that need to be made to support incubation?
>>>>>
>>>>>Bryan
>>>
>
>

Re: Hello

Posted by Bryan Taylor <bt...@rackspace.com>.

[Sorry about the stupid COMMERCIAL: tag being added - I'm trying to fix
with our email folks]

Nice.

I was just looking around at the geotagging enrichment adapter. The city
data is split between the two files of 70Mb and 40Mb sizes. It seems like
the data is small enough to just load it all into memory. This would
eliminate two SQL queries for every event.

Bryan

On 12/9/15 2:14 PM, "Mark Bittmann" <ma...@b23.io> wrote:

>Hi Bryan,
>
>For automation, B23 is planning to contribute Ansible scripts to deploy
>the Metron stack. The playbooks use Ambari blueprints for the hadoop
>ecosystem. We also install Elasticsearch, configure the legacy OpenSOC UI
>(based on Kibana/nodejs), create directories in hdfs, populate a MySQL
>database for geotagging. We template the OpenSOC_Config files so that we
>can use variable injection for the different services: Zookeeper, Hbase,
>Elasticsearch, MySQL, etc. Everyone's deployment might be slightly
>different, but I think this will be a really good start.
>
>I'm in the process of decoupling the scripts from our internal tooling -
>I should be able to make available the Ambari stuff later this week. Once
>we merge the disparate forks of the Cisco codebase, there will be some
>work to bring the playbooks up to date (i.e., use of Storm Flux), but not
>a ton.
>
>Mark
>
>
>
>On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>
>>Seems like a good list. I'm probably not your UI guy, but I'll be happy
>>to
>>help with any of the others. Test coverage is probably a good place for
>>me
>>to start learning my way around. Do we have CI set up? I see a travis
>>file
>>in the code.
>>
>>Regarding automation deployments, are we targeting Ambari or something
>>else? 
>>
>>On the hadoop component compatibility, I see from the opens-streaming pom
>>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>>0.98.0-hadoop2. These are all several iterations old. How aggressive do
>>we
>>want to be, generally, with tracking new releases? Are Hive, Flume, and
>>Spark also going to up rev for us?
>>
>>Bryan
>>
>>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>>
>>>Hi Brian,
>>>
>>>Welcome.  Glad to have you contribute.  There will be changes to the
>>>code
>>>base that the community will contribute back shortly.  We are waiting
>>>for
>>>the Jira to be setup so the backlog can be created and voted on.  I
>>>think
>>>the overall feeling is that we need to make the code base compatible
>>>with
>>>the latest version of HDP, automate deployments, increase test coverage,
>>>and start working on a new UI.  There may be more significant
>>>architectural changes to the code base, but we need to get the essential
>>>items knocked out before we go there.
>>>
>>>Thanks,
>>>James
>>>
>>>
>>>
>>>
>>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>>
>>>>Hi Folks,
>>>>
>>>>I just joined the list and thought I'd say "hi". I work at Rackspace
>>>>and
>>>>will be joining Andrew Hartnett's team there and hacking on metron.
>>>>This
>>>>is my first ASF project and I'm looking forward to being part of this
>>>>community.
>>>>
>>>>I'm curious what the development vision is for metron. What do people
>>>>like and not like about the codebase? I gather this code transitioned
>>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>>there any code changes that need to be made to support incubation?
>>>>
>>>>Bryan
>>

Re: Hello

Posted by Mark Bittmann <ma...@b23.io>.

Hi Bryan,

For automation, B23 is planning to contribute Ansible scripts to deploy the Metron stack. The playbooks use Ambari blueprints for the hadoop ecosystem. We also install Elasticsearch, configure the legacy OpenSOC UI (based on Kibana/nodejs), create directories in hdfs, populate a MySQL database for geotagging. We template the OpenSOC_Config files so that we can use variable injection for the different services: Zookeeper, Hbase, Elasticsearch, MySQL, etc. Everyone's deployment might be slightly different, but I think this will be a really good start. 

I'm in the process of decoupling the scripts from our internal tooling - I should be able to make available the Ambari stuff later this week. Once we merge the disparate forks of the Cisco codebase, there will be some work to bring the playbooks up to date (i.e., use of Storm Flux), but not a ton.

Mark

On 12/9/15, 2:48 PM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>
>Seems like a good list. I'm probably not your UI guy, but I'll be happy to
>help with any of the others. Test coverage is probably a good place for me
>to start learning my way around. Do we have CI set up? I see a travis file
>in the code.
>
>Regarding automation deployments, are we targeting Ambari or something
>else? 
>
>On the hadoop component compatibility, I see from the opens-streaming pom
>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>0.98.0-hadoop2. These are all several iterations old. How aggressive do we
>want to be, generally, with tracking new releases? Are Hive, Flume, and
>Spark also going to up rev for us?
>
>Bryan
>
>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>
>>Hi Brian,
>>
>>Welcome.  Glad to have you contribute.  There will be changes to the code
>>base that the community will contribute back shortly.  We are waiting for
>>the Jira to be setup so the backlog can be created and voted on.  I think
>>the overall feeling is that we need to make the code base compatible with
>>the latest version of HDP, automate deployments, increase test coverage,
>>and start working on a new UI.  There may be more significant
>>architectural changes to the code base, but we need to get the essential
>>items knocked out before we go there.
>>
>>Thanks,
>>James
>>
>>
>>
>>
>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>
>>>Hi Folks,
>>>
>>>I just joined the list and thought I'd say "hi". I work at Rackspace and
>>>will be joining Andrew Hartnett's team there and hacking on metron. This
>>>is my first ASF project and I'm looking forward to being part of this
>>>community.
>>>
>>>I'm curious what the development vision is for metron. What do people
>>>like and not like about the codebase? I gather this code transitioned
>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>there any code changes that need to be made to support incubation?
>>>
>>>Bryan
>

Re: Hello

Posted by Sheetal Dolas <sh...@hortonworks.com>.

That’s interesting question and we will have to think from feature/performance perspective as well as what versions of platforms do we want to support (and this could be dependent on immediate users of Metron)

Since we are just starting, its good opportunity to think and make better choice. Changing versions later becomes difficult.

I am inlined towards using newer (not necessarily the latest) stable releases for these reasons

1. There have been bunch of performance and stability improvements gone in almost all components recently (Storm, HDFS, HBase etc), and performance could be critical for scale
2. Most of hadoop projects try to be backward API compatible, so if someone needs to make it work with older versions, it may not be too painful
3. There are new features such as hive steaming ingest, that would be very helpful for both performance as well as hive analytics
4. Since Metron depends on many different platform components, its going to be as stable and reliable as the weakest component it. So newer stable release would make it better.

~ Sheetal



On 12/9/15, 11:48 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>
>Seems like a good list. I'm probably not your UI guy, but I'll be happy to
>help with any of the others. Test coverage is probably a good place for me
>to start learning my way around. Do we have CI set up? I see a travis file
>in the code.
>
>Regarding automation deployments, are we targeting Ambari or something
>else? 
>
>On the hadoop component compatibility, I see from the opens-streaming pom
>that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
>0.98.0-hadoop2. These are all several iterations old. How aggressive do we
>want to be, generally, with tracking new releases? Are Hive, Flume, and
>Spark also going to up rev for us?
>
>Bryan
>
>On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:
>
>>Hi Brian,
>>
>>Welcome.  Glad to have you contribute.  There will be changes to the code
>>base that the community will contribute back shortly.  We are waiting for
>>the Jira to be setup so the backlog can be created and voted on.  I think
>>the overall feeling is that we need to make the code base compatible with
>>the latest version of HDP, automate deployments, increase test coverage,
>>and start working on a new UI.  There may be more significant
>>architectural changes to the code base, but we need to get the essential
>>items knocked out before we go there.
>>
>>Thanks,
>>James
>>
>>
>>
>>
>>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>>
>>>Hi Folks,
>>>
>>>I just joined the list and thought I'd say "hi". I work at Rackspace and
>>>will be joining Andrew Hartnett's team there and hacking on metron. This
>>>is my first ASF project and I'm looking forward to being part of this
>>>community.
>>>
>>>I'm curious what the development vision is for metron. What do people
>>>like and not like about the codebase? I gather this code transitioned
>>>from a Cisco internal project and is incubating now at the ASF. Are
>>>there any code changes that need to be made to support incubation?
>>>
>>>Bryan
>
>

Re: Hello

Posted by Bryan Taylor <bt...@rackspace.com>.

Seems like a good list. I'm probably not your UI guy, but I'll be happy to
help with any of the others. Test coverage is probably a good place for me
to start learning my way around. Do we have CI set up? I see a travis file
in the code.

Regarding automation deployments, are we targeting Ambari or something
else? 

On the hadoop component compatibility, I see from the opens-streaming pom
that we are using Storm-0.9.2, Kafka 0.8.0, Hadoop 2.2.0, and HBase
0.98.0-hadoop2. These are all several iterations old. How aggressive do we
want to be, generally, with tracking new releases? Are Hive, Flume, and
Spark also going to up rev for us?

Bryan

On 12/9/15 12:22 PM, "James Sirota" <js...@hortonworks.com> wrote:

>Hi Brian,
>
>Welcome.  Glad to have you contribute.  There will be changes to the code
>base that the community will contribute back shortly.  We are waiting for
>the Jira to be setup so the backlog can be created and voted on.  I think
>the overall feeling is that we need to make the code base compatible with
>the latest version of HDP, automate deployments, increase test coverage,
>and start working on a new UI.  There may be more significant
>architectural changes to the code base, but we need to get the essential
>items knocked out before we go there.
>
>Thanks,
>James
>
>
>
>
>On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
>>Hi Folks,
>>
>>I just joined the list and thought I'd say "hi". I work at Rackspace and
>>will be joining Andrew Hartnett's team there and hacking on metron. This
>>is my first ASF project and I'm looking forward to being part of this
>>community.
>>
>>I'm curious what the development vision is for metron. What do people
>>like and not like about the codebase? I gather this code transitioned
>>from a Cisco internal project and is incubating now at the ASF. Are
>>there any code changes that need to be made to support incubation?
>>
>>Bryan

Re: Hello

Posted by larry mccay <la...@gmail.com>.

Welcome aboard, Bryan!

James has articulated the immediate goals rather well - we do need to
target the latest Apache releases of Hadoop in this project however.
Specific distributions will be testing and certifying in their own special
ways. :)
Since HDP is a company specific distribution we will only discuss Apache
Hadoop compatibility here.


On Wed, Dec 9, 2015 at 1:22 PM, James Sirota <js...@hortonworks.com>
wrote:

> Hi Brian,
>
> Welcome.  Glad to have you contribute.  There will be changes to the code
> base that the community will contribute back shortly.  We are waiting for
> the Jira to be setup so the backlog can be created and voted on.  I think
> the overall feeling is that we need to make the code base compatible with
> the latest version of HDP, automate deployments, increase test coverage,
> and start working on a new UI.  There may be more significant architectural
> changes to the code base, but we need to get the essential items knocked
> out before we go there.
>
> Thanks,
> James
>
>
>
>
> On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:
>
> >Hi Folks,
> >
> >I just joined the list and thought I'd say "hi". I work at Rackspace and
> will be joining Andrew Hartnett's team there and hacking on metron. This is
> my first ASF project and I'm looking forward to being part of this
> community.
> >
> >I'm curious what the development vision is for metron. What do people
> like and not like about the codebase? I gather this code transitioned from
> a Cisco internal project and is incubating now at the ASF. Are there any
> code changes that need to be made to support incubation?
> >
> >Bryan
>

Re: Hello

Posted by James Sirota <js...@hortonworks.com>.

Hi Brian,

Welcome.  Glad to have you contribute.  There will be changes to the code base that the community will contribute back shortly.  We are waiting for the Jira to be setup so the backlog can be created and voted on.  I think the overall feeling is that we need to make the code base compatible with the latest version of HDP, automate deployments, increase test coverage, and start working on a new UI.  There may be more significant architectural changes to the code base, but we need to get the essential items knocked out before we go there.  

Thanks,
James

On 12/9/15, 11:52 AM, "Bryan Taylor" <bt...@rackspace.com> wrote:

>Hi Folks,
>
>I just joined the list and thought I'd say "hi". I work at Rackspace and will be joining Andrew Hartnett's team there and hacking on metron. This is my first ASF project and I'm looking forward to being part of this community.
>
>I'm curious what the development vision is for metron. What do people like and not like about the codebase? I gather this code transitioned from a Cisco internal project and is incubating now at the ASF. Are there any code changes that need to be made to support incubation?
>
>Bryan