You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@metron.apache.org by Nick Allen <ni...@nickallen.org> on 2016/09/23 15:47:16 UTC

[DISCUSS] How to Use Profiles for Training?

The Profiler currently persists data within an HBase table.  There are some
extension points that allow anyone to plug-in a different row key or column
structure [1].  The default implementation organize the data in what should
be a scalable form for most use cases.

We currently have functionality to retrieve Profile data via a Java API [2]
and a Stellar API [3].  The primary use case for both of these is model
scoring.  The Stellar API pairs nicely with the Metron MaaS functionality
or any model scoring that would be done on streaming data within Metron.

Q. How do I access this data for model training?

The default implementation, while scalable, means that it is very difficult
to pull the data from HBase using a generic HBase connector for a
third-party platform.  How can we make this data most easily accessible for
model training in Spark, R, etc?

*A1:* Create custom connectors for a variety of third-party platforms like
Spark, R, etc.


*A2:* Provide an alternate persistence layer for the Profiler.  This data
makes the most sense in a TSDB (Time Series Database).  It is much more
likely that third-party platforms will have connectors already available
for a TSDB like OpenTSDB or InfluxDB.

*A3: Something else that is way better*???



--
[1] See org.apache.metron.profiler.hbase.RowKeyBuilder, ColumnBuilder
[2] See org.apache.metron.profiler.client.hbase.HBaseProfilerClient
[3] See org.apache.metron.profiler.client.stellar.ProfileGet

-- 
Nick Allen <ni...@nickallen.org>