You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@metron.apache.org by "Nick Allen (JIRA)" <ji...@apache.org> on 2016/09/30 15:57:21 UTC

[jira] [Commented] (METRON-477) Support lower fidelity retention of network traffic over time

    [ https://issues.apache.org/jira/browse/METRON-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15536335#comment-15536335 ] 

Nick Allen commented on METRON-477:
-----------------------------------

This could be a generic solution that handles aging of any data set.  The underlying idea here is that I want to keep as much high-fidelity data as I can.  But because I have limited resources, I am willing to trade-off fidelity to extend my retention.  

PCAP is definitely the driving use case here as no one can afford enough space to store as much pcap as they would like.

* As a user I define two or more Buckets.  
* A Bucket is just a set of similar data.
* The user defines the bucket capacity.  This could be raw size (1 TB) or this could be a % of my underlying data store (20% of available space).  
* To keep the bucket from exceeding capacity, the system will choose a subset of the data, apply a transform to that data and move the transformed data to another Bucket.  
* The user defines which data should be harvested first for a Bucket.  It could be the oldest data or the lowest priority data in that Bucket.
* The user defines the type of transform that should occur.  There would be a library of reusable transforms that a user can apply.  One transform would handle Raw Pcap -> Truncated Pcap, another might handle Truncated Pcap -> Daily Summary, another might compress textual data; Raw Text -> Gzip Compressed.  

Initially, the storage medium over which this works would likely be HDFS.  We could also attempt to make the underlying storage pluggable so this would work over HDFS, HBase, Elasticsearch indices, etc.  Data of a different age, fidelity might make sense in different storage mediums.  I may want to start with high-fidelity data that lives in Elasticsearch, but then as that data ages, it ages out to HDFS to reduce storage cost.


> Support lower fidelity retention of network traffic over time
> -------------------------------------------------------------
>
>                 Key: METRON-477
>                 URL: https://issues.apache.org/jira/browse/METRON-477
>             Project: Metron
>          Issue Type: Improvement
>            Reporter: Jon Zeolla
>
> Currently fastcapa supports full pcap capture.  I would like to see the ability to retain network traffic for longer periods of time but at increasing less fidelity.  
> For instance:
>  - Full PCAP is ingested and stored in bucket 1
>  - Transition "Full PCAP" to "Truncated PCAP" after bucket 1 hits X size, stored in bucket 2
>  - Transform the truncated PCAP into flows or daily summaries after bucket 2 hits X size, stored in bucket 3
> This system should be setup so that the transition jobs are highly configurable (as in sizes for each bucket, truncation cutoffs length, transition ordering, etc.).  In addition, both the full pcap and truncated pcap should be able to be retrieved using the same method (CLI, UI, etc.).  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)