You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@metron.apache.org by ni...@apache.org on 2018/08/31 19:20:34 UTC
[38/50] [abbrv] metron git commit: METRON-1737: Document Job cleanup (merrimanr via mmiklavc) closes apache/metron#1164

METRON-1737: Document Job cleanup (merrimanr via mmiklavc) closes apache/metron#1164


Project: http://git-wip-us.apache.org/repos/asf/metron/repo
Commit: http://git-wip-us.apache.org/repos/asf/metron/commit/6b70571d
Tree: http://git-wip-us.apache.org/repos/asf/metron/tree/6b70571d
Diff: http://git-wip-us.apache.org/repos/asf/metron/diff/6b70571d

Branch: refs/remotes/apache/feature/METRON-1699-create-batch-profiler
Commit: 6b70571d6de3951c98269bbf5b38e8b69deddfab
Parents: d9e1f38
Author: merrimanr <me...@gmail.com>
Authored: Wed Aug 15 16:00:13 2018 -0600
Committer: Michael Miklavcic <mi...@gmail.com>
Committed: Wed Aug 15 16:00:13 2018 -0600

----------------------------------------------------------------------
 metron-interface/metron-rest/README.md | 11 +++++++++++
 1 file changed, 11 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/metron/blob/6b70571d/metron-interface/metron-rest/README.md
----------------------------------------------------------------------
diff --git a/metron-interface/metron-rest/README.md b/metron-interface/metron-rest/README.md
index 080422d..2c216d1 100644
--- a/metron-interface/metron-rest/README.md
+++ b/metron-interface/metron-rest/README.md
@@ -222,6 +222,17 @@ Out of the box it is a simple wrapper around the tshark command to transform raw
 REST will supply the script with raw pcap data through standard in and expects PDML data serialized as XML.
 
 Pcap query jobs can be configured for submission to a YARN queue.  This setting is exposed as the Spring property `pcap.yarn.queue`.  If configured, the REST application will set the `mapreduce.job.queuename` Hadoop property to that value.
+It is highly recommended that a dedicated YARN queue be created and configured for Pcap queries to prevent a job from consuming too many cluster resources.  More information about setting up YARN queues can be found [here](https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html#Setting_up_queues).
+
+Pcap query results are stored in HDFS.  The location of query results when run through the REST app is determined by a couple factors.  The root of Pcap query results defaults to `/apps/metron/pcap/output` but can be changed with the 
+Spring property `pcap.final.output.path`.  Assuming the default Pcap query output directory, the path to a result page will follow this pattern:
+```
+/apps/metron/pcap/output/{username}/MAP_REDUCE/{job id}/page-{page number}.pcap
+```
+Over time Pcap query results will accumulate in HDFS.  Currently these results are not cleaned up automatically so cluster administrators should be aware of this and monitor them.  It is highly recommended that a process be put in place to 
+periodically delete files and directories under the Pcap query results root.
+
+Users should also be mindful of date ranges used in queries so they don't produce result sets that are too large.  Currently there are no limits enforced on date ranges.
 
 Queries can also be configured on a global level for setting the number of results per page via a Spring property `pcap.page.size`. By default, this value is set to 10 pcaps per page, but you may choose to set this value higher
 based on observing frequenetly-run query result sizes. This setting works in conjunction with the property for setting finalizer threadpool size when optimizing query performance.