You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by ha...@apache.org on 2015/03/25 10:10:35 UTC
hadoop git commit: MAPREDUCE-579. Streaming slowmatch documentation.
Repository: hadoop
Updated Branches:
refs/heads/branch-2 ee824cafe -> d85c14afb
MAPREDUCE-579. Streaming slowmatch documentation.
(cherry picked from commit a2e42d2deee715f6255d6fd2c95f34e80888dc5f)
Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/d85c14af
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/d85c14af
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/d85c14af
Branch: refs/heads/branch-2
Commit: d85c14afbfbf31028a7f253e0fa77c5ed3e88f7f
Parents: ee824ca
Author: Harsh J <ha...@cloudera.com>
Authored: Wed Mar 25 14:38:12 2015 +0530
Committer: Harsh J <ha...@cloudera.com>
Committed: Wed Mar 25 14:39:46 2015 +0530
----------------------------------------------------------------------
hadoop-mapreduce-project/CHANGES.txt | 2 ++
.../hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm | 7 +++++++
2 files changed, 9 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/hadoop/blob/d85c14af/hadoop-mapreduce-project/CHANGES.txt
----------------------------------------------------------------------
diff --git a/hadoop-mapreduce-project/CHANGES.txt b/hadoop-mapreduce-project/CHANGES.txt
index a2fcfbe..d913fe5 100644
--- a/hadoop-mapreduce-project/CHANGES.txt
+++ b/hadoop-mapreduce-project/CHANGES.txt
@@ -8,6 +8,8 @@ Release 2.8.0 - UNRELEASED
IMPROVEMENTS
+ MAPREDUCE-579. Streaming "slowmatch" documentation. (harsh)
+
MAPREDUCE-6287. Deprecated methods in org.apache.hadoop.examples.Sort
(Chao Zhang via harsh)
http://git-wip-us.apache.org/repos/asf/hadoop/blob/d85c14af/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
----------------------------------------------------------------------
diff --git a/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm b/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
index 179b1f0..a23d407 100644
--- a/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
+++ b/hadoop-tools/hadoop-streaming/src/site/markdown/HadoopStreaming.md.vm
@@ -546,6 +546,13 @@ You can use the record reader StreamXmlRecordReader to process XML documents.
Anything found between BEGIN\_STRING and END\_STRING would be treated as one record for map tasks.
+The name-value properties that StreamXmlRecordReader understands are:
+
+* (strings) 'begin' - Characters marking beginning of record, and 'end' - Characters marking end of record.
+* (boolean) 'slowmatch' - Toggle to look for begin and end characters, but within CDATA instead of regular tags. Defaults to false.
+* (integer) 'lookahead' - Maximum lookahead bytes to sync CDATA when using 'slowmatch', should be larger than 'maxrec'. Defaults to 2*'maxrec'.
+* (integer) 'maxrec' - Maximum record size to read between each match during 'slowmatch'. Defaults to 50000 bytes.
+
$H3 How do I update counters in streaming applications?
A streaming process can use the stderr to emit counter information. `reporter:counter:<group>,<counter>,<amount>` should be sent to stderr to update the counter.