You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Bin Wang <bi...@gmail.com> on 2014/11/20 17:28:40 UTC

Hadoop Streaming Use Non Printing Character as Key/Value separator

Hi there,

I am writing a hadoop streaming job where certain columns contain natural
languages. In that case, use '\t' as the default delimiter is not a choice
for me.

Does anything know how to pass a non printing character, like SOH 'start of
header' as the key/value separator?

I tried to pass different versions of that to a hadoop command, which I put
into a shell script.


-D stream.map.output.field.separator=SOH
-D stream.map.output.field.separator=001
-D stream.map.output.field.separator=^A

And now of them is working.  Can anyone help me with that?

Best,

Bin