You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Chinni, Ravi" <rc...@syncsort.com> on 2010/06/16 17:47:43 UTC

Changing default separator for streaming application

I am trying to develop a streaming MR application by implementing
korn-shell based mapper and reducer. I want to use 'space - x20' as the
separator between key and value throughout the application.

When invoking the application I specified  -D
stream.map.output.field.separator=" " -D
stream.reduce.output.field.seperator=" " options.



While in the output of my shell script I have a space between key and
value fields, the final output written by the framework to file has a
tab as the separator. It seems that the framework is replacing the space
separator by a tab separator in the output of mapper and reducer
functions.



If anyone has ideas on how I can fix this, please share it.



Thanks,

Ravi Chinni



_____________________________________________________________________________

ATTENTION:

The information contained in this message (including any files transmitted 
with this message) may contain proprietary, trade secret or other 
confidential and/or legally privileged information. Any pricing 
information contained in this message or in any files transmitted with 
this message is always confidential and cannot be shared with any third 
parties without prior written approval from Syncsort. This message is 
intended to be read only by the individual or entity to whom it is 
addressed or by their designee. If the reader of this message is not the 
intended recipient, you are on notice that any use, disclosure, copying or 
distribution of this message, in any form, is strictly prohibited. If you 
have received this message in error, please immediately notify the sender 
and/or Syncsort and destroy all copies of this message in your possession, 
custody or control.

RE: Changing default separator for streaming application

Posted by "Chinni, Ravi" <rc...@syncsort.com>.
Thanks. It helped. Not sure if this is documented anywhere on the hadoop
site.

 

One additional issue I am encountering:

I want the records from the reduce output to be '\r\n' terminated. Even
tough, I am putting a '\r\n' at the end of the value in my reduce script
function, the final output in the file has '\n'. Again it seems that the
framework is replacing '\r\n' by '\n'. Any ideas?

 

Ravi

 

 

From: Amareshwari Sri Ramadasu [mailto:amarsri@yahoo-inc.com] 
Sent: Thursday, June 17, 2010 12:26 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Changing default separator for streaming application

 

Final output is written by OutputFormat. By default, TextOutputFormat
will write \t as the key-value separator. You can specify a different
key-value separator for TextOutputFormat by specifying the value for
configuration property "mapred.textoutputformat.separator". Try setting
' ' for the configuration.

Thanks
Amareshwari

On 6/16/10 9:17 PM, "Chinni, Ravi" <rc...@syncsort.com> wrote:

I am trying to develop a streaming MR application by implementing
korn-shell based mapper and reducer. I want to use 'space - x20' as the
separator between key and value throughout the application.
When invoking the application I specified  -D
stream.map.output.field.separator=" " -D
stream.reduce.output.field.seperator=" " options.
 
While in the output of my shell script I have a space between key and
value fields, the final output written by the framework to file has a
tab as the separator. It seems that the framework is replacing the space
separator by a tab separator in the output of mapper and reducer
functions.
 
If anyone has ideas on how I can fix this, please share it.
 
Thanks,
Ravi Chinni

 
________________________________________________________________________
_____
 
ATTENTION:

The information contained in this message (including any files
transmitted with this message) may contain proprietary, trade secret or
other  confidential and/or legally privileged information. Any pricing
information contained in this message or in any files transmitted with
this message is always confidential and cannot be shared with any third
parties without prior written approval from Syncsort. This message is
intended to be read only by the individual or entity to whom it is
addressed or by their designee. If the reader of this message is not the
intended recipient, you are on notice that any use, disclosure, copying
or distribution of this message, in any form, is strictly prohibited. If
you have received this message in error, please immediately notify the
sender and/or Syncsort and destroy all copies of this message in your
possession, custody or control.


Re: Changing default separator for streaming application

Posted by Greg Roelofs <ro...@yahoo-inc.com>.
Ravi Chinni wrote:

> When invoking the application I specified  -D
> stream.map.output.field.separator=" " -D
> stream.reduce.output.field.seperator=" " options.
                             ^^^^^^^^^

> While in the output of my shell script I have a space between key and
> value fields, the final output written by the framework to file has a
> tab as the separator. It seems that the framework is replacing the space
> separator by a tab separator in the output of mapper and reducer
> functions.

Both of them, or just the reducer half?  A mismatch between the mapper's
output and the reducer's expected input format presumably would result in
completely bogus results, wouldn't it?  Or is the value of stream.map.output
.field.separator implicitly also the value of the reducer's input separator?
(Sorry, I'm still new to this stuff.)

> If anyone has ideas on how I can fix this, please share it.

Spell "separator" correctly? :-)

Greg

Re: Changing default separator for streaming application

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Final output is written by OutputFormat. By default, TextOutputFormat will write \t as the key-value separator. You can specify a different key-value separator for TextOutputFormat by specifying the value for configuration property "mapred.textoutputformat.separator". Try setting ' ' for the configuration.

Thanks
Amareshwari

On 6/16/10 9:17 PM, "Chinni, Ravi" <rc...@syncsort.com> wrote:

I am trying to develop a streaming MR application by implementing korn-shell based mapper and reducer. I want to use 'space - x20' as the separator between key and value throughout the application.
When invoking the application I specified  -D stream.map.output.field.separator=" " -D stream.reduce.output.field.seperator=" " options.

While in the output of my shell script I have a space between key and value fields, the final output written by the framework to file has a tab as the separator. It seems that the framework is replacing the space separator by a tab separator in the output of mapper and reducer functions.

If anyone has ideas on how I can fix this, please share it.

Thanks,
Ravi Chinni


_____________________________________________________________________________

ATTENTION:

The information contained in this message (including any files transmitted with this message) may contain proprietary, trade secret or other  confidential and/or legally privileged information. Any pricing information contained in this message or in any files transmitted with this message is always confidential and cannot be shared with any third parties without prior written approval from Syncsort. This message is intended to be read only by the individual or entity to whom it is addressed or by their designee. If the reader of this message is not the intended recipient, you are on notice that any use, disclosure, copying or distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please immediately notify the sender and/or Syncsort and destroy all copies of this message in your possession, custody or control.