You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ke...@thomsonreuters.com on 2011/03/24 20:22:26 UTC

How do I split input on fixed length keys

I'm using hadoop streaming and currently have these properties in my
command line: 
-Dstream.map.output.field.separator=' ' \ 
-Dstream.num.map.output.key.fields=1 \ 

This works for me as my test data happens to have a space at column 14.
If I want to use a fixed length split, is there a simple cut function I
could use like undefining the separator and counting 13 bytes? 
-Dstream.map.output.field.separator= \ 
-Dstream.num.map.output.key.fields=13 \ 

I have searched the forum for discussions on fixed length or splitting
keys but have not found my answer. Perhaps this is not possible, at
least on the command line? 

Thanks,
Kevin


Re: How do I split input on fixed length keys

Posted by elton sky <el...@gmail.com>.
Agree with Harsh,

I think you need to write your own RecordRead.

On Tue, Apr 5, 2011 at 3:37 PM, Harsh Chouraria <ha...@cloudera.com> wrote:

> Hello Kevin,
>
> On Fri, Mar 25, 2011 at 12:52 AM,  <Ke...@thomsonreuters.com> wrote:
> > -Dstream.map.output.field.separator= \
> > -Dstream.num.map.output.key.fields=13 \
> >
> > I have searched the forum for discussions on fixed length or splitting
> > keys but have not found my answer. Perhaps this is not possible, at
> > least on the command line?
>
> I'm not aware of any streaming provided functionality that gives you
> this support. Your mapper code will have to achieve this on its own
> before emitting, I think (Or your InputFormat can do it at read time,
> perhaps).
>
> --
> Harsh J
> Support Engineer, Cloudera
>

Re: How do I split input on fixed length keys

Posted by Harsh Chouraria <ha...@cloudera.com>.
Hello Kevin,

On Fri, Mar 25, 2011 at 12:52 AM,  <Ke...@thomsonreuters.com> wrote:
> -Dstream.map.output.field.separator= \
> -Dstream.num.map.output.key.fields=13 \
>
> I have searched the forum for discussions on fixed length or splitting
> keys but have not found my answer. Perhaps this is not possible, at
> least on the command line?

I'm not aware of any streaming provided functionality that gives you
this support. Your mapper code will have to achieve this on its own
before emitting, I think (Or your InputFormat can do it at read time,
perhaps).

-- 
Harsh J
Support Engineer, Cloudera