You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Something Something <ma...@gmail.com> on 2013/05/24 09:55:43 UTC

Reducer that outputs no key

Hello,

Trying to use Hadoop Streaming to create output that contains no key - just
value.

Here's what I am trying:

1)  Created IdentifierResolver as follows:

public class MyIdentifierResolver extends IdentifierResolver {

    public void resolve(String identifier) {
        System.out.println("Entered resolve with identifier: " +
identifier);
        super.resolve(identifier);
        if (identifier.equals("NullWritable")) {
            System.out.println("Setting output key class to NullWritable");
            setOutputKeyClass(NullWritable.class);
        }
    }


2)  Set the properties as follows:

-Dstream.io.identifier.resolver.class=com.my.package.MyIdentifierResolver \
-Dstream.map.output=NullWritable \
-Dstream.reduce.output=NullWritable


This should work right?  But it's still writing the 'key' in the output.
Is there a better way to do this in Hadoop?

Note:  Basically, we are trying to merge files (over 2000) into smaller
number of files (e.g. 500).  The files are too big so 'getmerge' does not
work 'cause we run into space issues.

Please help.  Thanks.

Re: Reducer that outputs no key

Posted by Something Something <ma...@gmail.com>.

You can ignore this for now.  I was able to get merging of files to work
under Hadoop Streaming by using the following 2 properties:

-mapper "cut -f2-"
-Dmapred.reduce.tasks=0


On Fri, May 24, 2013 at 12:55 AM, Something Something <
mailinglists19@gmail.com> wrote:

> Hello,
>
> Trying to use Hadoop Streaming to create output that contains no key -
> just value.
>
> Here's what I am trying:
>
> 1)  Created IdentifierResolver as follows:
>
> public class MyIdentifierResolver extends IdentifierResolver {
>
>     public void resolve(String identifier) {
>         System.out.println("Entered resolve with identifier: " +
> identifier);
>         super.resolve(identifier);
>         if (identifier.equals("NullWritable")) {
>             System.out.println("Setting output key class to NullWritable");
>             setOutputKeyClass(NullWritable.class);
>         }
>     }
>
>
> 2)  Set the properties as follows:
>
> -Dstream.io.identifier.resolver.class=com.my.package.MyIdentifierResolver \
> -Dstream.map.output=NullWritable \
> -Dstream.reduce.output=NullWritable
>
>
> This should work right?  But it's still writing the 'key' in the output.
> Is there a better way to do this in Hadoop?
>
> Note:  Basically, we are trying to merge files (over 2000) into smaller
> number of files (e.g. 500).  The files are too big so 'getmerge' does not
> work 'cause we run into space issues.
>
> Please help.  Thanks.
>

Re: Reducer that outputs no key

Posted by Something Something <ma...@gmail.com>.

You can ignore this for now.  I was able to get merging of files to work
under Hadoop Streaming by using the following 2 properties:

-mapper "cut -f2-"
-Dmapred.reduce.tasks=0


On Fri, May 24, 2013 at 12:55 AM, Something Something <
mailinglists19@gmail.com> wrote:

> Hello,
>
> Trying to use Hadoop Streaming to create output that contains no key -
> just value.
>
> Here's what I am trying:
>
> 1)  Created IdentifierResolver as follows:
>
> public class MyIdentifierResolver extends IdentifierResolver {
>
>     public void resolve(String identifier) {
>         System.out.println("Entered resolve with identifier: " +
> identifier);
>         super.resolve(identifier);
>         if (identifier.equals("NullWritable")) {
>             System.out.println("Setting output key class to NullWritable");
>             setOutputKeyClass(NullWritable.class);
>         }
>     }
>
>
> 2)  Set the properties as follows:
>
> -Dstream.io.identifier.resolver.class=com.my.package.MyIdentifierResolver \
> -Dstream.map.output=NullWritable \
> -Dstream.reduce.output=NullWritable
>
>
> This should work right?  But it's still writing the 'key' in the output.
> Is there a better way to do this in Hadoop?
>
> Note:  Basically, we are trying to merge files (over 2000) into smaller
> number of files (e.g. 500).  The files are too big so 'getmerge' does not
> work 'cause we run into space issues.
>
> Please help.  Thanks.
>

Re: Reducer that outputs no key

Posted by Something Something <ma...@gmail.com>.

You can ignore this for now.  I was able to get merging of files to work
under Hadoop Streaming by using the following 2 properties:

-mapper "cut -f2-"
-Dmapred.reduce.tasks=0


On Fri, May 24, 2013 at 12:55 AM, Something Something <
mailinglists19@gmail.com> wrote:

> Hello,
>
> Trying to use Hadoop Streaming to create output that contains no key -
> just value.
>
> Here's what I am trying:
>
> 1)  Created IdentifierResolver as follows:
>
> public class MyIdentifierResolver extends IdentifierResolver {
>
>     public void resolve(String identifier) {
>         System.out.println("Entered resolve with identifier: " +
> identifier);
>         super.resolve(identifier);
>         if (identifier.equals("NullWritable")) {
>             System.out.println("Setting output key class to NullWritable");
>             setOutputKeyClass(NullWritable.class);
>         }
>     }
>
>
> 2)  Set the properties as follows:
>
> -Dstream.io.identifier.resolver.class=com.my.package.MyIdentifierResolver \
> -Dstream.map.output=NullWritable \
> -Dstream.reduce.output=NullWritable
>
>
> This should work right?  But it's still writing the 'key' in the output.
> Is there a better way to do this in Hadoop?
>
> Note:  Basically, we are trying to merge files (over 2000) into smaller
> number of files (e.g. 500).  The files are too big so 'getmerge' does not
> work 'cause we run into space issues.
>
> Please help.  Thanks.
>

Re: Reducer that outputs no key

Posted by Something Something <ma...@gmail.com>.

You can ignore this for now.  I was able to get merging of files to work
under Hadoop Streaming by using the following 2 properties:

-mapper "cut -f2-"
-Dmapred.reduce.tasks=0


On Fri, May 24, 2013 at 12:55 AM, Something Something <
mailinglists19@gmail.com> wrote:

> Hello,
>
> Trying to use Hadoop Streaming to create output that contains no key -
> just value.
>
> Here's what I am trying:
>
> 1)  Created IdentifierResolver as follows:
>
> public class MyIdentifierResolver extends IdentifierResolver {
>
>     public void resolve(String identifier) {
>         System.out.println("Entered resolve with identifier: " +
> identifier);
>         super.resolve(identifier);
>         if (identifier.equals("NullWritable")) {
>             System.out.println("Setting output key class to NullWritable");
>             setOutputKeyClass(NullWritable.class);
>         }
>     }
>
>
> 2)  Set the properties as follows:
>
> -Dstream.io.identifier.resolver.class=com.my.package.MyIdentifierResolver \
> -Dstream.map.output=NullWritable \
> -Dstream.reduce.output=NullWritable
>
>
> This should work right?  But it's still writing the 'key' in the output.
> Is there a better way to do this in Hadoop?
>
> Note:  Basically, we are trying to merge files (over 2000) into smaller
> number of files (e.g. 500).  The files are too big so 'getmerge' does not
> work 'cause we run into space issues.
>
> Please help.  Thanks.
>