You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Mapred Learn <ma...@gmail.com> on 2011/06/20 06:59:06 UTC

how to change default name of a sequnce file

Hi,
I want to name output files of my map-red job (sequence files) to be a
certain name instead of part* default format.

Has anyone ever tried to over-ride the default filename and give output file
name per map-red ?

Thanks,
-JJ

AW: how to change default name of a sequnce file

Posted by Christoph Schmitz <Ch...@1und1.de>.
int partition = context.getTaskAttemptID().getTaskID().getId();

That will get you the number that is usually appended to the part-r-<number> name.

Regards,
Christoph

-----Ursprüngliche Nachricht-----
Von: Mapred Learn [mailto:mapred.learn@gmail.com] 
Gesendet: Montag, 20. Juni 2011 08:50
An: mapreduce-user@hadoop.apache.org
Betreff: Re: how to change default name of a sequnce file

Another question here is in getDefaultWorkFile() is that, how is it possible to find out the mapper number that is used in output. For eg, if you have 30 mappers, how can I add to output file( <OutputFile>) of 30th mapper - <OutputFile>_30 ?


On Sun, Jun 19, 2011 at 11:19 PM, Mapred Learn <ma...@gmail.com> wrote:


	Thanks !
	I will try this  !
	
	
	On Sun, Jun 19, 2011 at 11:16 PM, Christoph Schmitz <Ch...@1und1.de> wrote:
	

		Hi JJ,
		
		you can do that by subclassing TextOutputFormat (or whichever output format you're using) and overloading the getDefaultWorkFile method:
		
		public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
		   // ...
		   public Path getDefaultWorkFile(TaskAttemptContext context,
		           String extension) throws IOException {
		       FileOutputCommitter committer = (FileOutputCommitter) getOutputCommitter(context);
		       return new Path(committer.getWorkPath(), myOwnMethodToComputeTheFileName(context));
		   }
		}
		
		Regards,
		
		Christoph
		
		-----Ursprüngliche Nachricht-----
		Von: Mapred Learn [mailto:mapred.learn@gmail.com]
		Gesendet: Montag, 20. Juni 2011 06:59
		An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
		Betreff: how to change default name of a sequnce file
		

		Hi,
		I want to name output files of my map-red job (sequence files) to be a certain name instead of part* default format.
		
		Has anyone ever tried to over-ride the default filename and give output file name per map-red ?
		
		Thanks,
		-JJ
		




Re: how to change default name of a sequnce file

Posted by Mapred Learn <ma...@gmail.com>.
Another question here is in getDefaultWorkFile() is that, how is it possible
to find out the mapper number that is used in output. For eg, if you have 30
mappers, how can I add to output file( <OutputFile>) of 30th mapper -
<OutputFile>_30 ?

On Sun, Jun 19, 2011 at 11:19 PM, Mapred Learn <ma...@gmail.com>wrote:

> Thanks !
> I will try this  !
>
>   On Sun, Jun 19, 2011 at 11:16 PM, Christoph Schmitz <
> Christoph.Schmitz@1und1.de> wrote:
>
>> Hi JJ,
>>
>> you can do that by subclassing TextOutputFormat (or whichever output
>> format you're using) and overloading the getDefaultWorkFile method:
>>
>> public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
>>    // ...
>>    public Path getDefaultWorkFile(TaskAttemptContext context,
>>            String extension) throws IOException {
>>        FileOutputCommitter committer = (FileOutputCommitter)
>> getOutputCommitter(context);
>>        return new Path(committer.getWorkPath(),
>> myOwnMethodToComputeTheFileName(context));
>>    }
>> }
>>
>> Regards,
>>
>> Christoph
>>
>> -----Ursprüngliche Nachricht-----
>> Von: Mapred Learn [mailto:mapred.learn@gmail.com]
>> Gesendet: Montag, 20. Juni 2011 06:59
>> An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
>> Betreff: how to change default name of a sequnce file
>>
>> Hi,
>> I want to name output files of my map-red job (sequence files) to be a
>> certain name instead of part* default format.
>>
>> Has anyone ever tried to over-ride the default filename and give output
>> file name per map-red ?
>>
>> Thanks,
>> -JJ
>>
>
>

Re: how to change default name of a sequnce file

Posted by Mapred Learn <ma...@gmail.com>.
Thanks !
I will try this  !

On Sun, Jun 19, 2011 at 11:16 PM, Christoph Schmitz <
Christoph.Schmitz@1und1.de> wrote:

> Hi JJ,
>
> you can do that by subclassing TextOutputFormat (or whichever output format
> you're using) and overloading the getDefaultWorkFile method:
>
> public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
>    // ...
>    public Path getDefaultWorkFile(TaskAttemptContext context,
>            String extension) throws IOException {
>        FileOutputCommitter committer = (FileOutputCommitter)
> getOutputCommitter(context);
>        return new Path(committer.getWorkPath(),
> myOwnMethodToComputeTheFileName(context));
>    }
> }
>
> Regards,
>
> Christoph
>
> -----Ursprüngliche Nachricht-----
> Von: Mapred Learn [mailto:mapred.learn@gmail.com]
> Gesendet: Montag, 20. Juni 2011 06:59
> An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
> Betreff: how to change default name of a sequnce file
>
> Hi,
> I want to name output files of my map-red job (sequence files) to be a
> certain name instead of part* default format.
>
> Has anyone ever tried to over-ride the default filename and give output
> file name per map-red ?
>
> Thanks,
> -JJ
>

AW: how to change default name of a sequnce file

Posted by Christoph Schmitz <Ch...@1und1.de>.
Hi JJ,

you can do that by subclassing TextOutputFormat (or whichever output format you're using) and overloading the getDefaultWorkFile method:

public class MyOutputFormat<K, V> extends TextOutputFormat<K, V> {
    // ...
    public Path getDefaultWorkFile(TaskAttemptContext context,
            String extension) throws IOException {
        FileOutputCommitter committer = (FileOutputCommitter) getOutputCommitter(context);
        return new Path(committer.getWorkPath(), myOwnMethodToComputeTheFileName(context));
    }
}

Regards,

Christoph

-----Ursprüngliche Nachricht-----
Von: Mapred Learn [mailto:mapred.learn@gmail.com] 
Gesendet: Montag, 20. Juni 2011 06:59
An: mapreduce-user@hadoop.apache.org; cdh-user@cloudera.org
Betreff: how to change default name of a sequnce file

Hi,
I want to name output files of my map-red job (sequence files) to be a certain name instead of part* default format.
 
Has anyone ever tried to over-ride the default filename and give output file name per map-red ?
 
Thanks,
-JJ