You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Tony Burton <TB...@SportingIndex.com> on 2012/11/28 12:12:30 UTC

Map output compression in Hadoop 1.0.3

Hi,

Quick question: What's the best way to turn on Map Output Compression in Hadoop 1.0.3? The tutorial at http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use JobConf.setCompressMapOutput(boolean), but I'm using o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.

Is it simply a case of using getConf.set("mapred.output.compress", true) then constructing my Job from the Configuration object, or is there more direct way that I've missed?

Thanks,

Tony



**********************************************************************
Please consider the environment before printing this email or attachments

This email and any attachments are confidential, protected by copyright and may be legally privileged.  If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system.  Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened.  It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email.  Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP.  Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).  Any financial promotion contained herein has been issued
and approved by Sporting Index Ltd.

Outbound email has been scanned for viruses and SPAM

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Hi Andy and list,

Apologies - I've not been looking at my list inbox for a while, so missed this request. I'm running some tests as I type, and will report back when they're done. I'm running the same job for bzip2, gzip and snappy codecs versus no map out compression. I guess I should include LZO in the comparison too, but the codec wasn't obvious in the o.a.h.io.compress.* areas of Hadoop. If someone could point out where to find this codec, that'd be really handy. If not I could always google it :)

Tony



________________________________________
From: Kartashov, Andy [Andy.Kartashov@mpac.ca]
Sent: 29 November 2012 16:09
To: user@hadoop.apache.org
Subject: RE: Map output compression in Hadoop 1.0.3

Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Hi Andy and list,

Apologies - I've not been looking at my list inbox for a while, so missed this request. I'm running some tests as I type, and will report back when they're done. I'm running the same job for bzip2, gzip and snappy codecs versus no map out compression. I guess I should include LZO in the comparison too, but the codec wasn't obvious in the o.a.h.io.compress.* areas of Hadoop. If someone could point out where to find this codec, that'd be really handy. If not I could always google it :)

Tony



________________________________________
From: Kartashov, Andy [Andy.Kartashov@mpac.ca]
Sent: 29 November 2012 16:09
To: user@hadoop.apache.org
Subject: RE: Map output compression in Hadoop 1.0.3

Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Hi Andy and list,

Apologies - I've not been looking at my list inbox for a while, so missed this request. I'm running some tests as I type, and will report back when they're done. I'm running the same job for bzip2, gzip and snappy codecs versus no map out compression. I guess I should include LZO in the comparison too, but the codec wasn't obvious in the o.a.h.io.compress.* areas of Hadoop. If someone could point out where to find this codec, that'd be really handy. If not I could always google it :)

Tony



________________________________________
From: Kartashov, Andy [Andy.Kartashov@mpac.ca]
Sent: 29 November 2012 16:09
To: user@hadoop.apache.org
Subject: RE: Map output compression in Hadoop 1.0.3

Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Hi Andy and list,

Apologies - I've not been looking at my list inbox for a while, so missed this request. I'm running some tests as I type, and will report back when they're done. I'm running the same job for bzip2, gzip and snappy codecs versus no map out compression. I guess I should include LZO in the comparison too, but the codec wasn't obvious in the o.a.h.io.compress.* areas of Hadoop. If someone could point out where to find this codec, that'd be really handy. If not I could always google it :)

Tony



________________________________________
From: Kartashov, Andy [Andy.Kartashov@mpac.ca]
Sent: 29 November 2012 16:09
To: user@hadoop.apache.org
Subject: RE: Map output compression in Hadoop 1.0.3

Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map output compression in Hadoop 1.0.3

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map output compression in Hadoop 1.0.3

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Got it - thanks Harsh.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:41
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

No, I see your point of confusion and I can think of others who may be confused that way, but the API changes did not trigger the config naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2, keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", 
> true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression 
>> in Hadoop 1.0.3? The tutorial at 
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
>> JobConf.setCompressMapOutput(boolean), but I'm using 
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is 
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> *********************************************************************
>> *
>> ******* P Please consider the environment before printing this email 
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by 
>> copyright and may be legally privileged. If you are not the intended 
>> recipient, then the dissemination or copying of this email is 
>> prohibited. If you have received this in error, please notify the 
>> sender by replying by email and then delete the email completely from 
>> your system. Neither Sporting Index nor the sender accepts 
>> responsibility for any virus, or any other defect which might affect 
>> any computer or IT system into which the email is received and/or 
>> opened. It is the responsibility of the recipient to scan the email 
>> and no responsibility is accepted for any loss or damage arising in 
>> any way from receipt or use of this email. Sporting Index Ltd is a 
>> company registered in England and Wales with company number 2636842, 
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved 
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



--
Harsh J

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Got it - thanks Harsh.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:41
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

No, I see your point of confusion and I can think of others who may be confused that way, but the API changes did not trigger the config naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2, keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", 
> true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression 
>> in Hadoop 1.0.3? The tutorial at 
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
>> JobConf.setCompressMapOutput(boolean), but I'm using 
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is 
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> *********************************************************************
>> *
>> ******* P Please consider the environment before printing this email 
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by 
>> copyright and may be legally privileged. If you are not the intended 
>> recipient, then the dissemination or copying of this email is 
>> prohibited. If you have received this in error, please notify the 
>> sender by replying by email and then delete the email completely from 
>> your system. Neither Sporting Index nor the sender accepts 
>> responsibility for any virus, or any other defect which might affect 
>> any computer or IT system into which the email is received and/or 
>> opened. It is the responsibility of the recipient to scan the email 
>> and no responsibility is accepted for any loss or damage arising in 
>> any way from receipt or use of this email. Sporting Index Ltd is a 
>> company registered in England and Wales with company number 2636842, 
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved 
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



--
Harsh J

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Got it - thanks Harsh.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:41
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

No, I see your point of confusion and I can think of others who may be confused that way, but the API changes did not trigger the config naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2, keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", 
> true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression 
>> in Hadoop 1.0.3? The tutorial at 
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
>> JobConf.setCompressMapOutput(boolean), but I'm using 
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is 
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> *********************************************************************
>> *
>> ******* P Please consider the environment before printing this email 
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by 
>> copyright and may be legally privileged. If you are not the intended 
>> recipient, then the dissemination or copying of this email is 
>> prohibited. If you have received this in error, please notify the 
>> sender by replying by email and then delete the email completely from 
>> your system. Neither Sporting Index nor the sender accepts 
>> responsibility for any virus, or any other defect which might affect 
>> any computer or IT system into which the email is received and/or 
>> opened. It is the responsibility of the recipient to scan the email 
>> and no responsibility is accepted for any loss or damage arising in 
>> any way from receipt or use of this email. Sporting Index Ltd is a 
>> company registered in England and Wales with company number 2636842, 
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved 
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



--
Harsh J

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Got it - thanks Harsh.


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:41
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

No, I see your point of confusion and I can think of others who may be confused that way, but the API changes did not trigger the config naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2, keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", 
> true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression 
>> in Hadoop 1.0.3? The tutorial at 
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
>> JobConf.setCompressMapOutput(boolean), but I'm using 
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is 
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> *********************************************************************
>> *
>> ******* P Please consider the environment before printing this email 
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by 
>> copyright and may be legally privileged. If you are not the intended 
>> recipient, then the dissemination or copying of this email is 
>> prohibited. If you have received this in error, please notify the 
>> sender by replying by email and then delete the email completely from 
>> your system. Neither Sporting Index nor the sender accepts 
>> responsibility for any virus, or any other defect which might affect 
>> any computer or IT system into which the email is received and/or 
>> opened. It is the responsibility of the recipient to scan the email 
>> and no responsibility is accepted for any loss or damage arising in 
>> any way from receipt or use of this email. Sporting Index Ltd is a 
>> company registered in England and Wales with company number 2636842, 
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved 
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



--
Harsh J

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
No, I see your point of confusion and I can think of others who may be
confused that way, but the API changes did not trigger the config
naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2,
keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression
>> in Hadoop 1.0.3? The tutorial at
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
>> JobConf.setCompressMapOutput(boolean), but I'm using
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> **********************************************************************
>> ******* P Please consider the environment before printing this email
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by
>> copyright and may be legally privileged. If you are not the intended
>> recipient, then the dissemination or copying of this email is
>> prohibited. If you have received this in error, please notify the
>> sender by replying by email and then delete the email completely from
>> your system. Neither Sporting Index nor the sender accepts
>> responsibility for any virus, or any other defect which might affect
>> any computer or IT system into which the email is received and/or
>> opened. It is the responsibility of the recipient to scan the email
>> and no responsibility is accepted for any loss or damage arising in
>> any way from receipt or use of this email. Sporting Index Ltd is a
>> company registered in England and Wales with company number 2636842,
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



-- 
Harsh J

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
No, I see your point of confusion and I can think of others who may be
confused that way, but the API changes did not trigger the config
naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2,
keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression
>> in Hadoop 1.0.3? The tutorial at
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
>> JobConf.setCompressMapOutput(boolean), but I'm using
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> **********************************************************************
>> ******* P Please consider the environment before printing this email
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by
>> copyright and may be legally privileged. If you are not the intended
>> recipient, then the dissemination or copying of this email is
>> prohibited. If you have received this in error, please notify the
>> sender by replying by email and then delete the email completely from
>> your system. Neither Sporting Index nor the sender accepts
>> responsibility for any virus, or any other defect which might affect
>> any computer or IT system into which the email is received and/or
>> opened. It is the responsibility of the recipient to scan the email
>> and no responsibility is accepted for any loss or damage arising in
>> any way from receipt or use of this email. Sporting Index Ltd is a
>> company registered in England and Wales with company number 2636842,
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



-- 
Harsh J

RE: Map output compression in Hadoop 1.0.3

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
No, I see your point of confusion and I can think of others who may be
confused that way, but the API changes did not trigger the config
naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2,
keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression
>> in Hadoop 1.0.3? The tutorial at
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
>> JobConf.setCompressMapOutput(boolean), but I'm using
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> **********************************************************************
>> ******* P Please consider the environment before printing this email
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by
>> copyright and may be legally privileged. If you are not the intended
>> recipient, then the dissemination or copying of this email is
>> prohibited. If you have received this in error, please notify the
>> sender by replying by email and then delete the email completely from
>> your system. Neither Sporting Index nor the sender accepts
>> responsibility for any virus, or any other defect which might affect
>> any computer or IT system into which the email is received and/or
>> opened. It is the responsibility of the recipient to scan the email
>> and no responsibility is accepted for any loss or damage arising in
>> any way from receipt or use of this email. Sporting Index Ltd is a
>> company registered in England and Wales with company number 2636842,
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



-- 
Harsh J

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
No, I see your point of confusion and I can think of others who may be
confused that way, but the API changes did not trigger the config
naming change.

The config naming changes could instead be viewed by you as a MR1 vs.
MR2 thing, for simplification. So unless you move onto YARN-based MR2,
keep using the mapred.* style properties.

On Wed, Nov 28, 2012 at 5:07 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.
>
> For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.
>
> Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?
>
> (Hope that makes sense!)
>
>
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: 28 November 2012 11:25
> To: <us...@hadoop.apache.org>
> Subject: Re: Map output compression in Hadoop 1.0.3
>
> Hi,
>
> The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".
>
> Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.
>
> These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":
>
> 1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
> 2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
> 3. Turn it on per-job basis:
> 3.1. Stable API: JobConf.setCompressMapOutput(true);
> 3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);
>
> On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
>> Hi,
>>
>>
>>
>> Quick question: What's the best way to turn on Map Output Compression
>> in Hadoop 1.0.3? The tutorial at
>> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
>> JobConf.setCompressMapOutput(boolean), but I'm using
>> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>>
>>
>>
>> Is it simply a case of using getConf.set("mapred.output.compress",
>> true) then constructing my Job from the Configuration object, or is
>> there more direct way that I've missed?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Tony
>>
>>
>>
>>
>>
>>
>> **********************************************************************
>> ******* P Please consider the environment before printing this email
>> or attachments
>>
>>
>> This email and any attachments are confidential, protected by
>> copyright and may be legally privileged. If you are not the intended
>> recipient, then the dissemination or copying of this email is
>> prohibited. If you have received this in error, please notify the
>> sender by replying by email and then delete the email completely from
>> your system. Neither Sporting Index nor the sender accepts
>> responsibility for any virus, or any other defect which might affect
>> any computer or IT system into which the email is received and/or
>> opened. It is the responsibility of the recipient to scan the email
>> and no responsibility is accepted for any loss or damage arising in
>> any way from receipt or use of this email. Sporting Index Ltd is a
>> company registered in England and Wales with company number 2636842,
>> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
>> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
>> Any financial promotion contained herein has been issued and approved
>> by Sporting Index Ltd.
>>
>>
>> Outbound email has been scanned for viruses and SPAM
>
>
>
> --
> Harsh J
>
>
> Please consider the environment before printing this email
>
> www.sportingindex.com
> Inbound Email has been scanned for viruses and SPAM



-- 
Harsh J

RE: Map output compression in Hadoop 1.0.3

Posted by "Kartashov, Andy" <An...@mpac.ca>.
Tony,

Can you please share with us on the permorfmance improvement (if any) after using compression in map.output? I was abpout to start looking into it myself.   What compression codec did you use?

Rgds,
AK

-----Original Message-----
From: Tony Burton [mailto:TBurton@SportingIndex.com]
Sent: Wednesday, November 28, 2012 6:38 AM
To: <us...@hadoop.apache.org>
Subject: RE: Map output compression in Hadoop 1.0.3

Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression
> in Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I'm using
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress",
> true) then constructing my Job from the Configuration object, or is
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email
> or attachments
>
>
> This email and any attachments are confidential, protected by
> copyright and may be legally privileged. If you are not the intended
> recipient, then the dissemination or copying of this email is
> prohibited. If you have received this in error, please notify the
> sender by replying by email and then delete the email completely from
> your system. Neither Sporting Index nor the sender accepts
> responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or
> opened. It is the responsibility of the recipient to scan the email
> and no responsibility is accepted for any loss or damage arising in
> any way from receipt or use of this email. Sporting Index Ltd is a
> company registered in England and Wales with company number 2636842,
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001).
> Any financial promotion contained herein has been issued and approved
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le présent courriel et toute pièce jointe qui l'accompagne sont confidentiels, protégés par le droit d'auteur et peuvent être couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autorisée est interdite. Si vous n'êtes pas le destinataire prévu de ce courriel, supprimez-le et contactez immédiatement l'expéditeur. Veuillez penser à l'environnement avant d'imprimer le présent courriel

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Sorry, my fault about "mapred.output.compress" - I meant "mapred.compress.map.output".

Thanks Harsh for the speedy and comprehensive answer! Very useful. 

Tony



-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Sorry, my fault about "mapred.output.compress" - I meant "mapred.compress.map.output".

Thanks Harsh for the speedy and comprehensive answer! Very useful. 

Tony



-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Sorry, my fault about "mapred.output.compress" - I meant "mapred.compress.map.output".

Thanks Harsh for the speedy and comprehensive answer! Very useful. 

Tony



-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Sorry, my fault about "mapred.output.compress" - I meant "mapred.compress.map.output".

Thanks Harsh for the speedy and comprehensive answer! Very useful. 

Tony



-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

RE: Map output compression in Hadoop 1.0.3

Posted by Tony Burton <TB...@SportingIndex.com>.
Also, another point that prompted my initial question: I'd come across "mapred.compress.map.output" in the documentation, but I wasn't 100% sure if there has been or will be any equivalence or correspondence between config setting like this one and the naming of the stable and new API.

For example, we've got o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf as previously mentioned, from the "mapred" and "mapreduce" parts of the API.

Are config settings that begin with mapred.* related to the stable API with the implication that there's an mapreduce.* equivalent (eg mapred.compress.map.output vs mapreduce.compress.map.output), or am I seeing a connection that doesn't exist?

(Hope that makes sense!)




-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: 28 November 2012 11:25
To: <us...@hadoop.apache.org>
Subject: Re: Map output compression in Hadoop 1.0.3

Hi,

The property mapred.output.compress, as its name reads, controls job-output compression, not intermediate/transient data compression, which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map output compression":

1. Set "mapred.compress.map.output" to true in your client's mapred-site.xml to turn it on for all jobs run from such a client machine.
2. Set the above in cluster, with <final>true</final> at every node (JT plus TTs) and restart them, to turn it on for all job, regardless of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What's the best way to turn on Map Output Compression 
> in Hadoop 1.0.3? The tutorial at 
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use 
> JobConf.setCompressMapOutput(boolean), but I'm using 
> o.a.h.mapreduce.Job rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", 
> true) then constructing my Job from the Configuration object, or is 
> there more direct way that I've missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> **********************************************************************
> ******* P Please consider the environment before printing this email 
> or attachments
>
>
> This email and any attachments are confidential, protected by 
> copyright and may be legally privileged. If you are not the intended 
> recipient, then the dissemination or copying of this email is 
> prohibited. If you have received this in error, please notify the 
> sender by replying by email and then delete the email completely from 
> your system. Neither Sporting Index nor the sender accepts 
> responsibility for any virus, or any other defect which might affect 
> any computer or IT system into which the email is received and/or 
> opened. It is the responsibility of the recipient to scan the email 
> and no responsibility is accepted for any loss or damage arising in 
> any way from receipt or use of this email. Sporting Index Ltd is a 
> company registered in England and Wales with company number 2636842, 
> whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). 
> Any financial promotion contained herein has been issued and approved 
> by Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



--
Harsh J


Please consider the environment before printing this email

www.sportingindex.com
Inbound Email has been scanned for viruses and SPAM 

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
Hi,

The property mapred.output.compress, as its name reads, controls
job-output compression, not intermediate/transient data compression,
which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a
user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map
output compression":

1. Set "mapred.compress.map.output" to true in your client's
mapred-site.xml to turn it on for all jobs run from such a client
machine.
2. Set the above in cluster, with <final>true</final> at every node
(JT plus TTs) and restart them, to turn it on for all job, regardless
of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What’s the best way to turn on Map Output Compression in
> Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I’m using o.a.h.mapreduce.Job
> rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", true)
> then constructing my Job from the Configuration object, or is there more
> direct way that I’ve missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> *****************************************************************************
> P Please consider the environment before printing this email or attachments
>
>
> This email and any attachments are confidential, protected by copyright and
> may be legally privileged. If you are not the intended recipient, then the
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system. Neither Sporting Index nor the sender
> accepts responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or opened. It
> is the responsibility of the recipient to scan the email and no
> responsibility is accepted for any loss or damage arising in any way from
> receipt or use of this email. Sporting Index Ltd is a company registered in
> England and Wales with company number 2636842, whose registered office is at
> Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is
> authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any
> financial promotion contained herein has been issued and approved by
> Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



-- 
Harsh J

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
Hi,

The property mapred.output.compress, as its name reads, controls
job-output compression, not intermediate/transient data compression,
which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a
user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map
output compression":

1. Set "mapred.compress.map.output" to true in your client's
mapred-site.xml to turn it on for all jobs run from such a client
machine.
2. Set the above in cluster, with <final>true</final> at every node
(JT plus TTs) and restart them, to turn it on for all job, regardless
of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What’s the best way to turn on Map Output Compression in
> Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I’m using o.a.h.mapreduce.Job
> rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", true)
> then constructing my Job from the Configuration object, or is there more
> direct way that I’ve missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> *****************************************************************************
> P Please consider the environment before printing this email or attachments
>
>
> This email and any attachments are confidential, protected by copyright and
> may be legally privileged. If you are not the intended recipient, then the
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system. Neither Sporting Index nor the sender
> accepts responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or opened. It
> is the responsibility of the recipient to scan the email and no
> responsibility is accepted for any loss or damage arising in any way from
> receipt or use of this email. Sporting Index Ltd is a company registered in
> England and Wales with company number 2636842, whose registered office is at
> Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is
> authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any
> financial promotion contained herein has been issued and approved by
> Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



-- 
Harsh J

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
Hi,

The property mapred.output.compress, as its name reads, controls
job-output compression, not intermediate/transient data compression,
which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a
user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map
output compression":

1. Set "mapred.compress.map.output" to true in your client's
mapred-site.xml to turn it on for all jobs run from such a client
machine.
2. Set the above in cluster, with <final>true</final> at every node
(JT plus TTs) and restart them, to turn it on for all job, regardless
of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What’s the best way to turn on Map Output Compression in
> Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I’m using o.a.h.mapreduce.Job
> rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", true)
> then constructing my Job from the Configuration object, or is there more
> direct way that I’ve missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> *****************************************************************************
> P Please consider the environment before printing this email or attachments
>
>
> This email and any attachments are confidential, protected by copyright and
> may be legally privileged. If you are not the intended recipient, then the
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system. Neither Sporting Index nor the sender
> accepts responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or opened. It
> is the responsibility of the recipient to scan the email and no
> responsibility is accepted for any loss or damage arising in any way from
> receipt or use of this email. Sporting Index Ltd is a company registered in
> England and Wales with company number 2636842, whose registered office is at
> Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is
> authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any
> financial promotion contained herein has been issued and approved by
> Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



-- 
Harsh J

Re: Map output compression in Hadoop 1.0.3

Posted by Harsh J <ha...@cloudera.com>.
Hi,

The property mapred.output.compress, as its name reads, controls
job-output compression, not intermediate/transient data compression,
which is what you mean by "Map output compression".

Also note that this property is a per job one and can be toggled, if a
user wanted, on/off for each job specifically.

These should be the many ways, exhaustively, for MR1, to turn on "Map
output compression":

1. Set "mapred.compress.map.output" to true in your client's
mapred-site.xml to turn it on for all jobs run from such a client
machine.
2. Set the above in cluster, with <final>true</final> at every node
(JT plus TTs) and restart them, to turn it on for all job, regardless
of what the job itself specifies.
3. Turn it on per-job basis:
3.1. Stable API: JobConf.setCompressMapOutput(true);
3.2. New API: Job.getConfiguration().set("mapred.compress.map.output", true);

On Wed, Nov 28, 2012 at 4:42 PM, Tony Burton <TB...@sportingindex.com> wrote:
> Hi,
>
>
>
> Quick question: What’s the best way to turn on Map Output Compression in
> Hadoop 1.0.3? The tutorial at
> http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html says to use
> JobConf.setCompressMapOutput(boolean), but I’m using o.a.h.mapreduce.Job
> rather than o.a.h.mapred.JobConf.
>
>
>
> Is it simply a case of using getConf.set("mapred.output.compress", true)
> then constructing my Job from the Configuration object, or is there more
> direct way that I’ve missed?
>
>
>
> Thanks,
>
>
>
> Tony
>
>
>
>
>
>
> *****************************************************************************
> P Please consider the environment before printing this email or attachments
>
>
> This email and any attachments are confidential, protected by copyright and
> may be legally privileged. If you are not the intended recipient, then the
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system. Neither Sporting Index nor the sender
> accepts responsibility for any virus, or any other defect which might affect
> any computer or IT system into which the email is received and/or opened. It
> is the responsibility of the recipient to scan the email and no
> responsibility is accepted for any loss or damage arising in any way from
> receipt or use of this email. Sporting Index Ltd is a company registered in
> England and Wales with company number 2636842, whose registered office is at
> Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is
> authorised and regulated by the UK Financial Services Authority (reg. no.
> 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any
> financial promotion contained herein has been issued and approved by
> Sporting Index Ltd.
>
>
> Outbound email has been scanned for viruses and SPAM



-- 
Harsh J