You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ravi Gummadi (JIRA)" <ji...@apache.org> on 2011/03/29 11:31:05 UTC

[jira] [Created] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Make Gridmix emulate usage of data compression
----------------------------------------------

                 Key: MAPREDUCE-2408
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
          Components: contrib/gridmix
            Reporter: Ravi Gummadi
            Assignee: Amar Kamat


Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-2408:
----------------------------------

    Attachment: MR-2408-gridmix-compression-emulation-v1.1.patch

Attaching a patch implementing compression emulation support in Gridmix. test-patch and ant tests passed. Manually tested the patch.

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-2408:
----------------------------------

    Status: Patch Available  (was: Open)

Running through Hudson.

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039821#comment-13039821 ] 

Amar Kamat commented on MAPREDUCE-2408:
---------------------------------------

The goal of this jira is to emulate the compression characteristics of a MapReduce job. Emulating compression characteristics involves the following 1. Generating compressible data. The compression characteristics (e.g compression ratio) of the data (map input, map output and reduce output) should be configurable. 2. Extract compression related properties from original job's configuration and history files. Configure the simulated job to mimic the compression behavior using the original job's configuration and history. 

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040730#comment-13040730 ] 

Amar Kamat commented on MAPREDUCE-2408:
---------------------------------------

Opened MAPREDUCE-2542 for tracking LZO codec support in Gridmix.

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>             Fix For: 0.23.0
>
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040726#comment-13040726 ] 

Amar Kamat commented on MAPREDUCE-2408:
---------------------------------------

Hong,
Thanks a lot for your review. You are right. The compression ratios table will be different for different codecs. The empirical values table in this patch is computed for the default codec (i.e Gzip). We have compiled similar table for LZO and it seems LZO too shows some pattern in that respect. The plan is to add other codecs incrementally. I will open a JIRA to track LZO compression emulation.

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>             Fix For: 0.23.0
>
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Amar Kamat (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated MAPREDUCE-2408:
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.23.0
     Release Note: Emulates the MapReduce compression feature in Gridmix. By default, compression emulation is turned on. Compression emulation can be disabled by setting 'gridmix.compression-emulation.enable' to 'false'.  Use 'gridmix.compression-emulation.map-input.decompression-ratio', 'gridmix.compression-emulation.map-output.compression-ratio' and 'gridmix.compression-emulation.reduce-output.compression-ratio' to configure the compression ratios at map input, map output and reduce output side respectively. Currently, compression ratios in the range [0.07, 0.68] are supported. Gridmix auto detects whether map-input, map output and reduce output should emulate compression based on original job's compression related configuration parameters.
     Hadoop Flags: [Reviewed]
           Status: Resolved  (was: Patch Available)

I just committed this to trunk. Thanks Ravi for the review!

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>             Fix For: 0.23.0
>
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040290#comment-13040290 ] 

Hudson commented on MAPREDUCE-2408:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk #692 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk/692/])
    MAPREDUCE-2408. [Gridmix] Compression emulation in Gridmix. (amarrk)

amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128162
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/DistributedCacheEmulator.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixRecord.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/CompressionEmulationUtil.java
* /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestCompressionEmulationUtils.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/AvgRecordFactory.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/SleepJob.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/RandomTextDataGenerator.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/LoadJob.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestRandomTextDataGenerator.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/FileQueue.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateData.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateDistCacheData.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/InputStriper.java


> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>             Fix For: 0.23.0
>
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040652#comment-13040652 ] 

Hong Tang commented on MAPREDUCE-2408:
--------------------------------------

Looks like I missed it before it gets committed. I quickly went through the patch. I like the approach of using a dictionary and empirically match the compression ratio with the dictionary size. However, I believe the compression ratio would be different under different compression codecs (even same codec under different levels). It'd be useful if you could extend CompressionRatioLookupTable so that it takes as input a compression codec (and you may only support the most common few codecs lzo, gzip, and bzip2).

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>             Fix For: 0.23.0
>
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039977#comment-13039977 ] 

Hadoop QA commented on MAPREDUCE-2408:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12480563/MR-2408-gridmix-compression-emulation-v1.1.patch
  against trunk revision 1127444.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed these core unit tests:
                  org.apache.hadoop.cli.TestMRCLI
                  org.apache.hadoop.tools.TestHadoopArchives
                  org.apache.hadoop.tools.TestHarFileSystem

    -1 contrib tests.  The patch failed contrib unit tests.

    +1 system test framework.  The patch passed system test framework compile.

Test results: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//testReport/
Findbugs warnings: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/hudson/job/PreCommit-MAPREDUCE-Build/308//console

This message is automatically generated.

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040085#comment-13040085 ] 

Hudson commented on MAPREDUCE-2408:
-----------------------------------

Integrated in Hadoop-Mapreduce-trunk-Commit #703 (See [https://builds.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/703/])
    MAPREDUCE-2408. [Gridmix] Compression emulation in Gridmix. (amarrk)

amarrk : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1128162
Files : 
* /hadoop/mapreduce/trunk/CHANGES.txt
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/DistributedCacheEmulator.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixRecord.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/CompressionEmulationUtil.java
* /hadoop/mapreduce/trunk/src/docs/src/documentation/content/xdocs/gridmix.xml
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestCompressionEmulationUtils.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/AvgRecordFactory.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GridmixJob.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/SleepJob.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/RandomTextDataGenerator.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/LoadJob.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/test/org/apache/hadoop/mapred/gridmix/TestRandomTextDataGenerator.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/Gridmix.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/FileQueue.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateData.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/GenerateDistCacheData.java
* /hadoop/mapreduce/trunk/src/contrib/gridmix/src/java/org/apache/hadoop/mapred/gridmix/InputStriper.java


> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>             Fix For: 0.23.0
>
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-2408) Make Gridmix emulate usage of data compression

Posted by "Ravi Gummadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040072#comment-13040072 ] 

Ravi Gummadi commented on MAPREDUCE-2408:
-----------------------------------------

Tests failed are not related to this patch. Findbugs warnings reported by Hudson are also not related to this patch.

Patch looks good to me. +1

> Make Gridmix emulate usage of data compression
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2408
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2408
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: contrib/gridmix
>            Reporter: Ravi Gummadi
>            Assignee: Amar Kamat
>         Attachments: MR-2408-gridmix-compression-emulation-v1.1.patch
>
>
> Currently Gridmix emulates disk IO load only. This JIRA is to make Gridmix emulate load due to data compression as defined by the job-trace.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira