You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Irfan Mohammed <ir...@gmail.com> on 2009/09/08 16:58:13 UTC

OutOfMemory Errors when loading a Gzip file

Hi,
I am trying to load a large gzip file and process using pig. Everytime I 
run the following script, I get outofmemory errors.

The hadoop-site.xml is attached. The pig and the hadoop jobtracker logs 
are attached as well.

$ pig
 >>>
x1 = LOAD 'file:///mnt/transaction_ar20090907_1102_126.CSV.gz' using 
PigStorage('\u0002');
y1 = LIMIT x1 10;
dump y1;
 >>>

Environment :
hadoop-0.20.0
pig-0.3.0 [ patched with Pig-660-4 to work with hadoop-0.20.0 ]
ec2 [ c1.medium ]

Thanks,
Irfan


Re: OutOfMemory Errors when loading a Gzip file

Posted by prasenjit mukherjee <pr...@gmail.com>.
I am also trying to grapple with the similar class of  problems.  In
my case the files are unzipped ( and hence  assuming is splitable on
record boundaries) . The record size is pretty small, though the total
number of records could be in 100s of millions.

Would like to know how pig splits files for LOAD/STORE operations
specifically.  Does the fact that pig is instructed to use  local file
(LOAD file:///.... )  make any difference ?

-Thanks,
Prasen

On Tue, Sep 8, 2009 at 10:58 AM, Irfan Mohammed<ir...@gmail.com> wrote:
> Hi,
> I am trying to load a large gzip file and process using pig. Everytime I run
> the following script, I get outofmemory errors.
>
> The hadoop-site.xml is attached. The pig and the hadoop jobtracker logs are
> attached as well.
>
> $ pig
>>>>
> x1 = LOAD 'file:///mnt/transaction_ar20090907_1102_126.CSV.gz' using
> PigStorage('\u0002');
> y1 = LIMIT x1 10;
> dump y1;
>>>>
>
> Environment :
> hadoop-0.20.0
> pig-0.3.0 [ patched with Pig-660-4 to work with hadoop-0.20.0 ]
> ec2 [ c1.medium ]
>
> Thanks,
> Irfan
>
>

Re: OutOfMemory Errors when loading a Gzip file

Posted by Alan Gates <ga...@yahoo-inc.com>.
How large are the records in your file?  Do you expect any single  
record to be in the multi-megabyte size?

Have you tried decompressing the file and reading it to see if the  
issue is the compression?

Alan.

On Sep 8, 2009, at 7:58 AM, Irfan Mohammed wrote:

> Hi,
> I am trying to load a large gzip file and process using pig.  
> Everytime I run the following script, I get outofmemory errors.
>
> The hadoop-site.xml is attached. The pig and the hadoop jobtracker  
> logs are attached as well.
>
> $ pig
> >>>
> x1 = LOAD 'file:///mnt/transaction_ar20090907_1102_126.CSV.gz' using  
> PigStorage('\u0002');
> y1 = LIMIT x1 10;
> dump y1;
> >>>
>
> Environment :
> hadoop-0.20.0
> pig-0.3.0 [ patched with Pig-660-4 to work with hadoop-0.20.0 ]
> ec2 [ c1.medium ]
>
> Thanks,
> Irfan
>
> ERROR 6016: Out of memory.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066:  
> Unable to open iterator for alias y1
>        at org.apache.pig.PigServer.openIterator(PigServer.java:469)
>        at  
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java: 
> 522)
>        at  
> org 
> .apache 
> .pig 
> .tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java: 
> 190)
>        at  
> org 
> .apache 
> .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:164)
>        at  
> org 
> .apache 
> .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
>        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>        at org.apache.pig.Main.main(Main.java:350)
> Caused by: org.apache.pig.backend.executionengine.ExecException:  
> ERROR 6016: Out of memory.
>        at java.util.Arrays.copyOf(Arrays.java:2760)
>        at java.util.Arrays.copyOf(Arrays.java:2734)
>        at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
>        at java.util.ArrayList.add(ArrayList.java:351)
>        at  
> org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
>        at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java: 
> 117)
>        at  
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java: 
> 104)
>        at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:162)
>        at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:138)
>        at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.moveToNext(MapTask.java:191)
>        at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.next(MapTask.java:175)
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java: 
> 356)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> Caused by: java.lang.OutOfMemoryError
>
> 2009-09-08 10:06:13,551 WARN org.apache.hadoop.conf.Configuration:  
> DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop- 
> site.xml is deprecated. Instead use core-site.xml, mapred-site.xml  
> and hdfs-site.xml to override properties of core-default.xml, mapred- 
> default.xml and hdfs-default.xml respectively
> 2009-09-08 10:06:13,581 INFO org.apache.hadoop.mapred.JobTracker:  
> STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting JobTracker
> STARTUP_MSG:   host = domU-12-31-39-07-50-C2.compute-1.internal/ 
> 10.209.83.48
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.0
> STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20 
>  -r 763504; compiled by 'ndaley' on Thu Apr  9 05:18:40 UTC 2009
> ************************************************************/
> 2009-09-08 10:06:13,815 INFO  
> org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics  
> with hostName=JobTracker, port=50002
> 2009-09-08 10:06:13,908 INFO org.mortbay.log: Logging to  
> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via  
> org.mortbay.log.Slf4jLog
> 2009-09-08 10:06:14,027 INFO org.apache.hadoop.http.HttpServer:  
> Jetty bound to port 50030
> 2009-09-08 10:06:14,027 INFO org.mortbay.log: jetty-6.1.14
> 2009-09-08 10:06:16,821 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0 
> :50030
> 2009-09-08 10:06:16,822 INFO  
> org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics  
> with processName=JobTracker, sessionId=
> 2009-09-08 10:06:16,872 INFO org.apache.hadoop.mapred.JobTracker:  
> JobTracker up at: 50002
> 2009-09-08 10:06:16,872 INFO org.apache.hadoop.mapred.JobTracker:  
> JobTracker webserver: 50030
> 2009-09-08 10:06:17,045 INFO org.apache.hadoop.mapred.JobTracker:  
> Cleaning up the system directory
> 2009-09-08 10:06:17,274 INFO org.apache.hadoop.ipc.Server: IPC  
> Server Responder: starting
> 2009-09-08 10:06:17,275 INFO org.apache.hadoop.ipc.Server: IPC  
> Server listener on 50002: starting
> 2009-09-08 10:06:17,275 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 0 on 50002: starting
> 2009-09-08 10:06:17,275 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 1 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 2 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 3 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 4 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 5 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 6 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 7 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 8 on 50002: starting
> 2009-09-08 10:06:17,276 INFO org.apache.hadoop.mapred.JobTracker:  
> Starting RUNNING
> 2009-09-08 10:06:17,277 INFO org.apache.hadoop.ipc.Server: IPC  
> Server handler 9 on 50002: starting
> 2009-09-08 10:06:17,716 INFO org.apache.hadoop.net.NetworkTopology:  
> Adding a new node: /default-rack/domU-12-31-39-07-50- 
> C2.compute-1.internal
> 2009-09-08 10:13:03,008 INFO  
> org.apache.hadoop.mapred.EagerTaskInitializationListener:  
> Initializing job_200909081006_0001
> 2009-09-08 10:13:03,013 INFO org.apache.hadoop.mapred.JobHistory:  
> Nothing to recover! Generating a new filename domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144 
> .jar for job job_200909081006_0001
> 2009-09-08 10:13:03,026 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144 
> .jar doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144 
> .jar.recover for recovery.
> 2009-09-08 10:13:04,432 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144 
> .jar doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144 
> .jar.recover as the master history file for user.
> 2009-09-08 10:13:05,194 INFO org.apache.hadoop.mapred.JobInProgress:  
> Input size for job job_200909081006_0001 = 151
> 2009-09-08 10:13:05,194 INFO org.apache.hadoop.mapred.JobInProgress:  
> Split info for job:job_200909081006_0001 with 1 splits:
> 2009-09-08 10:13:05,195 INFO org.apache.hadoop.net.NetworkTopology:  
> Adding a new node: /default-rack/localhost
> 2009-09-08 10:13:05,195 INFO org.apache.hadoop.mapred.JobInProgress:  
> tip:task_200909081006_0001_m_000000 has split on node:/default-rack/ 
> localhost
> 2009-09-08 10:13:06,253 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0001_m_000002_0' to tip  
> task_200909081006_0001_m_000002, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:09,306 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0001_m_000002_0' has completed  
> task_200909081006_0001_m_000002 successfully.
> 2009-09-08 10:13:09,310 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0001_m_000000_0' to tip  
> task_200909081006_0001_m_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:09,312 INFO org.apache.hadoop.mapred.JobInProgress:  
> Choosing rack-local task task_200909081006_0001_m_000000
> 2009-09-08 10:13:15,350 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0001_m_000000_0' has completed  
> task_200909081006_0001_m_000000 successfully.
> 2009-09-08 10:13:15,353 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0001_m_000001_0' to tip  
> task_200909081006_0001_m_000001, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:18,366 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0001_m_000001_0' has completed  
> task_200909081006_0001_m_000001 successfully.
> 2009-09-08 10:13:18,367 INFO org.apache.hadoop.mapred.JobInProgress:  
> Job job_200909081006_0001 has completed successfully.
> 2009-09-08 10:13:18,442 INFO org.apache.hadoop.mapred.JobHistory:  
> Recovered job history filename for job job_200909081006_0001 is  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0001_root_Job4376148819153439144 
> .jar
> 2009-09-08 10:13:18,544 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0001_m_000000_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:18,544 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0001_m_000001_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:18,544 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0001_m_000002_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:27,843 INFO  
> org.apache.hadoop.mapred.EagerTaskInitializationListener:  
> Initializing job_200909081006_0002
> 2009-09-08 10:13:27,845 INFO org.apache.hadoop.mapred.JobHistory:  
> Nothing to recover! Generating a new filename domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360 
> .jar for job job_200909081006_0002
> 2009-09-08 10:13:27,846 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360 
> .jar doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360 
> .jar.recover for recovery.
> 2009-09-08 10:13:27,920 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360 
> .jar doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360 
> .jar.recover as the master history file for user.
> 2009-09-08 10:13:28,481 INFO org.apache.hadoop.mapred.JobInProgress:  
> Input size for job job_200909081006_0002 = 151
> 2009-09-08 10:13:28,481 INFO org.apache.hadoop.mapred.JobInProgress:  
> Split info for job:job_200909081006_0002 with 1 splits:
> 2009-09-08 10:13:28,482 INFO org.apache.hadoop.mapred.JobInProgress:  
> tip:task_200909081006_0002_m_000000 has split on node:/default-rack/ 
> localhost
> 2009-09-08 10:13:30,555 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0002_m_000002_0' to tip  
> task_200909081006_0002_m_000002, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:33,562 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0002_m_000002_0' has completed  
> task_200909081006_0002_m_000002 successfully.
> 2009-09-08 10:13:33,563 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0002_m_000000_0' to tip  
> task_200909081006_0002_m_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:33,563 INFO org.apache.hadoop.mapred.JobInProgress:  
> Choosing rack-local task task_200909081006_0002_m_000000
> 2009-09-08 10:13:36,604 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0002_m_000000_0' has completed  
> task_200909081006_0002_m_000000 successfully.
> 2009-09-08 10:13:36,605 INFO  
> org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:1   
> completedMapsInputSize:152  completedMapsOutputSize:409
> 2009-09-08 10:13:36,610 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0002_r_000000_0' to tip  
> task_200909081006_0002_r_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:48,631 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0002_r_000000_0' has completed  
> task_200909081006_0002_r_000000 successfully.
> 2009-09-08 10:13:48,634 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0002_m_000001_0' to tip  
> task_200909081006_0002_m_000001, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,638 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0002_m_000001_0' has completed  
> task_200909081006_0002_m_000001 successfully.
> 2009-09-08 10:13:51,638 INFO org.apache.hadoop.mapred.JobInProgress:  
> Job job_200909081006_0002 has completed successfully.
> 2009-09-08 10:13:51,754 INFO org.apache.hadoop.mapred.JobHistory:  
> Recovered job history filename for job job_200909081006_0002 is  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0002_root_Job5240765340979271360 
> .jar
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0002_m_000000_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0002_m_000001_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0002_m_000002_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:51,843 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0002_r_000000_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:13:58,252 INFO  
> org.apache.hadoop.mapred.EagerTaskInitializationListener:  
> Initializing job_200909081006_0003
> 2009-09-08 10:13:58,255 INFO org.apache.hadoop.mapred.JobHistory:  
> Nothing to recover! Generating a new filename domU-12-31-39-07-50- 
> C2 
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin 
> %3ADefaultJobName for job job_200909081006_0003
> 2009-09-08 10:13:58,258 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin 
> %3ADefaultJobName doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin 
> %3ADefaultJobName.recover for recovery.
> 2009-09-08 10:13:58,337 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin 
> %3ADefaultJobName doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin 
> %3ADefaultJobName.recover as the master history file for user.
> 2009-09-08 10:13:58,890 INFO org.apache.hadoop.mapred.JobInProgress:  
> Input size for job job_200909081006_0003 = 151
> 2009-09-08 10:13:58,890 INFO org.apache.hadoop.mapred.JobInProgress:  
> Split info for job:job_200909081006_0003 with 1 splits:
> 2009-09-08 10:13:58,890 INFO org.apache.hadoop.mapred.JobInProgress:  
> tip:task_200909081006_0003_m_000000 has split on node:/default-rack/ 
> localhost
> 2009-09-08 10:14:00,850 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0003_m_000002_0' to tip  
> task_200909081006_0003_m_000002, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:03,854 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0003_m_000002_0' has completed  
> task_200909081006_0003_m_000002 successfully.
> 2009-09-08 10:14:03,855 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0003_m_000000_0' to tip  
> task_200909081006_0003_m_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:03,855 INFO org.apache.hadoop.mapred.JobInProgress:  
> Choosing rack-local task task_200909081006_0003_m_000000
> 2009-09-08 10:14:06,860 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0003_m_000000_0' has completed  
> task_200909081006_0003_m_000000 successfully.
> 2009-09-08 10:14:06,860 INFO  
> org.apache.hadoop.mapred.ResourceEstimator: completedMapsUpdates:1   
> completedMapsInputSize:152  completedMapsOutputSize:409
> 2009-09-08 10:14:06,862 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0003_r_000000_0' to tip  
> task_200909081006_0003_r_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:21,888 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0003_r_000000_0' has completed  
> task_200909081006_0003_r_000000 successfully.
> 2009-09-08 10:14:21,889 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0003_m_000001_0' to tip  
> task_200909081006_0003_m_000001, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:24,892 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0003_m_000001_0' has completed  
> task_200909081006_0003_m_000001 successfully.
> 2009-09-08 10:14:24,893 INFO org.apache.hadoop.mapred.JobInProgress:  
> Job job_200909081006_0003 has completed successfully.
> 2009-09-08 10:14:24,979 INFO org.apache.hadoop.mapred.JobHistory:  
> Recovered job history filename for job job_200909081006_0003 is  
> domU-12-31-39-07-50- 
> C2 
> .compute-1.internal_1252418773830_job_200909081006_0003_root_PigLatin 
> %3ADefaultJobName
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0003_m_000000_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0003_m_000001_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0003_m_000002_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:14:25,052 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0003_r_000000_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:13,466 INFO  
> org.apache.hadoop.mapred.EagerTaskInitializationListener:  
> Initializing job_200909081006_0004
> 2009-09-08 10:15:13,470 INFO org.apache.hadoop.mapred.JobHistory:  
> Nothing to recover! Generating a new filename domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345 
> .jar for job job_200909081006_0004
> 2009-09-08 10:15:13,471 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345 
> .jar doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345 
> .jar.recover for recovery.
> 2009-09-08 10:15:13,587 INFO org.apache.hadoop.mapred.JobHistory:  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345 
> .jar doesnt exist! Using domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345 
> .jar.recover as the master history file for user.
> 2009-09-08 10:15:14,203 INFO org.apache.hadoop.mapred.JobInProgress:  
> Input size for job job_200909081006_0004 = 260104276
> 2009-09-08 10:15:14,203 INFO org.apache.hadoop.mapred.JobInProgress:  
> Split info for job:job_200909081006_0004 with 1 splits:
> 2009-09-08 10:15:14,203 INFO org.apache.hadoop.mapred.JobInProgress:  
> tip:task_200909081006_0004_m_000000 has split on node:/default-rack/ 
> localhost
> 2009-09-08 10:15:16,080 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0004_m_000002_0' to tip  
> task_200909081006_0004_m_000002, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:19,096 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0004_m_000002_0' has completed  
> task_200909081006_0004_m_000002 successfully.
> 2009-09-08 10:15:19,097 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0004_m_000000_0' to tip  
> task_200909081006_0004_m_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:19,097 INFO org.apache.hadoop.mapred.JobInProgress:  
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:15:43,146 INFO  
> org.apache.hadoop.mapred.TaskInProgress: Error from  
> attempt_200909081006_0004_m_000000_0: java.lang.OutOfMemoryError:  
> Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2760)
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> 	at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> 	at  
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java: 
> 104)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:162)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:138)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.next(MapTask.java:175)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:15:46,151 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0004_m_000000_1' to tip  
> task_200909081006_0004_m_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:15:46,151 INFO org.apache.hadoop.mapred.JobInProgress:  
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:15:46,151 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:10,341 INFO  
> org.apache.hadoop.mapred.TaskInProgress: Error from  
> attempt_200909081006_0004_m_000000_1: java.lang.OutOfMemoryError:  
> Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2760)
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> 	at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> 	at  
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java: 
> 104)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:162)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:138)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.next(MapTask.java:175)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:16:13,345 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0004_m_000000_2' to tip  
> task_200909081006_0004_m_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:13,345 INFO org.apache.hadoop.mapred.JobInProgress:  
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:16:13,345 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_1' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:37,468 INFO  
> org.apache.hadoop.mapred.TaskInProgress: Error from  
> attempt_200909081006_0004_m_000000_2: java.lang.OutOfMemoryError:  
> Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2760)
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> 	at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> 	at  
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java: 
> 104)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:162)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:138)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.next(MapTask.java:175)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:16:40,472 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0004_m_000000_3' to tip  
> task_200909081006_0004_m_000000, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:16:40,472 INFO org.apache.hadoop.mapred.JobInProgress:  
> Choosing rack-local task task_200909081006_0004_m_000000
> 2009-09-08 10:16:40,472 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_2' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:04,550 INFO  
> org.apache.hadoop.mapred.TaskInProgress: Error from  
> attempt_200909081006_0004_m_000000_3: java.lang.OutOfMemoryError:  
> Java heap space
> 	at java.util.Arrays.copyOf(Arrays.java:2760)
> 	at java.util.Arrays.copyOf(Arrays.java:2734)
> 	at java.util.ArrayList.ensureCapacity(ArrayList.java:167)
> 	at java.util.ArrayList.add(ArrayList.java:351)
> 	at org.apache.pig.builtin.PigStorage.readField(PigStorage.java:286)
> 	at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:117)
> 	at  
> org.apache.pig.backend.executionengine.PigSlice.next(PigSlice.java: 
> 104)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:162)
> 	at  
> org 
> .apache 
> .pig.backend.hadoop.executionengine.mapReduceLayer.SliceWrapper 
> $1.next(SliceWrapper.java:138)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.moveToNext(MapTask.java:191)
> 	at org.apache.hadoop.mapred.MapTask 
> $TrackedRecordReader.next(MapTask.java:175)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 2009-09-08 10:17:07,554 INFO  
> org.apache.hadoop.mapred.TaskInProgress: TaskInProgress  
> task_200909081006_0004_m_000000 has failed 4 times.
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobInProgress:  
> TaskTracker at 'domU-12-31-39-07-50-C2.compute-1.internal' turned  
> 'flaky'
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobInProgress:  
> Aborting job job_200909081006_0004
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobInProgress:  
> Killing job 'job_200909081006_0004'
> 2009-09-08 10:17:07,555 INFO org.apache.hadoop.mapred.JobTracker:  
> Adding task 'attempt_200909081006_0004_m_000001_0' to tip  
> task_200909081006_0004_m_000001, for tracker  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:07,556 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_3' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,561 INFO org.apache.hadoop.mapred.JobInProgress:  
> Task 'attempt_200909081006_0004_m_000001_0' has completed  
> task_200909081006_0004_m_000001 successfully.
> 2009-09-08 10:17:10,606 INFO org.apache.hadoop.mapred.JobHistory:  
> Recovered job history filename for job job_200909081006_0004 is  
> domU-12-31-39-07-50- 
> C2 
> .compute 
> -1 
> .internal_1252418773830_job_200909081006_0004_root_Job3181357076430015345 
> .jar
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_1' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_2' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000000_3' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000001_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> 2009-09-08 10:17:10,660 INFO org.apache.hadoop.mapred.JobTracker:  
> Removed completed task 'attempt_200909081006_0004_m_000002_0' from  
> 'tracker_domU-12-31-39-07-50- 
> C2.compute-1.internal:localhost.localdomain/127.0.0.1:57318'
> [root@domU-12-31-39-07-50-C2 ~]#
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>
> <configuration>
>
> <property>
>  <name>hadoop.tmp.dir</name>
>  <value>/mnt/hadoop</value>
> </property>
>
> <property>
>  <name>fs.default.name</name>
>  <value>hdfs://domU-12-31-39-07-50-C2.compute-1.internal:50001</value>
> </property>
>
> <property>
>  <name>mapred.job.tracker</name>
>  <value>hdfs://domU-12-31-39-07-50-C2.compute-1.internal:50002</value>
> </property>
>
> <property>
>  <name>tasktracker.http.threads</name>
>  <value>80</value>
> </property>
>
> <property>
> <name>mapred.reduce.parallel.copies</name>
> <value>1</value>
> </property>
> <property>
> <name>mapred.reduce.tasks</name>
> <value>1</value>
> </property>
>
> <property>
>  <name>mapred.tasktracker.map.tasks.maximum</name>
>  <value>3</value>
> </property>
>
> <property>
>  <name>mapred.tasktracker.reduce.tasks.maximum</name>
>  <value>3</value>
> </property>
>
> <property>
>  <name>mapred.output.compress</name>
>  <value>true</value>
> </property>
>
> <property>
>  <name>mapred.output.compression.type</name>
>  <value>BLOCK</value>
> </property>
>
> <property>
>  <name>dfs.client.block.write.retries</name>
>  <value>3</value>
> </property>
>
> <property>
>  <name>mapred.child.java.opts</name>
>  <value>-Xmx550m</value>
> </property>
>
> <property>
> <name>io.compression.codecs</name>
>
> < 
> value 
> > 
> org 
> .apache 
> .hadoop 
> .io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec</ 
> value>
> <description>A list of the compression codec classes that can be  
> used for compression/decompression.</description>
> </property>
>
> <property>
>  <name>fs.s3.awsAccessKeyId</name>
>  <value>xxxx</value>
> </property>
> <property>
>  <name>fs.s3.awsSecretAccessKey</name>
>  <value>xxxx</value>
> </property>
> <property>
>  <name>fs.s3n.awsAccessKeyId</name>
>  <value>xxxx</value>
> </property>
> <property>
>  <name>fs.s3n.awsSecretAccessKey</name>
>  <value>xxxx</value>
> </property>
>
> </configuration>
>