You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Sultan <su...@gmail.com> on 2019/04/07 23:01:25 UTC
Hadoop does not process large split size
Hi,
I am trying to process some data in hadoop. I wanted hadoop (MapReduce) to process the whole data as one split (one task) for testing purposes. My data size is 5368709120 bytes. But MR only processes 20% (equivalent to 8 tasks, 128MB each) of this size and considers this successful.
My data already divided in HDFS into 40 chunks (128 MB each), and I already set “ mapreduce.input.fileinputformat.split.minsize” and mapreduce.input.fileinputformat.split.maxsize” to 5368709120 bytes.
Here is the output
19/04/02 18:29:44 INFO client.RMProxy: Connecting to ResourceManager at dmaster/10.40.0.0:8032
19/04/02 18:29:48 INFO input.FileInputFormat: Total input paths to process : 1
19/04/02 18:29:59 INFO mapreduce.JobSubmitter: number of splits:1
19/04/02 18:30:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1554217373879_0005
19/04/02 18:30:01 INFO impl.YarnClientImpl: Submitted application application_1554217373879_0005
19/04/02 18:30:01 INFO mapreduce.Job: The url to track the job: http://dmaster:8088/proxy/application_1554217373879_0005/
19/04/02 18:30:01 INFO mapreduce.Job: Running job: job_1554217373879_0005
19/04/02 18:31:08 INFO mapreduce.Job: Job job_1554217373879_0005 running in uber mode : false
19/04/02 18:31:08 INFO mapreduce.Job: map 0% reduce 0%
19/04/02 18:32:20 INFO mapreduce.Job: map 1% reduce 0%
19/04/02 18:32:49 INFO mapreduce.Job: map 2% reduce 0%
19/04/02 18:33:20 INFO mapreduce.Job: map 3% reduce 0%
19/04/02 18:33:52 INFO mapreduce.Job: map 4% reduce 0%
19/04/02 18:34:22 INFO mapreduce.Job: map 5% reduce 0%
19/04/02 18:34:51 INFO mapreduce.Job: map 6% reduce 0%
19/04/02 18:35:21 INFO mapreduce.Job: map 7% reduce 0%
19/04/02 18:35:50 INFO mapreduce.Job: map 8% reduce 0%
19/04/02 18:36:20 INFO mapreduce.Job: map 9% reduce 0%
19/04/02 18:36:46 INFO mapreduce.Job: map 10% reduce 0%
19/04/02 18:37:16 INFO mapreduce.Job: map 11% reduce 0%
19/04/02 18:37:42 INFO mapreduce.Job: map 12% reduce 0%
19/04/02 18:38:11 INFO mapreduce.Job: map 13% reduce 0%
19/04/02 18:38:37 INFO mapreduce.Job: map 14% reduce 0%
19/04/02 18:39:08 INFO mapreduce.Job: map 15% reduce 0%
19/04/02 18:39:48 INFO mapreduce.Job: map 16% reduce 0%
19/04/02 18:41:13 INFO mapreduce.Job: map 17% reduce 0%
19/04/02 18:42:31 INFO mapreduce.Job: map 18% reduce 0%
19/04/02 18:43:57 INFO mapreduce.Job: map 19% reduce 0%
19/04/02 18:45:21 INFO mapreduce.Job: map 20% reduce 0%
19/04/02 18:46:01 INFO mapreduce.Job: map 100% reduce 0%
19/04/02 18:46:06 INFO mapreduce.Job: Job job_1554217373879_0005 completed successfully
19/04/02 18:46:07 INFO mapreduce.Job: Counters: 36
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=117516
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1073741936
HDFS: Number of bytes written=1402482688
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=882218
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=882218
Total vcore-milliseconds taken by all map tasks=882218
Total megabyte-milliseconds taken by all map tasks=903391232
Map-Reduce Framework
Map input records=22945792
Map output records=189399040
Input split bytes=112
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=17859
CPU time spent (ms)=217250
Physical memory (bytes) snapshot=189526016
Virtual memory (bytes) snapshot=2008051712
Total committed heap usage (bytes)=152043520
File Input Format Counters
Bytes Read=1073741824
File Output Format Counters
Bytes Written=1402482688
Any help is appreciated.