You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by ch huang <ju...@gmail.com> on 2013/12/03 09:28:09 UTC
issue about total input byte of MR job
i run the MR job,at the MR output i see
13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
but in the summary of end i see follow info ,so the HDFS read byte is
126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
File System Counters
FILE: Number of bytes read=9642910241
FILE: Number of bytes written=120327706125
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126792190158
HDFS: Number of bytes written=0
HDFS: Number of read operations=8151
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
THis is the input. please help with code example.
<Company>
<Employee>
<id>100</id>
<ename>ert</ename>
<Address>
<home>eewre</home>
<office>wefwef</office>
</address>
</employee>
</Company>
On Tue, Dec 3, 2013 at 2:11 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> It depend on your input data. E.g. your input consists of 10 files, each
> is 65M, then each file will take 2 mappers, overall it would cost 20
> mappers, but the input size is actually 650M rather than 20*64=1280M
>
>
> On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
>
>> i run the MR job,at the MR output i see
>>
>> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>>
>> because my each data block size is 64M,so total byte is 2717*64M/1024=
>> 170G
>>
>> but in the summary of end i see follow info ,so the HDFS read byte is
>> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>>
>> File System Counters
>> FILE: Number of bytes read=9642910241
>> FILE: Number of bytes written=120327706125
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=126792190158
>> HDFS: Number of bytes written=0
>> HDFS: Number of read operations=8151
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=0
>>
>
>
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
THis is the input. please help with code example.
<Company>
<Employee>
<id>100</id>
<ename>ert</ename>
<Address>
<home>eewre</home>
<office>wefwef</office>
</address>
</employee>
</Company>
On Tue, Dec 3, 2013 at 2:11 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> It depend on your input data. E.g. your input consists of 10 files, each
> is 65M, then each file will take 2 mappers, overall it would cost 20
> mappers, but the input size is actually 650M rather than 20*64=1280M
>
>
> On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
>
>> i run the MR job,at the MR output i see
>>
>> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>>
>> because my each data block size is 64M,so total byte is 2717*64M/1024=
>> 170G
>>
>> but in the summary of end i see follow info ,so the HDFS read byte is
>> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>>
>> File System Counters
>> FILE: Number of bytes read=9642910241
>> FILE: Number of bytes written=120327706125
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=126792190158
>> HDFS: Number of bytes written=0
>> HDFS: Number of read operations=8151
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=0
>>
>
>
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
THis is the input. please help with code example.
<Company>
<Employee>
<id>100</id>
<ename>ert</ename>
<Address>
<home>eewre</home>
<office>wefwef</office>
</address>
</employee>
</Company>
On Tue, Dec 3, 2013 at 2:11 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> It depend on your input data. E.g. your input consists of 10 files, each
> is 65M, then each file will take 2 mappers, overall it would cost 20
> mappers, but the input size is actually 650M rather than 20*64=1280M
>
>
> On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
>
>> i run the MR job,at the MR output i see
>>
>> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>>
>> because my each data block size is 64M,so total byte is 2717*64M/1024=
>> 170G
>>
>> but in the summary of end i see follow info ,so the HDFS read byte is
>> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>>
>> File System Counters
>> FILE: Number of bytes read=9642910241
>> FILE: Number of bytes written=120327706125
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=126792190158
>> HDFS: Number of bytes written=0
>> HDFS: Number of read operations=8151
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=0
>>
>
>
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
THis is the input. please help with code example.
<Company>
<Employee>
<id>100</id>
<ename>ert</ename>
<Address>
<home>eewre</home>
<office>wefwef</office>
</address>
</employee>
</Company>
On Tue, Dec 3, 2013 at 2:11 PM, Jeff Zhang <je...@gopivotal.com> wrote:
> It depend on your input data. E.g. your input consists of 10 files, each
> is 65M, then each file will take 2 mappers, overall it would cost 20
> mappers, but the input size is actually 650M rather than 20*64=1280M
>
>
> On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
>
>> i run the MR job,at the MR output i see
>>
>> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>>
>> because my each data block size is 64M,so total byte is 2717*64M/1024=
>> 170G
>>
>> but in the summary of end i see follow info ,so the HDFS read byte is
>> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>>
>> File System Counters
>> FILE: Number of bytes read=9642910241
>> FILE: Number of bytes written=120327706125
>> FILE: Number of read operations=0
>> FILE: Number of large read operations=0
>> FILE: Number of write operations=0
>> HDFS: Number of bytes read=126792190158
>> HDFS: Number of bytes written=0
>> HDFS: Number of read operations=8151
>> HDFS: Number of large read operations=0
>> HDFS: Number of write operations=0
>>
>
>
Re: issue about total input byte of MR job
Posted by Jeff Zhang <je...@gopivotal.com>.
It depend on your input data. E.g. your input consists of 10 files, each
is 65M, then each file will take 2 mappers, overall it would cost 20
mappers, but the input size is actually 650M rather than 20*64=1280M
On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>
RE: issue about total input byte of MR job
Posted by nishan shetty <ni...@huawei.com>.
Hi Ch Huang
Are you sure your input data size is 170G?
Because it is not necessary that 2717 splits will have170G(as per your calculation 2717*64M/1024). Each file will be considered as separate split which may be small.
Please cross check the input size using CLI
Regards
Nishan
From: ch huang [mailto:justlooks@gmail.com]
Sent: 03 December 2013 01:58 PM
To: user@hadoop.apache.org
Subject: issue about total input byte of MR job
i run the MR job,at the MR output i see
13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
but in the summary of end i see follow info ,so the HDFS read byte is 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
File System Counters
FILE: Number of bytes read=9642910241
FILE: Number of bytes written=120327706125
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126792190158
HDFS: Number of bytes written=0
HDFS: Number of read operations=8151
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
Hi,
How to process a xml file via mapreduce and load them into hbase table.
please suggest with sample code.
Thanks in advance.
On Tue, Dec 3, 2013 at 1:58 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>
Re: issue about total input byte of MR job
Posted by Jeff Zhang <je...@gopivotal.com>.
It depend on your input data. E.g. your input consists of 10 files, each
is 65M, then each file will take 2 mappers, overall it would cost 20
mappers, but the input size is actually 650M rather than 20*64=1280M
On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>
Re: issue about total input byte of MR job
Posted by Jeff Zhang <je...@gopivotal.com>.
It depend on your input data. E.g. your input consists of 10 files, each
is 65M, then each file will take 2 mappers, overall it would cost 20
mappers, but the input size is actually 650M rather than 20*64=1280M
On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>
RE: issue about total input byte of MR job
Posted by nishan shetty <ni...@huawei.com>.
Hi Ch Huang
Are you sure your input data size is 170G?
Because it is not necessary that 2717 splits will have170G(as per your calculation 2717*64M/1024). Each file will be considered as separate split which may be small.
Please cross check the input size using CLI
Regards
Nishan
From: ch huang [mailto:justlooks@gmail.com]
Sent: 03 December 2013 01:58 PM
To: user@hadoop.apache.org
Subject: issue about total input byte of MR job
i run the MR job,at the MR output i see
13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
but in the summary of end i see follow info ,so the HDFS read byte is 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
File System Counters
FILE: Number of bytes read=9642910241
FILE: Number of bytes written=120327706125
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126792190158
HDFS: Number of bytes written=0
HDFS: Number of read operations=8151
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Re: issue about total input byte of MR job
Posted by Jeff Zhang <je...@gopivotal.com>.
It depend on your input data. E.g. your input consists of 10 files, each
is 65M, then each file will take 2 mappers, overall it would cost 20
mappers, but the input size is actually 650M rather than 20*64=1280M
On Tue, Dec 3, 2013 at 4:28 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
Hi,
How to process a xml file via mapreduce and load them into hbase table.
please suggest with sample code.
Thanks in advance.
On Tue, Dec 3, 2013 at 1:58 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
Hi,
How to process a xml file via mapreduce and load them into hbase table.
please suggest with sample code.
Thanks in advance.
On Tue, Dec 3, 2013 at 1:58 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>
RE: issue about total input byte of MR job
Posted by nishan shetty <ni...@huawei.com>.
Hi Ch Huang
Are you sure your input data size is 170G?
Because it is not necessary that 2717 splits will have170G(as per your calculation 2717*64M/1024). Each file will be considered as separate split which may be small.
Please cross check the input size using CLI
Regards
Nishan
From: ch huang [mailto:justlooks@gmail.com]
Sent: 03 December 2013 01:58 PM
To: user@hadoop.apache.org
Subject: issue about total input byte of MR job
i run the MR job,at the MR output i see
13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
but in the summary of end i see follow info ,so the HDFS read byte is 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
File System Counters
FILE: Number of bytes read=9642910241
FILE: Number of bytes written=120327706125
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126792190158
HDFS: Number of bytes written=0
HDFS: Number of read operations=8151
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
RE: issue about total input byte of MR job
Posted by nishan shetty <ni...@huawei.com>.
Hi Ch Huang
Are you sure your input data size is 170G?
Because it is not necessary that 2717 splits will have170G(as per your calculation 2717*64M/1024). Each file will be considered as separate split which may be small.
Please cross check the input size using CLI
Regards
Nishan
From: ch huang [mailto:justlooks@gmail.com]
Sent: 03 December 2013 01:58 PM
To: user@hadoop.apache.org
Subject: issue about total input byte of MR job
i run the MR job,at the MR output i see
13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
but in the summary of end i see follow info ,so the HDFS read byte is 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
File System Counters
FILE: Number of bytes read=9642910241
FILE: Number of bytes written=120327706125
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=126792190158
HDFS: Number of bytes written=0
HDFS: Number of read operations=8151
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Re: issue about total input byte of MR job
Posted by Ranjini Rathinam <ra...@gmail.com>.
Hi,
How to process a xml file via mapreduce and load them into hbase table.
please suggest with sample code.
Thanks in advance.
On Tue, Dec 3, 2013 at 1:58 PM, ch huang <ju...@gmail.com> wrote:
> i run the MR job,at the MR output i see
>
> 13/12/03 14:02:28 INFO mapreduce.JobSubmitter: number of splits:2717
>
> because my each data block size is 64M,so total byte is 2717*64M/1024= 170G
>
> but in the summary of end i see follow info ,so the HDFS read byte is
> 126792190158/1024/1024/1024 = 118G ,the two number is not very close ,why?
>
> File System Counters
> FILE: Number of bytes read=9642910241
> FILE: Number of bytes written=120327706125
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=126792190158
> HDFS: Number of bytes written=0
> HDFS: Number of read operations=8151
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=0
>