You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Manoj Babu <ma...@gmail.com> on 2014/03/07 12:06:17 UTC
issue with DBInputFormat
Hi,
When using DBInputFormat to unload a data from table to hdfs i have
configured 6 map tasks to execute but 0th map task alone unloading the
whole data from table and the remaining 5 tasks were running properly.
Please find my obeservtion on debugging.
Chunk size=855565
Input Splits:
For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569
Queries fired from individual map tasks based on the splits created:
Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >=
4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >=
855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >=
1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >=
2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >=
3422260
The query executed from Map task 0 is the problem creator is not having any
limits so it queried all the rows from that task.
The below condition
in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()
if (split.getLength() > 0 && *split.getStart() > 0*) {
...
...}
should be as
if (split.getLength() > 0 && *split.getStart() >= 0*) {
...
...}
By overriding the getSelectQuery i could able to overcome the issue.
Anybody faced similar issue?
Cheers!
Manoj.
RE: issue with DBInputFormat
Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
We faced the same with DBInputFormat. Using DataDrivenDBInputFormat fixed the issue.
Regards
Khaleel
________________________________
From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat
Hi,
When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.
Chunk size=855565
Input Splits:
For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569
Queries fired from individual map tasks based on the splits created:
Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260
The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task.
The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()
if (split.getLength() > 0 && split.getStart() > 0) {
...
...}
should be as
if (split.getLength() > 0 && split.getStart() >= 0) {
...
...}
By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?
Cheers!
Manoj.
RE: issue with DBInputFormat
Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
We faced the same with DBInputFormat. Using DataDrivenDBInputFormat fixed the issue.
Regards
Khaleel
________________________________
From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat
Hi,
When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.
Chunk size=855565
Input Splits:
For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569
Queries fired from individual map tasks based on the splits created:
Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260
The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task.
The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()
if (split.getLength() > 0 && split.getStart() > 0) {
...
...}
should be as
if (split.getLength() > 0 && split.getStart() >= 0) {
...
...}
By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?
Cheers!
Manoj.
RE: issue with DBInputFormat
Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
We faced the same with DBInputFormat. Using DataDrivenDBInputFormat fixed the issue.
Regards
Khaleel
________________________________
From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat
Hi,
When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.
Chunk size=855565
Input Splits:
For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569
Queries fired from individual map tasks based on the splits created:
Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260
The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task.
The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()
if (split.getLength() > 0 && split.getStart() > 0) {
...
...}
should be as
if (split.getLength() > 0 && split.getStart() >= 0) {
...
...}
By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?
Cheers!
Manoj.
RE: issue with DBInputFormat
Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
We faced the same with DBInputFormat. Using DataDrivenDBInputFormat fixed the issue.
Regards
Khaleel
________________________________
From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat
Hi,
When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.
Chunk size=855565
Input Splits:
For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569
Queries fired from individual map tasks based on the splits created:
Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260
The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task.
The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()
if (split.getLength() > 0 && split.getStart() > 0) {
...
...}
should be as
if (split.getLength() > 0 && split.getStart() >= 0) {
...
...}
By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?
Cheers!
Manoj.