You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Manoj Babu <ma...@gmail.com> on 2014/03/07 12:06:17 UTC

issue with DBInputFormat

Hi,

When using DBInputFormat to unload a data from table to hdfs i have
configured 6 map tasks to execute but 0th map task alone unloading the
whole data from table and the remaining 5 tasks were running properly.
Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >=
4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >=
855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >=
1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >=
2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM (
select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >=
3422260

The query executed from Map task 0 is the problem creator is not having any
limits so it queried all the rows from that task.

The below condition
in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && *split.getStart() > 0*) {
...
...}

should be as
if (split.getLength() > 0 && *split.getStart() >= 0*) {
...
...}


By overriding the getSelectQuery i could able to overcome the issue.
Anybody faced similar issue?


Cheers!
Manoj.

RE: issue with DBInputFormat

Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
 
We faced the same with DBInputFormat.  Using DataDrivenDBInputFormat fixed the issue. 
 
 
Regards
 
Khaleel

________________________________

From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat


Hi,

When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260

The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task. 

The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && split.getStart() > 0) {

...
...}

should be as 
if (split.getLength() > 0 && split.getStart() >= 0) {

...
...}


By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?


Cheers! 
Manoj.

RE: issue with DBInputFormat

Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
 
We faced the same with DBInputFormat.  Using DataDrivenDBInputFormat fixed the issue. 
 
 
Regards
 
Khaleel

________________________________

From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat


Hi,

When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260

The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task. 

The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && split.getStart() > 0) {

...
...}

should be as 
if (split.getLength() > 0 && split.getStart() >= 0) {

...
...}


By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?


Cheers! 
Manoj.

RE: issue with DBInputFormat

Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
 
We faced the same with DBInputFormat.  Using DataDrivenDBInputFormat fixed the issue. 
 
 
Regards
 
Khaleel

________________________________

From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat


Hi,

When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260

The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task. 

The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && split.getStart() > 0) {

...
...}

should be as 
if (split.getLength() > 0 && split.getStart() >= 0) {

...
...}


By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?


Cheers! 
Manoj.

RE: issue with DBInputFormat

Posted by Khaleel Khalid <kh...@suntecgroup.com>.
Hi,
 
We faced the same with DBInputFormat.  Using DataDrivenDBInputFormat fixed the issue. 
 
 
Regards
 
Khaleel

________________________________

From: Manoj Babu [mailto:manoj444@gmail.com]
Sent: Fri 3/7/2014 4:36 PM
To: user@hadoop.apache.org
Subject: issue with DBInputFormat


Hi,

When using DBInputFormat to unload a data from table to hdfs i have configured 6 map tasks to execute but 0th map task alone unloading the whole data from table and the remaining 5 tasks were running properly. Please find my obeservtion on debugging.

Chunk size=855565

Input Splits:

For split0 the start=0 and the end=855565 and the length=855565
For split1 the start=855565 and the end=1711130 and the length=855565
For split2 the start=1711130 and the end=2566695 and the length=855565
For split3 the start=2566695 and the end=3422260 and the length=855565
For split4 the start=3422260 and the end=4277825 and the length=855565
For split5 the start=4277825 and the end=5133394 and the length=855569

Queries fired from individual map tasks based on the splits created:

Map task 0: Select query: select * from emp
Map task 1: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 4277825 + 855569 ) WHERE dbif_rno >= 4277825
Map task 2: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 855565 + 855565 ) WHERE dbif_rno >= 855565
Map task 3: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 1711130 + 855565 ) WHERE dbif_rno >= 1711130
Map task 4: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 2566695 + 855565 ) WHERE dbif_rno >= 2566695
Map task 5: Select query: SELECT * FROM (SELECT a.*,ROWNUM dbif_rno FROM ( select * from emp ) a WHERE rownum <= 3422260 + 855565 ) WHERE dbif_rno >= 3422260

The query executed from Map task 0 is the problem creator is not having any limits so it queried all the rows from that task. 

The below condition in org.apache.hadoop.mapreduce.lib.db.OracleDBRecordReader.getSelectQuery()

if (split.getLength() > 0 && split.getStart() > 0) {

...
...}

should be as 
if (split.getLength() > 0 && split.getStart() >= 0) {

...
...}


By overriding the getSelectQuery i could able to overcome the issue. Anybody faced similar issue?


Cheers! 
Manoj.