You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by Ruslan Dautkhanov <da...@gmail.com> on 2016/05/03 07:09:02 UTC

sqoop performance gets 20x slower on very wide datasets

https://issues.apache.org/jira/browse/SQOOP-2920

Has anybody experienced this problem before?
Is any known workaround?

We sqoop export from datalake to Oracle quite often.
Every time we sqoop "narrow" datasets, Oracle always have scalability
issues (3-node all-flash Oracle RAC) normally can't keep up with more than
45-55 sqoop mappers. Map-reduce framework shows sqoop mappers are not so
loaded.

On wide datasets, this picture is quite opposite. Oracle shows 95% of
sessions are bored and waiting for new INSERTs. Even when we go over
hundred of mappers. Sqoop has serious scalability issues on very wide
datasets. (Our company normally has very wide datasets)

For example, on the last sqoop export:
Started ~2.5 hours ago and 95 mappers already accumulated
CPU time spent (ms) 1,065,858,760
(looking at this metric through map-reduce framework stats)

1 million seconds of CPU time.

Or 11219.57 per mapper. Which is roughly 3.11 hours of CPU time per mapper.
So they are 100% on cpu.


-- 
Ruslan Dautkhanov

Re: sqoop performance gets 20x slower on very wide datasets

Posted by Ruslan Dautkhanov <da...@gmail.com>.
Ran JvmTop profiler for a couple of minutes on one of the sqoop mappers
while the job is still running:

http://code.google.com/p/jvmtop

Profiling PID 7246: org.apache.hadoop.mapred.YarnChild 10.20

92.92% ( 127.49s) unified_dim.setField0()
1.98% ( 2.71s) parquet.io.RecordReaderImplementation.read()
1.46% ( 2.00s) unified_dim.getFieldMap0()
1.33% ( 1.83s) unified_dim.setField()
0.58% ( 0.80s) parquet.column.impl.ColumnReaderImpl.readValue()
0.49% ( 0.67s) unified_dim.getFieldMap1()
0.37% ( 0.51s) ...quet.column.values.bitpacking.ByteBitPackingValuesRea()
0.34% ( 0.46s) com.cloudera.sqoop.lib.JdbcWritableBridge.writeString()
0.19% ( 0.26s) ...quet.column.impl.ColumnReaderImpl.writeCurrentValueTo()
0.09% ( 0.12s) ....cloudera.sqoop.lib.JdbcWritableBridge.writeBigDecima()
0.09% ( 0.12s) unified_dim.write1()
0.09% ( 0.12s) ...quet.column.values.rle.RunLengthBitPackingHybridDecod()
0.08% ( 0.10s) parquet.bytes.BytesUtils.readUnsignedVarInt()



unified_dim in the above profiling output is name of the target table in
oracle.
Looks like .setField0() method is the culprit.

Not sure which setField0() is that, but found this code generation that
might not be as efficient for wider datasets-
https://github.com/anthonycorbacho/sqoop/blob/master/src/java/org/apache/sqoop/orm/ClassWriter.java#L755

If I understand correctly for 700+ columns the generated code will have
700+ "if" statements ..

Perhaps that is it, I didn't find any other setField() methods there.

Anyone could please have a look into this?


-- 
Ruslan Dautkhanov

On Mon, May 2, 2016 at 11:09 PM, Ruslan Dautkhanov <da...@gmail.com>
wrote:

> https://issues.apache.org/jira/browse/SQOOP-2920
>
> Has anybody experienced this problem before?
> Is any known workaround?
>
> We sqoop export from datalake to Oracle quite often.
> Every time we sqoop "narrow" datasets, Oracle always have scalability
> issues (3-node all-flash Oracle RAC) normally can't keep up with more than
> 45-55 sqoop mappers. Map-reduce framework shows sqoop mappers are not so
> loaded.
>
> On wide datasets, this picture is quite opposite. Oracle shows 95% of
> sessions are bored and waiting for new INSERTs. Even when we go over
> hundred of mappers. Sqoop has serious scalability issues on very wide
> datasets. (Our company normally has very wide datasets)
>
> For example, on the last sqoop export:
> Started ~2.5 hours ago and 95 mappers already accumulated
> CPU time spent (ms) 1,065,858,760
> (looking at this metric through map-reduce framework stats)
>
> 1 million seconds of CPU time.
>
> Or 11219.57 per mapper. Which is roughly 3.11 hours of CPU time per
> mapper.
> So they are 100% on cpu.
>
>
> --
> Ruslan Dautkhanov
>