You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sqoop.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2016/07/05 18:51:10 UTC
[jira] [Created] (SQOOP-2981) sqoop import from jdbc:
JdbcWritableBridge.readBigDecimal() takes a ton of cpu
Ruslan Dautkhanov created SQOOP-2981:
----------------------------------------
Summary: sqoop import from jdbc: JdbcWritableBridge.readBigDecimal() takes a ton of cpu
Key: SQOOP-2981
URL: https://issues.apache.org/jira/browse/SQOOP-2981
Project: Sqoop
Issue Type: Bug
Components: codegen, connectors/oracle, sqoop2-jdbc-connector
Affects Versions: 1.4.6, 1.4.5
Environment: sqoop import from Oracle; saves as parquet file
Reporter: Ruslan Dautkhanov
Majority of time spent of sqoop import from Oracle was on converting Decimal. It was 2.5x times more than following most cpu consumer (snappy compression).
Sqoop was 100% on cpu in total, Oracle side was pretty bored.
{quote}
JvmTop 0.8.0 alpha - 11:56:45, amd64, 48 cpus, Linux 2.6.32-57, load avg 0.92
http://code.google.com/p/jvmtop
Profiling PID 25489: org.apache.hadoop.mapred.YarnChild 10.20
38.78% ( 7.68s) com.cloudera.sqoop.lib.JdbcWritableBridge.readBigDecimal()
14.27% ( 2.82s) org.xerial.snappy.SnappyNative.rawCompress()
12.67% ( 2.51s) parquet.io.api.Binary$FromStringBinary.encodeUTF8()
10.28% ( 2.04s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
4.80% ( 0.95s) ...quet.column.values.fallback.FallbackValuesWriter.writ()
3.69% ( 0.73s) com.cloudera.sqoop.lib.JdbcWritableBridge.readString()
2.51% ( 0.50s) parquet.avro.AvroWriteSupport.writeRecordFields()
2.30% ( 0.46s) parquet.column.impl.ColumnWriterV1.write()
1.90% ( 0.38s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
1.31% ( 0.26s) ...quet.column.values.rle.RunLengthBitPackingHybridEncod()
1.27% ( 0.25s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
1.22% ( 0.24s) parquet.hadoop.CodecFactory$BytesCompressor.compress()
0.65% ( 0.13s) ...quet.column.values.dictionary.DictionaryValuesWriter$()
0.64% ( 0.13s) ...quet.it.unimi.dsi.fastutil.objects.Object2IntLinkedOp()
0.64% ( 0.13s) parquet.bytes.CapacityByteArrayOutputStream.addSlab()
0.63% ( 0.12s) parquet.io.api.Binary$ByteArrayBackedBinary.getBytes()
0.62% ( 0.12s) ...quet.column.values.dictionary.DictionaryValuesWriter.()
0.58% ( 0.12s) parquet.bytes.CapacityByteArrayOutputStream.setByte()
0.49% ( 0.10s) parquet.hadoop.codec.SnappyUtil.validateBuffer()
0.44% ( 0.09s) parquet.hadoop.InternalParquetRecordWriter.write()
{quote}
DDL of the table on Oracle side:
{quote}
CREATE TABLE someschema.sometable
(
ID NUMBER NOT NULL,
psn VARCHAR2(50 BYTE),
MERCHID NUMBER,
RECENCY NUMBER,
FREQUENCY NUMBER,
SPEND NUMBER,
SPENDING NUMBER,
FREQ NUMBER
)
{quote}
Sqoop parameters:
{quote}
sqoop import \
-Dmapred.job.name="sqoop import into {out_table}" \
--connect "jdbc:oracle:thin:@{jdbc_tns}" \
--username {username} --password {password} \
--direct \
--compress --compression-codec snappy \
--as-parquetfile \
--target-dir {hdfs_dir1} \
--num-mappers {num_mappers} \
--query "SELECT * FROM someschema.sometable WHERE \$CONDITIONS"
{quote}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)