You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Peng Cheng (Jira)" <ji...@apache.org> on 2020/09/24 22:39:00 UTC
[jira] [Updated] (SPARK-32992) In OracleDialect, "RowID" SQL type
should be converted into "String" Catalyst type
[ https://issues.apache.org/jira/browse/SPARK-32992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Peng Cheng updated SPARK-32992:
-------------------------------
Description:
Most JDBC drivers use long SQL type for dataset row ID:
(in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils)
{code:java}
private def getCatalystType(
sqlType: Int,
precision: Int,
scale: Int,
signed: Boolean): DataType = {
val answer = sqlType match {
// scalastyle:off
...
case java.sql.Types.ROWID => LongType
...
case _ =>
throw new SQLException("Unrecognized SQL type " + sqlType)
// scalastyle:on
}
if (answer == null)
{ throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName) }
answer
{code}
Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted:
(in oracle.jdbc.driver.RowidAccessor, decompiled bytecode)
{code:java}
...
String getString(int var1) throws SQLException
{ return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1)); }
Object getObject(int var1) throws SQLException
{ return this.getROWID(var1); }
...
{code}
This caused an exception to be thrown when importing datasets from an Oracle DB, as reported in [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:]
{code:java}
{{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440)
at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228)
at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}}
{code}
Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added:
{code:java}
case Types.ROWID => Some(StringType)
{code}
was:
Most JDBC drivers use long SQL type for dataset row ID:
(in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils)
```scala
private def getCatalystType(
sqlType: Int,
precision: Int,
scale: Int,
signed: Boolean): DataType = {
val answer = sqlType match {
// scalastyle:off
...
{color:#de350b}case java.sql.Types.ROWID => LongType{color}
...
case _ =>
throw new SQLException("Unrecognized SQL type " + sqlType)
// scalastyle:on
}
if (answer == null) {
throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName)
}
answer
}
```
Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted:
(in oracle.jdbc.driver.RowidAccessor, decompiled bytecode)
```java
...
String getString(int var1) throws SQLException {
return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1));
}
Object getObject(int var1) throws SQLException {
return this.getROWID(var1);
}
...
```
This caused the common exception to be reported in [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:]
```
{{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440)
at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228)
at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}}
```
Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added:
```
case Types.ROWID => Some(StringType)
```
> In OracleDialect, "RowID" SQL type should be converted into "String" Catalyst type
> ----------------------------------------------------------------------------------
>
> Key: SPARK-32992
> URL: https://issues.apache.org/jira/browse/SPARK-32992
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.7, 3.1.0
> Reporter: Peng Cheng
> Priority: Minor
> Labels: jdbc, jdbc_connector
>
> Most JDBC drivers use long SQL type for dataset row ID:
>
> (in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils)
>
> {code:java}
> private def getCatalystType(
> sqlType: Int,
> precision: Int,
> scale: Int,
> signed: Boolean): DataType = {
> val answer = sqlType match {
> // scalastyle:off
> ...
> case java.sql.Types.ROWID => LongType
> ...
> case _ =>
> throw new SQLException("Unrecognized SQL type " + sqlType)
> // scalastyle:on
> }
> if (answer == null)
> { throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName) }
> answer
> {code}
>
>
> Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted:
>
> (in oracle.jdbc.driver.RowidAccessor, decompiled bytecode)
> {code:java}
> ...
> String getString(int var1) throws SQLException
> { return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1)); }
> Object getObject(int var1) throws SQLException
> { return this.getROWID(var1); }
> ...
> {code}
>
>
>
> This caused an exception to be thrown when importing datasets from an Oracle DB, as reported in [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:]
> {code:java}
>
> {{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440)
> at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228)
> at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620)
> at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
> at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}}
>
> {code}
>
> Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added:
>
> {code:java}
> case Types.ROWID => Some(StringType)
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org