You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Peng Cheng (Jira)" <ji...@apache.org> on 2020/09/24 22:29:00 UTC

[jira] [Created] (SPARK-32992) In OracleDialect, "RowID" SQL type should be converted into "String" Catalyst type

Peng Cheng created SPARK-32992:
----------------------------------

             Summary: In OracleDialect, "RowID" SQL type should be converted into "String" Catalyst type
                 Key: SPARK-32992
                 URL: https://issues.apache.org/jira/browse/SPARK-32992
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.7, 3.1.0
            Reporter: Peng Cheng


Most JDBC drivers use long SQL type for dataset row ID:

 

(in org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils)

 

```scala

private def getCatalystType(
 sqlType: Int,
 precision: Int,
 scale: Int,
 signed: Boolean): DataType = {
 val answer = sqlType match {
 // scalastyle:off
...
 {color:#de350b}case java.sql.Types.ROWID => LongType{color}

...
 case _ =>
 throw new SQLException("Unrecognized SQL type " + sqlType)
 // scalastyle:on
 }

 if (answer == null) {
 throw new SQLException("Unsupported type " + JDBCType.valueOf(sqlType).getName)
 }
 answer
}

```

 

Oracle JDBC drivers (of all versions) are rare exception, only String value can be extracted:

 

(in oracle.jdbc.driver.RowidAccessor, decompiled bytecode)

 

```java

...

String getString(int var1) throws SQLException {
 return this.isNull(var1) ? null : this.rowData.getString(this.getOffset(var1), this.getLength(var1), this.statement.connection.conversion.getCharacterSet((short)1));
}

Object getObject(int var1) throws SQLException {
 return this.getROWID(var1);
}

...

```

 

This caused the common exception to be reported in [https://stackoverflow.com/questions/52244492/spark-jdbc-dataframereader-fails-to-read-oracle-table-with-datatype-as-rowid:]

 

```
 
{{18/09/08 11:38:17 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 5.0 (TID 23, gbrdsr000002985.intranet.barcapint.com, executor 21): java.sql.SQLException: Invalid column type: getLong not implemented for class oracle.jdbc.driver.T4CRowidAccessor    at oracle.jdbc.driver.GeneratedAccessor.getLong(GeneratedAccessor.java:440)
    at oracle.jdbc.driver.GeneratedStatement.getLong(GeneratedStatement.java:228)
    at oracle.jdbc.driver.GeneratedScrollableResultSet.getLong(GeneratedScrollableResultSet.java:620)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:365)
    at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter$8.apply(JdbcUtils.scala:364)}}

```

 

Therefore, the default SQL type => Catalyst type conversion rule should be overriden in OracleDialect. Specifically, the following rule should be added:

 

```
case Types.ROWID => Some(StringType)

```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org