You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by bdolbeare <gi...@git.apache.org> on 2015/10/05 19:00:23 UTC
[GitHub] spark pull request: [SPARK-10648] Proposed bug fix when oracle ret...

Github user bdolbeare commented on the pull request:

    https://github.com/apache/spark/pull/8780#issuecomment-145598968
  
    The problem with Oracle is that you can define numbers without providing precision or scale:
    
    	column_name NUMBER  (this is the only case that doesn't work very well for Oracle support)
    		has a precision of 0 and scale of -127 in JDBC ResultSetMetaData
    		"If a precision is not specified, the column stores values as given."  http://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#i16209		
    		
    	column_name NUMBER(10) 
    		has a precision of 10 and scale of 0 in JDBC ResultSetMetaData
    		
    	column_name NUMBER(10,2) 
    		has a precision of 10 and scale of 2 in JDBC ResultSetMetaData	
    
    I think the best solution is to handle this in a OracleDialect since this is a quirk of Oracle.  I've done that for my own code but it would be nice to have two changes in Spark:
    
    1)  Access to more of the metadata fields (e.g. scale) in the dialect.getCatalystType call (currently the precision is provided but the scale is not)
    2)  Change line 406 in JDBCRDD to support creating a Decimal without a predefined precision/scale.  It seems that this would work in cases where there is a consistent
    precision/scale for a field and also this Oracle nuance where the precision/scale differ per row.
    
    
    For now, this is what I've done in my own OracleDialect:
    
    	object OracleDialect extends JdbcDialect {
    	  override def getCatalystType(sqlType: Int, typeName: String, size: Int, md: MetadataBuilder): Option[DataType] = {
    		// Handle NUMBER fields that have no precision/scale in special way because JDBC ResultSetMetaData converts this to 0 procision and -127 scale
    		if (sqlType == Types.NUMERIC && size == 0) {
    		  // This is sub-optimal as we have to pick a precision/scale in advance whereas the data in Oracle is allowed 
    		  //  to have different precision/scale for each value.  This conversion works in our domain for now though we 
    		  //  need a more durable solution.  Look into changing JDBCRDD (line 406):
    		  //    FROM:  mutableRow.update(i, Decimal(decimalVal, p, s))
    		  //    TO:  mutableRow.update(i, Decimal(decimalVal))
    		  Some(DecimalType(DecimalType.MAX_PRECISION, 10))
    		} // Handle Timestamp with timezone (for now we are just converting this to a string with default format)
    		else if (sqlType == -101) {
    		  Some(StringType)
    		} else None
    	  }
    	}



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org