You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "SHAILENDRA SHAHANE (JIRA)" <ji...@apache.org> on 2018/06/08 12:52:00 UTC

[jira] [Created] (SPARK-24496) CLONE - JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision.

SHAILENDRA SHAHANE created SPARK-24496:
------------------------------------------

             Summary: CLONE - JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision.
                 Key: SPARK-24496
                 URL: https://issues.apache.org/jira/browse/SPARK-24496
             Project: Spark
          Issue Type: Bug
          Components: SQL
            Reporter: SHAILENDRA SHAHANE
            Assignee: Hyukjin Kwon
             Fix For: 2.0.0


Currently, JSON data source supports {{floatAsBigDecimal}} option, which reads floats as {{DecimalType}}.

I noticed there are several restrictions in Spark {{DecimalType}} below:

1. The precision cannot be bigger than 38.
2. scale cannot be bigger than precision. 

However, with the option above, it reads {{BigDecimal}} which does not follow the conditions above.

This could be observed as below:

{code}
def simpleFloats: RDD[String] =
  sqlContext.sparkContext.parallelize(
    """{"a": 0.01}""" ::
    """{"a": 0.02}""" :: Nil)

val jsonDF = sqlContext.read
  .option("floatAsBigDecimal", "true")
  .json(simpleFloats)
jsonDF.printSchema()
{code}

throws an exception below:

{code}
org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater than precision (1).;
	at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:44)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57)
	at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57)
	at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55)
	at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
	at scala.collection.Iterator$class.foreach(Iterator.scala:742)
...
{code}

Since JSON data source infers {{DataType}} as {{StringType}} when it fails to infer, it might have to be inferred as {{StringType}} or maybe just simply {{DoubleType}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org