You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "SHAILENDRA SHAHANE (JIRA)" <ji...@apache.org> on 2018/06/08 13:06:00 UTC
[jira] [Commented] (SPARK-24496) CLONE - JSON data source fails to
infer floats as decimal when precision is bigger than 38 or scale is bigger
than precision.
[ https://issues.apache.org/jira/browse/SPARK-24496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506001#comment-16506001 ]
SHAILENDRA SHAHANE commented on SPARK-24496:
--------------------------------------------
This issue is still there . I tried to fetch data from MongoDB and got the following exception while converting the RDD to DF.
-----------------Code --------------
SQLContext sparkSQLContext = spark.sqlContext();
DataFrameReader dfr = spark.read()
.format("com.mongodb.spark.sql")
.option("floatAsBigDecimal", "true");
Dataset<Row> rbkp = dfr.load();
------------------ OR ------------------------
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
JavaMongoRDD<Document> rdd = MongoSpark.load(jsc);
Dataset<Row> rbkp = rdd.toDF();
--------------------
Spark version 2.3
MongoDB Version - 3.4 and 3.6
----------------Data Sample-------------
{"_id":"5b0d31f892549e10b61d962a","RSEG_MANDT":"800","RSEG_EBELN":"4500017749","RSEG_EBELP":"00020","RSEG_BELNR":"1000000001","RSEG_BUZEI":"000002","RSEG_GJAHR":"2013","RBKP_BUDAT":"2013-10-04","RSEG_MENGE":\{"$numberDecimal":"30.000"},"RSEG_LFBNR":"5000000472","RSEG_LFGJA":"2013","RSEG_LFPOS":"0002","NOT_ACCOUNT_MAINTENANCE":\{"$numberDecimal":"1.0000000000"},"RBKP_CPUTIMESTAMP":"2013-10-04T10:32:02.000Z","RBKP_WAERS":"USD","RSEG_BNKAN":\{"$numberDecimal":"0.00"},"RSEG_WRBTR":\{"$numberDecimal":"2340.00"},"RSEG_SHKZG":"S"}
> CLONE - JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision.
> -----------------------------------------------------------------------------------------------------------------------------
>
> Key: SPARK-24496
> URL: https://issues.apache.org/jira/browse/SPARK-24496
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: SHAILENDRA SHAHANE
> Assignee: Hyukjin Kwon
> Priority: Minor
> Fix For: 2.0.0
>
> Attachments: SparkJiraIssue08062018.txt
>
>
> Currently, JSON data source supports {{floatAsBigDecimal}} option, which reads floats as {{DecimalType}}.
> I noticed there are several restrictions in Spark {{DecimalType}} below:
> 1. The precision cannot be bigger than 38.
> 2. scale cannot be bigger than precision.
> However, with the option above, it reads {{BigDecimal}} which does not follow the conditions above.
> This could be observed as below:
> {code}
> def simpleFloats: RDD[String] =
> sqlContext.sparkContext.parallelize(
> """{"a": 0.01}""" ::
> """{"a": 0.02}""" :: Nil)
> val jsonDF = sqlContext.read
> .option("floatAsBigDecimal", "true")
> .json(simpleFloats)
> jsonDF.printSchema()
> {code}
> throws an exception below:
> {code}
> org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater than precision (1).;
> at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:44)
> at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144)
> at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108)
> at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59)
> at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57)
> at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249)
> at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57)
> at org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> ...
> {code}
> Since JSON data source infers {{DataType}} as {{StringType}} when it fails to infer, it might have to be inferred as {{StringType}} or maybe just simply {{DoubleType}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org