You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/11/01 08:18:00 UTC
[jira] [Commented] (SPARK-37176) JsonSource's infer should have the
same exception handle logic as JacksonParser's parse logic
[ https://issues.apache.org/jira/browse/SPARK-37176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17436675#comment-17436675 ]
Apache Spark commented on SPARK-37176:
--------------------------------------
User 'advancedxy' has created a pull request for this issue:
https://github.com/apache/spark/pull/34455
> JsonSource's infer should have the same exception handle logic as JacksonParser's parse logic
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-37176
> URL: https://issues.apache.org/jira/browse/SPARK-37176
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.0.3, 3.1.2, 3.2.0
> Reporter: Xianjin YE
> Priority: Minor
>
> JacksonParser's exception handle logic is different with org.apache.spark.sql.catalyst.json.JsonInferSchema#infer logic, the different can be saw as below:
> {code:java}
> // code JacksonParser's parse
> try {
> Utils.tryWithResource(createParser(factory, record)) { parser =>
> // a null first token is equivalent to testing for input.trim.isEmpty
> // but it works on any token stream and not just strings
> parser.nextToken() match {
> case null => None
> case _ => rootConverter.apply(parser) match {
> case null => throw QueryExecutionErrors.rootConverterReturnNullError()
> case rows => rows.toSeq
> }
> }
> }
> } catch {
> case e: SparkUpgradeException => throw e
> case e @ (_: RuntimeException | _: JsonProcessingException | _: MalformedInputException) =>
> // JSON parser currently doesn't support partial results for corrupted records.
> // For such records, all fields other than the field configured by
> // `columnNameOfCorruptRecord` are set to `null`.
> throw BadRecordException(() => recordLiteral(record), () => None, e)
> case e: CharConversionException if options.encoding.isEmpty =>
> val msg =
> """JSON parser cannot handle a character in its input.
> |Specifying encoding as an input option explicitly might help to resolve the issue.
> |""".stripMargin + e.getMessage
> val wrappedCharException = new CharConversionException(msg)
> wrappedCharException.initCause(e)
> throw BadRecordException(() => recordLiteral(record), () => None, wrappedCharException)
> case PartialResultException(row, cause) =>
> throw BadRecordException(
> record = () => recordLiteral(record),
> partialResult = () => Some(row),
> cause)
> }
> {code}
> v.s.
> {code:java}
> // JsonInferSchema's infer logic
> val mergedTypesFromPartitions = json.mapPartitions { iter =>
> val factory = options.buildJsonFactory()
> iter.flatMap { row =>
> try {
> Utils.tryWithResource(createParser(factory, row)) { parser =>
> parser.nextToken()
> Some(inferField(parser))
> }
> } catch {
> case e @ (_: RuntimeException | _: JsonProcessingException) => parseMode match {
> case PermissiveMode =>
> Some(StructType(Seq(StructField(columnNameOfCorruptRecord, StringType))))
> case DropMalformedMode =>
> None
> case FailFastMode =>
> throw QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
> }
> }
> }.reduceOption(typeMerger).toIterator
> }
> {code}
> They should have the same exception handle logic, otherwise it may confuse user because of the inconsistency.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org