You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2022/01/06 19:38:00 UTC
[jira] [Updated] (SPARK-37176) JsonSource's infer should have the same exception handle logic as JacksonParser's parse logic
[ https://issues.apache.org/jira/browse/SPARK-37176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dongjoon Hyun updated SPARK-37176:
----------------------------------
Affects Version/s: 3.3.0
(was: 3.2.0)
(was: 3.1.2)
(was: 3.0.3)
> JsonSource's infer should have the same exception handle logic as JacksonParser's parse logic
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-37176
> URL: https://issues.apache.org/jira/browse/SPARK-37176
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 3.3.0
> Reporter: Xianjin YE
> Assignee: Xianjin YE
> Priority: Minor
> Fix For: 3.3.0
>
>
> JacksonParser's exception handle logic is different with org.apache.spark.sql.catalyst.json.JsonInferSchema#infer logic, the different can be saw as below:
> {code:java}
> // code JacksonParser's parse
> try {
> Utils.tryWithResource(createParser(factory, record)) { parser =>
> // a null first token is equivalent to testing for input.trim.isEmpty
> // but it works on any token stream and not just strings
> parser.nextToken() match {
> case null => None
> case _ => rootConverter.apply(parser) match {
> case null => throw QueryExecutionErrors.rootConverterReturnNullError()
> case rows => rows.toSeq
> }
> }
> }
> } catch {
> case e: SparkUpgradeException => throw e
> case e @ (_: RuntimeException | _: JsonProcessingException | _: MalformedInputException) =>
> // JSON parser currently doesn't support partial results for corrupted records.
> // For such records, all fields other than the field configured by
> // `columnNameOfCorruptRecord` are set to `null`.
> throw BadRecordException(() => recordLiteral(record), () => None, e)
> case e: CharConversionException if options.encoding.isEmpty =>
> val msg =
> """JSON parser cannot handle a character in its input.
> |Specifying encoding as an input option explicitly might help to resolve the issue.
> |""".stripMargin + e.getMessage
> val wrappedCharException = new CharConversionException(msg)
> wrappedCharException.initCause(e)
> throw BadRecordException(() => recordLiteral(record), () => None, wrappedCharException)
> case PartialResultException(row, cause) =>
> throw BadRecordException(
> record = () => recordLiteral(record),
> partialResult = () => Some(row),
> cause)
> }
> {code}
> v.s.
> {code:java}
> // JsonInferSchema's infer logic
> val mergedTypesFromPartitions = json.mapPartitions { iter =>
> val factory = options.buildJsonFactory()
> iter.flatMap { row =>
> try {
> Utils.tryWithResource(createParser(factory, row)) { parser =>
> parser.nextToken()
> Some(inferField(parser))
> }
> } catch {
> case e @ (_: RuntimeException | _: JsonProcessingException) => parseMode match {
> case PermissiveMode =>
> Some(StructType(Seq(StructField(columnNameOfCorruptRecord, StringType))))
> case DropMalformedMode =>
> None
> case FailFastMode =>
> throw QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
> }
> }
> }.reduceOption(typeMerger).toIterator
> }
> {code}
> They should have the same exception handle logic, otherwise it may confuse user because of the inconsistency.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org