You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2018/06/06 22:20:55 UTC
[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...
Github user mengxr commented on the issue:
https://github.com/apache/spark/pull/20929
@maropu Thanks for updating this PR! It would be easier to maintain the logic in one place. I think it should be feasible to do everything inside `canonicalizeType` without modifying `JsonParser` or other methods in `JsonInferSchema`. The following code outlines my logic, though I didn't test it ...:
~~~scala
/**
* Canonicalize data types and remove StructTypes with no fields.
* @return Some(canonicalizedType) or None if nothing left.
*/
private def canonicalizeType(tpe: DataType, options: JSONOptions): Option[DataType] = tpe match {
case at @ ArrayType(elementType, _) =>
canonicalizeType(elementType, options).map(t => at.copy(elementType = t))
case StructType(fields) =>
val canonicalizedFields = fields.flatMap { f =>
canonicalizeType(f.dataType, options).map(t => f.copy(dataType = t))
}
// per SPARK-8093: empty structs should be deleted
if (canonicalizedFields.isEmpty) {
None
} else {
StructType(canonicalizedFields)
}
case NullType =>
if (options.dropFieldIfAllNull) {
None
else {
Some(StringType)
}
}
case other => Some(other)
}
~~~
In the test, we should also include scenarios with nested "null" fields like `[[], null, [[]]]`.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org