You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by mengxr <gi...@git.apache.org> on 2018/06/06 22:20:55 UTC

[GitHub] spark issue #20929: [SPARK-23772][SQL] Provide an option to ignore column of...

Github user mengxr commented on the issue:

    https://github.com/apache/spark/pull/20929
  
    @maropu Thanks for updating this PR! It would be easier to maintain the logic in one place. I think it should be feasible to do everything inside `canonicalizeType` without modifying `JsonParser` or other methods in `JsonInferSchema`. The following code outlines my logic, though I didn't test it ...:
    
    ~~~scala
     /**
       * Canonicalize data types and remove StructTypes with no fields.
       * @return Some(canonicalizedType) or None if nothing left.
       */
      private def canonicalizeType(tpe: DataType, options: JSONOptions): Option[DataType] = tpe match {
        case at @ ArrayType(elementType, _) =>
          canonicalizeType(elementType, options).map(t => at.copy(elementType = t))
    
        case StructType(fields) =>
          val canonicalizedFields = fields.flatMap { f =>
            canonicalizeType(f.dataType, options).map(t => f.copy(dataType = t))
          }
          // per SPARK-8093: empty structs should be deleted
          if (canonicalizedFields.isEmpty) {
            None
          } else {
            StructType(canonicalizedFields)
          }
    
        case NullType => 
          if (options.dropFieldIfAllNull) {
            None
          else {
            Some(StringType)
          }
        }
    
        case other => Some(other)
      }  
    ~~~
    
    In the test, we should also include scenarios with nested "null" fields like `[[], null, [[]]]`.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org