You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/10/28 02:26:10 UTC

[GitHub] [spark] fuwhu edited a comment on issue #26275: [SPARK-29615][SQL]Add a new insertInto method with byName parameter in DataFrameWriter.

fuwhu edited a comment on issue #26275: [SPARK-29615][SQL]Add a new insertInto method with byName parameter in DataFrameWriter.
URL: https://github.com/apache/spark/pull/26275#issuecomment-546766877
 
 
   > I'm not 100% sure that this feature is useful for users. Is it not enough just to verify the schemas of query output and a table before insersion? Or, any existing database-like systems support this by-name matches in insersion?
   
   I proposed this change because we encountered some relevant problem in our test environment.
   In our case, our developer inserted a data frame into hive table, the data frame has exactly same columns as the hive table, but the order is a bit different. Due to the wrong column order, spark take a non-partitioning column as partition column of hive table, and this column has huge cardinality, which lead to huge amount of files and folders created in our test environment and made our name node crash.
   
   When we write SQL query to insert data, we can hardly trigger such problem, but if we use the  DataFrameWriter.insertInto API, it may cause some confusion that developer may assume Spark matches the columns by name automatically.
   
   I just want to add a new insertInto method to make Spark user be aware of the action that Spark take for insertInto API.
   WDYT?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org