You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "qian wang (Jira)" <ji...@apache.org> on 2021/04/25 08:35:00 UTC

[jira] [Updated] (SPARK-35216) a general auto merge output small files feature for datasource api

     [ https://issues.apache.org/jira/browse/SPARK-35216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

qian wang updated SPARK-35216:
------------------------------
    Summary: a general auto merge output small files feature for datasource api  (was: a general auto merge output files feature for datasource api)

> a general auto merge output small files feature for datasource api
> ------------------------------------------------------------------
>
>                 Key: SPARK-35216
>                 URL: https://issues.apache.org/jira/browse/SPARK-35216
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 3.0.2
>            Reporter: qian wang
>            Priority: Major
>
> in most case, users write data to hive table or hdfs dir with spark sql, since as spark3.0 released, offical didn't encourge to use hive module to read/write hive table, preferredĀ  switching to datasoruce api from hive strategy rule, so as to centralize io operation with one module.
> so given a general auto merge output files ability for datasource api would resolve many users's small files problem in production, and it can bind with datasource write framwork tightly, so that the auto merge course is transparent to users, and it is capable to handle all kinds of writing method, such as writing hdfs dir/non-partitioned hive table/dynamic partition hive table
> this is my individual implemetation for the functionality, and it's stable in production environment of my company



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org