You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by QiangCai <gi...@git.apache.org> on 2018/02/01 02:42:24 UTC

[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1898#discussion_r165247013
  
    --- Diff: docs/configuration-parameters.md ---
    @@ -60,6 +60,7 @@ This section provides the details of all the configurations required for CarbonD
     | carbon.options.is.empty.data.bad.record | false | If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. | |
     | carbon.options.bad.record.path |  | Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect. | |
     | carbon.enable.vector.reader | true | This parameter increases the performance of select queries as it fetch columnar batch of size 4*1024 rows instead of fetching data row by row. | |
    +| carbon.task.distribution | merge_small_files | Setting this parameter value to *merge_small_files* will merge all the small files to a size of (128 MB). During data loading, all the small CSV files are combined to a map task to reduce the number of read task. This enhances the performance. | | 
    --- End diff --
    
    1. carbon.task.distribution is only for the query, not be used by data loading.
    Global_Sort loading will always merge small CSV files, not require this configuration.
    2. better to list all values of carbon.task.distribution
    custom, block(default), blocklet, merge_small_files


---