You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by sgururajshetty <gi...@git.apache.org> on 2018/01/31 13:59:23 UTC

[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

GitHub user sgururajshetty opened a pull request:

    https://github.com/apache/carbondata/pull/1898

    [CARBONDATA-1880] Documentation for merging small files

    Added the documentation for merging small file for better performance.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sgururajshetty/carbondata 1880

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1898.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1898
    
----
commit 313586c0bff34405672339d9819260146ae61816
Author: sgururajshetty <sg...@...>
Date:   2018-01-31T13:55:16Z

    Documentation for small files

----


---

[GitHub] carbondata issue #1898: [CARBONDATA-1880] Documentation for merging small fi...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1898
  
    Can one of the admins verify this patch?


---

[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

Posted by sgururajshetty <gi...@git.apache.org>.
Github user sgururajshetty closed the pull request at:

    https://github.com/apache/carbondata/pull/1898


---

[GitHub] carbondata issue #1898: [CARBONDATA-1880] Documentation for merging small fi...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1898
  
    Can one of the admins verify this patch?


---

[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1898#discussion_r165247013
  
    --- Diff: docs/configuration-parameters.md ---
    @@ -60,6 +60,7 @@ This section provides the details of all the configurations required for CarbonD
     | carbon.options.is.empty.data.bad.record | false | If false, then empty ("" or '' or ,,) data will not be considered as bad record and vice versa. | |
     | carbon.options.bad.record.path |  | Specifies the HDFS path where bad records are stored. By default the value is Null. This path must to be configured by the user if bad record logger is enabled or bad record action redirect. | |
     | carbon.enable.vector.reader | true | This parameter increases the performance of select queries as it fetch columnar batch of size 4*1024 rows instead of fetching data row by row. | |
    +| carbon.task.distribution | merge_small_files | Setting this parameter value to *merge_small_files* will merge all the small files to a size of (128 MB). During data loading, all the small CSV files are combined to a map task to reduce the number of read task. This enhances the performance. | | 
    --- End diff --
    
    1. carbon.task.distribution is only for the query, not be used by data loading.
    Global_Sort loading will always merge small CSV files, not require this configuration.
    2. better to list all values of carbon.task.distribution
    custom, block(default), blocklet, merge_small_files


---

[GitHub] carbondata issue #1898: [CARBONDATA-1880] Documentation for merging small fi...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/1898
  
    Can one of the admins verify this patch?


---