You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ravipesala <gi...@git.apache.org> on 2018/01/18 04:46:20 UTC

[GitHub] carbondata issue #1825: [CARBONDATA-2032][DataLoad] directly write carbon da...

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/1825
  
    @xuchuanyin There is a reason why we do copy instead of directly writing to HDFS.
    1. We make sure that one complete carbondata file goes to one HDFS block only, while copying it to HDFS from local disk we specify the block size. Other wise it impacts query performance a lot.
    2. Remove the overhead of writing to HDFS directly (it internally writes to replication as well) , so start copying in a different thread to avoid blocking of main loading flow.


---