You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by rahulforallp <gi...@git.apache.org> on 2018/07/02 05:40:03 UTC

[GitHub] carbondata pull request #2434: [CARBONDATA-2625] Optimize the performance of...

GitHub user rahulforallp opened a pull request:

    https://github.com/apache/carbondata/pull/2434

    [CARBONDATA-2625] Optimize the performance of CarbonReader read many files

    
    REf : https://github.com/apache/carbondata/pull/2391
    
    About the issue: it's timeout and no result in 8 minutes when read more than 10 million data with 140 files, Even though increase 200000 rows for each carbon Writer and it can reduce the index files and data files when the number of rows is 13000000, but when there are more than 1 billion or more, the number of files still still many. I check the code and find read more 140 files can be optimize:
    
    In the cache.getAll, the IO is more than 140 if there are 140 carbon files, in fact, the IO are more than 70 * 140 times, it's slow and can be optimized
    
    Secondly, there are some duplicate operate in getDataMaps and can be optimized
    
    Thirdly, SDK need much time to create multiple carbonRecorderReader, it need more than 8 minutes by testing 150 files and 15million rows data when create more than 16 carbonReorederReader if the machine has 8 cores . It can be optimized
    
    By optimizing the three points,including cache.getAll, getDatamaps and create carbonRecordReader, now SDK can work for reading 150 files and 15million rows data in 8 minutes, it need about 340 seconds by testing.
    
    One case: 150 files , each file has 200000 rows, total rows is 15000000
    Finished write data time: 449.102 s
    Finished build reader time:192.596 s
    Read first row time: 192.597 s, including build reader
    Read time:341.556 s, including build reader
    
    Another case: 15 files , each file has 2000000 rows, total rows is 15000000
    Finished write data time: 286.907 s
    Finished build reader time: 134.665 s
    Read first row time: 134.666 s, including build reader
    Finished read, the count of rows is:15000000
    Read time:156.427 s, including build reader
    
    Be sure to do all of the following checklist to help us incorporate
    your contribution quickly and easily:
    
        Any interfaces changed?
        Yes, add new one for optimizing performance
    
        Any backward compatibility impacted?
        NA
    
        Document update required?
        NO
    
        Testing done
        add example for it
    
        For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
        NO


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rahulforallp/incubator-carbondata xuboPRsynch2391

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2434.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2434
    
----
commit 28a0b0f40c45967e586d7a5e703dce3cfaa48c99
Author: xubo245 <60...@...>
Date:   2018-06-21T04:25:27Z

    [CARBONDATA-2625] Optimize the performance of CarbonReader read many files
    
    optimize the build process, including cache.getAll, getDatamaps and create carbonRecordReader
    
    fix CI error
    
    add config to change the carbonreader thread number for SDKDetailQueryExecutor
    
    optimize
    
    optimize
    
    try to fix sdv error
    
    optimize
    
    optimize
    
    fix
    
    fix again
    
    optimize

commit 9d1c825768cce1ca7e5d0f0aa9eb354ef166e2c9
Author: xubo245 <xu...@...>
Date:   2018-06-30T02:40:45Z

    optimize

commit 69210f8ac7e64ed8a5c6a0c0a586e0cf8fc95812
Author: xubo245 <xu...@...>
Date:   2018-06-30T02:53:31Z

    remove unused import

commit ac3f70c081171eaab0163f6b89901117759d9fdf
Author: xubo245 <xu...@...>
Date:   2018-06-30T08:33:52Z

    optimize

commit 9306daea8be158a6cdfed2387fd94100ceca13ca
Author: rahul <ra...@...>
Date:   2018-07-02T05:37:51Z

    removed unnecessary properties

----


---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5529/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6702/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6726/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5563/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5556/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6731/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5574/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5573/



---

[GitHub] carbondata pull request #2434: [CARBONDATA-2625] Optimize the performance of...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2434#discussion_r200014195
  
    --- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
    @@ -121,6 +122,7 @@ public static CarbonReaderBuilder builder(String tablePath) {
        */
       public void close() throws IOException {
         validateReader();
    +    SDKDetailQueryExecutor.shutdownThreadPool();
    --- End diff --
    
    Static one we cannot shutdown, as another CarbonReader might be reading same or different table in same process


---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5559/



---

[GitHub] carbondata pull request #2434: [CARBONDATA-2625] Optimize the performance of...

Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp closed the pull request at:

    https://github.com/apache/carbondata/pull/2434


---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5570/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5577/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5555/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5575/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    retest sdv please


---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5556/



---

[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2434
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6729/



---