You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by rahulforallp <gi...@git.apache.org> on 2018/07/02 05:40:03 UTC
[GitHub] carbondata pull request #2434: [CARBONDATA-2625] Optimize the performance of...
GitHub user rahulforallp opened a pull request:
https://github.com/apache/carbondata/pull/2434
[CARBONDATA-2625] Optimize the performance of CarbonReader read many files
REf : https://github.com/apache/carbondata/pull/2391
About the issue: it's timeout and no result in 8 minutes when read more than 10 million data with 140 files, Even though increase 200000 rows for each carbon Writer and it can reduce the index files and data files when the number of rows is 13000000, but when there are more than 1 billion or more, the number of files still still many. I check the code and find read more 140 files can be optimize:
In the cache.getAll, the IO is more than 140 if there are 140 carbon files, in fact, the IO are more than 70 * 140 times, it's slow and can be optimized
Secondly, there are some duplicate operate in getDataMaps and can be optimized
Thirdly, SDK need much time to create multiple carbonRecorderReader, it need more than 8 minutes by testing 150 files and 15million rows data when create more than 16 carbonReorederReader if the machine has 8 cores . It can be optimized
By optimizing the three points,including cache.getAll, getDatamaps and create carbonRecordReader, now SDK can work for reading 150 files and 15million rows data in 8 minutes, it need about 340 seconds by testing.
One case: 150 files , each file has 200000 rows, total rows is 15000000
Finished write data time: 449.102 s
Finished build reader time:192.596 s
Read first row time: 192.597 s, including build reader
Read time:341.556 s, including build reader
Another case: 15 files , each file has 2000000 rows, total rows is 15000000
Finished write data time: 286.907 s
Finished build reader time: 134.665 s
Read first row time: 134.666 s, including build reader
Finished read, the count of rows is:15000000
Read time:156.427 s, including build reader
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
Any interfaces changed?
Yes, add new one for optimizing performance
Any backward compatibility impacted?
NA
Document update required?
NO
Testing done
add example for it
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NO
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/rahulforallp/incubator-carbondata xuboPRsynch2391
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2434.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2434
----
commit 28a0b0f40c45967e586d7a5e703dce3cfaa48c99
Author: xubo245 <60...@...>
Date: 2018-06-21T04:25:27Z
[CARBONDATA-2625] Optimize the performance of CarbonReader read many files
optimize the build process, including cache.getAll, getDatamaps and create carbonRecordReader
fix CI error
add config to change the carbonreader thread number for SDKDetailQueryExecutor
optimize
optimize
try to fix sdv error
optimize
optimize
fix
fix again
optimize
commit 9d1c825768cce1ca7e5d0f0aa9eb354ef166e2c9
Author: xubo245 <xu...@...>
Date: 2018-06-30T02:40:45Z
optimize
commit 69210f8ac7e64ed8a5c6a0c0a586e0cf8fc95812
Author: xubo245 <xu...@...>
Date: 2018-06-30T02:53:31Z
remove unused import
commit ac3f70c081171eaab0163f6b89901117759d9fdf
Author: xubo245 <xu...@...>
Date: 2018-06-30T08:33:52Z
optimize
commit 9306daea8be158a6cdfed2387fd94100ceca13ca
Author: rahul <ra...@...>
Date: 2018-07-02T05:37:51Z
removed unnecessary properties
----
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5529/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6702/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6726/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5563/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2434
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5556/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6731/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2434
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5574/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2434
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5573/
---
[GitHub] carbondata pull request #2434: [CARBONDATA-2625] Optimize the performance of...
Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2434#discussion_r200014195
--- Diff: store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReader.java ---
@@ -121,6 +122,7 @@ public static CarbonReaderBuilder builder(String tablePath) {
*/
public void close() throws IOException {
validateReader();
+ SDKDetailQueryExecutor.shutdownThreadPool();
--- End diff --
Static one we cannot shutdown, as another CarbonReader might be reading same or different table in same process
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5559/
---
[GitHub] carbondata pull request #2434: [CARBONDATA-2625] Optimize the performance of...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp closed the pull request at:
https://github.com/apache/carbondata/pull/2434
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2434
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5570/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2434
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5577/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5555/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2434
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5575/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by rahulforallp <gi...@git.apache.org>.
Github user rahulforallp commented on the issue:
https://github.com/apache/carbondata/pull/2434
retest sdv please
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5556/
---
[GitHub] carbondata issue #2434: [CARBONDATA-2625] Optimize the performance of Carbon...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2434
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6729/
---