You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by NamanRastogi <gi...@git.apache.org> on 2018/11/22 10:50:53 UTC
[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...
GitHub user NamanRastogi opened a pull request:
https://github.com/apache/carbondata/pull/2942
[CARBONDATA-3121] Improvement of CarbonReader build time
CarbonReader builder is taking huge time.
**Reason**
Initialization of ChunkRowIterator is triggring actual I/O operation, and thus huge build time.
**Solution**
remove CarbonIterator.hasNext() from build.
- [x] Any interfaces changed?
No
- [x] Any backward compatibility impacted?
No
- [x] Document update required?
No
- [x] Testing done
Yes
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NamanRastogi/carbondata build_improv
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2942.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2942
----
commit 721b9bb6a4d408ba6658ba55ff1fe431f4e2523a
Author: Naman Rastogi <na...@...>
Date: 2018-11-22T08:27:50Z
Improvement of CarbonRecord build time
----
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1721/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by NamanRastogi <gi...@git.apache.org>.
Github user NamanRastogi commented on the issue:
https://github.com/apache/carbondata/pull/2942
retest this please
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1751/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9781/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2942
retest this please
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:
https://github.com/apache/carbondata/pull/2942
LGTM
---
[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...
Posted by NamanRastogi <gi...@git.apache.org>.
Github user NamanRastogi commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235901284
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
@@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
* @return {@code true} if the iteration has more elements
*/
@Override public boolean hasNext() {
- if (null != currentChunk) {
- if ((currentChunk.hasNext())) {
- return true;
- } else if (!currentChunk.hasNext()) {
- while (iterator.hasNext()) {
- currentChunk = iterator.next();
- if (currentChunk != null && currentChunk.hasNext()) {
- return true;
- }
- }
- }
+ if (currentChunk != null && currentChunk.hasNext()) {
--- End diff --
Done.
---
[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...
Posted by NamanRastogi <gi...@git.apache.org>.
Github user NamanRastogi commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235892844
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
@@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
* @return {@code true} if the iteration has more elements
*/
@Override public boolean hasNext() {
- if (null != currentChunk) {
- if ((currentChunk.hasNext())) {
- return true;
- } else if (!currentChunk.hasNext()) {
- while (iterator.hasNext()) {
- currentChunk = iterator.next();
- if (currentChunk != null && currentChunk.hasNext()) {
- return true;
- }
- }
- }
+ if (currentChunk != null && currentChunk.hasNext()) {
--- End diff --
The following tables are the build times of CarbonReader. The exact code is:
```scala
import org.apache.carbondata.sdk.file._
val startTime = System.currentTimeMillis
val reader = CarbonReader
.builder("hdfs://hacluster/carbonfiles", "t1")
.withBatch(50000)
.build
println(s"Time to build: ${System.currentTimeMillis - startTime}")
reader.close
```
**Before**
| DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 |
| -- | -- | -- |
| With Vector Reader | 1870 ms | 1867 ms |
| Without Vector Reader | 12961 ms | 28539 ms |
**After**
| DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 |
| -- | -- | -- |
| With Vector Reader | 1238 ms | 1279 ms |
| Without Vector Reader | 1878 ms | 1913 ms |
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1733/
---
[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...
Posted by xubo245 <gi...@git.apache.org>.
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235835735
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
@@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
* @return {@code true} if the iteration has more elements
*/
@Override public boolean hasNext() {
- if (null != currentChunk) {
- if ((currentChunk.hasNext())) {
- return true;
- } else if (!currentChunk.hasNext()) {
- while (iterator.hasNext()) {
- currentChunk = iterator.next();
- if (currentChunk != null && currentChunk.hasNext()) {
- return true;
- }
- }
- }
+ if (currentChunk != null && currentChunk.hasNext()) {
--- End diff --
Have you tested/compare the performance before/after this change?
---
[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/2942
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1511/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1750/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1540/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9798/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9799/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1539/
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9769/
---
[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...
Posted by xubo245 <gi...@git.apache.org>.
Github user xubo245 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2942#discussion_r235897415
--- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
@@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
* @return {@code true} if the iteration has more elements
*/
@Override public boolean hasNext() {
- if (null != currentChunk) {
- if ((currentChunk.hasNext())) {
- return true;
- } else if (!currentChunk.hasNext()) {
- while (iterator.hasNext()) {
- currentChunk = iterator.next();
- if (currentChunk != null && currentChunk.hasNext()) {
- return true;
- }
- }
- }
+ if (currentChunk != null && currentChunk.hasNext()) {
--- End diff --
nice, can you put this in content of this PR?
---
[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2942
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1523/
---