You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by NamanRastogi <gi...@git.apache.org> on 2018/11/22 10:50:53 UTC

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

GitHub user NamanRastogi opened a pull request:

    https://github.com/apache/carbondata/pull/2942

    [CARBONDATA-3121] Improvement of CarbonReader build time

    CarbonReader builder is taking huge time.
    
    **Reason**
    Initialization of ChunkRowIterator is triggring actual I/O operation, and thus huge build time.
    
    **Solution**
    remove CarbonIterator.hasNext() from build.
    
    
     - [x] Any interfaces changed?
         No 
     - [x] Any backward compatibility impacted?
           No
     - [x] Document update required?
            No
     - [x] Testing done
          Yes
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NamanRastogi/carbondata build_improv

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2942.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2942
    
----
commit 721b9bb6a4d408ba6658ba55ff1fe431f4e2523a
Author: Naman Rastogi <na...@...>
Date:   2018-11-22T08:27:50Z

    Improvement of CarbonRecord build time

----


---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1721/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by NamanRastogi <gi...@git.apache.org>.
Github user NamanRastogi commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    retest this please


---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1751/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9781/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    retest this please


---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    LGTM


---

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

Posted by NamanRastogi <gi...@git.apache.org>.
Github user NamanRastogi commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235901284
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
    
    Done.


---

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

Posted by NamanRastogi <gi...@git.apache.org>.
Github user NamanRastogi commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235892844
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
    
    The following tables are the build times of CarbonReader. The exact code is:
    ```scala
    import org.apache.carbondata.sdk.file._
    
    val startTime = System.currentTimeMillis
    val reader = CarbonReader
    	.builder("hdfs://hacluster/carbonfiles", "t1")
    	.withBatch(50000)
    	.build
    println(s"Time to build: ${System.currentTimeMillis - startTime}")
    reader.close
    
    ```
    
    **Before**
    
    | DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 |
    | -- | -- | -- |
    | With Vector Reader | 1870 ms | 1867 ms |
    | Without Vector Reader	| 12961 ms | 28539 ms |
    
    
    **After**
    
    | DETAIL_QUERY_BATCH_SIZE | 4000 | 50000 |
    | -- | -- | -- |
    | With Vector Reader	| 1238 ms	| 1279 ms	|
    | Without Vector Reader	| 1878 ms	| 1913 ms	|


---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1733/



---

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

Posted by xubo245 <gi...@git.apache.org>.
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235835735
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
    
    Have you tested/compare the performance before/after this change?


---

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2942


---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1511/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1750/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1540/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9798/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9799/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1539/



---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9769/



---

[GitHub] carbondata pull request #2942: [CARBONDATA-3121] Improvement of CarbonReader...

Posted by xubo245 <gi...@git.apache.org>.
Github user xubo245 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2942#discussion_r235897415
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/ChunkRowIterator.java ---
    @@ -52,17 +49,11 @@ public ChunkRowIterator(CarbonIterator<RowBatch> iterator) {
        * @return {@code true} if the iteration has more elements
        */
       @Override public boolean hasNext() {
    -    if (null != currentChunk) {
    -      if ((currentChunk.hasNext())) {
    -        return true;
    -      } else if (!currentChunk.hasNext()) {
    -        while (iterator.hasNext()) {
    -          currentChunk = iterator.next();
    -          if (currentChunk != null && currentChunk.hasNext()) {
    -            return true;
    -          }
    -        }
    -      }
    +    if (currentChunk != null && currentChunk.hasNext()) {
    --- End diff --
    
    nice, can you put this in content of this PR?


---

[GitHub] carbondata issue #2942: [CARBONDATA-3121] Improvement of CarbonReader build ...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2942
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/1523/



---