You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by kumarvishal09 <gi...@git.apache.org> on 2018/04/09 13:35:40 UTC

[GitHub] carbondata pull request #2149: [WIP]Page level uncompress and Improve Unsafe...

GitHub user kumarvishal09 opened a pull request:

    https://github.com/apache/carbondata/pull/2149

    [WIP]Page level uncompress and Improve Unsafe Variable length store performance

    **Changes Done:**
    **Page Level Decoder for query**
    Added page level on demand decoding, in current code, all pages of blocklet is getting uncompressed, because of this memory footprint is too high and cause OOM, Now added code to support page level decoding, one page will be decoding when all the records are processed next page data will be decoded. It will improve query performance for example limit query.
    **Unsafe No Dictionary(Unsafe variable length)**
    Optimized getRow(for Vector processing) And putArray method
    
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
           Existing all the testcase are applicable.
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kumarvishal09/incubator-carbondata PageLevelUnCompress

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2149.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2149
    
----
commit 5a3ca1a3b6bc3d799c20ef9dda901367f3ac2d2c
Author: kumarvishal <ku...@...>
Date:   2018-04-09T11:40:13Z

    Page level uncompress

----


---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by kumarvishal09 <gi...@git.apache.org>.

Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    retest this please


---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3802/



---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5018/



---

[GitHub] carbondata issue #2149: [WIP]Page level uncompress and Improve query perform...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4891/



---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3807/



---

[GitHub] carbondata pull request #2149: [CARBONDATA-2325]Page level uncompress and Im...

Posted by kumarvishal09 <gi...@git.apache.org>.

Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2149#discussion_r181554900
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/unsafe/UnsafeVariableLengthDimensionDataChunkStore.java ---
    @@ -78,70 +88,96 @@ public UnsafeVariableLengthDimensionDataChunkStore(long totalSize, boolean isInv
     
         // start position will be used to store the current data position
         int startOffset = 0;
    -    // position from where offsets will start
    -    long pointerOffsets = this.dataPointersOffsets;
         // as first position will be start from 2 byte as data is stored first in the memory block
         // we need to skip first two bytes this is because first two bytes will be length of the data
         // which we have to skip
    -    CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(),
    -        dataPageMemoryBlock.getBaseOffset() + pointerOffsets,
    -        CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -    // incrementing the pointers as first value is already filled and as we are storing as int
    -    // we need to increment the 4 bytes to set the position of the next value to set
    -    pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE;
    +    int [] dataOffsets = new int[numberOfRows];
    +    dataOffsets[0] = CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
         // creating a byte buffer which will wrap the length of the row
    -    // using byte buffer as unsafe will return bytes in little-endian encoding
    -    ByteBuffer buffer = ByteBuffer.allocate(CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -    // store length of data
    -    byte[] length = new byte[CarbonCommonConstants.SHORT_SIZE_IN_BYTE];
    -    // as first offset is already stored, we need to start from the 2nd row in data array
    +    ByteBuffer buffer = ByteBuffer.wrap(data);
         for (int i = 1; i < numberOfRows; i++) {
    -      // first copy the length of previous row
    -      CarbonUnsafe.getUnsafe().copyMemory(dataPageMemoryBlock.getBaseObject(),
    -          dataPageMemoryBlock.getBaseOffset() + startOffset, length, CarbonUnsafe.BYTE_ARRAY_OFFSET,
    -          CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -      buffer.put(length);
    -      buffer.flip();
    +      buffer.position(startOffset);
           // so current row position will be
           // previous row length + 2 bytes used for storing previous row data
    -      startOffset += CarbonCommonConstants.SHORT_SIZE_IN_BYTE + buffer.getShort();
    +      startOffset += buffer.getShort() + CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
           // as same byte buffer is used to avoid creating many byte buffer for each row
           // we need to clear the byte buffer
    -      buffer.clear();
    -      // now put the offset of current row, here we need to add 2 more bytes as current will
    -      // also have length part so we have to skip length
    -      CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(),
    -          dataPageMemoryBlock.getBaseOffset() + pointerOffsets,
    -          startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -      // incrementing the pointers as first value is already filled and as we are storing as int
    -      // we need to increment the 4 bytes to set the position of the next value to set
    -      pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE;
    +      dataOffsets[i] = startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
         }
    -
    +    CarbonUnsafe.getUnsafe().copyMemory(dataOffsets, CarbonUnsafe.INT_ARRAY_OFFSET,
    +        dataPageMemoryBlock.getBaseObject(),
    +        dataPageMemoryBlock.getBaseOffset() + this.dataPointersOffsets,
    +        dataOffsets.length * CarbonCommonConstants.INT_SIZE_IN_BYTE);
       }
     
       /**
        * Below method will be used to get the row based on row id passed
    -   *
    +   * Getting the row from unsafe works in below logic
    +   * 1. if inverted index is present then get the row id based on reverse inverted index
    +   * 2. get the current row id data offset
    +   * 3. if it's not a last row- get the next row offset
    +   * Subtract the current row offset + 2 bytes(to skip the data length) with next row offset
    +   * 4. if it's last row
    +   * subtract the current row offset + 2 bytes(to skip the data length) with complete data length
        * @param rowId
        * @return row
        */
       @Override public byte[] getRow(int rowId) {
    +    // get the actual row id
    +    rowId = getRowId(rowId);
    +    // get offset of data in unsafe
    +    int currentDataOffset = getOffSet(rowId);
    +    // get the data length
    +    short length = getLength(rowId, currentDataOffset);
    +    // create data array
    +    byte[] data = new byte[length];
    +    // fill the row data
    +    fillRowInternal(length, data, currentDataOffset);
    +    return data;
    +  }
    +
    +  /**
    +   * Returns the actual row id for data
    +   * if inverted index is present then get the row id based on reverse inverted index
    +   * otherwise return the same row id
    +   * @param rowId
    +   * row id
    +   * @return actual row id
    +   */
    +  private int getRowId(int rowId) {
         // if column was explicitly sorted we need to get the rowid based inverted index reverse
         if (isExplicitSorted) {
           rowId = CarbonUnsafe.getUnsafe().getInt(dataPageMemoryBlock.getBaseObject(),
               dataPageMemoryBlock.getBaseOffset() + this.invertedIndexReverseOffset + ((long)rowId
                   * CarbonCommonConstants.INT_SIZE_IN_BYTE));
         }
    -    // now to get the row from memory block we need to do following thing
    -    // 1. first get the current offset
    -    // 2. if it's not a last row- get the next row offset
    -    // Subtract the current row offset + 2 bytes(to skip the data length) with next row offset
    -    // else subtract the current row offset + 2 bytes(to skip the data length)
    -    // with complete data length
    -    int currentDataOffset = CarbonUnsafe.getUnsafe().getInt(dataPageMemoryBlock.getBaseObject(),
    -        dataPageMemoryBlock.getBaseOffset() + this.dataPointersOffsets + (rowId
    +    return rowId;
    +  }
    +
    +  /**
    +   * get data offset based on current row id
    +   * @param rowId
    +   * row id
    --- End diff --
    
    ok


---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5023/



---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4383/



---

[GitHub] carbondata pull request #2149: [CARBONDATA-2325]Page level uncompress and Im...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2149#discussion_r181342780
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/unsafe/UnsafeVariableLengthDimensionDataChunkStore.java ---
    @@ -78,70 +88,96 @@ public UnsafeVariableLengthDimensionDataChunkStore(long totalSize, boolean isInv
     
         // start position will be used to store the current data position
         int startOffset = 0;
    -    // position from where offsets will start
    -    long pointerOffsets = this.dataPointersOffsets;
         // as first position will be start from 2 byte as data is stored first in the memory block
         // we need to skip first two bytes this is because first two bytes will be length of the data
         // which we have to skip
    -    CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(),
    -        dataPageMemoryBlock.getBaseOffset() + pointerOffsets,
    -        CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -    // incrementing the pointers as first value is already filled and as we are storing as int
    -    // we need to increment the 4 bytes to set the position of the next value to set
    -    pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE;
    +    int [] dataOffsets = new int[numberOfRows];
    +    dataOffsets[0] = CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
         // creating a byte buffer which will wrap the length of the row
    -    // using byte buffer as unsafe will return bytes in little-endian encoding
    -    ByteBuffer buffer = ByteBuffer.allocate(CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -    // store length of data
    -    byte[] length = new byte[CarbonCommonConstants.SHORT_SIZE_IN_BYTE];
    -    // as first offset is already stored, we need to start from the 2nd row in data array
    +    ByteBuffer buffer = ByteBuffer.wrap(data);
         for (int i = 1; i < numberOfRows; i++) {
    -      // first copy the length of previous row
    -      CarbonUnsafe.getUnsafe().copyMemory(dataPageMemoryBlock.getBaseObject(),
    -          dataPageMemoryBlock.getBaseOffset() + startOffset, length, CarbonUnsafe.BYTE_ARRAY_OFFSET,
    -          CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -      buffer.put(length);
    -      buffer.flip();
    +      buffer.position(startOffset);
           // so current row position will be
           // previous row length + 2 bytes used for storing previous row data
    -      startOffset += CarbonCommonConstants.SHORT_SIZE_IN_BYTE + buffer.getShort();
    +      startOffset += buffer.getShort() + CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
           // as same byte buffer is used to avoid creating many byte buffer for each row
           // we need to clear the byte buffer
    -      buffer.clear();
    -      // now put the offset of current row, here we need to add 2 more bytes as current will
    -      // also have length part so we have to skip length
    -      CarbonUnsafe.getUnsafe().putInt(dataPageMemoryBlock.getBaseObject(),
    -          dataPageMemoryBlock.getBaseOffset() + pointerOffsets,
    -          startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE);
    -      // incrementing the pointers as first value is already filled and as we are storing as int
    -      // we need to increment the 4 bytes to set the position of the next value to set
    -      pointerOffsets += CarbonCommonConstants.INT_SIZE_IN_BYTE;
    +      dataOffsets[i] = startOffset + CarbonCommonConstants.SHORT_SIZE_IN_BYTE;
         }
    -
    +    CarbonUnsafe.getUnsafe().copyMemory(dataOffsets, CarbonUnsafe.INT_ARRAY_OFFSET,
    +        dataPageMemoryBlock.getBaseObject(),
    +        dataPageMemoryBlock.getBaseOffset() + this.dataPointersOffsets,
    +        dataOffsets.length * CarbonCommonConstants.INT_SIZE_IN_BYTE);
       }
     
       /**
        * Below method will be used to get the row based on row id passed
    -   *
    +   * Getting the row from unsafe works in below logic
    +   * 1. if inverted index is present then get the row id based on reverse inverted index
    +   * 2. get the current row id data offset
    +   * 3. if it's not a last row- get the next row offset
    +   * Subtract the current row offset + 2 bytes(to skip the data length) with next row offset
    +   * 4. if it's last row
    +   * subtract the current row offset + 2 bytes(to skip the data length) with complete data length
        * @param rowId
        * @return row
        */
       @Override public byte[] getRow(int rowId) {
    +    // get the actual row id
    +    rowId = getRowId(rowId);
    +    // get offset of data in unsafe
    +    int currentDataOffset = getOffSet(rowId);
    +    // get the data length
    +    short length = getLength(rowId, currentDataOffset);
    +    // create data array
    +    byte[] data = new byte[length];
    +    // fill the row data
    +    fillRowInternal(length, data, currentDataOffset);
    +    return data;
    +  }
    +
    +  /**
    +   * Returns the actual row id for data
    +   * if inverted index is present then get the row id based on reverse inverted index
    +   * otherwise return the same row id
    +   * @param rowId
    +   * row id
    +   * @return actual row id
    +   */
    +  private int getRowId(int rowId) {
         // if column was explicitly sorted we need to get the rowid based inverted index reverse
         if (isExplicitSorted) {
           rowId = CarbonUnsafe.getUnsafe().getInt(dataPageMemoryBlock.getBaseObject(),
               dataPageMemoryBlock.getBaseOffset() + this.invertedIndexReverseOffset + ((long)rowId
                   * CarbonCommonConstants.INT_SIZE_IN_BYTE));
         }
    -    // now to get the row from memory block we need to do following thing
    -    // 1. first get the current offset
    -    // 2. if it's not a last row- get the next row offset
    -    // Subtract the current row offset + 2 bytes(to skip the data length) with next row offset
    -    // else subtract the current row offset + 2 bytes(to skip the data length)
    -    // with complete data length
    -    int currentDataOffset = CarbonUnsafe.getUnsafe().getInt(dataPageMemoryBlock.getBaseObject(),
    -        dataPageMemoryBlock.getBaseOffset() + this.dataPointersOffsets + (rowId
    +    return rowId;
    +  }
    +
    +  /**
    +   * get data offset based on current row id
    +   * @param rowId
    +   * row id
    --- End diff --
    
    move it to previous line, please modify all places


---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3810/



---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3694/



---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by jackylk <gi...@git.apache.org>.

Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    LGTM


---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5026/



---

[GitHub] carbondata pull request #2149: [CARBONDATA-2325]Page level uncompress and Im...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2149


---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by kumarvishal09 <gi...@git.apache.org>.

Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    retest this please


---

[GitHub] carbondata issue #2149: [WIP]Page level uncompress and Improve query perform...

Posted by ravipesala <gi...@git.apache.org>.

Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4361/



---

[GitHub] carbondata issue #2149: [WIP]Page level uncompress and Improve query perform...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3668/



---

[GitHub] carbondata issue #2149: [CARBONDATA-2325]Page level uncompress and Improve q...

Posted by CarbonDataQA <gi...@git.apache.org>.

Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2149
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4917/



---