You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by xuchuanyin <gi...@git.apache.org> on 2018/04/13 09:20:40 UTC

[GitHub] carbondata pull request #2169: [CARBONDATA-2344][DataMap] Fix bugs in mappin...

GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/2169

    [CARBONDATA-2344][DataMap] Fix bugs in mapping blocklet to UnsafeDMStore rows

    In BlockletDataMap, carbondata stores DMRow in an array for each
    blocklet. But currently carbondata accesses the DMRow only by
    blockletId(0, 1, etc.), which will cause problem since different
    block can have same blockletId.
    
    This PR adds a map to map the blockId#blockletId to array index,
    carbondata can access the DMRow by blockId and blockletId.
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [x] Any interfaces changed?
     `NO, only internal interfaces have been changed`
     - [x] Any backward compatibility impacted?
     `NO`
     - [x] Document update required?
    `NO`
     - [x] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
    `NO`
            - How it is tested? Please attach test report.
    `Tested in local`
            - Is it a performance related change? Please attach the performance test report.
    `No`
            - Any additional information to help reviewers in testing this change.
     `NO`      
     - [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    `Not related`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata 0413_bug_blocklet_dm_unsafe_row

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2169
    
----
commit dd010297c7f7428dc8f42ec1a292b8cdddcc09aa
Author: xuchuanyin <xu...@...>
Date:   2018-04-13T08:18:23Z

    Fix bugs in mapping blocklet to UnsafeDMStore
    
    In BlockletDataMap, carbondata stores DMRow in an array for each
    blocklet. But currently carbondata accesses the DMRow only by
    blockletId(0, 1, etc.), which will cause problem since different
    block can have same blockletId.
    
    This PR adds a map to map the blockId#blockletId to array index,
    carbondata can access the DMRow by blockId and blockletId.

----


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5259/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4082/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    @xuchuanyin What I have mentioned is that instead of adding the mapping in datamap, handle while writing the datamap. 
    Currently the blocklet number is respective to each block while writing the datamap , instead generate blocklet number respective to complete index file.
    In this approach, we can eliminate the block to bloclet mapping completely even inside datamaps. 


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    This issue has been fixed in https://github.com/apache/carbondata/pull/2206


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5007/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    @ravipesala Thanks for helping me understand the design purpose.
    
    The origin problem is that I found the query result will duplicate/miss some records. The scenario is that I use a datamap to filter out 2 block (each contains 3 blocklets). When it comes to BlockletDataMap, it filter out 6 blocklets, but the blocklets are duplicated twice. Actually it only contains blocklets from the first block.
    
    I'll work on the relativeBlockletId and fix the problem.


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3867/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4078/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5091/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    @ravipesala After I studied the code, I found that we must keep a map between unique-blockletId to DMRow-pointer-index.
    The relative blockletId in previous code was generated before datamap pruning and has some relationship with DMRow-pointer-index. After pruning, some blocks will be filtered and we can't get the real relative blocklet since some blocks was filtered.


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    retest this please


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    retest this please


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4441/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4440/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3791/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4996/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    @xuchuanyin what is the issue you are actually facing? Blocklet ids here are only virtual and count as per the number of blocklets present in the indexfile. If the issue is with other datamaps like lucene then better correct the blocklet order as per the indexfile while writing the datamap. It also saves memory and simplifies the datamap writing by avoiding block name. 
    Maintaining block names here is not memory efficient.


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    retest this please


---

[GitHub] carbondata pull request #2169: [CARBONDATA-2344][DataMap] Fix bugs in mappin...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin closed the pull request at:

    https://github.com/apache/carbondata/pull/2169


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    @ravipesala I'm confused...
    Can you check the MinMaxDataMap and run the example with/without this PR?
    
    Note that:
    1. You can first apply #2201 to fix the example error
    2. Change `lineNum` to 1000000
    3. Change the table_blocksize of `minMaxDMSampleTable` to 256
    4. run the queries in the example
    
    Each query should return exactly 1 record, but you'll find some queries return 0 while some return 2.
    
    If it is realy a problem, please show me how to fix it in MinMaxDataMap. Thanks~


---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5226/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3780/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5263/



---

[GitHub] carbondata issue #2169: [CARBONDATA-2344][DataMap] Fix bugs in mapping block...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2169
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3941/



---