You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by BJangir <gi...@git.apache.org> on 2018/08/24 09:40:13 UTC

[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...

GitHub user BJangir opened a pull request:

    https://github.com/apache/carbondata/pull/2658

    [Carbondata 2885]Broadcast Issue and Small file distribution Issue

    Issue  :-
    1.  In External Table Carbon Relation sizeInByte is wrong (always 0) because of this Join Queries are identified for broadcast even Table actual size is > 10MB( default broadcast).This is making fail some of the join table ( table which should select sortmergeJoin but because of wrong calculation it gone for broadcast join) 
    
    2.  if Merge_small_file task distribution is enabled  ,Join queries are failed (TPCH). 
    carbon opens many carbon files but it not getting closed. 
    
    Root Cause :- 1. Current relation size calculation is based on tablestatus file but since External Table does not have tablestatus file so always zero was returned.
    2. if Merge_small_file task distribution is enabled carbon opens many carbon files but it not getting closed. 
    Solution :- 
    1. if Table is External Table then calculate size from TablePath .
    2. close the carbon files for scan is finished.
    
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed?
     NA
     - [ ] Any backward compatibility impacted?
     NA
     - [ ] Document update required?
    NA
     - [ ] Testing done
         Manually  testing in 3 node cluster  
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    NA


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BJangir/incubator-carbondata CARBONDATA-2885

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2658.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2658
    
----
commit 69fe7241e0cef5d7b9a6ac9e87018b3d44dd60a0
Author: BJangir <ba...@...>
Date:   2018-08-24T09:17:49Z

    [CARBONDATA-2885] Broadcast Issue and Small file distribution Issue

----


---

[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2658


---

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2658
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8038/



---

[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2658#discussion_r212576753
  
    --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala ---
    @@ -191,6 +191,14 @@ case class CarbonRelation(
             }
           }
         }
    +    else if (carbonTable.isExternalTable) {
    --- End diff --
    
    add check in above code for normal table, no need to check for tablestatus file as extrenal table tablestatus will not be present


---

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2658
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6385/



---

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2658
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6383/



---

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2658
  
    LGTM


---

[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2658
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6761/



---