You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/03/26 10:46:52 UTC

[GitHub] [carbondata] QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681
 
 
    ### Why is this PR needed?
   Spark ReusedExchange rule can't recognition the same Exchange plan on carbon table.
   So the query on the carbon table doesn't reuse Exchange, it leads to bad performance.
   
   For Example:
   
   ```
   create table t1(c1 int, c2 string) using carbondata
   
   explain
   select c2, sum(c1) from t1 group by c2
   union all
   select c2, sum(c1) from t1 group by c2
   ```
   physical plan as following:
   ```
   Union
   :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
   : +- Exchange hashpartitioning(c2#37, 200)
   : +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
   : +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
   +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
    +- Exchange hashpartitioning(c2#37, 200)
    +- *(3) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
    +- *(3) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
   ```
   
   after change, physical plan as following:
   
   ```
   Union
   :- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
   :  +- Exchange hashpartitioning(c2#37, 200)
   :     +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
   :        +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
   +- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
      +- ReusedExchange [c2#37, sum#54L], Exchange hashpartitioning(c2#37, 200)
   ```
   
   
    ### What changes were proposed in this PR?
   change CarbonFileIndex class to case class.
   
    ### Does this PR introduce any user interface change?
    - No
   
    ### Is any new testcase added?
    - Yes
   
       
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] QiangCai commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

Posted by GitBox <gi...@apache.org>.
QiangCai commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604764895
 
 
   @ajantha-bhat it also impact carbondata table.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] asfgit closed pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604413810
 
 
   Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2566/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604422173
 
 
   Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/858/
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [carbondata] ajantha-bhat commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604483281
 
 
   LGTM
   
   good finding. This is applicable only for fileFormat ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services