You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by GitBox <gi...@apache.org> on 2020/03/26 10:46:52 UTC
[GitHub] [carbondata] QiangCai opened a new pull request #3681:
[CARBONDATA-3752] Reuse Exchange to fix performance issue
QiangCai opened a new pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681
### Why is this PR needed?
Spark ReusedExchange rule can't recognition the same Exchange plan on carbon table.
So the query on the carbon table doesn't reuse Exchange, it leads to bad performance.
For Example:
```
create table t1(c1 int, c2 string) using carbondata
explain
select c2, sum(c1) from t1 group by c2
union all
select c2, sum(c1) from t1 group by c2
```
physical plan as following:
```
Union
:- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
: +- Exchange hashpartitioning(c2#37, 200)
: +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
: +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
+- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
+- Exchange hashpartitioning(c2#37, 200)
+- *(3) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
+- *(3) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
```
after change, physical plan as following:
```
Union
:- *(2) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
: +- Exchange hashpartitioning(c2#37, 200)
: +- *(1) HashAggregate(keys=[c2#37], functions=[partial_sum(cast(c1#36 as bigint))])
: +- *(1) FileScan carbondata default.t1[c1#36,c2#37] ReadSchema: struct<c1:int,c2:string>
+- *(4) HashAggregate(keys=[c2#37], functions=[sum(cast(c1#36 as bigint))])
+- ReusedExchange [c2#37, sum#54L], Exchange hashpartitioning(c2#37, 200)
```
### What changes were proposed in this PR?
change CarbonFileIndex class to case class.
### Does this PR introduce any user interface change?
- No
### Is any new testcase added?
- Yes
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [carbondata] QiangCai commented on issue #3681: [CARBONDATA-3752]
Reuse Exchange to fix performance issue
Posted by GitBox <gi...@apache.org>.
QiangCai commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604764895
@ajantha-bhat it also impact carbondata table.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [carbondata] asfgit closed pull request #3681: [CARBONDATA-3752]
Reuse Exchange to fix performance issue
Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3681:
[CARBONDATA-3752] Reuse Exchange to fix performance issue
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604413810
Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/2566/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [carbondata] CarbonDataQA1 commented on issue #3681:
[CARBONDATA-3752] Reuse Exchange to fix performance issue
Posted by GitBox <gi...@apache.org>.
CarbonDataQA1 commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604422173
Build Success with Spark 2.4.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/858/
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
[GitHub] [carbondata] ajantha-bhat commented on issue #3681:
[CARBONDATA-3752] Reuse Exchange to fix performance issue
Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on issue #3681: [CARBONDATA-3752] Reuse Exchange to fix performance issue
URL: https://github.com/apache/carbondata/pull/3681#issuecomment-604483281
LGTM
good finding. This is applicable only for fileFormat ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services