You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by BJangir <gi...@git.apache.org> on 2018/08/24 09:40:13 UTC
[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...
GitHub user BJangir opened a pull request:
https://github.com/apache/carbondata/pull/2658
[Carbondata 2885]Broadcast Issue and Small file distribution Issue
Issue :-
1. In External Table Carbon Relation sizeInByte is wrong (always 0) because of this Join Queries are identified for broadcast even Table actual size is > 10MB( default broadcast).This is making fail some of the join table ( table which should select sortmergeJoin but because of wrong calculation it gone for broadcast join)
2. if Merge_small_file task distribution is enabled ,Join queries are failed (TPCH).
carbon opens many carbon files but it not getting closed.
Root Cause :- 1. Current relation size calculation is based on tablestatus file but since External Table does not have tablestatus file so always zero was returned.
2. if Merge_small_file task distribution is enabled carbon opens many carbon files but it not getting closed.
Solution :-
1. if Table is External Table then calculate size from TablePath .
2. close the carbon files for scan is finished.
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [ ] Any interfaces changed?
NA
- [ ] Any backward compatibility impacted?
NA
- [ ] Document update required?
NA
- [ ] Testing done
Manually testing in 3 node cluster
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/BJangir/incubator-carbondata CARBONDATA-2885
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2658.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2658
----
commit 69fe7241e0cef5d7b9a6ac9e87018b3d44dd60a0
Author: BJangir <ba...@...>
Date: 2018-08-24T09:17:49Z
[CARBONDATA-2885] Broadcast Issue and Small file distribution Issue
----
---
[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/2658
---
[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2658
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/8038/
---
[GitHub] carbondata pull request #2658: [Carbondata 2885]Broadcast Issue and Small fi...
Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2658#discussion_r212576753
--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala ---
@@ -191,6 +191,14 @@ case class CarbonRelation(
}
}
}
+ else if (carbonTable.isExternalTable) {
--- End diff --
add check in above code for normal table, no need to check for tablestatus file as extrenal table tablestatus will not be present
---
[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2658
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6385/
---
[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2658
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/6383/
---
[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...
Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:
https://github.com/apache/carbondata/pull/2658
LGTM
---
[GitHub] carbondata issue #2658: [Carbondata 2885]Broadcast Issue and Small file dist...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2658
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/6761/
---