You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by xuchuanyin <gi...@git.apache.org> on 2017/12/04 10:48:10 UTC
[GitHub] carbondata pull request #1606: WIP:[CARBONDATA-1839] Fix bugs in compressing...
GitHub user xuchuanyin opened a pull request:
https://github.com/apache/carbondata/pull/1606
WIP:[CARBONDATA-1839] Fix bugs in compressing sort temp files
Be sure to do all of the following checklist to help us incorporate
your contribution quickly and easily:
- [X] Any interfaces changed?
`YES, ONLY CHANGE INTERNAL INTERFACES`
- [X] Any backward compatibility impacted?
`NO`
- [X] Document update required?
`YES`
- [X] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
`ADDED TESTS`
- How it is tested? Please attach test report.
`TESTED IN LOCAL CLUSTER`
- Is it a performance related change? Please attach the performance test report.
`YES`
- Any additional information to help reviewers in testing this change.
`There are some duplicate code in write temp sort files found during this bug fixing and I plan to optimize it in successive PR not in this one.`
- [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
`NOT RELATED`
RESOLVE
===
1. fix bugs in compressing sort temp files
2. optimize parameters
3. add tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xuchuanyin/carbondata bug_compress_sort_temp
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/1606.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1606
----
commit 83e109b6d332dafcc10e243e8bc5a21ac3617ac6
Author: xuchuanyin <xu...@hust.edu.cn>
Date: 2017-12-01T11:32:00Z
fix bugs in compressing sort temp files
1. fix bugs in compressing sort temp files
2. optimize parameters
3. add tests
----
---
[GitHub] carbondata issue #1606: WIP:[CARBONDATA-1839] Fix bugs in compressing sort t...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1674/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] Fix bugs in compressing sort temp ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/429/
---
[GitHub] carbondata pull request #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1606#discussion_r155000018
--- Diff: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/TableFieldStat.java ---
@@ -0,0 +1,139 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.sort.sortdata;
+
+import java.util.Objects;
+
+import org.apache.carbondata.core.metadata.datatype.DataType;
+
+public class TableFieldStat {
--- End diff --
Add comment
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1714/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1757/
---
[GitHub] carbondata issue #1606: WIP:[CARBONDATA-1839] Fix bugs in compressing sort t...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1606
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2061/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] Fix bugs in compressing sort temp ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/1700/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/495/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] Fix bugs in compressing sort temp ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1606
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2085/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1606
Is the rework completed for this PR?
---
[GitHub] carbondata pull request #1606: WIP: [CARBONDATA-1839] [DataLoad]Fix bugs in ...
Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin closed the pull request at:
https://github.com/apache/carbondata/pull/1606
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] Fix bugs in compressing sort temp ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1606
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2095/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] Fix bugs in compressing sort temp ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/445/
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...
Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1606
@jackylk Thanks for your review, since another PR #1594 has been merged, I'll optimize this PR to reduce duplicate codes in writing sort temp files.
---
[GitHub] carbondata pull request #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1606#discussion_r154999809
--- Diff: processing/src/main/java/org/apache/carbondata/processing/sort/sortdata/SortTempFileChunkHolder.java ---
@@ -381,6 +351,51 @@ private void fillDataForPrefetch() {
return holder;
}
+ private Object[][] getBatchedRowFromStream(int expected) throws CarbonSortKeyAndGroupByException {
--- End diff --
please add comment for this function
---
[GitHub] carbondata issue #1606: WIP: [CARBONDATA-1839] [DataLoad]Fix bugs in compres...
Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1606
@jackylk I close this PR and raise #1632 instead.
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1606
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2141/
---
[GitHub] carbondata pull request #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1606#discussion_r154999120
--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/ParallelReadMergeSorterImpl.java ---
@@ -80,11 +81,7 @@ public void initialize(SortParameters sortParameters) {
File.separator, CarbonCommonConstants.SORT_TEMP_FILE_LOCATION);
finalMerger =
new SingleThreadFinalSortFilesMerger(dataFolderLocations, sortParameters.getTableName(),
- sortParameters.getDimColCount(),
- sortParameters.getComplexDimColCount(), sortParameters.getMeasureColCount(),
- sortParameters.getNoDictionaryCount(), sortParameters.getMeasureDataType(),
- sortParameters.getNoDictionaryDimnesionColumn(),
- sortParameters.getNoDictionarySortColumn());
+ new TableFieldStat(sortParameters));
--- End diff --
I think the indentation is not correct
---
[GitHub] carbondata issue #1606: WIP:[CARBONDATA-1839] Fix bugs in compressing sort t...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1606
Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/401/
---
[GitHub] carbondata pull request #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compr...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1606#discussion_r154999525
--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/sort/unsafe/holder/UnsafeSortTempFileChunkHolder.java ---
@@ -352,6 +330,49 @@ private void fillDataForPrefetch() {
}
}
+ private Object[][] getBatchedRowFromStream(int expected) throws CarbonSortKeyAndGroupByException {
--- End diff --
please add comment for this function
---
[GitHub] carbondata issue #1606: [CARBONDATA-1839] [DataLoad]Fix bugs in compressing ...
Posted by xuchuanyin <gi...@git.apache.org>.
Github user xuchuanyin commented on the issue:
https://github.com/apache/carbondata/pull/1606
No, it is a little complex to reduce the duplicate code. I'll make a full test as possible as I can.
---