You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by QiangCai <gi...@git.apache.org> on 2017/12/15 14:12:39 UTC
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
GitHub user QiangCai opened a pull request:
https://github.com/apache/carbondata/pull/1669
[CARBONDATA-1880] Combine input small files for GLOBAL_SORT
Combine input small files for GLOBAL_SORT to avoid carbon small file issue
- [x] Any interfaces changed?
no
- [x] Any backward compatibility impacted?
yes
- [x] Document update required?
no
- [x] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
added
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [x] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/QiangCai/carbondata small_files
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/1669.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1669
----
commit 1d8963f94315242282ca3dfd9de5f52b84720569
Author: QiangCai <qi...@qq.com>
Date: 2017-12-15T14:02:28Z
combine input small files
----
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/carbondata/pull/1669
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/795/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
retest sdv please
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2336/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2421/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2136/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2124/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2409/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/1669
retest this please
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/912/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2438/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2142/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2151/
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157696911
--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -1277,6 +1277,10 @@
public static final String CARBON_CUSTOM_BLOCK_DISTRIBUTION = "carbon.custom.block.distribution";
public static final String CARBON_CUSTOM_BLOCK_DISTRIBUTION_DEFAULT = "false";
+ @CarbonProperty
+ public static final String CARBON_COMBINE_SMALL_INPUT_FILES = "carbon.combine.small.input.files";
--- End diff --
fixed
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/939/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2129/
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157356369
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
@@ -160,4 +162,112 @@ object DataLoadProcessBuilderOnSpark {
Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors)))
}
}
+
+ /**
+ * use FileScanRDD to read input csv files
--- End diff --
change comment to mention this function creates a RDD that does reading of multiple CSV files
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157356331
--- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java ---
@@ -1277,6 +1277,10 @@
public static final String CARBON_CUSTOM_BLOCK_DISTRIBUTION = "carbon.custom.block.distribution";
public static final String CARBON_CUSTOM_BLOCK_DISTRIBUTION_DEFAULT = "false";
+ @CarbonProperty
+ public static final String CARBON_COMBINE_SMALL_INPUT_FILES = "carbon.combine.small.input.files";
--- End diff --
change to `carbon.mergeSmallFileIO.enable`
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/901/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/922/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2154/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2168/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2440/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
retest sdv please
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Failed with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/909/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2441/
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157356347
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
@@ -49,29 +59,21 @@ object DataLoadProcessBuilderOnSpark {
private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
def loadDataUsingGlobalSort(
- sc: SparkContext,
+ sqlContext: SQLContext,
--- End diff --
better to use sparkSession instead of sqlContext
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157356409
--- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/package.scala ---
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.internal.SessionState
+
+package object command {
+ def sessionState(sparkSession: SparkSession): SessionState = sparkSession.sessionState
--- End diff --
Is this required?
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157697005
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
@@ -160,4 +162,112 @@ object DataLoadProcessBuilderOnSpark {
Array((uniqueLoadStatusId, (loadMetadataDetails, executionErrors)))
}
}
+
+ /**
+ * use FileScanRDD to read input csv files
--- End diff --
fixed
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/1669
retest this please
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/918/
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157697045
--- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/package.scala ---
@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.internal.SessionState
+
+package object command {
+ def sessionState(sparkSession: SparkSession): SessionState = sparkSession.sessionState
--- End diff --
change to SparkSQLUtil
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.2.0, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/925/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1669
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/2015/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1669
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/2452/
---
[GitHub] carbondata issue #1669: [CARBONDATA-1880] Combine input small files for GLOB...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/1669
retest this please
---
[GitHub] carbondata pull request #1669: [CARBONDATA-1880] Combine input small files f...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1669#discussion_r157696968
--- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala ---
@@ -49,29 +59,21 @@ object DataLoadProcessBuilderOnSpark {
private val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName)
def loadDataUsingGlobalSort(
- sc: SparkContext,
+ sqlContext: SQLContext,
--- End diff --
fixed
---