You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by jackylk <gi...@git.apache.org> on 2018/02/11 09:47:21 UTC
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
GitHub user jackylk opened a pull request:
https://github.com/apache/carbondata/pull/1970
[CARBONDATA-2159] Remove carbon-spark dependency in store-sdk module
store-sdk module should not depend on carbon-spark module
This PR changes:
1. A `Maps` utility is added to provide `getOrDefault` method and avoid JDK 8 dependency
2. `CarbonLoadModelBuilder` is added to build `CarbonLoadModel`
3. `DataLoadingUtil.scala` and `ValidateUtil.scala` is changed to java implementation and moved to `CarbonLoadModelBuilder` in processing module
After all these changes, carbon-spark dependency can be removed from store-sdk module
- [X] Any interfaces changed?
No
- [X] Any backward compatibility impacted?
No
- [X] Document update required?
No
- [X] Testing done
No functionality is added
- [X] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
NA
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jackylk/incubator-carbondata sdk-remove-spark-dependency
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/1970.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1970
----
commit 952665a8c1c52f28951463fef989333ae0e6d83e
Author: Jacky Li <ja...@...>
Date: 2018-01-06T12:28:44Z
[CARBONDATA-1992] Remove partitionId in CarbonTablePath
In CarbonTablePath, there is a deprecated partition id which is always 0, it should be removed to avoid confusion.
This closes #1765
commit 111c3821557820241d1114d87eae2f7cd017e610
Author: Jacky Li <ja...@...>
Date: 2018-01-02T15:46:14Z
[CARBONDATA-1968] Add external table support
This PR adds support for creating external table with existing carbondata files, using Hive syntax.
CREATE EXTERNAL TABLE tableName STORED BY 'carbondata' LOCATION 'path'
This closes #1749
commit 80b42ac662ebd2bc243ca91c86b035717223daf4
Author: SangeetaGulia <sa...@...>
Date: 2017-09-21T09:26:26Z
[CARBONDATA-1827] S3 Carbon Implementation
1.Provide support for s3 in carbondata.
2.Added S3Example to create carbon table on s3.
3.Added S3CSVExample to load carbon table using csv from s3.
This closes #1805
commit 71c2d8ca4a3212cff1eedbe78ee03e521f57fbbc
Author: Jacky Li <ja...@...>
Date: 2018-01-31T16:25:31Z
[REBASE] Solve conflict after rebasing master
commit 15b4e192ee904a2e7c845ac67e0fcf1ba151a683
Author: Jacky Li <ja...@...>
Date: 2018-01-30T13:24:04Z
[CARBONDATA-2099] Refactor query scan process to improve readability
Unified concepts in scan process flow:
1.QueryModel contains all parameter for scan, it is created by API in CarbonTable. (In future, CarbonTable will be the entry point for various table operations)
2.Use term ColumnChunk to represent one column in one blocklet, and use ChunkIndex in reader to read specified column chunk
3.Use term ColumnPage to represent one page in one ColumnChunk
4.QueryColumn => ProjectionColumn, indicating it is for projection
This closes #1874
commit c3e99681bcd397ed33bc90e8d73b1fd33e0e60f7
Author: Jacky Li <ja...@...>
Date: 2018-01-31T08:14:27Z
[CARBONDATA-2025] Unify all path construction through CarbonTablePath static method
Refactory CarbonTablePath:
1.Remove CarbonStorePath and use CarbonTablePath only.
2.Make CarbonTablePath an utility without object creation, it can avoid creating object before using it, thus code is cleaner and GC is less.
This closes #1768
commit e502c59a2d0b95d80db3aff04c749654254eadbe
Author: Jatin <ja...@...>
Date: 2018-01-25T11:23:00Z
[CARBONDATA-2080] [S3-Implementation] Propagated hadoopConf from driver to executor for s3 implementation in cluster mode.
Problem : hadoopconf was not getting propagated from driver to the executor that's why load was failing to the distributed environment.
Solution: Setting the Hadoop conf in base class CarbonRDD
How to verify this PR :
Execute the load in the cluster mode It should be a success using location s3.
This closes #1860
commit cae74a8cecea74e8899a87dcb7d12e0dec1b8069
Author: sounakr <so...@...>
Date: 2017-09-28T10:51:05Z
[CARBONDATA-1480]Min Max Index Example for DataMap
Datamap Example. Implementation of Min Max Index through Datamap. And Using the Index while prunning.
This closes #1359
commit e972fd3d5cc8f392d47ca111b2d8f262edb29ac6
Author: ravipesala <ra...@...>
Date: 2017-11-15T14:18:40Z
[CARBONDATA-1544][Datamap] Datamap FineGrain implementation
Implemented interfaces for FG datamap and integrated to filterscanner to use the pruned bitset from FG datamap.
FG Query flow as follows.
1.The user can add FG datamap to any table and implement there interfaces.
2. Any filter query which hits the table with datamap will call prune method of FGdatamap.
3. The prune method of FGDatamap return list FineGrainBlocklet , these blocklets contain the information of block, blocklet, page and rowids information as well.
4. The pruned blocklets are internally wriitten to file and returns only the block , blocklet and filepath information as part of Splits.
5. Based on the splits scanrdd schedule the tasks.
6. In filterscanner we check the datamapwriterpath from split and reNoteads the bitset if exists. And pass this bitset as input to it.
This closes #1471
commit cd7eed66bdd7b0044953cb5bf037f6cce38c9e12
Author: xuchuanyin <xu...@...>
Date: 2018-02-08T07:39:45Z
[HotFix][CheckStyle] Fix import related checkstyle
This closes #1952
commit de92ea9a123b17d903f2d1d4662299315c792954
Author: xuchuanyin <xu...@...>
Date: 2018-02-08T06:35:14Z
[CARBONDATA-2018][DataLoad] Optimization in reading/writing for sort temp row
Pick up the no-sort fields in the row and pack them as bytes array and skip parsing them during merge sort to reduce CPU consumption
This closes #1792
commit 6dd8b038fc898dbf48ad30adfc870c19eb38e3d0
Author: xuchuanyin <xu...@...>
Date: 2018-02-08T06:42:39Z
[CARBONDATA-2023][DataLoad] Add size base block allocation in data loading
Carbondata assign blocks to nodes at the beginning of data loading.
Previous block allocation strategy is block number based and it will
suffer skewed data problem if the size of input files differs a lot.
We introduced a size based block allocation strategy to optimize data
loading performance in skewed data scenario.
This closes #1808
commit e5c32ac96f4cf85ef7a42f2a14c31c19418a789b
Author: Jacky Li <ja...@...>
Date: 2018-02-10T02:34:59Z
Revert "[CARBONDATA-2023][DataLoad] Add size base block allocation in data loading"
This reverts commit 6dd8b038fc898dbf48ad30adfc870c19eb38e3d0.
commit e1c6448cdbfa8d5eab1a861485f953eea3984f1f
Author: Jacky Li <ja...@...>
Date: 2018-02-10T12:11:25Z
Revert "[CARBONDATA-2018][DataLoad] Optimization in reading/writing for sort temp row"
This reverts commit de92ea9a123b17d903f2d1d4662299315c792954.
commit 7f5751a78c28c8a428fa62a5f82858ac65415c86
Author: Jacky Li <ja...@...>
Date: 2018-02-11T02:12:10Z
[CARBONDATA-2156] Add interface annotation
InterfaceAudience and InterfaceStability annotation should be added for user and developer
1.InetfaceAudience can be User and Developer
2.InterfaceStability can be Stable, Evolving, Unstable
This closes #1968
commit a848ccff8aaf3c10970c61b1f85bce56478ca0ac
Author: Jacky Li <ja...@...>
Date: 2018-02-10T11:44:23Z
[CARBONDATA-1997] Add CarbonWriter SDK API
Added a new module called store-sdk, and added a CarbonWriter API, it can be used to write Carbondata files to a specified folder, without Spark and Hadoop dependency. User can use this API in any environment.
This closes #1967
commit d898557aec564846e6298748698970de7e4eeca7
Author: Jacky Li <ja...@...>
Date: 2018-02-11T08:27:25Z
fix dependency
----
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2462/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2463/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3713/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2464/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3700/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3703/
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167481299
--- Diff: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ---
@@ -708,4 +710,129 @@ public static Boolean checkIfValidLoadInProgress(AbsoluteTableIdentifier absolut
}
}
+ private static boolean isLoadDeletionRequired(String metaDataLocation) {
+ LoadMetadataDetails[] details = SegmentStatusManager.readLoadMetadata(metaDataLocation);
+ if (details != null && details.length > 0) {
+ for (LoadMetadataDetails oneRow : details) {
+ if ((SegmentStatus.MARKED_FOR_DELETE == oneRow.getSegmentStatus()
+ || SegmentStatus.COMPACTED == oneRow.getSegmentStatus()
+ || SegmentStatus.INSERT_IN_PROGRESS == oneRow.getSegmentStatus()
+ || SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneRow.getSegmentStatus())
+ && oneRow.getVisibility().equalsIgnoreCase("true")) {
+ return true;
+ }
+ }
+ }
+ return false;
+ }
+
+ /**
+ * This will update the old table status details before clean files to the latest table status.
+ * @param oldList
+ * @param newList
+ * @return
+ */
+ public static List<LoadMetadataDetails> updateLoadMetadataFromOldToNew(
+ LoadMetadataDetails[] oldList, LoadMetadataDetails[] newList) {
+
+ List<LoadMetadataDetails> newListMetadata =
+ new ArrayList<LoadMetadataDetails>(Arrays.asList(newList));
+ for (LoadMetadataDetails oldSegment : oldList) {
+ if ("false".equalsIgnoreCase(oldSegment.getVisibility())) {
+ newListMetadata.get(newListMetadata.indexOf(oldSegment)).setVisibility("false");
+ }
+ }
+ return newListMetadata;
+ }
+
+ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
+ List<LoadMetadataDetails> listOfLoadFolderDetails) throws IOException {
+ String dataLoadLocation = CarbonTablePath.getTableStatusFilePath(identifier.getTablePath());
+
+ DataOutputStream dataOutputStream;
+ Gson gsonObjectToWrite = new Gson();
+ BufferedWriter brWriter = null;
+
+ AtomicFileOperations writeOperation =
+ new AtomicFileOperationsImpl(dataLoadLocation, FileFactory.getFileType(dataLoadLocation));
+
+ try {
+
+ dataOutputStream = writeOperation.openForWrite(FileWriteOperation.OVERWRITE);
+ brWriter = new BufferedWriter(new OutputStreamWriter(dataOutputStream,
+ Charset.forName(CarbonCommonConstants.DEFAULT_CHARSET)));
+
+ String metadataInstance = gsonObjectToWrite.toJson(listOfLoadFolderDetails.toArray());
+ brWriter.write(metadataInstance);
+ } finally {
+ try {
+ if (null != brWriter) {
+ brWriter.flush();
+ }
+ } catch (Exception e) {
+ LOG.error("error in flushing ");
+
+ }
+ CarbonUtil.closeStreams(brWriter);
+ writeOperation.close();
+ }
+ }
+
+ public static void deleteLoadsAndUpdateMetadata(
+ CarbonTable carbonTable,
+ boolean isForceDeletion) throws IOException {
+ if (isLoadDeletionRequired(carbonTable.getMetadataPath())) {
+ LoadMetadataDetails[] details =
+ SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath());
+ AbsoluteTableIdentifier identifier = carbonTable.getAbsoluteTableIdentifier();
+ ICarbonLock carbonTableStatusLock = CarbonLockFactory.getCarbonLockObj(
+ identifier, LockUsage.TABLE_STATUS_LOCK);
+
+ // Delete marked loads
+ boolean isUpdationRequired = DeleteLoadFolders.deleteLoadFoldersFromFileSystem(
+ identifier,
--- End diff --
please apply java style
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2460/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2474/
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167465747
--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModelBuilder.java ---
@@ -0,0 +1,322 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.loading.model;
+
+import java.io.IOException;
+import java.text.SimpleDateFormat;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.Maps;
+import org.apache.carbondata.common.Strings;
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.constants.LoggerAction;
+import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.processing.loading.constants.DataLoadProcessorConstants;
+import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat;
+import org.apache.carbondata.processing.loading.sort.SortScopeOptions;
+import org.apache.carbondata.processing.util.TableOptionConstant;
+
+import org.apache.commons.lang.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+
+/**
+ * Builder for {@link CarbonLoadModel}
+ */
+@InterfaceAudience.Developer
+public class CarbonLoadModelBuilder {
+
+ private CarbonTable table;
+
+ public CarbonLoadModelBuilder(CarbonTable table) {
+ this.table = table;
+ }
+
+ /**
+ * build CarbonLoadModel for data loading
+ * @param options Load options from user input
+ * @return a new CarbonLoadModel instance
+ */
+ public CarbonLoadModel build(
+ Map<String, String> options) throws InvalidLoadOptionException, IOException {
+ Map<String, String> optionsFinal = LoadOption.fillOptionWithDefaultValue(options);
+ optionsFinal.put("sort_scope", "no_sort");
+ if (!options.containsKey("fileheader")) {
+ List<CarbonColumn> csvHeader = table.getCreateOrderColumn(table.getTableName());
+ String[] columns = new String[csvHeader.size()];
+ for (int i = 0; i < columns.length; i++) {
+ columns[i] = csvHeader.get(i).getColName();
+ }
+ optionsFinal.put("fileheader", Strings.mkString(columns, ","));
+ }
+ CarbonLoadModel model = new CarbonLoadModel();
+
+ // we have provided 'fileheader', so it hadoopConf can be null
+ build(options, optionsFinal, model, null);
+
+ // set default values
+ model.setTimestampformat(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
+ model.setDateFormat(CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT);
+ model.setUseOnePass(Boolean.parseBoolean(Maps.getOrDefault(options, "onepass", "false")));
+ model.setDictionaryServerHost(Maps.getOrDefault(options, "dicthost", null));
+ try {
+ model.setDictionaryServerPort(Integer.parseInt(Maps.getOrDefault(options, "dictport", "-1")));
+ } catch (NumberFormatException e) {
+ throw new InvalidLoadOptionException(e.getMessage());
+ }
+ return model;
+ }
+
+ /**
+ * build CarbonLoadModel for data loading
+ * @param options Load options from user input
+ * @param optionsFinal Load options that populated with default values for optional options
+ * @param carbonLoadModel The output load model
+ * @param hadoopConf hadoopConf is needed to read CSV header if there 'fileheader' is not set in
+ * user provided load options
+ */
+ public void build(
--- End diff --
These code are moved from DataLoadingUtil.scala in carbon-spark module
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167465707
--- Diff: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ---
@@ -708,4 +710,129 @@ public static Boolean checkIfValidLoadInProgress(AbsoluteTableIdentifier absolut
}
}
+ private static boolean isLoadDeletionRequired(String metaDataLocation) {
--- End diff --
These code are moved from DataLoadingUtil.scala in carbon-spark module
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk closed the pull request at:
https://github.com/apache/carbondata/pull/1970
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167482420
--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.loading.model;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.carbondata.common.Maps;
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.CarbonLoadOptionConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+import org.apache.carbondata.processing.util.CarbonLoaderUtil;
+
+import org.apache.commons.lang.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+
+@InterfaceAudience.Developer
+public class LoadOption {
+
+ private static LogService LOG = LogServiceFactory.getLogService(LoadOption.class.getName());
+
+ /**
+ * get data loading options and initialise default value
+ */
+ public static Map<String, String> fillOptionWithDefaultValue(
+ Map<String, String> options) throws InvalidLoadOptionException {
+ Map<String, String> optionsFinal = new HashMap<>();
+ optionsFinal.put("delimiter", Maps.getOrDefault(options, "delimiter", ","));
+ optionsFinal.put("quotechar", Maps.getOrDefault(options, "quotechar", "\""));
+ optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", ""));
+ optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#"));
+ optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null));
+
+ optionsFinal.put(
+ "escapechar",
+ CarbonLoaderUtil.getEscapeChar(Maps.getOrDefault(options,"escapechar", "\\")));
+
+ optionsFinal.put(
+ "serialization_null_format",
+ Maps.getOrDefault(options, "serialization_null_format", "\\N"));
+
+ optionsFinal.put(
+ "bad_records_logger_enable",
+ Maps.getOrDefault(
+ options,
+ "bad_records_logger_enable",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_LOGGER_ENABLE,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_LOGGER_ENABLE_DEFAULT)));
+
+ String badRecordActionValue = CarbonProperties.getInstance().getProperty(
+ CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION,
+ CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION_DEFAULT);
+
+ optionsFinal.put(
+ "bad_records_action",
+ Maps.getOrDefault(
+ options,
+ "bad_records_action",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_ACTION,
+ badRecordActionValue)));
+
+ optionsFinal.put(
+ "is_empty_data_bad_record",
+ Maps.getOrDefault(
+ options,
+ "is_empty_data_bad_record",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_IS_EMPTY_DATA_BAD_RECORD,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_IS_EMPTY_DATA_BAD_RECORD_DEFAULT)));
+
+ optionsFinal.put(
+ "skip_empty_line",
+ Maps.getOrDefault(
+ options,
+ "skip_empty_line",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));
+
+ optionsFinal.put(
+ "all_dictionary_path",
+ Maps.getOrDefault(options, "all_dictionary_path", ""));
+
+ optionsFinal.put(
+ "complex_delimiter_level_1",
+ Maps.getOrDefault(options,"complex_delimiter_level_1", "\\$"));
+
+ optionsFinal.put(
+ "complex_delimiter_level_2",
+ Maps.getOrDefault(options, "complex_delimiter_level_2", "\\:"));
+
+ optionsFinal.put(
+ "dateformat",
+ Maps.getOrDefault(
+ options,
+ "dateformat",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT_DEFAULT)));
+
+ optionsFinal.put(
+ "timestampformat",
+ Maps.getOrDefault(
+ options,
+ "timestampformat",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_TIMESTAMPFORMAT,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_TIMESTAMPFORMAT_DEFAULT)));
+
+ optionsFinal.put(
+ "global_sort_partitions",
+ Maps.getOrDefault(
+ options,
+ "global_sort_partitions",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_GLOBAL_SORT_PARTITIONS,
+ null)));
+
+ optionsFinal.put("maxcolumns", Maps.getOrDefault(options, "maxcolumns", null));
+
+ optionsFinal.put(
+ "batch_sort_size_inmb",
+ Maps.getOrDefault(
+ options,
+ "batch_sort_size_inmb",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB,
+ CarbonProperties.getInstance().getProperty(
+ CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB,
+ CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT))));
+
+ optionsFinal.put(
+ "bad_record_path",
+ Maps.getOrDefault(
+ options,
+ "bad_record_path",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORD_PATH,
+ CarbonProperties.getInstance().getProperty(
+ CarbonCommonConstants.CARBON_BADRECORDS_LOC,
+ CarbonCommonConstants.CARBON_BADRECORDS_LOC_DEFAULT_VAL))));
+
+ String useOnePass = Maps.getOrDefault(
+ options,
+ "single_pass",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS_DEFAULT)).trim().toLowerCase();
+
+ boolean singlePass;
+
+ if (useOnePass.equalsIgnoreCase("true")) {
+ singlePass = true;
+ } else {
+ // when single_pass = false and if either alldictionarypath
+ // or columnDict is configured the do not allow load
+ if (StringUtils.isNotEmpty(optionsFinal.get("all_dictionary_path")) || StringUtils
--- End diff --
fixed
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2461/
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167481586
--- Diff: processing/src/main/java/org/apache/carbondata/processing/loading/model/LoadOption.java ---
@@ -0,0 +1,245 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.processing.loading.model;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.carbondata.common.Maps;
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.exceptions.sql.InvalidLoadOptionException;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.CarbonLoadOptionConstants;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+import org.apache.carbondata.processing.util.CarbonLoaderUtil;
+
+import org.apache.commons.lang.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+
+@InterfaceAudience.Developer
+public class LoadOption {
+
+ private static LogService LOG = LogServiceFactory.getLogService(LoadOption.class.getName());
+
+ /**
+ * get data loading options and initialise default value
+ */
+ public static Map<String, String> fillOptionWithDefaultValue(
+ Map<String, String> options) throws InvalidLoadOptionException {
+ Map<String, String> optionsFinal = new HashMap<>();
+ optionsFinal.put("delimiter", Maps.getOrDefault(options, "delimiter", ","));
+ optionsFinal.put("quotechar", Maps.getOrDefault(options, "quotechar", "\""));
+ optionsFinal.put("fileheader", Maps.getOrDefault(options, "fileheader", ""));
+ optionsFinal.put("commentchar", Maps.getOrDefault(options, "commentchar", "#"));
+ optionsFinal.put("columndict", Maps.getOrDefault(options, "columndict", null));
+
+ optionsFinal.put(
+ "escapechar",
+ CarbonLoaderUtil.getEscapeChar(Maps.getOrDefault(options,"escapechar", "\\")));
+
+ optionsFinal.put(
+ "serialization_null_format",
+ Maps.getOrDefault(options, "serialization_null_format", "\\N"));
+
+ optionsFinal.put(
+ "bad_records_logger_enable",
+ Maps.getOrDefault(
+ options,
+ "bad_records_logger_enable",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_LOGGER_ENABLE,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_LOGGER_ENABLE_DEFAULT)));
+
+ String badRecordActionValue = CarbonProperties.getInstance().getProperty(
+ CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION,
+ CarbonCommonConstants.CARBON_BAD_RECORDS_ACTION_DEFAULT);
+
+ optionsFinal.put(
+ "bad_records_action",
+ Maps.getOrDefault(
+ options,
+ "bad_records_action",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORDS_ACTION,
+ badRecordActionValue)));
+
+ optionsFinal.put(
+ "is_empty_data_bad_record",
+ Maps.getOrDefault(
+ options,
+ "is_empty_data_bad_record",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_IS_EMPTY_DATA_BAD_RECORD,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_IS_EMPTY_DATA_BAD_RECORD_DEFAULT)));
+
+ optionsFinal.put(
+ "skip_empty_line",
+ Maps.getOrDefault(
+ options,
+ "skip_empty_line",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_SKIP_EMPTY_LINE)));
+
+ optionsFinal.put(
+ "all_dictionary_path",
+ Maps.getOrDefault(options, "all_dictionary_path", ""));
+
+ optionsFinal.put(
+ "complex_delimiter_level_1",
+ Maps.getOrDefault(options,"complex_delimiter_level_1", "\\$"));
+
+ optionsFinal.put(
+ "complex_delimiter_level_2",
+ Maps.getOrDefault(options, "complex_delimiter_level_2", "\\:"));
+
+ optionsFinal.put(
+ "dateformat",
+ Maps.getOrDefault(
+ options,
+ "dateformat",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_DATEFORMAT_DEFAULT)));
+
+ optionsFinal.put(
+ "timestampformat",
+ Maps.getOrDefault(
+ options,
+ "timestampformat",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_TIMESTAMPFORMAT,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_TIMESTAMPFORMAT_DEFAULT)));
+
+ optionsFinal.put(
+ "global_sort_partitions",
+ Maps.getOrDefault(
+ options,
+ "global_sort_partitions",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_GLOBAL_SORT_PARTITIONS,
+ null)));
+
+ optionsFinal.put("maxcolumns", Maps.getOrDefault(options, "maxcolumns", null));
+
+ optionsFinal.put(
+ "batch_sort_size_inmb",
+ Maps.getOrDefault(
+ options,
+ "batch_sort_size_inmb",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB,
+ CarbonProperties.getInstance().getProperty(
+ CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB,
+ CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT))));
+
+ optionsFinal.put(
+ "bad_record_path",
+ Maps.getOrDefault(
+ options,
+ "bad_record_path",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_BAD_RECORD_PATH,
+ CarbonProperties.getInstance().getProperty(
+ CarbonCommonConstants.CARBON_BADRECORDS_LOC,
+ CarbonCommonConstants.CARBON_BADRECORDS_LOC_DEFAULT_VAL))));
+
+ String useOnePass = Maps.getOrDefault(
+ options,
+ "single_pass",
+ CarbonProperties.getInstance().getProperty(
+ CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS,
+ CarbonLoadOptionConstants.CARBON_OPTIONS_SINGLE_PASS_DEFAULT)).trim().toLowerCase();
+
+ boolean singlePass;
+
+ if (useOnePass.equalsIgnoreCase("true")) {
+ singlePass = true;
+ } else {
+ // when single_pass = false and if either alldictionarypath
+ // or columnDict is configured the do not allow load
+ if (StringUtils.isNotEmpty(optionsFinal.get("all_dictionary_path")) || StringUtils
--- End diff --
move last StringUitls to next line
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on the issue:
https://github.com/apache/carbondata/pull/1970
LGTM
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167482170
--- Diff: core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentStatusManager.java ---
@@ -708,4 +710,129 @@ public static Boolean checkIfValidLoadInProgress(AbsoluteTableIdentifier absolut
}
}
+ private static boolean isLoadDeletionRequired(String metaDataLocation) {
+ LoadMetadataDetails[] details = SegmentStatusManager.readLoadMetadata(metaDataLocation);
+ if (details != null && details.length > 0) {
+ for (LoadMetadataDetails oneRow : details) {
+ if ((SegmentStatus.MARKED_FOR_DELETE == oneRow.getSegmentStatus()
+ || SegmentStatus.COMPACTED == oneRow.getSegmentStatus()
+ || SegmentStatus.INSERT_IN_PROGRESS == oneRow.getSegmentStatus()
+ || SegmentStatus.INSERT_OVERWRITE_IN_PROGRESS == oneRow.getSegmentStatus())
+ && oneRow.getVisibility().equalsIgnoreCase("true")) {
+ return true;
+ }
+ }
+ }
+ return false;
+ }
+
+ /**
+ * This will update the old table status details before clean files to the latest table status.
+ * @param oldList
+ * @param newList
+ * @return
+ */
+ public static List<LoadMetadataDetails> updateLoadMetadataFromOldToNew(
+ LoadMetadataDetails[] oldList, LoadMetadataDetails[] newList) {
+
+ List<LoadMetadataDetails> newListMetadata =
+ new ArrayList<LoadMetadataDetails>(Arrays.asList(newList));
+ for (LoadMetadataDetails oldSegment : oldList) {
+ if ("false".equalsIgnoreCase(oldSegment.getVisibility())) {
+ newListMetadata.get(newListMetadata.indexOf(oldSegment)).setVisibility("false");
+ }
+ }
+ return newListMetadata;
+ }
+
+ private static void writeLoadMetadata(AbsoluteTableIdentifier identifier,
+ List<LoadMetadataDetails> listOfLoadFolderDetails) throws IOException {
+ String dataLoadLocation = CarbonTablePath.getTableStatusFilePath(identifier.getTablePath());
+
+ DataOutputStream dataOutputStream;
+ Gson gsonObjectToWrite = new Gson();
+ BufferedWriter brWriter = null;
+
+ AtomicFileOperations writeOperation =
+ new AtomicFileOperationsImpl(dataLoadLocation, FileFactory.getFileType(dataLoadLocation));
+
+ try {
+
+ dataOutputStream = writeOperation.openForWrite(FileWriteOperation.OVERWRITE);
+ brWriter = new BufferedWriter(new OutputStreamWriter(dataOutputStream,
+ Charset.forName(CarbonCommonConstants.DEFAULT_CHARSET)));
+
+ String metadataInstance = gsonObjectToWrite.toJson(listOfLoadFolderDetails.toArray());
+ brWriter.write(metadataInstance);
+ } finally {
+ try {
+ if (null != brWriter) {
+ brWriter.flush();
+ }
+ } catch (Exception e) {
+ LOG.error("error in flushing ");
+
+ }
+ CarbonUtil.closeStreams(brWriter);
+ writeOperation.close();
+ }
+ }
+
+ public static void deleteLoadsAndUpdateMetadata(
+ CarbonTable carbonTable,
+ boolean isForceDeletion) throws IOException {
+ if (isLoadDeletionRequired(carbonTable.getMetadataPath())) {
+ LoadMetadataDetails[] details =
+ SegmentStatusManager.readLoadMetadata(carbonTable.getMetadataPath());
+ AbsoluteTableIdentifier identifier = carbonTable.getAbsoluteTableIdentifier();
+ ICarbonLock carbonTableStatusLock = CarbonLockFactory.getCarbonLockObj(
+ identifier, LockUsage.TABLE_STATUS_LOCK);
+
+ // Delete marked loads
+ boolean isUpdationRequired = DeleteLoadFolders.deleteLoadFoldersFromFileSystem(
+ identifier,
--- End diff --
fixed
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/1970
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3516/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3699/
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by QiangCai <gi...@git.apache.org>.
Github user QiangCai commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167466587
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datetype/DateTypeTest.scala ---
@@ -16,10 +16,11 @@
*/
package org.apache.carbondata.spark.testsuite.datetype
-import org.apache.carbondata.spark.exception.MalformedCarbonCommandException
import org.apache.spark.sql.test.util.QueryTest
import org.scalatest.BeforeAndAfterAll
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+
--- End diff --
not required
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3698/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2458/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1970
retest this please
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/1970
CI failed due to environment problem in CI machine.
merged into carbonstore branch.
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3704/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3701/
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/3702/
---
[GitHub] carbondata pull request #1970: [CARBONDATA-2159] Remove carbon-spark depende...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/1970#discussion_r167482163
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/datetype/DateTypeTest.scala ---
@@ -16,10 +16,11 @@
*/
package org.apache.carbondata.spark.testsuite.datetype
-import org.apache.carbondata.spark.exception.MalformedCarbonCommandException
import org.apache.spark.sql.test.util.QueryTest
import org.scalatest.BeforeAndAfterAll
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+
--- End diff --
fixed
---
[GitHub] carbondata issue #1970: [CARBONDATA-2159] Remove carbon-spark dependency in ...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/1970
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2459/
---