You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by sounakr <gi...@git.apache.org> on 2018/03/12 19:33:12 UTC
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
GitHub user sounakr opened a pull request:
https://github.com/apache/carbondata/pull/2055
[CARBONDATA-2224][File Level Reader Support] External File level reader support
File level reader reads any carbondata file placed in any external file path. The reading can be done through 3 methods.
a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used in this case to read the file. To create a spark datasource external table
" CREATE TABLE sdkOutputTable USING CarbonDataFileFormat LOCATION '$writerOutputFilePath1'"
For more details please refer the test file org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala
file.
b) Reading from spark sql as a external table. CarbonFileinputFormat.java is used for reading the files. The create table syntax for this will be
"CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'carbondatafileformat' LOCATION '$writerOutputFilePath6'"
For more details org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala.
c) Reading Through Hadoop Map reduce job. Please refer org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more details.
Limitation :: This implementation depend on writer SDK file path as following table_name/Fact/Part0/Segment_null. This reader writer must be independent of static path.
Due to this reader currently won't work with standard partition also. This will be handled in future PRs.
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sounakr/incubator-carbondata file_level_reader_master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2055.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2055
----
commit 5e65f3f97749571a74b6c04a05f5b09aec709787
Author: sounakr <so...@...>
Date: 2018-02-24T02:25:14Z
File Format Reader
commit bcb8f64d61e19787fb3303a00d59cb61a6ebce32
Author: sounakr <so...@...>
Date: 2018-02-26T11:58:47Z
File Format Phase 2
commit 35b09072d7d75677f473e9d54b3a5db0ff1b64dc
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T06:06:56Z
* File Format Phase 2 (cleanup code)
commit 466abfad2fdcc50d69dbbf32791466b7fc4836d1
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T06:36:28Z
* File Format Phase 2 (cleanup code)
commit 5b2ad29bc9402e223af22124cc6d3d91962e72f4
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T09:54:43Z
* File Format Phase 2 (cleanup code and adding testCase)
commit 994372f0d2c7e8c528f9900c7b17ff8c8a857698
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T11:58:37Z
* File Format Phase 2 (filter issue fix)
commit e3160888dcac715928f9d18febd33b22177513a0
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T12:20:46Z
* File Format Phase 2 (filter issue fix return value)
commit 949e6a97680f46a91808be094505a519340a1a53
Author: sounakr <so...@...>
Date: 2018-02-27T13:55:16Z
Clear DataMap Cache
commit 7fdccc3885ab1c731d7066e36a2237372198ae22
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T14:02:35Z
* File Format Phase 2 (test cases)
commit 528e8120527a712308adee4b91d516a9891975ea
Author: sounakr <so...@...>
Date: 2018-02-28T03:18:45Z
Refactor CarbonFileInputFormat
commit 0a2b2249ea8486d2a217ff245b2311bb96936d64
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-28T10:02:08Z
* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat
commit fdfe2f405a2bb8ca122a785919290bc82a72c01c
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-28T10:02:08Z
* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat
commit 64627d2f2953779a9ee32f23be0b552b6b18f1d9
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-28T10:02:08Z
* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat
commit 8871e3140afa008794dfa0e8e2df58f5b29f46bd
Author: sounakr <so...@...>
Date: 2018-03-01T11:23:39Z
Map Reduce Test Case for CarbonInputFileFormat
commit 51403245ce250625de7a0bd20e369d3011f2eeb9
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-01T11:41:03Z
* fixed the issues
Existing external table flow got impacted
Added a new storage(provider) carbondatafileformat for external table creation
commit 1f89d92c947e4b4a1248493552187b70d1f51dba
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-01T15:32:07Z
* Bug fixes
CarbonFileInputFormat flow 3 issue fixes.
a. schema ordinal
b. table path problem in absolute identifier
c. drop of external table fix
d. unwanted code cleanup
commit e1e2ae5019c863d1d43d91d8f5f6852c6d92be29
Author: sounakr <so...@...>
Date: 2018-03-02T05:09:45Z
Review Code
commit 1e374feadd7dd86848b31fed113cf234f0ddb542
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-05T11:07:10Z
merge conflict fix
commit 97d90a1d2bf461dea0259153ab9b28247c2a75ab
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-06T10:08:20Z
* Fixed the test script failure for spark 2.1
commit b3dc89c278b6f89ce9c63ea9f3597124f6916543
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-06T11:58:32Z
* Fixed the test script failure for spark 2.1, 2.2
commit eca6617089702b246dcfb9b039be04d61ede5c6b
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-12T12:46:10Z
* Fix the compilation errors after rebase to master.
commit 761a7ba32b7a4fc990f80e4ed6dc4e0294d7747c
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-12T12:59:00Z
*Fixing the test case of this requirement
commit 16745af45b0683d2121a40272dde92cc07275c93
Author: sounakr <so...@...>
Date: 2018-03-12T18:45:19Z
Review Comments
----
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by sounakr <gi...@git.apache.org>.
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/2055
Retest this please
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3114/
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by sounakr <gi...@git.apache.org>.
Github user sounakr closed the pull request at:
https://github.com/apache/carbondata/pull/2055
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2995/
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208602
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java ---
@@ -0,0 +1,193 @@
+/*
--- End diff --
There are some binary files in this PR, please remove them
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2055
retest this please... cannot see spark 2.1 report.
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3148/
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174212571
--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileLevelFormat.scala ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.net.URI
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.mapred.JobConf
+import org.apache.hadoop.mapreduce._
+import org.apache.hadoop.mapreduce.lib.input.FileSplit
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.execution.datasources.text.TextOutputWriter
+import org.apache.spark.sql.optimizer.CarbonFilters
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.types.{AtomicType, StructField, StructType}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.{DataMapChooser, DataMapStoreManager, Segment}
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore
+import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata, ColumnarFormatVersion}
+import org.apache.carbondata.core.reader.CarbonHeaderReader
+import org.apache.carbondata.core.scan.expression.Expression
+import org.apache.carbondata.core.scan.expression.logical.AndExpression
+import org.apache.carbondata.core.scan.model.QueryModel
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.hadoop.{CarbonInputSplit, CarbonProjection, CarbonRecordReader, InputMetricsStats}
+import org.apache.carbondata.hadoop.api.{CarbonFileInputFormat, DataMapJob}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+class CarbonFileLevelFormat extends FileFormat
--- End diff --
Please rename current `CarbonFileFormat` class to `CarbonTableLevelFormat`
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174667420
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ---
@@ -0,0 +1,678 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.lang.reflect.Constructor;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.DataMapChooser;
+import org.apache.carbondata.core.datamap.DataMapLevel;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapper;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.InvalidConfigurationException;
+import org.apache.carbondata.core.indexstore.ExtendedBlocklet;
+import org.apache.carbondata.core.indexstore.PartitionSpec;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion;
+import org.apache.carbondata.core.metadata.schema.PartitionInfo;
+import org.apache.carbondata.core.metadata.schema.partition.PartitionType;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+import org.apache.carbondata.core.mutate.UpdateVO;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.SingleTableProvider;
+import org.apache.carbondata.core.scan.filter.TableProvider;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.stats.QueryStatistic;
+import org.apache.carbondata.core.stats.QueryStatisticsConstants;
+import org.apache.carbondata.core.stats.QueryStatisticsRecorder;
+import org.apache.carbondata.core.statusmanager.SegmentUpdateStatusManager;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeConverter;
+import org.apache.carbondata.core.util.DataTypeConverterImpl;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.CarbonRecordReader;
+import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport;
+import org.apache.carbondata.hadoop.readsupport.impl.DictionaryDecodeReadSupport;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.hadoop.util.SchemaReader;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocalFileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.hadoop.mapreduce.security.TokenCache;
+
+/**
+ * Input format of CarbonData file.
+ *
+ * @param <T>
+ */
+public class CarbonFileInputFormat<T> extends FileInputFormat<Void, T> implements Serializable {
--- End diff --
ok. Fixed
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2976/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4239/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2998/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2055
retest this please
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3152/
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174694029
--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
@@ -826,6 +826,12 @@ public boolean isExternalTable() {
return external != null && external.equalsIgnoreCase("true");
}
+ public boolean isFileLevelExternalTable() {
--- End diff --
Do not call it external, just change to `isFileLevelFormat` is ok
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4242/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4384/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3907/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4379/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2055
retest this please. Refactoring comments will be handled in the next PR to this carbonfile branch.
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174211199
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala ---
@@ -0,0 +1,292 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import java.io.File
+
+import org.apache.spark.sql.{AnalysisException, CarbonEnv}
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+
+class TestCreateTableUsingCarbonFileLevelFormat extends QueryTest with BeforeAndAfterAll {
--- End diff --
This suite is fine, but can you add one suite using SparkSession instead of CarbonSession?
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2055
retest this, please ...
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174690144
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+
+
+class TestCarbonFileInputFormatWithExternalCarbonTable extends QueryTest with BeforeAndAfterAll {
+
+ var writerPath = new File(this.getClass.getResource("/").getPath
+ +
+ "../." +
+ "./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+ .getCanonicalPath
+ //getCanonicalPath gives path with \, so code expects /. Need to handle in code ?
+ writerPath = writerPath.replace("\\", "/");
+
+
+ def buildTestData(persistSchema:Boolean) = {
+
+ FileUtils.deleteDirectory(new File(writerPath))
+
+ val schema = new StringBuilder()
+ .append("[ \n")
+ .append(" {\"name\":\"string\"},\n")
+ .append(" {\"age\":\"int\"},\n")
+ .append(" {\"height\":\"double\"}\n")
+ .append("]")
+ .toString()
+
+ try {
+ val builder = CarbonWriter.builder()
+ val writer =
+ if (persistSchema) {
+ builder.persistSchemaFile(true)
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ } else {
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ }
+
+ var i = 0
+ while (i < 100) {
+ writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2)))
+ i += 1
+ }
+ writer.close()
+ } catch {
+ case ex: Exception => None
+ case _ => None
+ }
+ }
+
+ def cleanTestData() = {
+ FileUtils.deleteDirectory(new File(writerPath))
+ }
+
+ def deleteIndexFile(path: String, extension: String) : Unit = {
+ val file: CarbonFile = FileFactory
+ .getCarbonFile(path, FileFactory.getFileType(path))
+
+ for (eachDir <- file.listFiles) {
+ if (!eachDir.isDirectory) {
+ if (eachDir.getName.endsWith(extension)) {
+ CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+ }
+ } else {
+ deleteIndexFile(eachDir.getPath, extension)
+ }
+ }
+ }
+
+ override def beforeAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ // create carbon table and insert data
+ }
+
+ override def afterAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ //TO DO, need to remove segment dependency and tableIdentifier Dependency
+ test("read carbondata files (sdk Writer Output) using the Carbonfile ") {
+ buildTestData(false)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //new provider Carbonfile
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ sql("Describe formatted sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable limit 3").show(false)
+
+ sql("select name from sdkOutputTable").show(false)
+
+ sql("select age from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable where age > 2 and age < 8").show(200, false)
+
+ sql("select * from sdkOutputTable where name = 'robot3'").show(200, false)
+
+ sql("select * from sdkOutputTable where name like 'robo%' limit 5").show(200, false)
+
+ sql("select * from sdkOutputTable where name like '%obot%' limit 2").show(200, false)
+
+ sql("select sum(age) from sdkOutputTable where name like 'robot1%' ").show(200, false)
+
+ sql("select count(*) from sdkOutputTable where name like 'robot%' ").show(200, false)
+
+ sql("select count(*) from sdkOutputTable").show(200, false)
+
+ sql("DROP TABLE sdkOutputTable")
+ // drop table should not delete the files
+ assert(new File(writerPath).exists())
+ cleanTestData()
+ }
+
+ test("should not allow to alter datasource carbontable ") {
+ buildTestData(false)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //data source file format
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ val exception = intercept[MalformedCarbonCommandException]
+ {
+ sql("Alter table sdkOutputTable change age age BIGINT")
+ }
+ assert(exception.getMessage()
+ .contains("Unsupported alter operation on Carbon external fileformat table"))
--- End diff --
Please use term `'carbonfile'` instead of external table
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174690041
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+
+
+class TestCarbonFileInputFormatWithExternalCarbonTable extends QueryTest with BeforeAndAfterAll {
+
+ var writerPath = new File(this.getClass.getResource("/").getPath
+ +
+ "../." +
+ "./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+ .getCanonicalPath
+ //getCanonicalPath gives path with \, so code expects /. Need to handle in code ?
+ writerPath = writerPath.replace("\\", "/");
+
+
+ def buildTestData(persistSchema:Boolean) = {
+
+ FileUtils.deleteDirectory(new File(writerPath))
+
+ val schema = new StringBuilder()
+ .append("[ \n")
+ .append(" {\"name\":\"string\"},\n")
+ .append(" {\"age\":\"int\"},\n")
+ .append(" {\"height\":\"double\"}\n")
+ .append("]")
+ .toString()
+
+ try {
+ val builder = CarbonWriter.builder()
+ val writer =
+ if (persistSchema) {
+ builder.persistSchemaFile(true)
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ } else {
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ }
+
+ var i = 0
+ while (i < 100) {
+ writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2)))
+ i += 1
+ }
+ writer.close()
+ } catch {
+ case ex: Exception => None
+ case _ => None
+ }
+ }
+
+ def cleanTestData() = {
+ FileUtils.deleteDirectory(new File(writerPath))
+ }
+
+ def deleteIndexFile(path: String, extension: String) : Unit = {
+ val file: CarbonFile = FileFactory
+ .getCarbonFile(path, FileFactory.getFileType(path))
+
+ for (eachDir <- file.listFiles) {
+ if (!eachDir.isDirectory) {
+ if (eachDir.getName.endsWith(extension)) {
+ CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+ }
+ } else {
+ deleteIndexFile(eachDir.getPath, extension)
+ }
+ }
+ }
+
+ override def beforeAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ // create carbon table and insert data
+ }
+
+ override def afterAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ //TO DO, need to remove segment dependency and tableIdentifier Dependency
+ test("read carbondata files (sdk Writer Output) using the Carbonfile ") {
+ buildTestData(false)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //new provider Carbonfile
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ sql("Describe formatted sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable limit 3").show(false)
+
+ sql("select name from sdkOutputTable").show(false)
+
+ sql("select age from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable where age > 2 and age < 8").show(200, false)
+
+ sql("select * from sdkOutputTable where name = 'robot3'").show(200, false)
+
+ sql("select * from sdkOutputTable where name like 'robo%' limit 5").show(200, false)
+
+ sql("select * from sdkOutputTable where name like '%obot%' limit 2").show(200, false)
+
+ sql("select sum(age) from sdkOutputTable where name like 'robot1%' ").show(200, false)
+
+ sql("select count(*) from sdkOutputTable where name like 'robot%' ").show(200, false)
+
+ sql("select count(*) from sdkOutputTable").show(200, false)
+
+ sql("DROP TABLE sdkOutputTable")
+ // drop table should not delete the files
+ assert(new File(writerPath).exists())
+ cleanTestData()
+ }
+
+ test("should not allow to alter datasource carbontable ") {
+ buildTestData(false)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //data source file format
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ val exception = intercept[MalformedCarbonCommandException]
+ {
+ sql("Alter table sdkOutputTable change age age BIGINT")
+ }
+ assert(exception.getMessage()
+ .contains("Unsupported alter operation on Carbon external fileformat table"))
--- End diff --
Change to `Unsupported alter operation on 'carbonfile' table`
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174212055
--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/CarbonFileLevelFormat.scala ---
@@ -0,0 +1,266 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.net.URI
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.mapred.JobConf
+import org.apache.hadoop.mapreduce._
+import org.apache.hadoop.mapreduce.lib.input.FileSplit
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.execution.datasources.text.TextOutputWriter
+import org.apache.spark.sql.optimizer.CarbonFilters
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.types.{AtomicType, StructField, StructType}
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.{DataMapChooser, DataMapStoreManager, Segment}
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore
+import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata, ColumnarFormatVersion}
+import org.apache.carbondata.core.reader.CarbonHeaderReader
+import org.apache.carbondata.core.scan.expression.Expression
+import org.apache.carbondata.core.scan.expression.logical.AndExpression
+import org.apache.carbondata.core.scan.model.QueryModel
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.hadoop.{CarbonInputSplit, CarbonProjection, CarbonRecordReader, InputMetricsStats}
+import org.apache.carbondata.hadoop.api.{CarbonFileInputFormat, DataMapJob}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+class CarbonFileLevelFormat extends FileFormat
+ with DataSourceRegister
+ with Logging
+ with Serializable {
+
+ @transient val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+ override def inferSchema(sparkSession: SparkSession,
+ options: Map[String, String],
+ files: Seq[FileStatus]): Option[StructType] = {
+ val filePaths = CarbonUtil.getFilePathExternalFilePath(
+ options.get("path").get)
+ if (filePaths.size() == 0) {
+ throw new SparkException("CarbonData file is not present in the location mentioned in DDL")
+ }
+ val carbonHeaderReader: CarbonHeaderReader = new CarbonHeaderReader(filePaths.get(0))
+ val fileHeader = carbonHeaderReader.readHeader
+ val table_columns: java.util.List[org.apache.carbondata.format.ColumnSchema] = fileHeader
+ .getColumn_schema
+ var colArray = ArrayBuffer[StructField]()
+ for (i <- 0 to table_columns.size() - 1) {
+ val col = CarbonUtil.thriftColumnSchmeaToWrapperColumnSchema(table_columns.get(i))
+ colArray += (new StructField(col.getColumnName,
+ CarbonScalaUtil.convertCarbonToSparkDataType(col.getDataType), false))
+ }
+ colArray.+:(Nil)
+
+ Some(StructType(colArray))
+ }
+
+ override def prepareWrite(sparkSession: SparkSession,
+ job: Job,
+ options: Map[String, String],
+ dataSchema: StructType): OutputWriterFactory = {
+
+ new OutputWriterFactory {
+ override def newInstance(
+ path: String,
+ dataSchema: StructType,
+ context: TaskAttemptContext): OutputWriter = {
+ new TextOutputWriter(path, dataSchema, context)
+ }
+
+ override def getFileExtension(context: TaskAttemptContext): String = {
+ CarbonTablePath.CARBON_DATA_EXT
+ }
+ }
+ }
+
+ override def shortName(): String = "CarbonDataFileFormat"
--- End diff --
change the shortName to `carbonfile`. So, `carbondata` is for table level and `carbonfile` is for file level
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208296
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ---
@@ -0,0 +1,678 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.lang.reflect.Constructor;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.DataMapChooser;
+import org.apache.carbondata.core.datamap.DataMapLevel;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapper;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.InvalidConfigurationException;
+import org.apache.carbondata.core.indexstore.ExtendedBlocklet;
+import org.apache.carbondata.core.indexstore.PartitionSpec;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion;
+import org.apache.carbondata.core.metadata.schema.PartitionInfo;
+import org.apache.carbondata.core.metadata.schema.partition.PartitionType;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+import org.apache.carbondata.core.mutate.UpdateVO;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.SingleTableProvider;
+import org.apache.carbondata.core.scan.filter.TableProvider;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.stats.QueryStatistic;
+import org.apache.carbondata.core.stats.QueryStatisticsConstants;
+import org.apache.carbondata.core.stats.QueryStatisticsRecorder;
+import org.apache.carbondata.core.statusmanager.SegmentUpdateStatusManager;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeConverter;
+import org.apache.carbondata.core.util.DataTypeConverterImpl;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.CarbonRecordReader;
+import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport;
+import org.apache.carbondata.hadoop.readsupport.impl.DictionaryDecodeReadSupport;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.hadoop.util.SchemaReader;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocalFileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.hadoop.mapreduce.security.TokenCache;
+
+/**
+ * Input format of CarbonData file.
+ *
+ * @param <T>
+ */
+public class CarbonFileInputFormat<T> extends FileInputFormat<Void, T> implements Serializable {
--- End diff --
Please annotate this class using InterfaceAudience.User and InterfaceStability.Evolving
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2055
retest this, please...
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174693583
--- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonFileFormat.scala ---
@@ -0,0 +1,269 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.net.URI
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.{FileStatus, Path}
+import org.apache.hadoop.mapred.JobConf
+import org.apache.hadoop.mapreduce._
+import org.apache.hadoop.mapreduce.lib.input.FileSplit
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+import org.apache.spark.{SparkException, TaskContext}
+import org.apache.spark.deploy.SparkHadoopUtil
+import org.apache.spark.internal.Logging
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.execution.datasources._
+import org.apache.spark.sql.execution.datasources.text.TextOutputWriter
+import org.apache.spark.sql.optimizer.CarbonFilters
+import org.apache.spark.sql.sources.{DataSourceRegister, Filter}
+import org.apache.spark.sql.types.{AtomicType, StructField, StructType}
+
+import org.apache.carbondata.common.annotations.{InterfaceAudience, InterfaceStability}
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datamap.{DataMapChooser, DataMapStoreManager, Segment}
+import org.apache.carbondata.core.indexstore.PartitionSpec
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore
+import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata, ColumnarFormatVersion}
+import org.apache.carbondata.core.reader.CarbonHeaderReader
+import org.apache.carbondata.core.scan.expression.Expression
+import org.apache.carbondata.core.scan.expression.logical.AndExpression
+import org.apache.carbondata.core.scan.model.QueryModel
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.hadoop.{CarbonInputSplit, CarbonProjection, CarbonRecordReader, InputMetricsStats}
+import org.apache.carbondata.hadoop.api.{CarbonFileInputFormat, DataMapJob}
+import org.apache.carbondata.spark.util.CarbonScalaUtil
+
+@InterfaceAudience.User
+@InterfaceStability.Evolving
+class SparkCarbonFileFormat extends FileFormat
+ with DataSourceRegister
+ with Logging
+ with Serializable {
+
+ @transient val LOGGER = LogServiceFactory.getLogService(this.getClass.getName)
+
+ override def inferSchema(sparkSession: SparkSession,
+ options: Map[String, String],
+ files: Seq[FileStatus]): Option[StructType] = {
+ val filePaths = CarbonUtil.getFilePathExternalFilePath(
+ options.get("path").get)
+ if (filePaths.size() == 0) {
+ throw new SparkException("CarbonData file is not present in the location mentioned in DDL")
+ }
+ val carbonHeaderReader: CarbonHeaderReader = new CarbonHeaderReader(filePaths.get(0))
+ val fileHeader = carbonHeaderReader.readHeader
+ val table_columns: java.util.List[org.apache.carbondata.format.ColumnSchema] = fileHeader
+ .getColumn_schema
+ var colArray = ArrayBuffer[StructField]()
+ for (i <- 0 to table_columns.size() - 1) {
+ val col = CarbonUtil.thriftColumnSchmeaToWrapperColumnSchema(table_columns.get(i))
+ colArray += (new StructField(col.getColumnName,
+ CarbonScalaUtil.convertCarbonToSparkDataType(col.getDataType), false))
+ }
+ colArray.+:(Nil)
+
+ Some(StructType(colArray))
+ }
+
+ override def prepareWrite(sparkSession: SparkSession,
+ job: Job,
+ options: Map[String, String],
+ dataSchema: StructType): OutputWriterFactory = {
+
+ new OutputWriterFactory {
+ override def newInstance(
+ path: String,
+ dataSchema: StructType,
+ context: TaskAttemptContext): OutputWriter = {
+ new TextOutputWriter(path, dataSchema, context)
+ }
+
+ override def getFileExtension(context: TaskAttemptContext): String = {
+ CarbonTablePath.CARBON_DATA_EXT
+ }
+ }
+ }
+
+ override def shortName(): String = "Carbonfile"
--- End diff --
make it non-capital: `carbonfile`
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174690840
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+
+
+class TestCarbonFileInputFormatWithExternalCarbonTable extends QueryTest with BeforeAndAfterAll {
+
+ var writerPath = new File(this.getClass.getResource("/").getPath
+ +
+ "../." +
+ "./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+ .getCanonicalPath
+ //getCanonicalPath gives path with \, so code expects /. Need to handle in code ?
+ writerPath = writerPath.replace("\\", "/");
+
+
+ def buildTestData(persistSchema:Boolean) = {
+
+ FileUtils.deleteDirectory(new File(writerPath))
+
+ val schema = new StringBuilder()
+ .append("[ \n")
+ .append(" {\"name\":\"string\"},\n")
+ .append(" {\"age\":\"int\"},\n")
+ .append(" {\"height\":\"double\"}\n")
+ .append("]")
+ .toString()
+
+ try {
+ val builder = CarbonWriter.builder()
+ val writer =
+ if (persistSchema) {
+ builder.persistSchemaFile(true)
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ } else {
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ }
+
+ var i = 0
+ while (i < 100) {
+ writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2)))
+ i += 1
+ }
+ writer.close()
+ } catch {
+ case ex: Exception => None
+ case _ => None
+ }
+ }
+
+ def cleanTestData() = {
+ FileUtils.deleteDirectory(new File(writerPath))
+ }
+
+ def deleteIndexFile(path: String, extension: String) : Unit = {
+ val file: CarbonFile = FileFactory
+ .getCarbonFile(path, FileFactory.getFileType(path))
+
+ for (eachDir <- file.listFiles) {
+ if (!eachDir.isDirectory) {
+ if (eachDir.getName.endsWith(extension)) {
+ CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+ }
+ } else {
+ deleteIndexFile(eachDir.getPath, extension)
+ }
+ }
+ }
+
+ override def beforeAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ // create carbon table and insert data
+ }
+
+ override def afterAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ //TO DO, need to remove segment dependency and tableIdentifier Dependency
+ test("read carbondata files (sdk Writer Output) using the Carbonfile ") {
+ buildTestData(false)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //new provider Carbonfile
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ sql("Describe formatted sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable limit 3").show(false)
+
+ sql("select name from sdkOutputTable").show(false)
+
+ sql("select age from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable where age > 2 and age < 8").show(200, false)
+
+ sql("select * from sdkOutputTable where name = 'robot3'").show(200, false)
+
+ sql("select * from sdkOutputTable where name like 'robo%' limit 5").show(200, false)
+
+ sql("select * from sdkOutputTable where name like '%obot%' limit 2").show(200, false)
+
+ sql("select sum(age) from sdkOutputTable where name like 'robot1%' ").show(200, false)
+
+ sql("select count(*) from sdkOutputTable where name like 'robot%' ").show(200, false)
+
+ sql("select count(*) from sdkOutputTable").show(200, false)
+
+ sql("DROP TABLE sdkOutputTable")
+ // drop table should not delete the files
+ assert(new File(writerPath).exists())
+ cleanTestData()
+ }
+
+ test("should not allow to alter datasource carbontable ") {
+ buildTestData(false)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //data source file format
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ val exception = intercept[MalformedCarbonCommandException]
+ {
+ sql("Alter table sdkOutputTable change age age BIGINT")
+ }
+ assert(exception.getMessage()
+ .contains("Unsupported alter operation on Carbon external fileformat table"))
+
+ sql("DROP TABLE sdkOutputTable")
+ // drop table should not delete the files
+ assert(new File(writerPath).exists())
+ cleanTestData()
+ }
+
+ test("Read sdk writer output file without index file should fail") {
+ buildTestData(false)
+ deleteIndexFile(writerPath, CarbonCommonConstants.UPDATE_INDEX_FILE_EXT)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //data source file format
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ //org.apache.spark.SparkException: Index file not present to read the carbondata file
+ val exception = intercept[java.lang.RuntimeException]
+ {
+ sql("select * from sdkOutputTable").show(false)
+ }
+ assert(exception.getMessage().contains("Index file not present to read the carbondata file"))
+
+ sql("DROP TABLE sdkOutputTable")
+ // drop table should not delete the files
+ assert(new File(writerPath).exists())
+ cleanTestData()
+ }
+
+
+ test("Read sdk writer output file without Carbondata file should fail") {
--- End diff --
I think read a table without index file should not fail. Like in case the CarbonWriter is using NO_SORT scope, in future it maybe not writing the index file. In that case, we should still be able to query on that path.
I think better to create another PR for this requirement. Please raise another JIRA for this, and put a TODO here
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3883/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2055
LGTM
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2975/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by sounakr <gi...@git.apache.org>.
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/2055
Retest this please
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2055
@jackylk : All the review comments have been addressed and again rebased with master. Please check the below commit.
https://github.com/apache/carbondata/pull/2055/commits/b530adf2071113a75bc8e982ea3bd934e971b650
InferSchema, just renaming done. cannot make it to table path level now as it dependent on identifier.
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174693784
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java ---
@@ -0,0 +1,682 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.ByteArrayInputStream;
+import java.io.DataInputStream;
+import java.io.IOException;
+import java.io.Serializable;
+import java.lang.reflect.Constructor;
+import java.util.ArrayList;
+import java.util.BitSet;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datamap.DataMapChooser;
+import org.apache.carbondata.core.datamap.DataMapLevel;
+import org.apache.carbondata.core.datamap.Segment;
+import org.apache.carbondata.core.datamap.dev.expr.DataMapExprWrapper;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
+import org.apache.carbondata.core.exception.InvalidConfigurationException;
+import org.apache.carbondata.core.indexstore.ExtendedBlocklet;
+import org.apache.carbondata.core.indexstore.PartitionSpec;
+import org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMapFactory;
+import org.apache.carbondata.core.indexstore.blockletindex.SegmentIndexFileStore;
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.metadata.ColumnarFormatVersion;
+import org.apache.carbondata.core.metadata.schema.PartitionInfo;
+import org.apache.carbondata.core.metadata.schema.partition.PartitionType;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import org.apache.carbondata.core.metadata.schema.table.TableInfo;
+import org.apache.carbondata.core.mutate.UpdateVO;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.SingleTableProvider;
+import org.apache.carbondata.core.scan.filter.TableProvider;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.stats.QueryStatistic;
+import org.apache.carbondata.core.stats.QueryStatisticsConstants;
+import org.apache.carbondata.core.stats.QueryStatisticsRecorder;
+import org.apache.carbondata.core.statusmanager.SegmentUpdateStatusManager;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeConverter;
+import org.apache.carbondata.core.util.DataTypeConverterImpl;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.CarbonMultiBlockSplit;
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.CarbonRecordReader;
+import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport;
+import org.apache.carbondata.hadoop.readsupport.impl.DictionaryDecodeReadSupport;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.hadoop.util.SchemaReader;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.LocalFileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.Reporter;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.hadoop.mapreduce.security.TokenCache;
+
+/**
+ * Input format of CarbonData file.
+ *
+ * @param <T>
+ */
+@InterfaceAudience.User
+@InterfaceStability.Evolving
+public class CarbonFileInputFormat<T> extends FileInputFormat<Void, T> implements Serializable {
--- End diff --
I think there are many duplicate code in `CarbonFileInputFormat` and `CarbonTableInputFormat`, can you make super class called `CarbonInputFormat` and move all duplicate code to it.
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208978
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java ---
@@ -79,4 +79,19 @@ public static TableInfo getTableInfo(AbsoluteTableIdentifier identifier)
carbonTableIdentifier.getTableName(),
identifier.getTablePath());
}
+
+
+ public static TableInfo inferSchemaForExternalTable(AbsoluteTableIdentifier identifier)
--- End diff --
rename to `inferSchema`, and can you pass the tablePath only
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4261/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4221/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3881/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by sounakr <gi...@git.apache.org>.
Github user sounakr commented on the issue:
https://github.com/apache/carbondata/pull/2055
Retest this please
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4348/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3868/
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174691411
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingSparkCarbonFileFormat.scala ---
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+
+class TestCreateTableUsingSparkCarbonFileFormat extends QueryTest with BeforeAndAfterAll {
+
+
+ override def beforeAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ override def afterAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ var writerPath = new File(this.getClass.getResource("/").getPath
+ +
+ "../." +
+ "./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+ .getCanonicalPath
+ //getCanonicalPath gives path with \, so code expects /. Need to handle in code ?
+ writerPath = writerPath.replace("\\", "/");
+
+ val filePath = writerPath + "/Fact/Part0/Segment_null/"
+
+ def buildTestData(persistSchema:Boolean) = {
+
+ FileUtils.deleteDirectory(new File(writerPath))
+
+ val schema = new StringBuilder()
+ .append("[ \n")
+ .append(" {\"name\":\"string\"},\n")
+ .append(" {\"age\":\"int\"},\n")
+ .append(" {\"height\":\"double\"}\n")
+ .append("]")
+ .toString()
+
+ try {
+ val builder = CarbonWriter.builder()
+ val writer =
+ if (persistSchema) {
+ builder.persistSchemaFile(true)
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ } else {
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ }
+
+ var i = 0
+ while (i < 100) {
+ writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2)))
+ i += 1
+ }
+ writer.close()
+ } catch {
+ case ex: Exception => None
+ case _ => None
+ }
+ }
+
+ def cleanTestData() = {
+ FileUtils.deleteDirectory(new File(writerPath))
+ }
+
+ def deleteIndexFile(path: String, extension: String) : Unit = {
+ val file: CarbonFile = FileFactory
+ .getCarbonFile(path, FileFactory.getFileType(path))
+
+ for (eachDir <- file.listFiles) {
+ if (!eachDir.isDirectory) {
+ if (eachDir.getName.endsWith(extension)) {
+ CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+ }
+ } else {
+ deleteIndexFile(eachDir.getPath, extension)
+ }
+ }
+ }
+
+ //TO DO, need to remove segment dependency and tableIdentifier Dependency
+ test("read carbondata files (sdk Writer Output) using the SparkCarbonFileFormat ") {
+ buildTestData(false)
+ assert(new File(filePath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //data source file format
+ if (sqlContext.sparkContext.version.startsWith("2.1")) {
+ //data source file format
+ sql(s"""CREATE TABLE sdkOutputTable USING Carbonfile OPTIONS (PATH '$filePath') """)
+ } else if (sqlContext.sparkContext.version.startsWith("2.2")) {
+ //data source file format
+ sql(
+ s"""CREATE TABLE sdkOutputTable USING Carbonfile LOCATION
+ |'$filePath' """.stripMargin)
+ } else{
+ // TO DO
+ }
+
+ sql("Describe formatted sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable limit 3").show(false)
+
+ sql("select name from sdkOutputTable").show(false)
+
+ sql("select age from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable where age > 2 and age < 8").show(200,false)
--- End diff --
please change all show to assert
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2055
Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/3017/
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2055
merged into carbonfile branch
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174691560
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingSparkCarbonFileFormat.scala ---
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+
+class TestCreateTableUsingSparkCarbonFileFormat extends QueryTest with BeforeAndAfterAll {
+
+
+ override def beforeAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ override def afterAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ var writerPath = new File(this.getClass.getResource("/").getPath
+ +
+ "../." +
+ "./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+ .getCanonicalPath
+ //getCanonicalPath gives path with \, so code expects /. Need to handle in code ?
+ writerPath = writerPath.replace("\\", "/");
+
+ val filePath = writerPath + "/Fact/Part0/Segment_null/"
+
+ def buildTestData(persistSchema:Boolean) = {
+
+ FileUtils.deleteDirectory(new File(writerPath))
+
+ val schema = new StringBuilder()
+ .append("[ \n")
+ .append(" {\"name\":\"string\"},\n")
+ .append(" {\"age\":\"int\"},\n")
+ .append(" {\"height\":\"double\"}\n")
+ .append("]")
+ .toString()
+
+ try {
+ val builder = CarbonWriter.builder()
+ val writer =
+ if (persistSchema) {
+ builder.persistSchemaFile(true)
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ } else {
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ }
+
+ var i = 0
+ while (i < 100) {
+ writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2)))
+ i += 1
+ }
+ writer.close()
+ } catch {
+ case ex: Exception => None
+ case _ => None
+ }
+ }
+
+ def cleanTestData() = {
+ FileUtils.deleteDirectory(new File(writerPath))
+ }
+
+ def deleteIndexFile(path: String, extension: String) : Unit = {
+ val file: CarbonFile = FileFactory
+ .getCarbonFile(path, FileFactory.getFileType(path))
+
+ for (eachDir <- file.listFiles) {
+ if (!eachDir.isDirectory) {
+ if (eachDir.getName.endsWith(extension)) {
+ CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+ }
+ } else {
+ deleteIndexFile(eachDir.getPath, extension)
+ }
+ }
+ }
+
+ //TO DO, need to remove segment dependency and tableIdentifier Dependency
+ test("read carbondata files (sdk Writer Output) using the SparkCarbonFileFormat ") {
+ buildTestData(false)
+ assert(new File(filePath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //data source file format
+ if (sqlContext.sparkContext.version.startsWith("2.1")) {
+ //data source file format
+ sql(s"""CREATE TABLE sdkOutputTable USING Carbonfile OPTIONS (PATH '$filePath') """)
+ } else if (sqlContext.sparkContext.version.startsWith("2.2")) {
+ //data source file format
+ sql(
+ s"""CREATE TABLE sdkOutputTable USING Carbonfile LOCATION
+ |'$filePath' """.stripMargin)
+ } else{
+ // TO DO
+ }
+
+ sql("Describe formatted sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable limit 3").show(false)
+
+ sql("select name from sdkOutputTable").show(false)
+
+ sql("select age from sdkOutputTable").show(false)
+
+ sql("select * from sdkOutputTable where age > 2 and age < 8").show(200,false)
+
+ sql("select * from sdkOutputTable where name = 'robot3'").show(200,false)
+
+ sql("select * from sdkOutputTable where name like 'robo%' limit 5").show(200,false)
+
+ sql("select * from sdkOutputTable where name like '%obot%' limit 2").show(200,false)
+
+ sql("select sum(age) from sdkOutputTable where name like 'robot1%' ").show(200,false)
+
+ sql("select count(*) from sdkOutputTable where name like 'robot%' ").show(200,false)
+
+ sql("select count(*) from sdkOutputTable").show(200,false)
+
+ sql("DROP TABLE sdkOutputTable")
+ // drop table should not delete the files
+ assert(new File(filePath).exists())
+ cleanTestData()
+ }
+
+
+ test("should not allow to alter datasource carbontable ") {
--- End diff --
all these testcases are duplicated, can you move them to a common class and use it in 3 test suites
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174689415
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala ---
@@ -0,0 +1,240 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import java.io.File
+
+import org.apache.commons.io.FileUtils
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.datastore.filesystem.CarbonFile
+import org.apache.carbondata.core.datastore.impl.FileFactory
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.sdk.file.{CarbonWriter, Schema}
+
+
+class TestCarbonFileInputFormatWithExternalCarbonTable extends QueryTest with BeforeAndAfterAll {
+
+ var writerPath = new File(this.getClass.getResource("/").getPath
+ +
+ "../." +
+ "./src/test/resources/SparkCarbonFileFormat/WriterOutput/")
+ .getCanonicalPath
+ //getCanonicalPath gives path with \, so code expects /. Need to handle in code ?
+ writerPath = writerPath.replace("\\", "/");
+
+
+ def buildTestData(persistSchema:Boolean) = {
+
+ FileUtils.deleteDirectory(new File(writerPath))
+
+ val schema = new StringBuilder()
+ .append("[ \n")
+ .append(" {\"name\":\"string\"},\n")
+ .append(" {\"age\":\"int\"},\n")
+ .append(" {\"height\":\"double\"}\n")
+ .append("]")
+ .toString()
+
+ try {
+ val builder = CarbonWriter.builder()
+ val writer =
+ if (persistSchema) {
+ builder.persistSchemaFile(true)
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ } else {
+ builder.withSchema(Schema.parseJson(schema)).outputPath(writerPath).buildWriterForCSVInput()
+ }
+
+ var i = 0
+ while (i < 100) {
+ writer.write(Array[String]("robot" + i, String.valueOf(i), String.valueOf(i.toDouble / 2)))
+ i += 1
+ }
+ writer.close()
+ } catch {
+ case ex: Exception => None
+ case _ => None
+ }
+ }
+
+ def cleanTestData() = {
+ FileUtils.deleteDirectory(new File(writerPath))
+ }
+
+ def deleteIndexFile(path: String, extension: String) : Unit = {
+ val file: CarbonFile = FileFactory
+ .getCarbonFile(path, FileFactory.getFileType(path))
+
+ for (eachDir <- file.listFiles) {
+ if (!eachDir.isDirectory) {
+ if (eachDir.getName.endsWith(extension)) {
+ CarbonUtil.deleteFoldersAndFilesSilent(eachDir)
+ }
+ } else {
+ deleteIndexFile(eachDir.getPath, extension)
+ }
+ }
+ }
+
+ override def beforeAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ // create carbon table and insert data
+ }
+
+ override def afterAll(): Unit = {
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+ }
+
+ //TO DO, need to remove segment dependency and tableIdentifier Dependency
+ test("read carbondata files (sdk Writer Output) using the Carbonfile ") {
+ buildTestData(false)
+ assert(new File(writerPath).exists())
+ sql("DROP TABLE IF EXISTS sdkOutputTable")
+
+ //new provider Carbonfile
+ sql(
+ s"""CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'Carbonfile' LOCATION
+ |'$writerPath' """.stripMargin)
+
+ sql("Describe formatted sdkOutputTable").show(false)
--- End diff --
can you assert the result instead of show
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174694130
--- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
@@ -2068,6 +2079,202 @@ private static void updateDecimalType(TableInfo tableInfo) {
return tableInfo;
}
+ public static ColumnSchema thriftColumnSchmeaToWrapperColumnSchema(
+ org.apache.carbondata.format.ColumnSchema externalColumnSchema) {
+ ColumnSchema wrapperColumnSchema = new ColumnSchema();
+ wrapperColumnSchema.setColumnUniqueId(externalColumnSchema.getColumn_id());
+ wrapperColumnSchema.setColumnName(externalColumnSchema.getColumn_name());
+ wrapperColumnSchema.setColumnar(externalColumnSchema.isColumnar());
+ DataType dataType = thriftDataTyopeToWrapperDataType(externalColumnSchema.data_type);
+ if (DataTypes.isDecimal(dataType)) {
+ DecimalType decimalType = (DecimalType) dataType;
+ decimalType.setPrecision(externalColumnSchema.getPrecision());
+ decimalType.setScale(externalColumnSchema.getScale());
+ }
+ wrapperColumnSchema.setDataType(dataType);
+ wrapperColumnSchema.setDimensionColumn(externalColumnSchema.isDimension());
+ List<Encoding> encoders = new ArrayList<Encoding>();
+ for (org.apache.carbondata.format.Encoding encoder : externalColumnSchema.getEncoders()) {
+ encoders.add(fromExternalToWrapperEncoding(encoder));
+ }
+ wrapperColumnSchema.setEncodingList(encoders);
+ wrapperColumnSchema.setNumberOfChild(externalColumnSchema.getNum_child());
+ wrapperColumnSchema.setPrecision(externalColumnSchema.getPrecision());
+ wrapperColumnSchema.setColumnGroup(externalColumnSchema.getColumn_group_id());
+ wrapperColumnSchema.setScale(externalColumnSchema.getScale());
+ wrapperColumnSchema.setDefaultValue(externalColumnSchema.getDefault_value());
+ wrapperColumnSchema.setSchemaOrdinal(externalColumnSchema.getSchemaOrdinal());
+ Map<String, String> properties = externalColumnSchema.getColumnProperties();
+ if (properties != null) {
+ if (properties.get(CarbonCommonConstants.SORT_COLUMNS) != null) {
+ wrapperColumnSchema.setSortColumn(true);
+ }
+ }
+ wrapperColumnSchema.setFunction(externalColumnSchema.getAggregate_function());
+ List<org.apache.carbondata.format.ParentColumnTableRelation> parentColumnTableRelation =
+ externalColumnSchema.getParentColumnTableRelations();
+ if (null != parentColumnTableRelation) {
+ wrapperColumnSchema.setParentColumnTableRelations(
+ fromThriftToWrapperParentTableColumnRelations(parentColumnTableRelation));
+ }
+ return wrapperColumnSchema;
+ }
+
+ static List<ParentColumnTableRelation> fromThriftToWrapperParentTableColumnRelations(
+ List<org.apache.carbondata.format.ParentColumnTableRelation> thirftParentColumnRelation) {
+ List<ParentColumnTableRelation> parentColumnTableRelationList = new ArrayList<>();
+ for (org.apache.carbondata.format.ParentColumnTableRelation carbonTableRelation :
+ thirftParentColumnRelation) {
+ RelationIdentifier relationIdentifier =
+ new RelationIdentifier(carbonTableRelation.getRelationIdentifier().getDatabaseName(),
+ carbonTableRelation.getRelationIdentifier().getTableName(),
+ carbonTableRelation.getRelationIdentifier().getTableId());
+ ParentColumnTableRelation parentColumnTableRelation =
+ new ParentColumnTableRelation(relationIdentifier, carbonTableRelation.getColumnId(),
+ carbonTableRelation.getColumnName());
+ parentColumnTableRelationList.add(parentColumnTableRelation);
+ }
+ return parentColumnTableRelationList;
+ }
+
+ static Encoding fromExternalToWrapperEncoding(
+ org.apache.carbondata.format.Encoding encoderThrift) {
+ switch (encoderThrift) {
+ case DICTIONARY:
+ return Encoding.DICTIONARY;
+ case DELTA:
+ return Encoding.DELTA;
+ case RLE:
+ return Encoding.RLE;
+ case INVERTED_INDEX:
+ return Encoding.INVERTED_INDEX;
+ case BIT_PACKED:
+ return Encoding.BIT_PACKED;
+ case DIRECT_DICTIONARY:
+ return Encoding.DIRECT_DICTIONARY;
+ default:
+ throw new IllegalArgumentException(encoderThrift.toString() + " is not supported");
+ }
+ }
+
+ static DataType thriftDataTyopeToWrapperDataType(
+ org.apache.carbondata.format.DataType dataTypeThrift) {
+ switch (dataTypeThrift) {
+ case BOOLEAN:
+ return DataTypes.BOOLEAN;
+ case STRING:
+ return DataTypes.STRING;
+ case SHORT:
+ return DataTypes.SHORT;
+ case INT:
+ return DataTypes.INT;
+ case LONG:
+ return DataTypes.LONG;
+ case DOUBLE:
+ return DataTypes.DOUBLE;
+ case DECIMAL:
+ return DataTypes.createDefaultDecimalType();
+ case DATE:
+ return DataTypes.DATE;
+ case TIMESTAMP:
+ return DataTypes.TIMESTAMP;
+ case ARRAY:
+ return DataTypes.createDefaultArrayType();
+ case STRUCT:
+ return DataTypes.createDefaultStructType();
+ default:
+ return DataTypes.STRING;
+ }
+ }
+
+ public static List<String> getFilePathExternalFilePath(String path) {
+
+ // return the list of carbondata files in the given path.
+ CarbonFile segment = FileFactory.getCarbonFile(path, FileFactory.getFileType(path));
+ CarbonFile[] dataFiles = segment.listFiles(new CarbonFileFilter() {
+ @Override public boolean accept(CarbonFile file) {
+
+ if (file.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT)) {
+ return true;
+ }
+ return false;
+ }
+ });
+ List<String> filePaths = new ArrayList<>(dataFiles.length);
+ for (CarbonFile dfiles : dataFiles) {
+ filePaths.add(dfiles.getAbsolutePath());
+ }
+ return filePaths;
+ }
+
+ /**
+ * This method will read the schema file from a given path
+ *
+ * @param schemaFilePath
+ * @return
+ */
+ public static org.apache.carbondata.format.TableInfo inferSchemaFileExternalTable(
--- End diff --
rename to `inferSchema`
---
[GitHub] carbondata pull request #2055: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2055#discussion_r174208671
--- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java ---
@@ -0,0 +1,193 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.mapred;
+
+
+import java.io.BufferedReader;
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileReader;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.util.List;
+import java.util.UUID;
+
+import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.api.CarbonFileInputFormat;
+import org.apache.carbondata.hadoop.api.CarbonTableInputFormat;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.conf.Configured;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.IntWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.mapred.FileInputFormat;
+import org.apache.hadoop.mapred.JobClient;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
+import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
+import org.apache.hadoop.util.Tool;
+import org.apache.hadoop.util.ToolRunner;
+import org.junit.Assert;
+import org.junit.Test;
+
+public class TestMapReduceCarbonFileInputFormat {
+
+ private static final Log LOG = LogFactory.getLog(TestMapReduceCarbonFileInputFormat.class);
+
+ private int countTheLines(String outPath) throws Exception {
+ File file = new File(outPath);
+ if (file.exists()) {
+ BufferedReader reader = new BufferedReader(new FileReader(file));
+ int i = 0;
+ while (reader.readLine() != null) {
+ i++;
+ }
+ reader.close();
+ return i;
+ }
+ return 0;
+ }
+
+ private int countTheColumns(String outPath) throws Exception {
+ File file = new File(outPath);
+ if (file.exists()) {
+ BufferedReader reader = new BufferedReader(new FileReader(file));
+ String[] split = reader.readLine().split(",");
+ reader.close();
+ return split.length;
+ }
+ return 0;
+ }
+
+
--- End diff --
remove empty lines
---
[GitHub] carbondata issue #2055: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2055
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3902/
---