You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by sounakr <gi...@git.apache.org> on 2018/03/12 14:46:17 UTC
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
GitHub user sounakr opened a pull request:
https://github.com/apache/carbondata/pull/2054
[CARBONDATA-2224][File Level Reader Support] External File level reader support
File level reader reads any carbondata file placed in any external file path. The reading can be done through 3 methods.
a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used in this case to read the file. To create a spark datasource external table
" CREATE TABLE sdkOutputTable USING CarbonDataFileFormat LOCATION '$writerOutputFilePath1'"
For more details please refer the test file org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala
file.
b) Reading from spark sql as a external table. CarbonFileinputFormat.java is used for reading the files. The create table syntax for this will be
"CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'carbondatafileformat' LOCATION '$writerOutputFilePath6'"
For more details org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala.
c) Reading Through Hadoop Map reduce job. Please refer org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more details.
**Limitation** :: This implementation depend on writer SDK file path as following table_name/Fact/Part0/Segment_null. This reader writer must be independent of static path.
Due to this reader currently won't work with standard partition also. This will be handled in future PRs.
- [ ] Any interfaces changed?
- [ ] Any backward compatibility impacted?
- [ ] Document update required?
- [ ] Testing done
Please provide details on
- Whether new unit test cases have been added or why no new tests are required?
- How it is tested? Please attach test report.
- Is it a performance related change? Please attach the performance test report.
- Any additional information to help reviewers in testing this change.
- [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/sounakr/incubator-carbondata file_level_reader
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/carbondata/pull/2054.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2054
----
commit 5765fc4007c8514ccb20c6a98c7f4463483275fc
Author: sounakr <so...@...>
Date: 2018-02-24T02:25:14Z
File Format Reader
commit 685214f2537cdca75ecb58196e4b2a168e6c9cbb
Author: sounakr <so...@...>
Date: 2018-02-26T11:58:47Z
File Format Phase 2
commit b1070c2322e5fdd0fce83fe36b083611b0b60bf6
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T06:06:56Z
* File Format Phase 2 (cleanup code)
commit 9bb51e9f6d475f58409815434edf089b60795584
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T06:36:28Z
* File Format Phase 2 (cleanup code)
commit 69d85aa1a869e6018cf25728f326328de027085a
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T09:54:43Z
* File Format Phase 2 (cleanup code and adding testCase)
commit f092f86a2ff033a5e1e7798cf8ed2658f8cb888d
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T11:58:37Z
* File Format Phase 2 (filter issue fix)
commit 13a97acb1562f1a8dfa8830cc0a872c5b6361961
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T12:20:46Z
* File Format Phase 2 (filter issue fix return value)
commit d146e1c2e5c67d3251ac99e7853351bd498b4b6a
Author: sounakr <so...@...>
Date: 2018-02-27T13:55:16Z
Clear DataMap Cache
commit eb97736e3cdd46b62f7f7203c10e2ac86fbea375
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-27T14:02:35Z
* File Format Phase 2 (test cases)
commit b192fe886be21b3d137944929cf45dd1c931bd65
Author: sounakr <so...@...>
Date: 2018-02-28T03:18:45Z
Refactor CarbonFileInputFormat
commit 5916a476b215a44e4e580b870093182ef7ca5183
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-28T10:02:08Z
* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat
commit db65fcb48158eec6f8e02a528f07f72eae1b3d4a
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-28T10:02:08Z
* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat
commit ec1870763e28c48b7796ab090a911c55228cb614
Author: Ajantha-Bhat <aj...@...>
Date: 2018-02-28T10:02:08Z
* File Format Phase 2
a. test cases addition
b. Exception handling when the files are not present
c. Setting the filter expression in carbonTableInputFormat
commit 08508f0a0c5ab0c43b568bda84c2602f38ae3f3c
Author: sounakr <so...@...>
Date: 2018-03-01T11:23:39Z
Map Reduce Test Case for CarbonInputFileFormat
commit fe56389b55227bb287f2b8cffaf1a6da8b567fa8
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-01T11:41:03Z
* fixed the issues
Existing external table flow got impacted
Added a new storage(provider) carbondatafileformat for external table creation
commit 83784c00487cdf76b724d31218fdf57c241e7901
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-01T15:32:07Z
* Bug fixes
CarbonFileInputFormat flow 3 issue fixes.
a. schema ordinal
b. table path problem in absolute identifier
c. drop of external table fix
d. unwanted code cleanup
commit 866807a01eb4c9617f36e141b19ccb6a94de6aca
Author: sounakr <so...@...>
Date: 2018-03-02T05:09:45Z
Review Code
commit 729fb7ea629bcec3afdf5f933309bc2db15663fd
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-05T11:07:10Z
merge conflict fix
commit 5767275d5788ea38b5f75920c84fd0a315932e4d
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-06T10:08:20Z
* Fixed the test script failure for spark 2.1
commit ecf8b339d3402b482e71fc7f970f581bda5c4aff
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-06T11:58:32Z
* Fixed the test script failure for spark 2.1, 2.2
commit da45328f111cd02b07783bfa340015bec64452dc
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-12T12:46:10Z
* Fix the compilation errors after rebase to master.
commit 13d40503e6ed559b80ec3465e85bf7ac3d2cf407
Author: Ajantha-Bhat <aj...@...>
Date: 2018-03-12T12:59:00Z
*Fixing the test case of this requirement
----
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4210/
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:
https://github.com/apache/carbondata/pull/2054
@jackylk : Binary files [carbondata and index files (sdk Writer output)] are intentionally added for test cases of this requirement. Test cases will fail If we remove them.
---
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173841691
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java ---
@@ -28,7 +28,8 @@
import org.apache.carbondata.core.metadata.schema.table.TableInfo;
import org.apache.carbondata.core.util.CarbonUtil;
import org.apache.carbondata.core.util.path.CarbonTablePath;
-import org.apache.carbondata.core.util.path.CarbonTablePath;
+
+
--- End diff --
remove empty line
---
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173842791
--- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVCarbonWriterSuite.java ---
@@ -68,13 +68,12 @@ public void testWriteFilesJsonSchema() throws IOException {
private void writeFilesAndVerify(Schema schema, String path) {
try {
- CarbonWriter writer = CarbonWriter.builder()
- .withSchema(schema)
- .outputPath(path)
- .buildWriterForCSVInput();
+ CarbonWriter writer =
+ CarbonWriter.builder().withSchema(schema).outputPath(path).buildWriterForCSVInput();
for (int i = 0; i < 100; i++) {
- writer.write(new String[]{"robot" + i, String.valueOf(i), String.valueOf((double) i / 2)});
+ writer
+ .write(new String[] { "robot" + i, String.valueOf(i), String.valueOf((double) i / 2) });
--- End diff --
do not modify it since no change
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:
https://github.com/apache/carbondata/pull/2054
There are some binary files, please delete them
---
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173842755
--- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVCarbonWriterSuite.java ---
@@ -68,13 +68,12 @@ public void testWriteFilesJsonSchema() throws IOException {
private void writeFilesAndVerify(Schema schema, String path) {
try {
- CarbonWriter writer = CarbonWriter.builder()
- .withSchema(schema)
- .outputPath(path)
- .buildWriterForCSVInput();
+ CarbonWriter writer =
+ CarbonWriter.builder().withSchema(schema).outputPath(path).buildWriterForCSVInput();
--- End diff --
do not modify the code style
---
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
Posted by sounakr <gi...@git.apache.org>.
Github user sounakr closed the pull request at:
https://github.com/apache/carbondata/pull/2054
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054
Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4218/
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2054
SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3858/
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2964/
---
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173885274
--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
@@ -826,6 +826,12 @@ public boolean isExternalTable() {
return external != null && external.equalsIgnoreCase("true");
}
+ public boolean isFileLevelExternalTable() {
--- End diff --
**stored by 'carbondatafileformat' is mapped with _filelevelexternal.**
So, In carbonScanRDD, when mapReduce service or hadoop service calls carbonScanRDD.
based on _filelevelexternal, new File level reader [CarbonFileInputFormat] will be called.
External table can be table level (stored by 'carbondata') or file level (stored by 'carbondatafileformat')
**This is used to identify the file level external table.**
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:
https://github.com/apache/carbondata/pull/2054
SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3867/
---
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173841169
--- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
@@ -826,6 +826,12 @@ public boolean isExternalTable() {
return external != null && external.equalsIgnoreCase("true");
}
+ public boolean isFileLevelExternalTable() {
--- End diff --
why is this property required?
---
[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...
Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2054#discussion_r173842122
--- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java ---
@@ -79,4 +81,19 @@ public static TableInfo getTableInfo(AbsoluteTableIdentifier identifier)
carbonTableIdentifier.getTableName(),
identifier.getTablePath());
}
+
+
+ public static TableInfo inferSchemaForExternalTable(AbsoluteTableIdentifier identifier)
--- End diff --
Can the input param change to `String tablePath`?
---
[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...
Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:
https://github.com/apache/carbondata/pull/2054
Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2973/
---