You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by sounakr <gi...@git.apache.org> on 2018/03/12 14:46:17 UTC

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

GitHub user sounakr opened a pull request:

    https://github.com/apache/carbondata/pull/2054

    [CARBONDATA-2224][File Level Reader Support] External File level reader support

    File level reader reads any carbondata file placed in any external file path. The reading can be done through 3 methods.
    a) Reading as a datasource from Spark. CarbonFileLevelFormat.scala is used in this case to read the file. To create a spark datasource external table
    " CREATE TABLE sdkOutputTable USING CarbonDataFileFormat LOCATION '$writerOutputFilePath1'"
    For more details please refer the test file org/apache/carbondata/spark/testsuite/createTable/TestCreateTableUsingCarbonFileLevelFormat.scala
    file.
    
    b) Reading from spark sql as a external table. CarbonFileinputFormat.java is used for reading the files. The create table syntax for this will be
    "CREATE EXTERNAL TABLE sdkOutputTable STORED BY 'carbondatafileformat' LOCATION '$writerOutputFilePath6'"
    For more details org/apache/carbondata/spark/testsuite/createTable/TestCarbonFileInputFormatWithExternalCarbonTable.scala.
    
    c) Reading Through Hadoop Map reduce job. Please refer org/apache/carbondata/mapred/TestMapReduceCarbonFileInputFormat.java for more details.
    
    
    **Limitation** :: This implementation depend on writer SDK file path as following table_name/Fact/Part0/Segment_null.  This reader writer must be independent of static path.
    Due to this reader currently won't work with standard partition also. This will be handled in future PRs.
    
     - [ ] Any interfaces changed?
     
     - [ ] Any backward compatibility impacted?
     
     - [ ] Document update required?
    
     - [ ] Testing done
            Please provide details on 
            - Whether new unit test cases have been added or why no new tests are required?
            - How it is tested? Please attach test report.
            - Is it a performance related change? Please attach the performance test report.
            - Any additional information to help reviewers in testing this change.
           
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. 
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sounakr/incubator-carbondata file_level_reader

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2054.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2054
    
----
commit 5765fc4007c8514ccb20c6a98c7f4463483275fc
Author: sounakr <so...@...>
Date:   2018-02-24T02:25:14Z

    File Format Reader

commit 685214f2537cdca75ecb58196e4b2a168e6c9cbb
Author: sounakr <so...@...>
Date:   2018-02-26T11:58:47Z

    File Format Phase 2

commit b1070c2322e5fdd0fce83fe36b083611b0b60bf6
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-27T06:06:56Z

    * File Format Phase 2 (cleanup code)

commit 9bb51e9f6d475f58409815434edf089b60795584
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-27T06:36:28Z

    * File Format Phase 2 (cleanup code)

commit 69d85aa1a869e6018cf25728f326328de027085a
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-27T09:54:43Z

    * File Format Phase 2 (cleanup code and adding testCase)

commit f092f86a2ff033a5e1e7798cf8ed2658f8cb888d
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-27T11:58:37Z

    * File Format Phase 2 (filter issue fix)

commit 13a97acb1562f1a8dfa8830cc0a872c5b6361961
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-27T12:20:46Z

    * File Format Phase 2 (filter issue fix return value)

commit d146e1c2e5c67d3251ac99e7853351bd498b4b6a
Author: sounakr <so...@...>
Date:   2018-02-27T13:55:16Z

    Clear DataMap Cache

commit eb97736e3cdd46b62f7f7203c10e2ac86fbea375
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-27T14:02:35Z

    * File Format Phase 2 (test cases)

commit b192fe886be21b3d137944929cf45dd1c931bd65
Author: sounakr <so...@...>
Date:   2018-02-28T03:18:45Z

    Refactor CarbonFileInputFormat

commit 5916a476b215a44e4e580b870093182ef7ca5183
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-28T10:02:08Z

    * File Format Phase 2
    a. test cases addition
    b. Exception handling when the files are not present
    c. Setting the filter expression in carbonTableInputFormat

commit db65fcb48158eec6f8e02a528f07f72eae1b3d4a
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-28T10:02:08Z

    * File Format Phase 2
    a. test cases addition
    b. Exception handling when the files are not present
    c. Setting the filter expression in carbonTableInputFormat

commit ec1870763e28c48b7796ab090a911c55228cb614
Author: Ajantha-Bhat <aj...@...>
Date:   2018-02-28T10:02:08Z

    * File Format Phase 2
    a. test cases addition
    b. Exception handling when the files are not present
    c. Setting the filter expression in carbonTableInputFormat

commit 08508f0a0c5ab0c43b568bda84c2602f38ae3f3c
Author: sounakr <so...@...>
Date:   2018-03-01T11:23:39Z

    Map Reduce Test Case for CarbonInputFileFormat

commit fe56389b55227bb287f2b8cffaf1a6da8b567fa8
Author: Ajantha-Bhat <aj...@...>
Date:   2018-03-01T11:41:03Z

    * fixed the issues
    Existing external table flow got impacted
    Added a new storage(provider) carbondatafileformat for external table creation

commit 83784c00487cdf76b724d31218fdf57c241e7901
Author: Ajantha-Bhat <aj...@...>
Date:   2018-03-01T15:32:07Z

    * Bug fixes
    CarbonFileInputFormat flow 3 issue fixes.
    a. schema ordinal
    b. table path problem in absolute identifier
    c. drop of external table fix
    d. unwanted code cleanup

commit 866807a01eb4c9617f36e141b19ccb6a94de6aca
Author: sounakr <so...@...>
Date:   2018-03-02T05:09:45Z

    Review Code

commit 729fb7ea629bcec3afdf5f933309bc2db15663fd
Author: Ajantha-Bhat <aj...@...>
Date:   2018-03-05T11:07:10Z

    merge conflict fix

commit 5767275d5788ea38b5f75920c84fd0a315932e4d
Author: Ajantha-Bhat <aj...@...>
Date:   2018-03-06T10:08:20Z

    * Fixed the test script failure for spark 2.1

commit ecf8b339d3402b482e71fc7f970f581bda5c4aff
Author: Ajantha-Bhat <aj...@...>
Date:   2018-03-06T11:58:32Z

    * Fixed the test script failure for spark 2.1, 2.2

commit da45328f111cd02b07783bfa340015bec64452dc
Author: Ajantha-Bhat <aj...@...>
Date:   2018-03-12T12:46:10Z

    * Fix the compilation errors after rebase to master.

commit 13d40503e6ed559b80ec3465e85bf7ac3d2cf407
Author: Ajantha-Bhat <aj...@...>
Date:   2018-03-12T12:59:00Z

    *Fixing the test case of this requirement

----


---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4210/



---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    @jackylk : Binary files [carbondata and index files (sdk Writer output)] are intentionally added for test cases of this requirement. Test cases will fail If we remove them.


---

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2054#discussion_r173841691
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java ---
    @@ -28,7 +28,8 @@
     import org.apache.carbondata.core.metadata.schema.table.TableInfo;
     import org.apache.carbondata.core.util.CarbonUtil;
     import org.apache.carbondata.core.util.path.CarbonTablePath;
    -import org.apache.carbondata.core.util.path.CarbonTablePath;
    +
    +
    --- End diff --
    
    remove empty line


---

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2054#discussion_r173842791
  
    --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVCarbonWriterSuite.java ---
    @@ -68,13 +68,12 @@ public void testWriteFilesJsonSchema() throws IOException {
     
       private void writeFilesAndVerify(Schema schema, String path) {
         try {
    -      CarbonWriter writer = CarbonWriter.builder()
    -          .withSchema(schema)
    -          .outputPath(path)
    -          .buildWriterForCSVInput();
    +      CarbonWriter writer =
    +          CarbonWriter.builder().withSchema(schema).outputPath(path).buildWriterForCSVInput();
     
           for (int i = 0; i < 100; i++) {
    -        writer.write(new String[]{"robot" + i, String.valueOf(i), String.valueOf((double) i / 2)});
    +        writer
    +            .write(new String[] { "robot" + i, String.valueOf(i), String.valueOf((double) i / 2) });
    --- End diff --
    
    do not modify it since no change


---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    There are some binary files, please delete them


---

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2054#discussion_r173842755
  
    --- Diff: store/sdk/src/test/java/org/apache/carbondata/sdk/file/CSVCarbonWriterSuite.java ---
    @@ -68,13 +68,12 @@ public void testWriteFilesJsonSchema() throws IOException {
     
       private void writeFilesAndVerify(Schema schema, String path) {
         try {
    -      CarbonWriter writer = CarbonWriter.builder()
    -          .withSchema(schema)
    -          .outputPath(path)
    -          .buildWriterForCSVInput();
    +      CarbonWriter writer =
    +          CarbonWriter.builder().withSchema(schema).outputPath(path).buildWriterForCSVInput();
    --- End diff --
    
    do not modify the code style


---

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

Posted by sounakr <gi...@git.apache.org>.
Github user sounakr closed the pull request at:

    https://github.com/apache/carbondata/pull/2054


---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/4218/



---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3858/



---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2964/



---

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2054#discussion_r173885274
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -826,6 +826,12 @@ public boolean isExternalTable() {
         return external != null && external.equalsIgnoreCase("true");
       }
     
    +  public boolean isFileLevelExternalTable() {
    --- End diff --
    
    **stored by 'carbondatafileformat' is mapped with _filelevelexternal.**
    So, In carbonScanRDD, when mapReduce service or hadoop service calls carbonScanRDD.
    based on _filelevelexternal, new File level reader [CarbonFileInputFormat] will be called.
    
    External table can be table level (stored by 'carbondata') or file level (stored by 'carbondatafileformat')
    **This is used to identify the file level external table.**


---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/3867/



---

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2054#discussion_r173841169
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java ---
    @@ -826,6 +826,12 @@ public boolean isExternalTable() {
         return external != null && external.equalsIgnoreCase("true");
       }
     
    +  public boolean isFileLevelExternalTable() {
    --- End diff --
    
    why is this property required?


---

[GitHub] carbondata pull request #2054: [CARBONDATA-2224][File Level Reader Support] ...

Posted by jackylk <gi...@git.apache.org>.
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2054#discussion_r173842122
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/SchemaReader.java ---
    @@ -79,4 +81,19 @@ public static TableInfo getTableInfo(AbsoluteTableIdentifier identifier)
             carbonTableIdentifier.getTableName(),
             identifier.getTablePath());
       }
    +
    +
    +  public static TableInfo inferSchemaForExternalTable(AbsoluteTableIdentifier identifier)
    --- End diff --
    
    Can the input param change to `String tablePath`?


---

[GitHub] carbondata issue #2054: [CARBONDATA-2224][File Level Reader Support] Externa...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2054
  
    Build Failed with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/2973/



---