You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@carbondata.apache.org by ajantha-bhat <gi...@git.apache.org> on 2018/05/05 13:23:52 UTC

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Reading two sdk writer outp...

GitHub user ajantha-bhat opened a pull request:

    https://github.com/apache/carbondata/pull/2273

    [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception

    [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception
    
    **problem** : when two sdk writer output with differnt schema is placed in
    same folder for reading, output is not as expected. It has many null
    output.
    
    **root cause:** when multiple carbondata and indexx files is placed in same
    folder. table schema is inferred by first file.
    comparing table schema with all other index file schema validation is
    not present
    
    **solution:** comapre table schema with all other index file schema, if
    there is a mismatch throw exception
    
    Be sure to do all of the following checklist to help us incorporate 
    your contribution quickly and easily:
    
     - [ ] Any interfaces changed? NA
     
     - [ ] Any backward compatibility impacted? NA
     
     - [ ] Document update required? NA
    
     - [ ] Testing done
       added the test case in  TestNonTransactionalCarbonTable.scala
     - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.  NA
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ajantha-bhat/carbondata schema_mismatch

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/2273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2273
    
----
commit 03ab5f30909fe971e6221fdab83bdeb2f81e385c
Author: ajantha-bhat <aj...@...>
Date:   2018-05-05T11:29:44Z

    [CARBONDATA-2442] Reading two sdk writer output with differnt schema should prompt exception
    
    problem : when two sdk writer output with differnt schema is placed in
    same folder for reading, output is not as expected. It has many null
    output.
    
    root cause: when multiple carbondata and indexx files is placed in same
    folder. table schema is inferred by first file.
    comparing table schema with all other index file schema validation is
    not present
    
    solution: comapre table schema with all other index file schema, if
    there is a mismatch throw exception

----


---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by gvramana <gi...@git.apache.org>.
Github user gvramana commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187071062
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kunal642 , @sounakr. Data files should not be skipped, clear error should be given to user. Otherwise user thinks that result is correct and is computed considering all files. Along with exception, which file has data mismatch also needs to be logged for him to analyse further and fix.
    later carbon print tool will be provided for him to check schema of each carbondata file, which will help user to debug problem.


---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187136017
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kumarvishal09 :Tested with parquet by having 2 files with same column name but different data type. parquet throws java.lang.UnsupportedOperationException during read.
    
    Caused by: java.lang.UnsupportedOperationException: Unimplemented type: StringType
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch(VectorizedColumnReader.java:369)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:188)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:230)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
    	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:105)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    	at org.apache.spark.scheduler.Task.run(Task.scala:108)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    	at java.lang.Thread.run(Thread.java:748)
    
    



---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r186725643
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java ---
    @@ -342,6 +342,30 @@ public void setParentColumnTableRelations(
         return true;
       }
     
    +  /**
    +   * method to compare columnSchema,
    +   * other parameters along with just column name and column data type
    +   * @param obj
    +   * @return
    +   */
    +  public boolean equalsWithStrictCheck(Object obj) {
    +    if (!this.equals(obj)) {
    +      return false;
    +    }
    +    ColumnSchema other = (ColumnSchema) obj;
    +    if (!columnUniqueId.equals(other.columnUniqueId) ||
    +        (isDimensionColumn != other.isDimensionColumn) ||
    +        (scale != other.scale) ||
    +        (precision != other.precision) ||
    +        (isSortColumn != other.isSortColumn)) {
    +      return false;
    +    }
    +    if (encodingList.size() != other.encodingList.size()) {
    --- End diff --
    
    done. added


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    LGTM


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4657/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: Readi...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    retest this please


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5811/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Failed  with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5817/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    LGTM



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5782/



---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187397546
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/readcommitter/LatestFilesReadCommittedScope.java ---
    @@ -122,6 +122,11 @@ private String getSegmentID(String carbonIndexFileName, String indexFilePath) {
       @Override public void takeCarbonIndexFileSnapShot() throws IOException {
         // Read the current file Path get the list of indexes from the path.
         CarbonFile file = FileFactory.getCarbonFile(carbonFilePath);
    +    if (file == null) {
    +      // For nonTransactional table, files can be removed at any point of time.
    +      // So cannot assume files will be present
    +      throw new IOException("No files are present in the table location");
    --- End diff --
    
    append File path in exception message 


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5678/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5745/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5665/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Reading two sdk writer output with...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    SDV Build Fail , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4736/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    retest this please


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    LGTM


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4626/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: Readi...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    retest this please


---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by sounakr <gi...@git.apache.org>.
Github user sounakr commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187030343
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kunal642 I agree with you. The purpose of SDK is to read whatever file is present. In case there is a mismatch in the schema we should not block the output of the files are having correct schema. 
    Also, in future we are going to support Merge Schema and show the output in case of different schema.  
    Better to show the output of how much can be read with the correct schema and also throw a warning or print the log for the presence of different schema in the log. 


---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187007180
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kunal642 : For nonTransactional tables, we support many sdk writers output files to be placed and read from same folder. This works when schema is same, If schema is different we have to inform user that these files are not of same type. If we just ignore fiels how user know why it is ignored ? hence the exception


---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r186992068
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    I dont think throwing exception is the correct approach. Either we should not let the user write data with different schema or we should infer the schema from the first index file that was written and skip any index file with different schema while preparing splits.


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4860/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4654/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    retest this please


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4808/



---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r186696437
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/column/ColumnSchema.java ---
    @@ -342,6 +342,30 @@ public void setParentColumnTableRelations(
         return true;
       }
     
    +  /**
    +   * method to compare columnSchema,
    +   * other parameters along with just column name and column data type
    +   * @param obj
    +   * @return
    +   */
    +  public boolean equalsWithStrictCheck(Object obj) {
    +    if (!this.equals(obj)) {
    +      return false;
    +    }
    +    ColumnSchema other = (ColumnSchema) obj;
    +    if (!columnUniqueId.equals(other.columnUniqueId) ||
    +        (isDimensionColumn != other.isDimensionColumn) ||
    +        (scale != other.scale) ||
    +        (precision != other.precision) ||
    +        (isSortColumn != other.isSortColumn)) {
    +      return false;
    +    }
    +    if (encodingList.size() != other.encodingList.size()) {
    --- End diff --
    
    Better to check the encoding values also...this is a generic method and can be useful for other schenario


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    retest this please


---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r187116998
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    @kunal642 @sounakr I agree with @gvramana,  skipping data file is not correct as it will miss some records which will not be acceptable. Blocking user while writing is not possible. I think throwing exception is correct. 
    @ajantha-bhat Can u please check how Parquet works in similar scenario.


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4585/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5741/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4805/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4868/



---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r186995311
  
    --- Diff: core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java ---
    @@ -2311,14 +2312,14 @@ static DataType thriftDataTyopeToWrapperDataType(
         }
       }
     
    -  public static List<String> getFilePathExternalFilePath(String path) {
    +  public static List<String> getFilePathExternalFilePath(String path, final String fileExtension) {
    --- End diff --
    
    This change is not required


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by ajantha-bhat <gi...@git.apache.org>.
Github user ajantha-bhat commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    retest this please


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Reading two sdk writer output with...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/5659/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4518/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed: multi...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4864/



---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] Fixed: Reading two sdk writ...

Posted by kunal642 <gi...@git.apache.org>.
Github user kunal642 commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2273#discussion_r186992244
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java ---
    @@ -151,6 +154,33 @@ public CarbonTable getOrCreateCarbonTable(Configuration configuration) throws IO
         SegmentStatusManager segmentStatusManager = new SegmentStatusManager(identifier);
         SegmentStatusManager.ValidAndInvalidSegmentsInfo segments = segmentStatusManager
             .getValidAndInvalidSegments(loadMetadataDetails, this.readCommittedScope);
    +
    +    // For NonTransactional table, compare the schema of all index files with inferred schema.
    +    // If there is a mismatch throw exception. As all files must be of same schema.
    +    if (!carbonTable.getTableInfo().isTransactionalTable()) {
    +      SchemaConverter schemaConverter = new ThriftWrapperSchemaConverterImpl();
    +      for (Segment segment : segments.getValidSegments()) {
    +        Map<String, String> indexFiles = segment.getCommittedIndexFile();
    +        for (Map.Entry<String, String> indexFileEntry : indexFiles.entrySet()) {
    +          Path indexFile = new Path(indexFileEntry.getKey());
    +          org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil.inferSchemaFromIndexFile(
    +              indexFile.toString(), carbonTable.getTableName());
    +          TableInfo wrapperTableInfo = schemaConverter.fromExternalToWrapperTableInfo(
    +              tableInfo, identifier.getDatabaseName(),
    +              identifier.getTableName(),
    +              identifier.getTablePath());
    +          List<ColumnSchema> indexFileColumnList =
    +              wrapperTableInfo.getFactTable().getListOfColumns();
    +          List<ColumnSchema> tableColumnList =
    +              carbonTable.getTableInfo().getFactTable().getListOfColumns();
    +          if (!compareColumnSchemaList(indexFileColumnList, tableColumnList)) {
    +            throw new IOException("All the files schema doesn't match. "
    --- End diff --
    
    I dont think throwing exception is the correct approach. Either we should not let the user write data with different schema or we should infer the schema from the first index file that was written and skip any index file with different schema while preparing splits.
    
    @sounakr @kumarvishal09 what do you think??


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by ravipesala <gi...@git.apache.org>.
Github user ravipesala commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/4835/



---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by kumarvishal09 <gi...@git.apache.org>.
Github user kumarvishal09 commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    @ajantha-bhat Please rebase


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Reading two sdk writer output with...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4499/



---

[GitHub] carbondata pull request #2273: [CARBONDATA-2442] and [CARBONDATA-2469] Fixed...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/carbondata/pull/2273


---

[GitHub] carbondata issue #2273: [CARBONDATA-2442] Fixed: Reading two sdk writer outp...

Posted by CarbonDataQA <gi...@git.apache.org>.
Github user CarbonDataQA commented on the issue:

    https://github.com/apache/carbondata/pull/2273
  
    Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/4505/



---