You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/07/30 09:22:15 UTC

[GitHub] [hudi] sbernauer opened a new pull request #1888: HUDI-1129: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

sbernauer opened a new pull request #1888:
URL: https://github.com/apache/hudi/pull/1888


   ## What is the purpose of the pull request
   
   Specific fix this error: https://issues.apache.org/jira/browse/HUDI-1129
   This is needed to partially fix https://github.com/apache/hudi/issues/1845
   
   ## Brief change log
   
   - Use avro field names and not indices to convert from avro to GenericRow of catalyst at AvroConversionHelper
   
   ## Verify this pull request
   
   Run maven test suite
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on a change in pull request #1888: HUDI-1129: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

Posted by GitBox <gi...@apache.org>.
bvaradar commented on a change in pull request #1888:
URL: https://github.com/apache/hudi/pull/1888#discussion_r463504707



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala
##########
@@ -136,7 +136,7 @@ object AvroConversionHelper {
         case (struct: StructType, RECORD) =>
           val length = struct.fields.length
           val converters = new Array[AnyRef => AnyRef](length)
-          val avroFieldIndexes = new Array[Int](length)
+          val avroFieldNames = new Array[String](length)

Review comment:
       Question : Does this work for nested schemas where same name is used in different hierarchy ? For example : "order.rec.rec" (I am just making this up) but wanted to make sure if there are any chances of ambiguity in field resolution that can arise ? Can you add some test-cases to verify this would work fine. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sbernauer commented on a change in pull request #1888: HUDI-1129: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

Posted by GitBox <gi...@apache.org>.
sbernauer commented on a change in pull request #1888:
URL: https://github.com/apache/hudi/pull/1888#discussion_r464333661



##########
File path: hudi-spark/src/main/scala/org/apache/hudi/AvroConversionHelper.scala
##########
@@ -136,7 +136,7 @@ object AvroConversionHelper {
         case (struct: StructType, RECORD) =>
           val length = struct.fields.length
           val converters = new Array[AnyRef => AnyRef](length)
-          val avroFieldIndexes = new Array[Int](length)
+          val avroFieldNames = new Array[String](length)

Review comment:
       Good point, I will try to write some tests




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed pull request #1888: [HUDI-1129]: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

Posted by GitBox <gi...@apache.org>.
nsivabalan closed pull request #1888:
URL: https://github.com/apache/hudi/pull/1888


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1888: [HUDI-1129]: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1888:
URL: https://github.com/apache/hudi/pull/1888#issuecomment-751788889


   I found the issue after some debugging. But need your thoughts on whether it is a bug or how to go about fixing it. 
   @bvaradar @n3nash @vinothchandar 
   
   As per the test case linked to reproduce, here is what we are doing. 
   Generate records with SCHEMA_1 and ingest to Hudi with SCHEMA_1
   Generate records with SCHEMA_2 and ingest to Hudi with SCHEMA_2
   Generate records with SCHEMA_1 and ingest to Hudi with SCHEMA_2(both source and target schema)// this is where the exception is thrown. 
   
   Here is the gist of the issue. 
   Lets say we have an avro record with SCHEMA_1
   byte[] recordBytes = HoodieAvroUtils.avroToBytes(genericRecord);
   
   Converting this back to GenRec with SCHEMA_1 succeeds. HoodieAvroUtils.bytesToAvro(recordBytes, SCHEMA_1)
   But converting this back to GenRec with SCHEMA_2 (which has one additional field compared to SCHEMA_1) fails.
    
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] sbernauer commented on pull request #1888: HUDI-1129: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

Posted by GitBox <gi...@apache.org>.
sbernauer commented on pull request #1888:
URL: https://github.com/apache/hudi/pull/1888#issuecomment-667948739


   Hi @bvaradar the test currently fails because of the EOFException from https://issues.apache.org/jira/browse/HUDI-1128


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on pull request #1888: [HUDI-1129]: AvroConversionUtils unable to handle avro to row transformation when passing evolved schema

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on pull request #1888:
URL: https://github.com/apache/hudi/pull/1888#issuecomment-896357116


   This is not required anymore. https://github.com/apache/hudi/pull/2927 handles the schema evol. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org