You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by GitBox <gi...@apache.org> on 2022/10/13 22:29:45 UTC

[GitHub] [gobblin] abhisheknath2011 opened a new pull request, #3581: Avro 1.9 upgrade of Gobblin OSS

abhisheknath2011 opened a new pull request, #3581:
URL: https://github.com/apache/gobblin/pull/3581

   Dear Gobblin maintainers,
   
   Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
   
   
   ### JIRA
   - [ ] My PR addresses the following [Gobblin JIRA](https://issues.apache.org/jira/browse/GOBBLIN/) issues and references them in the PR title. For example, "[GOBBLIN-XXX] My Gobblin PR"
       - https://issues.apache.org/jira/browse/GOBBLIN-XXX
   
   
   ### Description
   - [ ] Here are some details about my PR, including screenshots (if applicable):
   
   
   ### Tests
   - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason:
   
   
   ### Commits
   - [ ] My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)":
       1. Subject is separated from body by a blank line
       2. Subject is limited to 50 characters
       3. Subject does not end with a period
       4. Subject uses the imperative mood ("add", not "adding")
       5. Body wraps at 72 characters
       6. Body explains "what" and "why", not "how"
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [gobblin] codecov-commenter commented on pull request #3581: Avro 1.9 upgrade of Gobblin OSS

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #3581:
URL: https://github.com/apache/gobblin/pull/3581#issuecomment-1278277524

   # [Codecov](https://codecov.io/gh/apache/gobblin/pull/3581?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#3581](https://codecov.io/gh/apache/gobblin/pull/3581?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (d46fff0) into [master](https://codecov.io/gh/apache/gobblin/commit/fafd40b25fdd3e8f8299f9e939794ad7a2492314?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fafd40b) will **increase** coverage by `0.79%`.
   > The diff coverage is `69.65%`.
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #3581      +/-   ##
   ============================================
   + Coverage     46.87%   47.66%   +0.79%     
   + Complexity    10640     8592    -2048     
   ============================================
     Files          2113     1707     -406     
     Lines         82798    65365   -17433     
     Branches       9220     7094    -2126     
   ============================================
   - Hits          38813    31159    -7654     
   + Misses        40425    31492    -8933     
   + Partials       3560     2714     -846     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/gobblin/pull/3581?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [.../org/apache/gobblin/test/SequentialTestSource.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb3JlLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vdGVzdC9TZXF1ZW50aWFsVGVzdFNvdXJjZS5qYXZh) | `0.00% <ø> (ø)` | |
   | [...ement/conversion/hive/utils/AvroHiveTypeUtils.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvbnZlcnNpb24vaGl2ZS91dGlscy9BdnJvSGl2ZVR5cGVVdGlscy5qYXZh) | `33.91% <0.00%> (-33.05%)` | :arrow_down: |
   | [...ent/copy/replication/ConfigBasedMultiDatasets.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvcHkvcmVwbGljYXRpb24vQ29uZmlnQmFzZWRNdWx0aURhdGFzZXRzLmphdmE=) | `13.95% <ø> (ø)` | |
   | [...ion/google/webmaster/GoogleWebmasterExtractor.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1tb2R1bGVzL2dvb2dsZS1pbmdlc3Rpb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vaW5nZXN0aW9uL2dvb2dsZS93ZWJtYXN0ZXIvR29vZ2xlV2VibWFzdGVyRXh0cmFjdG9yLmphdmE=) | `53.53% <ø> (ø)` | |
   | [...le/webmaster/GoogleWebmasterExtractorIterator.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1tb2R1bGVzL2dvb2dsZS1pbmdlc3Rpb24vc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vaW5nZXN0aW9uL2dvb2dsZS93ZWJtYXN0ZXIvR29vZ2xlV2VibWFzdGVyRXh0cmFjdG9ySXRlcmF0b3IuamF2YQ==) | `41.73% <ø> (ø)` | |
   | [...blin/service/monitoring/KafkaJobStatusMonitor.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1zZXJ2aWNlL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL3NlcnZpY2UvbW9uaXRvcmluZy9LYWZrYUpvYlN0YXR1c01vbml0b3IuamF2YQ==) | `44.88% <ø> (ø)` | |
   | [...nversion/hive/query/HiveAvroORCQueryGenerator.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1kYXRhLW1hbmFnZW1lbnQvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vZGF0YS9tYW5hZ2VtZW50L2NvbnZlcnNpb24vaGl2ZS9xdWVyeS9IaXZlQXZyb09SQ1F1ZXJ5R2VuZXJhdG9yLmphdmE=) | `65.07% <62.28%> (-0.82%)` | :arrow_down: |
   | [...e/avro/FieldAttributeBasedDeltaFieldsProvider.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vRmllbGRBdHRyaWJ1dGVCYXNlZERlbHRhRmllbGRzUHJvdmlkZXIuamF2YQ==) | `79.41% <100.00%> (ø)` | |
   | [...preduce/avro/MRCompactorAvroKeyDedupJobRunner.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb21wYWN0aW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9nb2JibGluL2NvbXBhY3Rpb24vbWFwcmVkdWNlL2F2cm8vTVJDb21wYWN0b3JBdnJvS2V5RGVkdXBKb2JSdW5uZXIuamF2YQ==) | `45.83% <100.00%> (+0.45%)` | :arrow_up: |
   | [...bblin/converter/filter/AvroSchemaFieldRemover.java](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-Z29iYmxpbi1jb3JlLWJhc2Uvc3JjL21haW4vamF2YS9vcmcvYXBhY2hlL2dvYmJsaW4vY29udmVydGVyL2ZpbHRlci9BdnJvU2NoZW1hRmllbGRSZW1vdmVyLmphdmE=) | `94.82% <100.00%> (+0.18%)` | :arrow_up: |
   | ... and [419 more](https://codecov.io/gh/apache/gobblin/pull/3581/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [gobblin] Will-Lo merged pull request #3581: [GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS

Posted by GitBox <gi...@apache.org>.
Will-Lo merged PR #3581:
URL: https://github.com/apache/gobblin/pull/3581


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [gobblin] abhisheknath2011 commented on a diff in pull request #3581: [GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS

Posted by GitBox <gi...@apache.org>.
abhisheknath2011 commented on code in PR #3581:
URL: https://github.com/apache/gobblin/pull/3581#discussion_r998601818


##########
gobblin-core-base/src/main/java/org/apache/gobblin/converter/filter/AvroSchemaFieldRemover.java:
##########
@@ -119,14 +121,17 @@ private Schema removeFieldsFromRecords(Schema schema, Map<String, Schema> schema
       if (!this.shouldRemove(field)) {
         Field newField;
         if (this.children.containsKey(field.name())) {
-          newField = new Field(field.name(), this.children.get(field.name()).removeFields(field.schema(), schemaMap),
-              field.doc(), field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              this.children.get(field.name()).removeFields(field.schema(), schemaMap),
+              field.doc(), AvroUtils.getCompatibleDefaultValue(field));
         } else {
-          newField = new Field(field.name(), DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
-              field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
+              AvroUtils.getCompatibleDefaultValue(field));
         }
-        for (Map.Entry<String, JsonNode> stringJsonNodeEntry : field.getJsonProps().entrySet()) {
-          newField.addProp(stringJsonNodeEntry.getKey(), stringJsonNodeEntry.getValue());
+        // Avro 1.9 compatible change - replaced deprecated public api getJsonProps with getObjectProps
+        for (Map.Entry<String, Object> objectEntry : field.getObjectProps().entrySet()) {

Review Comment:
   Thanks. I have updated the PR with the changes.



##########
gobblin-core-base/src/main/java/org/apache/gobblin/converter/filter/AvroSchemaFieldRemover.java:
##########
@@ -119,14 +121,17 @@ private Schema removeFieldsFromRecords(Schema schema, Map<String, Schema> schema
       if (!this.shouldRemove(field)) {
         Field newField;
         if (this.children.containsKey(field.name())) {
-          newField = new Field(field.name(), this.children.get(field.name()).removeFields(field.schema(), schemaMap),
-              field.doc(), field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              this.children.get(field.name()).removeFields(field.schema(), schemaMap),
+              field.doc(), AvroUtils.getCompatibleDefaultValue(field));
         } else {
-          newField = new Field(field.name(), DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
-              field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
+              AvroUtils.getCompatibleDefaultValue(field));
         }
-        for (Map.Entry<String, JsonNode> stringJsonNodeEntry : field.getJsonProps().entrySet()) {
-          newField.addProp(stringJsonNodeEntry.getKey(), stringJsonNodeEntry.getValue());
+        // Avro 1.9 compatible change - replaced deprecated public api getJsonProps with getObjectProps
+        for (Map.Entry<String, Object> objectEntry : field.getObjectProps().entrySet()) {
+          newField.addProp(objectEntry.getKey(), objectEntry.getValue());

Review Comment:
   Done.



##########
gobblin-core/src/test/java/org/apache/gobblin/recordaccess/AvroGenericRecordAccessorTest.java:
##########
@@ -139,6 +139,11 @@ public void testGetStringArrayUtf8() throws IOException {
 
   @Test
   public void testGetMultiConvertsStrings() throws IOException {
+    // The below error is due to invalid avro data. Type with "null" union must have "null" first and then

Review Comment:
   Thanks for the detailed explanation. I have updated the comment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [gobblin] srramach commented on a diff in pull request #3581: [GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS

Posted by GitBox <gi...@apache.org>.
srramach commented on code in PR #3581:
URL: https://github.com/apache/gobblin/pull/3581#discussion_r997451873


##########
gobblin-core-base/src/main/java/org/apache/gobblin/converter/filter/AvroSchemaFieldRemover.java:
##########
@@ -119,14 +121,17 @@ private Schema removeFieldsFromRecords(Schema schema, Map<String, Schema> schema
       if (!this.shouldRemove(field)) {
         Field newField;
         if (this.children.containsKey(field.name())) {
-          newField = new Field(field.name(), this.children.get(field.name()).removeFields(field.schema(), schemaMap),
-              field.doc(), field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              this.children.get(field.name()).removeFields(field.schema(), schemaMap),
+              field.doc(), AvroUtils.getCompatibleDefaultValue(field));
         } else {
-          newField = new Field(field.name(), DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
-              field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
+              AvroUtils.getCompatibleDefaultValue(field));
         }
-        for (Map.Entry<String, JsonNode> stringJsonNodeEntry : field.getJsonProps().entrySet()) {
-          newField.addProp(stringJsonNodeEntry.getKey(), stringJsonNodeEntry.getValue());
+        // Avro 1.9 compatible change - replaced deprecated public api getJsonProps with getObjectProps
+        for (Map.Entry<String, Object> objectEntry : field.getObjectProps().entrySet()) {

Review Comment:
   getObjectProps() doesn't exist in Avro 1.7.7, so this is not Avro version agnostic. We did add a new API in AvroCompatibilityHelper to getAllPropNames(), so you should be able to use the AvroCompatibilityHelper here too.



##########
gobblin-core/src/test/java/org/apache/gobblin/recordaccess/AvroGenericRecordAccessorTest.java:
##########
@@ -139,6 +139,11 @@ public void testGetStringArrayUtf8() throws IOException {
 
   @Test
   public void testGetMultiConvertsStrings() throws IOException {
+    // The below error is due to invalid avro data. Type with "null" union must have "null" first and then

Review Comment:
   This comment is not strictly accurate (as far as Avro is concerned). A union with "null" doesn't _have_ to have null as the first type. It's usually intended to signify optional-ness, and usually people do want to put null first, so that they can set "null" as the default value, but it's not required.
   
   What _is_ required (as per Avro) is that the default value must have the same type as the first entry in the union. So, for example, this is correct:
   ```
   "type": ["null", "string"],
   "default": null
   ```
   
   Whereas this is wrong:
   ```
   // default is null, but must be a string, since that's the first type in the union
   "type": ["string", "null"],
   "default": null
   ```
   
   This is also wrong:
   ```
   // default is a string ("null"), not null, which is the first type in the union
   "type": ["null", "string"],
   "default": "null"
   ```
   
   And this is totally correct, though usually rare to see:
   ```
   "type": ["string", "null"]
   "default": "foobar"
   ```
   
   So what you really want to say in this comment is that since the default value is `null`, you want to ensure that that's the first type in the union.



##########
gobblin-core-base/src/main/java/org/apache/gobblin/converter/filter/AvroSchemaFieldRemover.java:
##########
@@ -119,14 +121,17 @@ private Schema removeFieldsFromRecords(Schema schema, Map<String, Schema> schema
       if (!this.shouldRemove(field)) {
         Field newField;
         if (this.children.containsKey(field.name())) {
-          newField = new Field(field.name(), this.children.get(field.name()).removeFields(field.schema(), schemaMap),
-              field.doc(), field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              this.children.get(field.name()).removeFields(field.schema(), schemaMap),
+              field.doc(), AvroUtils.getCompatibleDefaultValue(field));
         } else {
-          newField = new Field(field.name(), DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
-              field.defaultValue());
+          newField = AvroCompatibilityHelper.createSchemaField(field.name(),
+              DO_NOTHING_INSTANCE.removeFields(field.schema(), schemaMap), field.doc(),
+              AvroUtils.getCompatibleDefaultValue(field));
         }
-        for (Map.Entry<String, JsonNode> stringJsonNodeEntry : field.getJsonProps().entrySet()) {
-          newField.addProp(stringJsonNodeEntry.getKey(), stringJsonNodeEntry.getValue());
+        // Avro 1.9 compatible change - replaced deprecated public api getJsonProps with getObjectProps
+        for (Map.Entry<String, Object> objectEntry : field.getObjectProps().entrySet()) {
+          newField.addProp(objectEntry.getKey(), objectEntry.getValue());

Review Comment:
   Same for addProp(). Use the AvroCompatibilityHelper instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [gobblin] homatthew commented on a diff in pull request #3581: Avro 1.9 upgrade of Gobblin OSS

Posted by GitBox <gi...@apache.org>.
homatthew commented on code in PR #3581:
URL: https://github.com/apache/gobblin/pull/3581#discussion_r995302032


##########
gradle/scripts/repositories.gradle:
##########
@@ -29,7 +29,11 @@ repositories {
   maven {
     url "https://linkedin.jfrog.io/artifactory/open-source/"
   }
-  mavenCentral()
+  maven {
+    url "https://linkedin.jfrog.io/artifactory/gobblin-hive/"
+  }
+  jcenter()

Review Comment:
   Can we replace jcenter with maven central? We are trying to limit dependencies on jcenter because of its flakiness after becoming EoL.  This makes our CI flakey
   
   See https://github.com/apache/gobblin/pull/3554 for similar type changes. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [gobblin] abhisheknath2011 commented on a diff in pull request #3581: [GOBBLIN-1726] Avro 1.9 upgrade of Gobblin OSS

Posted by GitBox <gi...@apache.org>.
abhisheknath2011 commented on code in PR #3581:
URL: https://github.com/apache/gobblin/pull/3581#discussion_r997375643


##########
gradle/scripts/repositories.gradle:
##########
@@ -29,7 +29,11 @@ repositories {
   maven {
     url "https://linkedin.jfrog.io/artifactory/open-source/"
   }
-  mavenCentral()
+  maven {
+    url "https://linkedin.jfrog.io/artifactory/gobblin-hive/"
+  }
+  jcenter()

Review Comment:
   Removed jcenter(). Will check if everything looks good from github CI side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@gobblin.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org