You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/23 23:05:15 UTC

[GitHub] [beam] ahmedabu98 opened a new pull request, #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

ahmedabu98 opened a new pull request, #24344:
URL: https://github.com/apache/beam/pull/24344

   `TypedSchemaTransformProvider::configurationSchema` returns an inconsistently ordered Schema object. A consistent Schema is needed to use configuration Row objects to instantiate and build SchemaTransforms. The solution in this PR is to sort the fields by name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #24344:
URL: https://github.com/apache/beam/pull/24344#discussion_r1033894522


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java:
##########
@@ -264,6 +264,19 @@ public static Schema of(Field... fields) {
     return Schema.builder().addFields(fields).build();
   }
 
+  /** Returns an identical Schema with sorted fields. */
+  public Schema sorted() {
+    Schema sortedSchema =

Review Comment:
   Can we add an assertion or a unit test that will break if any new properties are added to the Schema object that are not copied over here ?



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/TypedSchemaTransformProvider.java:
##########
@@ -40,6 +40,9 @@
  */
 @Internal
 @Experimental(Kind.SCHEMAS)
+@SuppressWarnings({
+  "nullness" // TODO(https://github.com/apache/beam/issues/20506)

Review Comment:
   What's the error you got here ? We should try to fix checker-framework warnings in new code instead of surpassing. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj merged pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
chamikaramj merged PR #24344:
URL: https://github.com/apache/beam/pull/24344


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1331636503

   Jacoco creates synthetic fields when it runs, which get picked up by `Class::getDeclaredFields`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1328182670

   R: @chamikaramj 
   Thanks for the feedback, PTAL. I thought the sort code belongs in the Schema class, lmk what you think. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1325759096

   Let's also create a Github issue to track this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #24344:
URL: https://github.com/apache/beam/pull/24344#discussion_r1035113637


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/Schema.java:
##########
@@ -264,6 +264,19 @@ public static Schema of(Field... fields) {
     return Schema.builder().addFields(fields).build();
   }
 
+  /** Returns an identical Schema with sorted fields. */
+  public Schema sorted() {
+    Schema sortedSchema =

Review Comment:
   Done, PTAL



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1331482041

   Seems like the new test is failing.
   
   https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/25057/testReport/junit/org.apache.beam.sdk.schemas/SchemaTest/testSortedMethodIncludesAllSchemaFields/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1332441362

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1332824553

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1325757420

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1325756220

   R: @chamikaramj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on code in PR #24344:
URL: https://github.com/apache/beam/pull/24344#discussion_r1030932543


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/TypedSchemaTransformProvider.java:
##########
@@ -61,7 +63,13 @@ Optional<List<String>> dependencies(ConfigT configuration, PipelineOptions optio
   @Override
   public final Schema configurationSchema() {
     try {
-      return SchemaRegistry.createDefault().getSchema(configurationClass());
+      Schema schema = SchemaRegistry.createDefault().getSchema(configurationClass());
+
+      // Sort the fields by name to ensure a consistent schema is produced
+      Schema sortedSchema = schema.getFields().stream()
+          .sorted(Comparator.comparing(Field::getName))
+          .collect(Schema.toSchema());

Review Comment:
   Can we also add unit tests  ?



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/TypedSchemaTransformProvider.java:
##########
@@ -61,7 +63,13 @@ Optional<List<String>> dependencies(ConfigT configuration, PipelineOptions optio
   @Override
   public final Schema configurationSchema() {
     try {
-      return SchemaRegistry.createDefault().getSchema(configurationClass());
+      Schema schema = SchemaRegistry.createDefault().getSchema(configurationClass());
+
+      // Sort the fields by name to ensure a consistent schema is produced
+      Schema sortedSchema = schema.getFields().stream()

Review Comment:
   We should also preserve other fields of the Schema, for example, id, options, encoding_positions_set (and any future fields).
   
   https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/org/apache/beam/model/pipeline/v1/schema.proto#L40



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on code in PR #24344:
URL: https://github.com/apache/beam/pull/24344#discussion_r1035042580


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/transforms/TypedSchemaTransformProvider.java:
##########
@@ -40,6 +40,9 @@
  */
 @Internal
 @Experimental(Kind.SCHEMAS)
+@SuppressWarnings({
+  "nullness" // TODO(https://github.com/apache/beam/issues/20506)

Review Comment:
   No longer running into this now that sorting method is in Schema.java, will remove



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
chamikaramj commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1332560036

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #24344: Sort SchemaTransform configuration schema fields by name to establish consistency

Posted by GitBox <gi...@apache.org>.
ahmedabu98 commented on PR #24344:
URL: https://github.com/apache/beam/pull/24344#issuecomment-1328182685

   Fixes #24361 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org