You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2022/07/01 06:11:09 UTC

[GitHub] [avro] radai-rosenblatt opened a new pull request, #1748: AVRO-3560 - throw SchemaParseException on dangling content in avsc beyond end of schema

radai-rosenblatt opened a new pull request, #1748:
URL: https://github.com/apache/avro/pull/1748

   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Avro Jira](https://issues.apache.org/jira/browse/AVRO/) issues and references them in the PR title. For example, "AVRO-1234: My Avro PR"
     - https://issues.apache.org/jira/browse/AVRO-3560
   
   ### Tests
   
   - [ ] My PR adds the following unit test:
     - TestSchema.testContentAfterAvsc()
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](https://chris.beams.io/posts/git-commit/)":
     1. Subject is separated from body by a blank line
     1. Subject is limited to 50 characters (not including Jira issue reference)
     1. Subject does not end with a period
     1. Subject uses the imperative mood ("add", not "adding")
     1. Body wraps at 72 characters
     1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes how to use it.
     - All the public functions and the classes in the PR contain Javadoc that explain what it does
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172009308

   Perhaps this change should not apply to parse(InputStream in) where the caller might expect to be able to read content after the schema.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] opwvhk commented on a diff in pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
opwvhk commented on code in PR #1748:
URL: https://github.com/apache/avro/pull/1748#discussion_r1021511363


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1454,19 +1456,36 @@ public Schema parse(String s, String... more) {
      */
     public Schema parse(String s) {
       try {
-        return parse(FACTORY.createParser(s));
+        return parse(FACTORY.createParser(s), false);
       } catch (IOException e) {
         throw new SchemaParseException(e);
       }
     }
 
-    private Schema parse(JsonParser parser) throws IOException {
+    private Schema parse(JsonParser parser, boolean allowDanglingContent) throws IOException {
       boolean saved = validateNames.get();
       boolean savedValidateDefaults = VALIDATE_DEFAULTS.get();
       try {
         validateNames.set(validate);
         VALIDATE_DEFAULTS.set(validateDefaults);
-        return Schema.parse(MAPPER.readTree(parser), names);
+        JsonNode jsonNode = MAPPER.readTree(parser);
+        Schema schema = Schema.parse(jsonNode, names);
+        if (!allowDanglingContent) {
+          String dangling;
+          StringWriter danglingWriter = new StringWriter();
+          int numCharsReleased = parser.releaseBuffered(danglingWriter);

Review Comment:
   What happens if the parser didn't read all bytes available in a file?
   AFAICT, JsonParser only releases what it read but didn't parse.
   
   (I think we're ok for Strings; I'm not sure about files)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1748:
URL: https://github.com/apache/avro/pull/1748#discussion_r911725171


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1454,19 +1456,37 @@ public Schema parse(String s, String... more) {
      */
     public Schema parse(String s) {
       try {
-        return parse(FACTORY.createParser(s));
+        return parse(FACTORY.createParser(s), false);
       } catch (IOException e) {
         throw new SchemaParseException(e);
       }
     }
 
-    private Schema parse(JsonParser parser) throws IOException {
+    private Schema parse(JsonParser parser, boolean allowDanglingContent) throws IOException {
       boolean saved = validateNames.get();
       boolean savedValidateDefaults = VALIDATE_DEFAULTS.get();
       try {
         validateNames.set(validate);
         VALIDATE_DEFAULTS.set(validateDefaults);
-        return Schema.parse(MAPPER.readTree(parser), names);
+        JsonNode jsonNode = MAPPER.readTree(parser);
+        Schema schema = Schema.parse(jsonNode, names);
+        if (!allowDanglingContent) {
+          StringWriter danglingWriter = new StringWriter();
+          parser.releaseBuffered(danglingWriter);
+          String dangling = danglingWriter.toString().trim();
+          if (dangling.isEmpty()) {
+            // releaseBuffered(Writer) above works in case the source was character-based
+            // (ex. a String). releaseBuffered(OutputStream) covers the case that the
+            // source was byte-based (ex. a File)
+            ByteArrayOutputStream danglingOutputStream = new ByteArrayOutputStream();
+            parser.releaseBuffered(danglingOutputStream);
+            dangling = new String(danglingOutputStream.toByteArray(), StandardCharsets.UTF_8);

Review Comment:
   Should you trim this too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
RyanSkraba commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1211680092

   Hello!  We released 1.11.1 the other day. It was kind of in a crunch, and I wasn't looking at any of the non-1.11.1 tagged issues :/  My apologies for not getting to this earlier -- it should be in the next release.
   
   To confirm, there's really little or no security issues here right?  We strictly just never look farther than the JSON beginning of File.
   
   I like the branch name :D 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on a diff in pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by "RyanSkraba (via GitHub)" <gi...@apache.org>.
RyanSkraba commented on code in PR #1748:
URL: https://github.com/apache/avro/pull/1748#discussion_r943212280


##########
lang/java/avro/src/test/java/org/apache/avro/io/TestResolvingIOResolving.java:
##########
@@ -101,7 +101,7 @@ private static Object[][] dataForResolvingTests() {
             "{\"type\":\"record\",\"name\":\"outer\",\"fields\":[" + "{\"name\": \"g1\", "
                 + "\"type\":{\"type\":\"record\",\"name\":\"inner\",\"fields\":["
                 + "{\"name\":\"f1\", \"type\":\"int\", \"default\": 101}," + "{\"name\":\"f2\", \"type\":\"int\"}]}}, "
-                + "{\"name\": \"g2\", \"type\": \"long\"}]}}",

Review Comment:
   These are nice catches!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] radai-rosenblatt commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
radai-rosenblatt commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172062051

   that was an excellent catch @KalleOlaviNiemitalo. i've added a test for File input and adapted the code to work for all cases of dangling input.
   
   except for the case of a buffer full of whitespace with more content hiding in the underlying input (either a reader or an inputstream). I'm ok with not solving for that case as I expect it to be rare (and the code changes required more invasive)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] radai-rosenblatt commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
radai-rosenblatt commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1209842418

   @RyanSkraba - any chance of this making it in ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] radai-rosenblatt commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
radai-rosenblatt commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172045228

   i'll add a check for the File method.
   as for the case of a buffer full of whitespace (and presumably content beyond that) - I'm ok with missing that validation (as i expect it to be extremely uncommon). otherwise i'd need to read till EOF?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #1748:
URL: https://github.com/apache/avro/pull/1748#discussion_r911727624


##########
lang/java/avro/src/main/java/org/apache/avro/Schema.java:
##########
@@ -1454,19 +1456,37 @@ public Schema parse(String s, String... more) {
      */
     public Schema parse(String s) {
       try {
-        return parse(FACTORY.createParser(s));
+        return parse(FACTORY.createParser(s), false);
       } catch (IOException e) {
         throw new SchemaParseException(e);
       }
     }
 
-    private Schema parse(JsonParser parser) throws IOException {
+    private Schema parse(JsonParser parser, boolean allowDanglingContent) throws IOException {
       boolean saved = validateNames.get();
       boolean savedValidateDefaults = VALIDATE_DEFAULTS.get();
       try {
         validateNames.set(validate);
         VALIDATE_DEFAULTS.set(validateDefaults);
-        return Schema.parse(MAPPER.readTree(parser), names);
+        JsonNode jsonNode = MAPPER.readTree(parser);
+        Schema schema = Schema.parse(jsonNode, names);
+        if (!allowDanglingContent) {
+          StringWriter danglingWriter = new StringWriter();
+          parser.releaseBuffered(danglingWriter);
+          String dangling = danglingWriter.toString().trim();
+          if (dangling.isEmpty()) {

Review Comment:
   Could perhaps optimise by doing the ByteArrayOutputStream stuff only if the first releaseBuffered call returns exactly -1.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] radai-rosenblatt commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
radai-rosenblatt commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172025820

   updated the PR to allow dangling content when parsing from InputStream


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba merged pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by "RyanSkraba (via GitHub)" <gi...@apache.org>.
RyanSkraba merged PR #1748:
URL: https://github.com/apache/avro/pull/1748


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] radai-rosenblatt commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
radai-rosenblatt commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1171981926

   ended up flushing a few bad schemas in other unit tests as well ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by GitBox <gi...@apache.org>.
KalleOlaviNiemitalo commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1172042027

   Could you add tests for trailing content in a UTF-8 file parsed using Schema.parse(File file)? [JsonFactory.createParser(File f)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1025-L1031) apparently creates an InputStream for that, and [JsonFactory._createParser(InputStream in, IOContext ctxt)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonFactory.java#L1653-L1668) calls [ByteSourceJsonBootstrapper.constructParser](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/ByteSourceJsonBootstrapper.java#L251-L271), which creates an UTF8StreamJsonParser in that case. UTF8StreamJsonParser is a "byte-based" parser and inherits [JsonParser.releaseBuffered(Writer w)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91
 e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/JsonParser.java#L836), which just returns -1, but [UTF8StreamJsonParser.releaseBuffered(OutputStream out)](https://github.com/FasterXML/jackson-core/blob/10a9026f4ef91e821798296e7c4e3fe445921f89/src/main/java/com/fasterxml/jackson/core/json/UTF8StreamJsonParser.java#L224-L236) can write something to the OutputStream. I think this means that, to correctly detect trailing content in a UTF-8 file, Schema.parse would have to call not only JsonParser.releaseBuffered(Writer) but also JsonParser.releaseBuffered(OutputStream), or first call JsonParser.getInputSource() and then use the type of the result to guess whether the parser is byte-based or char-based.
   
   If parse(InputStream) parses a stream that has trailing content, and JsonParser buffers that content, it would be best to return that content to the InputStream so that the caller can then read it. However I don't see how to do that.
   
   Is it possible that JsonParser buffers some content from a Reader, and the buffered content is all space characters and thus ignored by Avro.Schema, but the Reader has more content that JsonContent did not even read because its buffer filled up? In which case, Avro.Schema would have to check the Reader as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] RyanSkraba commented on pull request #1748: AVRO-3560: throw SchemaParseException on dangling content in avsc beyond end of schema

Posted by "RyanSkraba (via GitHub)" <gi...@apache.org>.
RyanSkraba commented on PR #1748:
URL: https://github.com/apache/avro/pull/1748#issuecomment-1591681689

   Ooops, it looks like this broke on master due to the unit test migration to JUnit 5.  I'm taking a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org