You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by GitBox <gi...@apache.org> on 2020/03/31 20:28:03 UTC

[GitHub] [drill] paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   # [DRILL-7683](https://issues.apache.org/jira/browse/DRILL-7683): Add "message parsing" to new JSON loader
   
   ## Description
   
   Worked on a project that uses the new JSON loader to parse a REST response that includes a set of "wrapper" fields around the JSON payload. Example:
   
   ```
   { "status": "ok", "results: [ data here ]}
   ```
   
   To solve this cleanly, added the ability to specify a "message parser" to consume JSON tokens up to the start of the data. This parser can be written as needed for each different data source.
   
   Since this change adds one more parameter to the JSON structure parser, added builders to gather the needed parameters rather than making the constructor even larger.
   
   Note that, aside from the private plugin mentioned above, no other code uses the JSON loader yet.
   
   ## Documentation
   
   N/A
   
   ## Testing
   
   Added unit tests. Reran all existing tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
cgivre commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-608400620
 
 
   @paul-rogers 
   Regarding the Java 1.8 build failing in `TestAggregateFunctions`, this problem seemed to go away after rebaseing.  Not sure what it was, but the `time_bucket` PR and the REST plugin both now pass the tests. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-607108696
 
 
   +1, LGTM.
   
   @paul-rogers let's for the future try to address comments without adding new changes, here is were just a couple of files, in other PRs were much more, a little hard for the code review to review new stuff with each commit :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers opened a new pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   # [DRILL-7683](https://issues.apache.org/jira/browse/DRILL-7683): Add "message parsing" to new JSON loader
   
   ## Description
   
   Worked on a project that uses the new JSON loader to parse a REST response that includes a set of "wrapper" fields around the JSON payload. Example:
   
   ```
   { "status": "ok", "results: [ data here ]}
   ```
   
   To solve this cleanly, added the ability to specify a "message parser" to consume JSON tokens up to the start of the data. This parser can be written as needed for each different data source.
   
   When working on the REST data source, it became clear we need a no-code way to handle the same issue. So, extended the message parser to handle the simple case, a path to the data. In the above, the path would be just `results`. The path can contain any number of slash-separated elements: `response/body/rows` for example.
   
   Since this change adds two more parameters to the JSON structure parser, added builders to gather the needed parameters rather than making the constructor even larger.
   
   Note that, aside from the private plugin mentioned above, no other code uses the JSON loader yet.
   
   ## Developer Documentation
   
   This PR is part of the "new" V2 EVF-based JSON parser. An example of usage appears in PR #1892 (REST storage plugin.) To use the simple path-based form of message parsing, add the following option to the JSON parser builder:
   
   ```
       .dataPath("path/to/data")
   ```
   
   The tail element should be the one that holds an array of JSON records.
   
   To add custom message parsing (to check return status, say), use a different option of the builder:
   
   ```
         .messageParser(parser)
   ```
   
   Then implement the `MessageParser` class to do the parsing. The present version works at the level of JSON tokens: you must use the provided "tokenizer" to read each token and do the right thing.
   
   Since working at the token level is tedious, the goal is to provide a read-made parser that takes a path to the data, such as "response.data" and skips all fields except those in the path.
   
   The goal here is to get the mechanism added to the JSON parser so we can then try it in the REST plugin and work out exactly what we need in that higher-level parser level.
   
   ## User Documentation
   
   N/A
   
   ## Testing
   
   Added unit tests. Reran all existing tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-609457513
 
 
   @arina-ielchiieva, thanks for the reminder. Updated the description. Basically, this is the first step toward providing an easier-to-use solution in the context of the REST API. Wanted to get this first step committed so we can try things out and figure out what we need in the next level, which is what developers will actually use. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] asfgit closed pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-610264807
 
 
   @paul-rogers please squash the commits and see if all tests pass. If not, try to return the jobs, ideally all tests should pass.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-610628326
 
 
   @arina-ielchiieva, thanks for the review. Squashed commits. Builds cleanly in my environment. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-606986462
 
 
   The Java 1.8 build failed in `TestAggregateFunctions`, however those work in my own build. Any idea what might be wrong?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on a change in pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#discussion_r401231516
 
 

 ##########
 File path: exec/java-exec/src/test/java/org/apache/drill/exec/store/easy/json/parser/TestJsonParserBasics.java
 ##########
 @@ -324,4 +327,64 @@ public void testJsonModeObjects() {
     assertFalse(fixture.next());
     fixture.close();
   }
+
+  /**
+   * Example message parser. A real parser would provide much better
+   * error messages for badly-formed JSON or error codes.
+   */
+  private static class MessageParserFixture implements MessageParser {
+
+    @Override
+    public boolean parsePrefix(TokenIterator tokenizer) {
+      assertEquals(JsonToken.START_OBJECT, tokenizer.requireNext());
+      assertEquals(JsonToken.FIELD_NAME, tokenizer.requireNext());
+      assertEquals(JsonToken.VALUE_STRING, tokenizer.requireNext());
+      if (!tokenizer.stringValue().equals("ok")) {
 
 Review comment:
   Usually is better to start with constant to avoid NPE (I know it's just a test, just noticed).
   ```suggestion
         if (!"ok".equals(tokenizer.stringValue()) {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
cgivre commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-606999124
 
 
   @paul-rogers 
   I have an unrelated PR (https://github.com/apache/drill/pull/2040) that was also failing in the same spot.  Not sure why.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-610202010
 
 
   Build apparently failed during JDBC unit tests, perhaps due to timeout? Unlikely to be caused by this PR since nothing calls any of this code yet.
   
   @arina-ielchiieva, anything else I should do for this one? Would be helpful if we can first merge this one, then we can rebase the REST API PR on top of the updated master (since that PR uses code from this PR.) 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] cgivre commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
cgivre commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-607246450
 
 
   @paul-rogers 
   Thank you very much for submitting this.  It's funny, the original REST plugin did this but not very well.  This is a really useful functionality. 
   
   In any event, I don't want to hold up this PR, but would you mind please writing up a paragraph documenting this functionality?  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on a change in pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on a change in pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#discussion_r401229789
 
 

 ##########
 File path: exec/java-exec/src/main/java/org/apache/drill/exec/store/easy/json/loader/JsonLoaderImpl.java
 ##########
 @@ -146,30 +198,33 @@
    * inference, and not JSON tokens have been seen which would hint
    * at a type. Not needed when a schema is provided.
    */
-
   // Using a simple list. Won't perform well if we have hundreds of
   // null fields; but then we've never seen such a pathologically bad
-  // case... Usually just one or two fields have deferred nulls.
+  // case. Usually just one or two fields have deferred nulls.
   private final List<NullTypeMarker> nullStates = new ArrayList<>();
 
-  public JsonLoaderImpl(ResultSetLoader rsLoader, TupleMetadata providedSchema,
-      JsonLoaderOptions options, CustomErrorContext errorContext,
-      InputStream stream) {
-    this.rsLoader = rsLoader;
-    this.options = options;
-    this.errorContext = errorContext;
-    this.rowListener = new TupleListener(this, rsLoader.writer(), providedSchema);
-    this.parser = new JsonStructureParser(stream, options, rowListener, this);
+  private JsonLoaderImpl(JsonLoaderBuilder builder) {
+    this.rsLoader = builder.rsLoader;
+    this.options = builder.options;
+    this.errorContext =builder. errorContext;
 
 Review comment:
   ```suggestion
       this.errorContext = builder. errorContext;
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-606944185
 
 
   This PR also includes support for the `ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER` added to the "old" JSON parser in #1663.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers closed pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers closed pull request #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
paul-rogers commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-612328748
 
 
   @arina-ielchiieva, @cgivre, since this is taken a while to review; went ahead and added the simplified form of the message parsing which we'll use in the REST storage plugin. If we can commit this one before the REST plugin, I'll go ahead and update that PR to use this feature.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva edited a comment on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva edited a comment on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-607108696
 
 
   +1, LGTM. Please squash the commits.
   
   @paul-rogers let's for the future try to address comments without adding new changes, here is were just a couple of files, in other PRs were much more, a little hard for the code review to review new stuff with each commit :)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-612423451
 
 
   Changes look ok, but I don't think we should do this since PR has been already reviewed.
   I will send follow-up email with the proposal how we can speed up merge process.
   
   +1, LGTM.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [drill] arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader

Posted by GitBox <gi...@apache.org>.
arina-ielchiieva commented on issue #2045: DRILL-7683: Add "message parsing" to new JSON loader
URL: https://github.com/apache/drill/pull/2045#issuecomment-609407888
 
 
   @paul-rogers could you please address @cgivre comment about documentation?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services