You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "gianm (via GitHub)" <gi...@apache.org> on 2024/01/16 07:14:29 UTC

[PR] Avoid conversion to String in JsonReader, JsonNodeReader. (druid)

gianm opened a new pull request, #15693:
URL: https://github.com/apache/druid/pull/15693

   These readers were running UTF-8 decode on the provided entity to convert it to a String, then parsing the String as JSON. The patch changes them to parse the provided entity's input stream directly.
   
   In order to preserve the nice error messages that include parse errors, the readers now need to open the entity again on the error path, to re-read the data. To make this possible, the InputEntity#open contract is tightened to require the ability to re-open entities, and existing InputEntity implementations are updated to allow re-opening.
   
   This patch also renames JsonLineReaderBenchmark to JsonInputFormatBenchmark, updates it to benchmark all three JSON readers, and adds a case that reads fields out of the parsed row (not just creates it).
   
   Benchmarks below. Findings:
   
   - The `reader` and `node_reader` (used if `useJsonNodeReader` is set) readers are ~15% faster on `parseAndRead`. `reader` is the default for stream ingest; `node_reader` is used for stream ingest if `useJsonNodeReader` is set.
   - The `line_reader` wasn't changed in this patch and performance is the same (within margin of error). This one is default for batch ingest, and used for streaming if `assumeNewlineDelimited` is set.
   
   So, the speedups are mainly for streaming ingest. But #15681 has a similar speedup for `line_reader` if that's the one you care about!
   
   ```
   master
   
   Benchmark                              (readerTypeString)  Mode  Cnt     Score     Error  Units
   JsonInputFormatBenchmark.parseAndRead              reader  avgt    5  3148.287 ± 117.748  us/op
   JsonInputFormatBenchmark.parseAndRead         node_reader  avgt    5  3232.287 ±  20.667  us/op
   JsonInputFormatBenchmark.parseAndRead         line_reader  avgt    5  3085.638 ±  45.131  us/op
   
   patch
   
   Benchmark                              (readerTypeString)  Mode  Cnt     Score    Error  Units
   JsonInputFormatBenchmark.parseAndRead              reader  avgt    5  2656.737 ± 65.348  us/op
   JsonInputFormatBenchmark.parseAndRead         node_reader  avgt    5  2659.078 ± 53.231  us/op
   JsonInputFormatBenchmark.parseAndRead         line_reader  avgt    5  3017.010 ± 63.724  us/op
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Avoid conversion to String in JsonReader, JsonNodeReader. (druid)

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #15693:
URL: https://github.com/apache/druid/pull/15693#issuecomment-2005419069

   This pull request has been marked as stale due to 60 days of inactivity.
   It will be closed in 4 weeks if no further activity occurs. If you think
   that's incorrect or this pull request should instead be reviewed, please simply
   write any comment. Even if closed, you can still revive the PR at any time or
   discuss it on the dev@druid.apache.org list.
   Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Avoid conversion to String in JsonReader, JsonNodeReader. (druid)

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm commented on PR #15693:
URL: https://github.com/apache/druid/pull/15693#issuecomment-1893333575

   This patch is running into a problem that is addressed in #15681 by the addition of `IntermediateRowParsingReader#intermediateRowAsString`. Without that, the switch to using `ByteEntity` as the intermediate form makes the error messages show "ByteEntity" rather than the actual intermediate data.
   
   So, when that other one is merged, I'll merge master into this PR and it should be OK.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


Re: [PR] Avoid conversion to String in JsonReader, JsonNodeReader. (druid)

Posted by "gianm (via GitHub)" <gi...@apache.org>.
gianm merged PR #15693:
URL: https://github.com/apache/druid/pull/15693


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org