You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/08 04:11:09 UTC

[GitHub] [druid] xhl0726 opened a new pull request #9999: Optimize protobuf parsing

xhl0726 opened a new pull request #9999:
URL: https://github.com/apache/druid/pull/9999

Fixes #9984 .

### Description
see #9984
In order to improve the performance of protobuf parsing and solve the problem of `JSONPathSpec` mentioned in the end of #9984, we apply different parsing methods for flatten data and nested data.

Since Druid only allows defining `ParseSpec` of protobuf data with `JSONParseSpec`, the nested data refers to the data whose `JSONPathSpec `(defined in `JSONParseSpec`) is not null. The flatten data has null in `JSONPathSpec`, so its parsing process can be optimized by avoding transforming to JSON first. The modified class `ProtobufInputRowParser` has passed all the tests in `ProtobufInputRowParserTest` (If necessary, we can add a unit test for flatten data).

Below is the result of `ProtobufParserBenchmark`
![image](https://user-images.githubusercontent.com/24449727/83990429-32b54200-a97c-11ea-87f3-b9faf205c2d2.png)

It shows that parsing flatten data can be 4 times faster than parsing data using the optimized method.



#### Added test files
We added `prototest.desc`(which is a copy of `prototest.desc` in protobuf-extension test-classes) and `ProtoFile`(which is generated by a test in `ProtobuInputRowTest`) under `benchmarks/src/test/resources` to simplify the process of test data preparation. The inputs for two benchmarks in `ProtobufParserBenchmark` are the same, but the results are a little different due to the request of `JSONPathSpec`.

<hr>

This PR has:
- [x] been self-reviewed.
- [ ] added unit tests or modified existing tests to cover new code paths.

<hr>

##### Key changed/added classes in this PR
* `ProtobufParserBenchmark`
* `ProtobufInputRowParser`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org