You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/03/30 21:16:03 UTC

[GitHub] [spark] MaxGekk opened a new pull request #24252: [SPARK-27327][SQL] New benchmarks for JSONBenchmark: functions, Dataset[String]

MaxGekk opened a new pull request #24252: [SPARK-27327][SQL] New benchmarks for JSONBenchmark: functions, Dataset[String]
URL: https://github.com/apache/spark/pull/24252
 
 
   ## What changes were proposed in this pull request?
   
   Added new benchmarks for:
   1. JSON functions: `from_json`, `json_tuple` and `get_json_object`
   2. Parsing `Dataset[String]` with JSON records
   3. Comparing just splitting input text by lines with schema inferring, per-line parsing when encoding is set and not set.
   
   Also existing benchmarks were refactored to use the `NoOp` datasource to eliminate overhead of triggers like `.filter((_: Row) => true).count()`.
   
   ## How was this patch tested?
   
   By running `JSONBenchmark` locally.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org