You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/13 01:35:57 UTC

[GitHub] [druid] paul-rogers opened a new pull request #12058: Response "trailer" support

paul-rogers opened a new pull request #12058:
URL: https://github.com/apache/druid/pull/12058


   Provides a *response trailer* mechanism to include a map of values in the query response after the usual data results.
   
   ### Motivation
   
   Druid runs queries using a REST request/response protocol in which the response contains the query result set. There are several use cases where it would be useful to return additional performance data along with the result set. A concrete example is the [query profile](https://github.com/apache/drill/issues/2350) which can optionally return the profile when a query runs. Further, the query profile mechanism uses the response trailer to return a partial profile from each data node during execution.
   
   When not returning a full profile, it is still useful to return a small set of overall metrics. This PR includes only those summary metrics.
   
   Since query metrics are available, by definition, only at the completion of a query, it is necessary to return metrics and profile information *after* the data. Thus, the existing before-the-data header mechanism is not a good fit for this use case and we instead introduce a *response trailer* mechanism.
   
   ### Credit
   
   This PR and description are derived from a prototype which @gianm created.
   
   ### Design
   
   When running a query with a JSON response format, the response is just the response set as an array of rows, where the rows can be in various formats. When the response trailer is enabled, the JSON response format changes to an object which includes the result set as one of the fields.
   
   This change is clearly highly visible to clients which may expect the JSON array format. To avoid breaking such clients, the client must specifically request the revised format.
   
   #### `ResponseContext` changes
   
   The `ResponseContext` gains a new field, `metrics` to gather the summary metrics. The field contains the object that generates the metrics shown in the API below. On the Broker, the `ResponseContext` may contain metrics for multiple subqueries.
   
   A previous PR clearly identified those `ResponseContext` fields which are to appear in the response header. Those fields not in the header are considered internal fields. This PR extends the idea to tag keys that appear in the trailer. The result is that fields can appear in the header, trailer, both or neither. ("Neither" is useful for keys that are only used internally by query engines within one server, like `timeoutAt` and `count`.)  
   
   The new `metrics` key is tagged as trailer-only.
   
   #### "Starter" metrics
   
   Until the full query profile is available, this PR motivates the response trailer by providing some basic metrics within the `metrics` key:
   
   * `resultRows`: the number of rows returned
   * `queryStart`: Query start time
   * `queryMs`: Query duration (up until the trailer is written)
   
   In addition, the trailer contains fields which may have been truncated in the header:
   
   * `uncoveredIntervals`
   * `missingSegments`
   
   #### `QueryLifecycle` and `SqlLifecycle` changes
   
   The `QueryResource` and `SqlResource` classes process each query, return the results, log data, return the response trailer (in this PR) and, later, will assemble and return the profile.
   
   To better organize this work, the lifecycle classes each gain a `LifecycleStats` that gathers and distributes the above statistics.
   
   #### `QueryResponse`
   
   The SQL-related code to run a query consists of many layers of functions which take the query as input and returns a sequence. The response context is created several layers down in the stack. To return the trailer, we must return the response context up through those top layers that today return only a sequence. The existing `QueryResponse` is pressed into service to solve this issue by wrapping the output sequence and the query profile.
   
   The profile is made available in the `QueryResponse`, but it is not fully populated until the root `Sequence` returns all its results.
   
   #### `Query` changes
   
   Druid queries return a variety of results. The objects returned by a `Sequence` may not represent rows, but rather query-specific batches of rows. To obtain a row count, we must know how to interpret those results. The new `Query.getRowCountOf()` method does this work.
   
   #### `DirectDruidClient` changes
   
   The internal Druid client uses the new results-with-trailer format. It copies the trailer fields into the client-side response context. The Broker-side merge merges results from may clients into the final response context.
   
   ### Rationale
   
   The design does not use HTTP trailers. In theory the response context could have been sent using HTTP trailers. However, client and proxy support for HTTP trailers is spotty, and it isn't clear how much data is safe to send in a trailer. For these reasons, we chose to include the trailer in the regular response.
   
   ### API
   
   #### Request
   
   In native, include the header `X-Druid-Include-Trailer: Yes`:
   
   ```text
   POST /druid/v2/
   X-Druid-Include-Trailer: Yes
   
   { ... } // regular query
   ```
   
   In SQL, use the result format arrayWithTrailer:
   
   ```text
   POST /druid/v2/sql/
   
   {
     "query": "SELECT COUNT(*) FROM wikipedia",
     "resultFormat": "arrayWithTrailer"
   }
   ```
   
   #### Response
   
   In both native and SQL:
   
   ```json
   {
     "results": [ ... ], // normal results
     "context": { // trailer
       "metrics": [
         {
           "segments": 37,
           "segmentsProcessed": 37,
           "segmentsVectorProcessed": 37,
           "segmentRows": 266472,
           "preFilteredRows": 159739,
           "nodeRows": 1,
           "resultRows": 1,
           "cpuNanos": 5027000,
           "threads": 5,
           "queryStart": 1627010838439,
           "queryMs": 40
         },
         {
           "subQueryId": "ef46510d-668b-45e8-a9ef-9de55949342a_0",
           "segments": 37,
           "segmentsProcessed": 37,
           "segmentsVectorProcessed": 37,
           "segmentRows": 266472,
           "preFilteredRows": 194618,
           "nodeRows": 52,
           "resultRows": 5,
           "cpuNanos": 9541000,
           "threads": 5,
           "queryStart": 1627010838461,
           "queryMs": 11
         }
       ]
     }
   }
   ```
   
   ##### Results
   
   The results key contains the actual query results. In native, it is an array that is identical to what you would have received if you did not set the `X-Druid-Include-Trailer` header. In SQL it is identical to what you would have received if you set the result format to array.
   
   ##### Context
   
   The context key contains the response trailer. Inside the context is a metrics key that contains an array of query metric objects. There is one for the main query (which will be listed first) and one for each subquery (which will be listed in the order that they began execution).
   
   The specific metrics should be considered an Alpha release: they are subject to change and will be re-evaluated once the full query profile is available.
   
   Keys that can be present in a the current "starter" query metric object are:
   
   `subQueryId`: Present for subqueries, absent for the main query. Useful for differentiating subqueries from each other, and for differentiating subqueries from the main query.
   
   `segments`: The number of segments that were considered by data servers. Includes segments that were not processed because the data server ultimately decided to use the per-segment query cache. Does not include segments that were pruned by the Broker. Equivalent to the count of the query/segmentAndCache/time metric.
   
   `segmentsProcessed`: The number of segments that were actually processed by data servers. Does not include segments whose results were served out of the per-segment query cache, or segments that were pruned by the Broker. Equivalent to the count of the query/segment/time metric.
   
   `segmentsVectorProcessed`: The number of segments that were actually processed by data servers, and were processed using a vectorized engine. This is a subset of segmentsProcessed.
   
   `segmentRows`: The total number of rows across all segments processed by data servers. May include rows that were not actually processed because they were filtered out using the index. May include rows that were not actually processed due to pushed-down limits.
   
   `preFilteredRows`: The total number of rows across all segments processed by data servers, after applying index-based filters, but before applying cursor-based filters. Will not be larger than segmentRows. This is the generally the number of rows processed by the query engine, except in cases where the query engine does not process all rows (such as SELECT * FROM tbl LIMIT 5, where limit pushdown means that data servers only read the first few rows).
   
   `nodeRows`: The total number of rows received by the Broker from data servers. May be larger than segmentRows and preFilteredRows due to caching: nodeRows includes rows served from the per-segment query cache, but segmentRows and preFilteredRows do not. May also be larger than preFilteredRows due to queries that generate more than one output row per input row.
   
   `resultRows`: The total number of rows returned by this query after processing has completed on the Broker. This is how many rows a user would receive after issuing the query.
   
   `cpuNanos`: The total number of nanoseconds of CPU time used by this query, summed up across all servers. Equivalent to the sum of the query/cpu/time metric.
   
   `threads`: The total number of threads that participated in this query. Note that not all threads are necessarily active at the same time.
   
   `queryStart`: The time this query started, in milliseconds since the epoch.
   
   `queryMs`: The duration of this query, in milliseconds. This is similar to the query/time metric, but not quite equivalent, because it only measures the time until the response trailer starts being written. The query/time metric includes the time it takes to write the response trailer and fully flush the connection to the client.
   
   ### Backward Compatibility
   
   The trailer appears for client queries only if the client explicitly requests it via the `X-Druid-Include-Trailer` header. Since existing clients do not include this header, existing clients will see no functional difference.
   
   The response trailer is always enabled for internal queries between the broker and data nodes, etc. This change is transparent to clients. If a client directly queries a data node, then the above rule applies: the `X-Druid-Include-Trailer` header must be set.
   
   The addition of the trailer changes the format of internal query responses. The following ensures services can be upgraded in any order:
   
   * An old Broker won't request the trailer, so new data nodes will not return one.
   * A new Broker will request the trailer, but will accept the old array format.
   
   When running test cases, it appears that some tests may mock or otherwise simulate the data node response to the broker. In this case, the new Broker requests a header, but the mocking code returns an array format. The "will accept the old array format" ensures that these tests continue to run.
   
   ### Minor Revisions
   
   The PR includes a number of minor refactoring moves:
   
   #### `QueryResource`
   
   Pull the `StreamingOutput` subclass out as a named class in `QueryResource` so it is easier to see which parts occur within the message handler, and which occur asynchronously to produce the output.
   
   #### Optional `druid.server.http.minThreads` config parameter
   
   Add a `minThreads` to the Jetty server configuration to optionally reduce the initial number of Jetty worker threads to make the IDE experience a bit easier. The default is to retain the current behavior. To change the minimum (and startup) number of threads, add the following to the common config file:
   
   ```text
   druid.server.http.minThreads=2
   ```
   
   The thread pool will expand the thread count on demand up to `maxThreads`, so this change affects only the *initial* thread count.
   
   ### `SqlResource` error handling
   
   Added a distinct error code, 409 CONFLICT, for the query cancellation case to differentiate cancellation from internal error.
   
   ### Tests
   
   Deflaked several tests. Modified others to avoid timeout while debugging. Reduced copy/paste revisions for handling the new header field.
   
   <hr>
   
   This PR has:
   - [X] been self-reviewed.
      - [X] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) 
   - [ ] added documentation for new or modified features or behaviors.
   - [X] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [X] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [X] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers edited a comment on pull request #12058: Response "trailer" support

Posted by GitBox <gi...@apache.org>.

paul-rogers edited a comment on pull request #12058:
URL: https://github.com/apache/druid/pull/12058#issuecomment-995327640


   One of the steps failed in a unit test which appears due to flaky behavior:
   
   ```text
   [ERROR] testReconnectWithCacheAndPrefetch(org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactoryTest)  Time elapsed: 8.224 s  <<< ERROR!
   java.lang.RuntimeException: java.util.concurrent.TimeoutException
   	at org.apache.druid.data.input.impl.prefetch.Fetcher.openObjectFromLocal(Fetcher.java:226)
   	at org.apache.druid.data.input.impl.prefetch.Fetcher.next(Fetcher.java:177)
   	at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory$2.next(PrefetchableTextFilesFirehoseFactory.java:239)
   	at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory$2.next(PrefetchableTextFilesFirehoseFactory.java:225)
   	at org.apache.druid.data.input.impl.FileIteratingFirehose.getNextLineIterator(FileIteratingFirehose.java:105)
   	at org.apache.druid.data.input.impl.FileIteratingFirehose.hasMore(FileIteratingFirehose.java:66)
   	at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactoryTest.testReconnectWithCacheAndPrefetch(PrefetchableTextFilesFirehoseFactoryTest.java:356)
   ```
   
   The same test passed in the "SQL compatibility" test, and on my machine in both Maven and the IDE. So, the test is flaky.
   
   Took a look a the test. The Firehose is completely separate from any of the areas I changed. Perhaps there is a race condition in the code, though brief inspection did not reveal the cause. Anyone seen this one fail previously? Added more detail to the exception in case if fails again. Let's not hold up reviews due to this test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #12058: Response "trailer" support

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on pull request #12058:
URL: https://github.com/apache/druid/pull/12058#issuecomment-995327640


   One of the steps failed in a unit test which appears due to flaky behavior:
   
   ```text
   [ERROR] testReconnectWithCacheAndPrefetch(org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactoryTest)  Time elapsed: 8.224 s  <<< ERROR!
   java.lang.RuntimeException: java.util.concurrent.TimeoutException
   	at org.apache.druid.data.input.impl.prefetch.Fetcher.openObjectFromLocal(Fetcher.java:226)
   	at org.apache.druid.data.input.impl.prefetch.Fetcher.next(Fetcher.java:177)
   	at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory$2.next(PrefetchableTextFilesFirehoseFactory.java:239)
   	at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory$2.next(PrefetchableTextFilesFirehoseFactory.java:225)
   	at org.apache.druid.data.input.impl.FileIteratingFirehose.getNextLineIterator(FileIteratingFirehose.java:105)
   	at org.apache.druid.data.input.impl.FileIteratingFirehose.hasMore(FileIteratingFirehose.java:66)
   	at org.apache.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactoryTest.testReconnectWithCacheAndPrefetch(PrefetchableTextFilesFirehoseFactoryTest.java:356)
   ```
   
   Took a look a the test. The Firehose is completely separate from any of the areas I changed. Perhaps there is a race condition in the code, though brief inspection did not reveal the cause. The test works fine in Maven and in the IDE on my machine. Anyone seen this one fail previously? Added more detail to the exception in case if fails again. Let's not hold up reviews due to this test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers commented on pull request #12058: Response "trailer" support

Posted by GitBox <gi...@apache.org>.

paul-rogers commented on pull request #12058:
URL: https://github.com/apache/druid/pull/12058#issuecomment-996287590


   The pesky Firehose `testReconnectWithCacheAndPrefetch` test failed again:
   
   ```text
   java.lang.RuntimeException: java.util.concurrent.TimeoutException: Poll timeout after 8000 milliseconds(s)
   ```
   
   Pretty sure:
   
   * The Firehose code has concurrency issues. The code suggests we have race conditions: the question is if they are benign. The repeated flakey failures suggest that they are not benign.
   * Those issues are unrelated to this PR.
   
   So, should we fix the test as part of this PR? Block this PR pending a different PR to fix the issue? Keep trying a build until we get lucky?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] paul-rogers edited a comment on pull request #12058: Response "trailer" support

Posted by GitBox <gi...@apache.org>.

paul-rogers edited a comment on pull request #12058:
URL: https://github.com/apache/druid/pull/12058#issuecomment-996287590


   The pesky Firehose `testReconnectWithCacheAndPrefetch` test failed again:
   
   ```text
   java.lang.RuntimeException: java.util.concurrent.TimeoutException: Poll timeout after 8000 milliseconds(s)
   ```
   
   The timeout message was added in a previous commit in this PR. It appears the code has gotten itself into a deadlock.
   
   Pretty sure:
   
   * The Firehose code has concurrency issues. The code suggests we have race conditions: the question is if they are benign. The repeated flakey failures suggest that they are not benign.
   * Those issues are unrelated to this PR.
   
   So, should we fix the test as part of this PR? Block this PR pending a different PR to fix the issue? Keep trying a build until we get lucky?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] lgtm-com[bot] commented on pull request #12058: Response "trailer" support

Posted by GitBox <gi...@apache.org>.

lgtm-com[bot] commented on pull request #12058:
URL: https://github.com/apache/druid/pull/12058#issuecomment-992058945


   This pull request **introduces 4 alerts** when merging 80a91876db20352b4d5f52e52d784078a891732a into e53c3e80ca246c0bc9efd18d23858b791eb6e1a2 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-90b306f1beded31c8920a3d57923fd81df48e723)
   
   **new alerts:**
   
   * 2 for Spurious Javadoc @param tags
   * 2 for Dereferenced variable may be null


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org