You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by sp...@apache.org on 2021/04/05 16:58:49 UTC
[tinkerpop] 07/07: TINKERPOP-2245 Documentation updates around UnifiedChannelizer

This is an automated email from the ASF dual-hosted git repository.

spmallette pushed a commit to branch TINKERPOP-2245
in repository https://gitbox.apache.org/repos/asf/tinkerpop.git

commit 26f45af09c6fef5885180b5b25893fa7689f8f1a
Author: Stephen Mallette <st...@amazon.com>
AuthorDate: Mon Apr 5 12:58:05 2021 -0400

    TINKERPOP-2245 Documentation updates around UnifiedChannelizer
---
 docs/src/reference/gremlin-applications.asciidoc | 40 +++++++++++++++----
 docs/src/upgrade/release-3.5.x.asciidoc          | 51 ++++++++++++++++--------
 2 files changed, 66 insertions(+), 25 deletions(-)

diff --git a/docs/src/reference/gremlin-applications.asciidoc b/docs/src/reference/gremlin-applications.asciidoc
index bac1ba3..12b2770 100644
--- a/docs/src/reference/gremlin-applications.asciidoc
+++ b/docs/src/reference/gremlin-applications.asciidoc
@@ -832,6 +832,10 @@ channelizer: org.apache.tinkerpop.gremlin.server.channel.HttpChannelizer
 [source,yaml]
 channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer
 
+NOTE: The `UnifiedChannelizer` introduced in 3.5.0 can also be used to support HTTP requests as its functionality
+is similar to `WsAndHttpChannelizer`. Please see the Gremlin Server UnifiedChannelizer Section of the Upgrade
+Documentation for 3.5.0 for more link:https://tinkerpop.apache.org/docs/current/upgrade/#_tinkerpop_3_5_0[details].
+
 The `HttpChannelizer` is already configured in the `gremlin-server-rest-modern.yaml` file that is packaged with the Gremlin
 Server distribution.  To utilize it, start Gremlin Server as follows:
 
@@ -966,7 +970,7 @@ The following table describes the various YAML configuration options that Gremli
 |authorization.config |A `Map` of configuration settings to be passed to the `Authorizer` when it is constructed.  The settings available are dependent on the implementation. |_none_
 |channelizer |The fully qualified classname of the `Channelizer` implementation to use.  A `Channelizer` is a "channel initializer" which Gremlin Server uses to define the type of processing pipeline to use.  By allowing different `Channelizer` implementations, Gremlin Server can support different communication protocols (e.g. WebSocket). |`WebSocketChannelizer`
 |enableAuditLog |The `AuthenticationHandler`, `AuthorizationHandler` and processors can issue audit logging messages with the authenticated user, remote socket address and requests with a gremlin query. For privacy reasons, the default value of this setting is false. The audit logging messages are logged at the INFO level via the `audit.org.apache.tinkerpop.gremlin.server` logger, which can be configured using the log4j.properties file. |_false_
-|graphManager |The fully qualified classname of the `GraphManager` implementation to use.  A `GraphManager` is a class that adheres to the TinkerPop `GraphManager` interface, allowing custom implementations for storing and managing graph references, as well as defining custom methods to open and close graphs instantiations. It is important to note that the TinkerPop HTTP and WebSocketChannelizers auto-commit and auto-rollback based on the graphs stored in the graphManager upon script exe [...]
+|graphManager |The fully qualified classname of the `GraphManager` implementation to use.  A `GraphManager` is a class that adheres to the TinkerPop `GraphManager` interface, allowing custom implementations for storing and managing graph references, as well as defining custom methods to open and close graphs instantiations. To prevent Gremlin Server from starting when all graphs fails, the `CheckedGraphManager` can be used.|`DefaultGraphManager`
 |graphs |A `Map` of `Graph` configuration files where the key of the `Map` becomes the name to which the `Graph` will be bound and the value is the file name of a `Graph` configuration file. |_none_
 |gremlinPool |The number of "Gremlin" threads available to execute actual scripts in a `ScriptEngine`. This pool represents the workers available to handle blocking operations in Gremlin Server. When set to `0`, Gremlin Server will use the value provided by `Runtime.availableProcessors()`. |0
 |host |The name of the host to bind the server to. |localhost
@@ -977,6 +981,7 @@ The following table describes the various YAML configuration options that Gremli
 |maxContentLength |The maximum length of the aggregated content for a message.  Works in concert with `maxChunkSize` where chunked requests are accumulated back into a single message.  A request exceeding this size will return a `413 - Request Entity Too Large` status code.  A response exceeding this size will raise an internal exception. |65536
 |maxHeaderSize |The maximum length of all headers. |8192
 |maxInitialLineLength |The maximum length of the initial line (e.g.  "GET / HTTP/1.0") processed in a request, which essentially controls the maximum length of the submitted URI. |4096
+|maxParameters |The maximum number of parameters that can be passed on a request. Larger numbers may impact performance for scripts. This configuration only applies to the `UnifiedChannelizer`. |16
 |metrics.consoleReporter.enabled |Turns on console reporting of metrics. |false
 |metrics.consoleReporter.interval |Time in milliseconds between reports of metrics to console. |180000
 |metrics.csvReporter.enabled |Turns on CSV reporting of metrics. |false
@@ -1009,6 +1014,7 @@ The following table describes the various YAML configuration options that Gremli
 |serializers |A `List` of `Map` settings, where each `Map` represents a `MessageSerializer` implementation to use along with its configuration. If this value is not set, then Gremlin Server will configure with GraphSON and GraphBinary but will not register any `ioRegistries` for configured graphs. |_empty_
 |serializers[X].className |The full class name of the `MessageSerializer` implementation. |_none_
 |serializers[X].config |A `Map` containing `MessageSerializer` specific configurations. |_none_
+|sessionLifetimeTimeout |The maximum time in milliseconds that a session can exist. This value cannot be extended beyond this value irrespective of the number of requests and their individual timeouts. The session life cannot be extended once started. This configuration only applies to the `UnifiedChannelizer`. |600000 (10 minutes)
 |ssl.enabled |Determines if SSL is turned on or not. |false
 |ssl.keyStore |The private key in JKS or PKCS#12 format.  |_none_
 |ssl.keyStorePassword |The password of the `keyStore` if it is password-protected. |_none_
@@ -1021,7 +1027,9 @@ The following table describes the various YAML configuration options that Gremli
 |strictTransactionManagement |Set to `true` to require `aliases` to be submitted on every requests, where the `aliases` become the scope of transaction management. |false
 |threadPoolBoss |The number of threads available to Gremlin Server for accepting connections. Should always be set to `1`. |1
 |threadPoolWorker |The number of threads available to Gremlin Server for processing non-blocking reads and writes. |1
-|useEpollEventLoop |try to use epoll event loops (works only on Linux os) instead of netty NIO. |false
+|useCommonEngineForSessions |Ensures that the same `ScriptEngine` is used to support sessions and sessionless requests which will lead to better performance. Do not change this setting from the default without a specific use case in mind. This configuration only applies to the `UnifiedChannelizer`. |true
+|useEpollEventLoop |Try to use epoll event loops (works only on Linux os) instead of netty NIO. |false
+|useGlobalFunctionCacheForSessions |Enable the global function cache for sessions when using the `UnifiedChannelizer`. When `true` it means that functions created in one request to a session remain available on the next request to that session. This setting is only relevant when `useGlobalFunctionCacheForSessions` is `false`. |true
 |writeBufferHighWaterMark | If the number of bytes in the network send buffer exceeds this value then the channel is no longer writeable, accepting no additional writes until buffer is drained and the `writeBufferLowWaterMark` is met. |65536
 |writeBufferLowWaterMark | Once the number of bytes queued in the network send buffer exceeds the `writeBufferHighWaterMark`, the channel will not become writeable again until the buffer is drained and it drops below this value. |65536
 |=========================================================
@@ -1031,6 +1039,9 @@ See the <<metrics,Metrics>> section for more information on how to configure Gan
 [[opprocessor-configurations]]
 ==== OpProcessor Configurations
 
+IMPORTANT: The `UnifiedChannelizer` does not rely on `OpProcessor` infrastructure. If using that channelizer, these
+configuration options can be ignored.
+
 An `OpProcessor` provides a way to plug-in handlers to Gremlin Server's processing flow. Gremlin Server uses this
 plug-in system itself to expose the packaged functionality that it exposes. Configurations can be supplied to an
 `OpProcessor` through the `processors` key in the Gremlin Server configuration file. Each `OpProcessor` can take a
@@ -2047,22 +2058,35 @@ expected workload. More discussion on this topic can be found in the <<parameter
 Section below.
 * When configuring the size of `threadPoolWorker` start with the default of `1` and increment by one as needed to a
 maximum of `2*number of cores`.
-* The "right" size of the `gremlinPool` setting is somewhat dependent on the type of scripts that will be processed
+* The "right" size of the `gremlinPool` setting is somewhat dependent on the type of requests that will be processed
 by Gremlin Server.  As requests arrive to Gremlin Server they are decoded and queued to be processed by threads in
 this pool.  When this pool is exhausted of threads, Gremlin Server will continue to accept incoming requests, but
 the queue will continue to grow.  If left to grow too large, the server will begin to slow.  When tuning around
 this setting, consider whether the bulk of the scripts being processed will be "fast" or "slow", where "fast"
 generally means being measured in the low hundreds of milliseconds and "slow" means anything longer than that.
-* Scripts that are "slow" can really hurt Gremlin Server if they are not properly accounted for.  `ScriptEngine`
-evaluations are blocking operations that aren't always easily interrupted, so once a "slow" script is being evaluated in
-the context of a `ScriptEngine` it must finish its work.  Lots of "slow" scripts will eventually consume the
-`gremlinPool` preventing other scripts from getting processed from the queue.
+* Requests that are "slow" can really hurt Gremlin Server if they are not properly accounted for. Since these requests
+block a thread until the job is complete or successfully interrupted, lots of long-run requests will eventually consume
+the `gremlinPool` preventing other requests from getting processed from the queue.
 ** To limit the impact of this problem, consider properly setting the `evaluationTimeout` to something "sane".
 In other words, test the traversals being sent to Gremlin Server and determine the maximum time they take to evaluate
-and iterate over results, then set the timeout value accordingly.
+and iterate over results, then set the timeout value accordingly. Also, consider setting a shorter global timeout for
+requests and then use longer per-request timeouts for those specific ones that might execute at a longer rate.
 ** Note that `evaluationTimeout` can only attempt to interrupt the evaluation on timeout.  It allows Gremlin
 Server to "ignore" the result of that evaluation, which means the thread in the `gremlinPool` that did the evaluation
 may still be consumed after the timeout if interruption does not succeed on the thread.
+* When using sessions, there are different options to consider depending on the `Channelizer` implementation being
+used:
+** `WebSocketChannelizer` and `WsAndHttpChannelizer` - Both of these channelizers use the `gremlinPool` only for
+sessionless requests and construct a single threaded pool for each session created. In this way, these channelizers
+tend to optimize sessions to be long-lived. For short-lived sessions, which may be typical when using bytecode based
+remote transactions, quickly creating and destroying these sessions can be expensive. It is likely that there will be
+increased garbage collection times and frequency as well as a general increase in overall server processing.
+** `UnifiedChannelizer` - The threads of the `gremlinPool` are used to service both sessions and sessionless requests.
+With a common thread pool, this channelizer is a better choice when using lots of short-lived sessions as compared to
+`WebSocketChannelizer` and `WsAndHttpChannelizer`, because there is less cost in starting and stopping sessions. It is
+important though to understand the expected workload for the server and plan the size accordingly to ensure that the
+server does not need to wait for an extended period of time for a thread to be available to process the queue of
+incoming requests.
 * Graph element serialization for `Vertex` and `Edge` can be expensive, as their data structures are complex given the
 possible existence of multi-properties and meta-properties. When returning data from Gremlin Server only return the
 data that is required. For example, if only two properties of a `Vertex` are needed then simply return the two rather
diff --git a/docs/src/upgrade/release-3.5.x.asciidoc b/docs/src/upgrade/release-3.5.x.asciidoc
index 0663891..1537ec9 100644
--- a/docs/src/upgrade/release-3.5.x.asciidoc
+++ b/docs/src/upgrade/release-3.5.x.asciidoc
@@ -410,23 +410,6 @@ these values are not hashable and will result in an error. By introducing a `Has
 See: link:https://issues.apache.org/jira/browse/TINKERPOP-2395[TINKERPOP-2395],
 link:https://issues.apache.org/jira/browse/TINKERPOP-2407[TINKERPOP-2407]
 
-==== Gremlin Server UnifiedChannelizer
-
-Just some notes for later:
-
-* UnifiedChannelizer technically replaces all existing implementations but is not yet the default
-* Some new settings related to it: maxParameters, sessionLifeTimeout, useGlobalFunctionCacheForSessions, useCommonEngineForSessions
-* Session behavior shifts slightly under this channelizer for async calls, where a failure will mean that the session
-will close, remaining requests in the queue will be ignored and rollback will occur.
-* care should be take with strict transaction management and multi-graph transactions (which aren't real - not a new thing)
-* absolute max lifetime of a session is a new thing
-* transaction semantic under unified
-** user manually calls commit() commits transaction
-** user manually calls rollback()
-** user manually calls close() on Cluster
-** user manually calls close() on Tx or GraphTraversalSource spawned from Transaction
-** server error
-
 ==== Gremlin Server Audit Logging
 
 The `authentication.enableAuditlog` configuration property is deprecated, but replaced by the `enableAuditLog` property
@@ -491,6 +474,40 @@ future releases on the 3.5.x line.
 See: link:https://issues.apache.org/jira/browse/TINKERPOP-2537[TINKERPOP-2537],
 link:https://tinkerpop.apache.org/docs/current/reference/#transactions[Reference Documentation - Transactions]
 
+==== Gremlin Server UnifiedChannelizer
+
+Gremlin Server uses a `Channelizer` abstraction to configure different Netty pipelines which can then offer different
+server behaviors. Most commonly, users configure the `WebSocketChannelizer` to enable the websocket protocol to which
+the various language drivers can connect.
+
+TinkerPop 3.5.0 introduces a new `Channelizer` implementation called the `UnifiedChannelizer`. This channelizer is
+somewhat similar to the `WsAndHttpChannelizer` in that combines websocket and standard HTTP protocols in the server,
+but it provides a new and improved thread management approach as well as a more streamlined execution model. The
+`UnifiedChannelizer` technically replaces all existing implementations, but is not yet configured by default in Gremlin
+Server. To use it, modify the `channelizer` setting in the server yaml file as follows:
+
+```[source,yaml]
+----
+channelizer: org.apache.tinkerpop.gremlin.server.channel.UnifiedChannelizer
+----
+
+As the `UnifiedChannelizer` is tested further, it will eventually become the default implementation. It may however
+be the preferred channelizer when using large numbers of short-lived sessions as the the threading model of the
+`UnifiedChannelizer` is better suited for such situations. If using this new channelizer, there are a few considerations
+to keep in mind:
+
+* The `UnifiedChannelizer` does not use the `OpProcessor` infrastructure, therefore those
+link:https://tinkerpop.apache.org/docs/3.5.0/reference/#opprocessor-configurations[configurations] are no longer
+relevant and can be ignored.
+
+it is important to read about
+the `gremlinPool` setting in the link:https://tinkerpop.apache.org/docs/3.5.0/reference/#_tuning[Tuning Section] of
+the reference documentation and to look into the link:https://tinkerpop.apache.org/docs/3.5.0/reference/#_configuring_2[new configurations]
+available related to this channelizer: `maxParameters`, `sessionLifeTimeout`, `useGlobalFunctionCacheForSessions`, and
+`useCommonEngineForSessions`.
+
+See: link:https://issues.apache.org/jira/browse/TINKERPOP-2245[TINKERPOP-2245]
+
 ==== Retry Conditions
 
 Some error conditions are temporary in nature and therefore an operation that ends in such a situation may be tried