You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2022/01/19 09:07:17 UTC

[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787504326



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       I think mixing promises and callbacks in the same API can be confusing and prone to misuse.
   
   I think we should expose a different method for "streaming" or grouping into smaller sets of results, for example with async iterables:
   
   ```javascript
   const stream = client.stream(traversal);
   for await (const item of stream) {
     statement
   }
   ```
   
   or regular callbacks
   ```javascript
   client.forEach(traversal, item => {
    // called for each item
   }, err => {
     // called at the end or when there's an error
   });
   ```

##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js
##########
@@ -290,13 +296,31 @@ class Connection extends EventEmitter {
     }
     switch (response.status.code) {
       case responseStatusCode.noContent:
+        if (this._onDataMessageHandlers[response.requestId]) {
+          this._onDataMessageHandlers[response.requestId](
+            new ResultSet(utils.emptyArray, response.status.attributes)
+          );
+        }
         this._clearHandler(response.requestId);
         return handler.callback(null, new ResultSet(utils.emptyArray, response.status.attributes));
       case responseStatusCode.partialContent:
-        handler.result = handler.result || [];
-        handler.result.push.apply(handler.result, response.result.data);
+        if (this._onDataMessageHandlers[response.requestId]) {
+          this._onDataMessageHandlers[response.requestId](
+            new ResultSet(response.result.data, response.status.attributes)

Review comment:
       Maybe instead of having 4 resultsets, the user wants to access each individual item.

##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 

Review comment:
       Nice way to introduce the change 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org