You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2022/01/05 16:28:55 UTC

[GitHub] [tinkerpop] tkolanko opened a new pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

tkolanko opened a new pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539


   This PR implements a change to javascript driver for allowing an optional callback which will be run with the result set of each chunk of data returned from the gremlin server rather than waiting for the entire query to finish.
   
   docker/build.sh -t -i -n passes all tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r818658387



##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -73,41 +72,118 @@ class Client {
     return this._connection.isOpen;
   }
 
+  /**
+   * Configuration specific to the current request.
+   * @typedef {Object} RequestOptions
+   * @property {String} requestId - User specified request identifier which must be a UUID.
+   * @property {Number} batchSize - Indicates whether the Power component is present.
+   * @property {String} userAgent - The size in which the result of a request is to be "batched" back to the client
+   * @property {Number} evaluationTimeout - The timeout for the evaluation of the request.
+   */
+
   /**
    * Send a request to the Gremlin Server, can send a script or bytecode steps.
    * @param {Bytecode|string} message The bytecode or script to send
    * @param {Object} [bindings] The script bindings, if any.
-   * @param {Object} [requestOptions] Configuration specific to the current request.
-   * @param {String} [requestOptions.requestId] User specified request identifier which must be a UUID.
-   * @param {Number} [requestOptions.batchSize] The size in which the result of a request is to be "batched" back to the client
-   * @param {String} [requestOptions.userAgent] A custom string that specifies to the server where the request came from.
-   * @param {Number} [requestOptions.evaluationTimeout] The timeout for the evaluation of the request.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the current request.
    * @returns {Promise}
    */
   submit(message, bindings, requestOptions) {
-    const requestIdOverride = requestOptions && requestOptions.requestId
-    if (requestIdOverride) delete requestOptions['requestId'];
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
+
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
+
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
+    }
+
+    if (message instanceof Bytecode) {
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.submit(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      } else {
+        return this._connection.submit(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      }
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.submit(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );
+    } else {
+      throw new TypeError("message must be of type Bytecode or string");
+    }
+  }
+
+  /**
+   * Send a request to the Gremlin Server and receive a stream for the results, can send a script or bytecode steps.
+   * @param {Bytecode|string} message The bytecode or script to send
+   * @param {Object} [bindings] The script bindings, if any.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the current request.
+   * @returns {ReadableStream}
+   */
+  stream(message, bindings, requestOptions) {
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
 
-    const args = Object.assign({
-      gremlin: message,
-      aliases: { 'g': this._options.traversalSource || 'g' }
-    }, requestOptions)
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
 
-    if (this._options.session && this._options.processor === 'session') {
-      args['session'] = this._options.session;
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
     }
 
     if (message instanceof Bytecode) {
-      if (this._options.session && this._options.processor === 'session') {
-        return this._connection.submit('session', 'bytecode', args, requestIdOverride);
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.stream(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       } else {
-        return this._connection.submit('traversal', 'bytecode', args, requestIdOverride);
+        return this._connection.stream(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       }
-    } else if (typeof message === 'string') {
-      args['bindings'] = bindings;
-      args['language'] = 'gremlin-groovy';
-      args['accept'] = this._connection.mimeType;
-      return this._connection.submit(this._options.processor || '','eval', args, requestIdOverride);
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.stream(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );

Review comment:
       That's what I originally did in https://github.com/apache/tinkerpop/pull/1539/commits/fe1e468945ea3a36c53c2b452627bfffd20e0239 but it ended up breaking all the client tests that overwrote the internal connection object and I couldn't figure out why. I'll take another look to see if I can get the tests to pass with that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787809380



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       I think from the user POV, having a thing (`ResultSet`, `Stream`, ...) that represents the result of the execution is more familiar than having 1 per "batch". The batch is a low level thing.
   
   There are several db client libraries that use this approach (blocking iterators in other languages, async iterators in node.js, streams or callbacks). I [worked on the paging api for Cassandra](https://github.com/datastax/nodejs-driver/tree/master/doc/features/paging) that uses several of these approaches (see [here for example](https://github.com/datastax/nodejs-driver#row-streaming-and-pipes)), but these are popular patterns across node.js db client libraries (see [mysql](https://github.com/mysqljs/mysql#streaming-query-rows))
   
   I think that we should return an object that fires events or a stream of result items, we should clearly signal when all the data has been received or when there was an error. If we use callbacks, I think we shouldn't mix it with promises to get the error.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787798256



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       > we don't operate on traversals
   
   I used the variable name `traversal` for the string script containing the traversal, I edited the code example for clarity.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r791557537



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       Nice!
   I'll try to review it this week 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r812367486



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       Some of the existing tests don't like the changes I made to reduce duplicate code from the original submit and the new stream method. I'll back out of those and leave the duplicated code for now; it's not that much. If we want to have this fixed up for 3.5.3 I can take a crack at it this weekend.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787817459



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       I think an async iterator is harder to implement in this case given that the tinkerpop protocol doesn't allow to fetch the next page (it will continue to serve the next "partial content").
   
   I think an event emitter (emitter of result items) or a stream fit better in this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1060337675


   I'll leave it open for a few more hours, to give the opportunity for other committers to chime in.
   
   If there aren't any remarks by then, I'll merge it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r818659117



##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -46,13 +45,13 @@ class Client {
    */
   constructor(url, options = {}) {
     this._options = options;
-    if (this._options.processor === 'session') {
+    if (this._options.processor === "session") {

Review comment:
       Yeah, it's most likely my editor. I'll get these fixed up
   
   Would you be willing to accept a seperate pr that enabled eslint with airbnb config + prettier and just accept it's formatting?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1058278224


   @spmallette @jorgebay Here's where we are with this PR:
   
   1. I've fixed up the formatting issues and will create a new JIRA ticket to implement a standard for linting the JS files
   2. The tests are passing, however, there is a duplicated block of code that should be extracted out. The current client tests are overriding an internal object and they did not play nice with having that block of code extracted out. I need more time to figure out why that is. We can either hold off on this PR and fixed that up or we can merge this PR in and I can create a JIRA ticket to figure out a better way to remove the duplicated block of code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1006206668


   Thanks for submitting this - it may have to wait for 3.5.3 though as we're getting ready for release. Is that ok for you?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r812367486



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       Some of the existing tests don't like the changes I made to reduce duplicate code the original submit and the new stream method. I'll back out of those and leave the duplicated code for now; it's not that much. If we want to have this fixed up for 3.5.3 I can take a crack at it this weekend.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787801589



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       ah ok, let me take a look at how complex it would be to add an async iterator to websocket messages. Through a quick google search there are [some](https://github.com/alanshaw/it-ws) [libraries](https://github.com/alanshaw/it-ws) but they don't seem to be maintained




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787791909



##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js
##########
@@ -290,13 +296,31 @@ class Connection extends EventEmitter {
     }
     switch (response.status.code) {
       case responseStatusCode.noContent:
+        if (this._onDataMessageHandlers[response.requestId]) {
+          this._onDataMessageHandlers[response.requestId](
+            new ResultSet(utils.emptyArray, response.status.attributes)
+          );
+        }
         this._clearHandler(response.requestId);
         return handler.callback(null, new ResultSet(utils.emptyArray, response.status.attributes));
       case responseStatusCode.partialContent:
-        handler.result = handler.result || [];
-        handler.result.push.apply(handler.result, response.result.data);
+        if (this._onDataMessageHandlers[response.requestId]) {
+          this._onDataMessageHandlers[response.requestId](
+            new ResultSet(response.result.data, response.status.attributes)

Review comment:
       That would mean different parsing would have to take place between partial content and final parsing/if the total results were less than the batch size.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r790838160



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       @jorgebay I implemented a new `client.stream` which returns a readable stream. Can you take a look and let me know what you think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r818681184



##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -73,41 +72,118 @@ class Client {
     return this._connection.isOpen;
   }
 
+  /**
+   * Configuration specific to the current request.
+   * @typedef {Object} RequestOptions
+   * @property {String} requestId - User specified request identifier which must be a UUID.
+   * @property {Number} batchSize - Indicates whether the Power component is present.
+   * @property {String} userAgent - The size in which the result of a request is to be "batched" back to the client
+   * @property {Number} evaluationTimeout - The timeout for the evaluation of the request.
+   */
+
   /**
    * Send a request to the Gremlin Server, can send a script or bytecode steps.
    * @param {Bytecode|string} message The bytecode or script to send
    * @param {Object} [bindings] The script bindings, if any.
-   * @param {Object} [requestOptions] Configuration specific to the current request.
-   * @param {String} [requestOptions.requestId] User specified request identifier which must be a UUID.
-   * @param {Number} [requestOptions.batchSize] The size in which the result of a request is to be "batched" back to the client
-   * @param {String} [requestOptions.userAgent] A custom string that specifies to the server where the request came from.
-   * @param {Number} [requestOptions.evaluationTimeout] The timeout for the evaluation of the request.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the current request.
    * @returns {Promise}
    */
   submit(message, bindings, requestOptions) {
-    const requestIdOverride = requestOptions && requestOptions.requestId
-    if (requestIdOverride) delete requestOptions['requestId'];
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
+
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
+
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
+    }
+
+    if (message instanceof Bytecode) {
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.submit(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      } else {
+        return this._connection.submit(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      }
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.submit(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );
+    } else {
+      throw new TypeError("message must be of type Bytecode or string");
+    }
+  }
+
+  /**
+   * Send a request to the Gremlin Server and receive a stream for the results, can send a script or bytecode steps.
+   * @param {Bytecode|string} message The bytecode or script to send
+   * @param {Object} [bindings] The script bindings, if any.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the current request.
+   * @returns {ReadableStream}
+   */
+  stream(message, bindings, requestOptions) {
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
 
-    const args = Object.assign({
-      gremlin: message,
-      aliases: { 'g': this._options.traversalSource || 'g' }
-    }, requestOptions)
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
 
-    if (this._options.session && this._options.processor === 'session') {
-      args['session'] = this._options.session;
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
     }
 
     if (message instanceof Bytecode) {
-      if (this._options.session && this._options.processor === 'session') {
-        return this._connection.submit('session', 'bytecode', args, requestIdOverride);
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.stream(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       } else {
-        return this._connection.submit('traversal', 'bytecode', args, requestIdOverride);
+        return this._connection.stream(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       }
-    } else if (typeof message === 'string') {
-      args['bindings'] = bindings;
-      args['language'] = 'gremlin-groovy';
-      args['accept'] = this._connection.mimeType;
-      return this._connection.submit(this._options.processor || '','eval', args, requestIdOverride);
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.stream(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );

Review comment:
       It's just for maintainability but if you see that's too much work, we can move forward without extracting a common method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay merged pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay merged pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1057472443


   > @tkolanko i just thought i'd follow up on this PR which seems to pass CI builds now. is this done from your perspective?
   
   Yeah, good on my side if @jorgebay is good with it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r818437551



##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -46,13 +45,13 @@ class Client {
    */
   constructor(url, options = {}) {
     this._options = options;
-    if (this._options.processor === 'session') {
+    if (this._options.processor === "session") {

Review comment:
       We don't have a linter for the project but we should follow the rest of the other files as conventions.
   
   There are several line changes related to strings and overriding existing formatting, it's likely that it's your IDE but in any case, we should avoid them to avoid history noise.
   
   In general, you can check out airbnb's style guide: https://github.com/airbnb/javascript#strings
   
   Would it be possible to revert the unnecessary style changes?

##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1733,6 +1733,59 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can return a readable stream instead of waiting for the entire result set to be loaded.
+
+[source,javascript]
+----
+
+const readable =  client.stream("g.V()", {}, { batchSize: 25 });
+
+readable.on('data', (data) => {
+  console.log(data.toArray()); // 25 vertices
+})
+
+readable.on('error', (error) => {
+  console.log(error); // errors returned from gremlin server
+})
+
+readable.on('end', () => {
+  console.log('query complete'); // when the end event is received then all the results have been processed
+})
+----
+
+If you are using NodeJS >= 10.0, you can asynchronously iterate readable streams:
+
+
+[source,javascript]
+----
+
+const readable = client.stream("g.V()", {}, { batchSize: 25 });
+
+try {
+  for await (const result of readable) {
+      console.log('data', result.toArray()); // 25 vertices

Review comment:
       NIT: spacing :)

##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -73,41 +72,118 @@ class Client {
     return this._connection.isOpen;
   }
 
+  /**
+   * Configuration specific to the current request.
+   * @typedef {Object} RequestOptions
+   * @property {String} requestId - User specified request identifier which must be a UUID.
+   * @property {Number} batchSize - Indicates whether the Power component is present.
+   * @property {String} userAgent - The size in which the result of a request is to be "batched" back to the client
+   * @property {Number} evaluationTimeout - The timeout for the evaluation of the request.
+   */
+
   /**
    * Send a request to the Gremlin Server, can send a script or bytecode steps.
    * @param {Bytecode|string} message The bytecode or script to send
    * @param {Object} [bindings] The script bindings, if any.
-   * @param {Object} [requestOptions] Configuration specific to the current request.
-   * @param {String} [requestOptions.requestId] User specified request identifier which must be a UUID.
-   * @param {Number} [requestOptions.batchSize] The size in which the result of a request is to be "batched" back to the client
-   * @param {String} [requestOptions.userAgent] A custom string that specifies to the server where the request came from.
-   * @param {Number} [requestOptions.evaluationTimeout] The timeout for the evaluation of the request.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the current request.
    * @returns {Promise}
    */
   submit(message, bindings, requestOptions) {
-    const requestIdOverride = requestOptions && requestOptions.requestId
-    if (requestIdOverride) delete requestOptions['requestId'];
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
+
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
+
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
+    }
+
+    if (message instanceof Bytecode) {
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.submit(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      } else {
+        return this._connection.submit(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
+      }
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.submit(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );
+    } else {
+      throw new TypeError("message must be of type Bytecode or string");
+    }
+  }
+
+  /**
+   * Send a request to the Gremlin Server and receive a stream for the results, can send a script or bytecode steps.
+   * @param {Bytecode|string} message The bytecode or script to send
+   * @param {Object} [bindings] The script bindings, if any.
+   * @param {RequestOptions} [requestOptions] Configuration specific to the current request.
+   * @returns {ReadableStream}
+   */
+  stream(message, bindings, requestOptions) {
+    const requestIdOverride = requestOptions && requestOptions.requestId;
+    if (requestIdOverride) delete requestOptions["requestId"];
 
-    const args = Object.assign({
-      gremlin: message,
-      aliases: { 'g': this._options.traversalSource || 'g' }
-    }, requestOptions)
+    const args = Object.assign(
+      {
+        gremlin: message,
+        aliases: { g: this._options.traversalSource || "g" },
+      },
+      requestOptions
+    );
 
-    if (this._options.session && this._options.processor === 'session') {
-      args['session'] = this._options.session;
+    if (this._options.session && this._options.processor === "session") {
+      args["session"] = this._options.session;
     }
 
     if (message instanceof Bytecode) {
-      if (this._options.session && this._options.processor === 'session') {
-        return this._connection.submit('session', 'bytecode', args, requestIdOverride);
+      if (this._options.session && this._options.processor === "session") {
+        return this._connection.stream(
+          "session",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       } else {
-        return this._connection.submit('traversal', 'bytecode', args, requestIdOverride);
+        return this._connection.stream(
+          "traversal",
+          "bytecode",
+          args,
+          requestIdOverride
+        );
       }
-    } else if (typeof message === 'string') {
-      args['bindings'] = bindings;
-      args['language'] = 'gremlin-groovy';
-      args['accept'] = this._connection.mimeType;
-      return this._connection.submit(this._options.processor || '','eval', args, requestIdOverride);
+    } else if (typeof message === "string") {
+      args["bindings"] = bindings;
+      args["language"] = "gremlin-groovy";
+      args["accept"] = this._connection.mimeType;
+      return this._connection.stream(
+        this._options.processor || "",
+        "eval",
+        args,
+        requestIdOverride
+      );

Review comment:
       this code block is identical to the previous one, except from the underlying method that it's called.
   
   Would it be possible to extract to something like `executeOnConnection()` or something like that, where the connection method is a parameter as well?
   
   For example, it will have:
   ```javascript
   return this._connection.[methodName](/* ... */)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r818437551



##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -46,13 +45,13 @@ class Client {
    */
   constructor(url, options = {}) {
     this._options = options;
-    if (this._options.processor === 'session') {
+    if (this._options.processor === "session") {

Review comment:
       We don't have a linter for the project but we should follow the rest of the other files as conventions.
   
   There are several line changes related to strings and overriding existing formatting, it's likely that it's your IDE but in any case, we should avoid them to reduce history noise.
   
   In general, you can check out airbnb's style guide: https://github.com/airbnb/javascript#strings
   
   Would it be possible to revert the unnecessary style changes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1058046283


   @spmallette I think this will need to wait until after 3.5.3 so I can make the changes @jorgebay requested


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787504326



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       I think mixing promises and callbacks in the same API can be confusing and prone to misuse.
   
   I think we should expose a different method for "streaming" or grouping into smaller sets of results, for example with async iterables:
   
   ```javascript
   const stream = client.stream(traversal);
   for await (const item of stream) {
     statement
   }
   ```
   
   or regular callbacks
   ```javascript
   client.forEach(traversal, item => {
    // called for each item
   }, err => {
     // called at the end or when there's an error
   });
   ```

##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/connection.js
##########
@@ -290,13 +296,31 @@ class Connection extends EventEmitter {
     }
     switch (response.status.code) {
       case responseStatusCode.noContent:
+        if (this._onDataMessageHandlers[response.requestId]) {
+          this._onDataMessageHandlers[response.requestId](
+            new ResultSet(utils.emptyArray, response.status.attributes)
+          );
+        }
         this._clearHandler(response.requestId);
         return handler.callback(null, new ResultSet(utils.emptyArray, response.status.attributes));
       case responseStatusCode.partialContent:
-        handler.result = handler.result || [];
-        handler.result.push.apply(handler.result, response.result.data);
+        if (this._onDataMessageHandlers[response.requestId]) {
+          this._onDataMessageHandlers[response.requestId](
+            new ResultSet(response.result.data, response.status.attributes)

Review comment:
       Maybe instead of having 4 resultsets, the user wants to access each individual item.

##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 

Review comment:
       Nice way to introduce the change 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1006572785


   @spmallette that works just fine - thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r812361577



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       we're getting a bit closer to 3.5.3 code freeze i think. i was wondering if there was anything left to do on this PR to have it ready for merge. cc/ @jorgebay 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1055330487


   @tkolanko i just thought i'd follow up on this PR which seems to pass CI builds now. is this done from your perspective?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r791557537



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       Nice!
   I'll try to review it this week 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#issuecomment-1060815702


   Landed in https://github.com/apache/tinkerpop/commit/14b1fe003f41ef464090506f9d6e89f58a3f53c9 on `3.5-dev`
   
   Thanks @tkolanko !!!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r818682368



##########
File path: gremlin-javascript/src/main/javascript/gremlin-javascript/lib/driver/client.js
##########
@@ -46,13 +45,13 @@ class Client {
    */
   constructor(url, options = {}) {
     this._options = options;
-    if (this._options.processor === 'session') {
+    if (this._options.processor === "session") {

Review comment:
       Sure, we feel free to create a JIRA ticket.
   My main concern is flattening the history too much, so we can start with eslint w/ few presets, it's fine by me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] tkolanko commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
tkolanko commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787792222



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       Our use case might be different, we don't operate on traverals but through submitting scripts. We have a front end user interface where users can enter queries with auto complete, intellisense etc etc along with some options. The frontend POSTS the query and options to our backend which submits the script, parses the result set and returns a result to the frontend.
   
   
   I went with this approach so we could so something like
   ```js
   import {parseResultSet} from './parser';
   
   ...code to handle parsing the POST request and extracting what we need...
   const output = {...} // object that holds our parsed data
   try {
     await client.submit(query, bindings, options, (data) => parseResultSet(data, output))
   } catch(e) {
     ...additional error handling/clean up...
    throw new Error(e);
   }
   return res.json(output))
   ```
   
   This seemed like an easier opt in path than trying to handle async iteration on websocket messages
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] jorgebay commented on a change in pull request #1539: TINKERPOP-2679 update javascript driver to support stream processing

Posted by GitBox <gi...@apache.org>.
jorgebay commented on a change in pull request #1539:
URL: https://github.com/apache/tinkerpop/pull/1539#discussion_r787504326



##########
File path: docs/src/reference/gremlin-variants.asciidoc
##########
@@ -1721,6 +1721,32 @@ IMPORTANT: The preferred method for setting a per-request timeout for scripts is
 with bytecode may try `g.with(EVALUATION_TIMEOUT, 500)` within a script. Scripts with multiple traversals and multiple
 timeouts will be interpreted as a sum of all timeouts identified in the script for that request.
 
+
+==== Processing results as they are returned from the Gremlin server
+
+
+The Gremlin JavaScript driver maintains a WebSocket connection to the Gremlin server and receives messages according to the `batchSize` parameter on the per request settings or the `resultIterationBatchSize` value configured for the Gremlin server. When submitting scripts the default behavior is to wait for the entire result set to be returned from a query before allowing any processing on the result set. 
+
+The following examples assume that you have 100 vertices in your graph.
+
+[source,javascript]
+----
+const result = await client.submit("g.V()");
+console.log(result.toArray()); // 100 - all the vertices in your graph
+----
+
+When working with larger result sets it may be beneficial for memory management to process each chunk of data as it is returned from the gremlin server. The Gremlin JavaScript driver can accept an optional callback to run on each chunk of data returned.
+
+[source,javascript]
+----
+
+await client.submit("g.V()", {}, { batchSize: 25 }, (data) => {

Review comment:
       I think mixing promises and callbacks in the same API can be confusing and prone to misuse.
   
   I think we should expose a different method for "streaming" or grouping into smaller sets of results, for example with async iterables:
   
   ```javascript
   const stream = client.stream('g.V()');
   for await (const item of stream) {
     statement
   }
   ```
   
   or regular callbacks
   ```javascript
   client.forEach('g.V()', item => {
    // called for each item
   }, err => {
     // called at the end or when there's an error
   });
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org