You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "rangadi (via GitHub)" <gi...@apache.org> on 2023/03/10 21:40:13 UTC
[GitHub] [spark] rangadi opened a new pull request, #40373: [Draft] Streaming Spark Connect POC
rangadi opened a new pull request, #40373:
URL: https://github.com/apache/spark/pull/40373
[This is not meant for merge, but a preliminary POC for streaming support in Spark Connect].
This includes basic functionality to run streaming queries over spark connect.
Expectation is 1:1 parity with standard streaming API.
How to try it in local mode ( `./bin/pyspark --remote "local[*]"`)
```
>>>
>>> query = (
... spark
... .readStream
... .format("rate")
... .option("numPartitions", "1")
... .load()
... .writeStream
... .format("memory")
... .queryName("rate_table")
... .start()
... )
>>> query.isActive
True
>>> query.status
{'message': 'Waiting for data to arrive', 'isDataAvailable': False, 'isTriggerActive': False}
>>> query.lastProgress
{'id': 'c962c6f4-7fcf-494a-a16d-ec107701fe66', 'runId': '3f1b4c3d-648c-4da3-abe6-69b5b683d7ad', 'name': 'rate_table', 'timestamp': '2023-03-10T21:33:12.424Z', 'batchId': 19, 'numInputRows': 1, 'inputRowsPerSecond': 76.92307692307692, 'processedRowsPerSecond': 6.25, 'durationMs': {'addBatch': 11, 'commitOffsets': 73, 'getBatch': 0, 'latestOffset': 0, 'queryPlanning': 1, 'triggerExecution': 160, 'walCommit': 75}, 'stateOperators': [], 'sources': [{'description': 'RateStreamV2[rowsPerSecond=1, rampUpTimeSeconds=0, numPartitions=1', 'startOffset': 18, 'endOffset': 19, 'latestOffset': 19, 'numInputRows': 1, 'inputRowsPerSecond': 76.92307692307692, 'processedRowsPerSecond': 6.25}], 'sink': {'description': 'MemorySink', 'numOutputRows': 1}}
>>> query.stop()
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] rangadi closed pull request #40373: [Draft] Streaming Spark Connect POC
Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi closed pull request #40373: [Draft] Streaming Spark Connect POC
URL: https://github.com/apache/spark/pull/40373
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] rangadi commented on pull request #40373: [Draft] Streaming Spark Connect POC
Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on PR #40373:
URL: https://github.com/apache/spark/pull/40373#issuecomment-1464526547
cc: @HeartSaVioR
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org