You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "rangadi (via GitHub)" <gi...@apache.org> on 2023/03/10 21:40:13 UTC

[GitHub] [spark] rangadi opened a new pull request, #40373: [Draft] Streaming Spark Connect POC

rangadi opened a new pull request, #40373:
URL: https://github.com/apache/spark/pull/40373

   [This is not meant for merge, but a preliminary POC for streaming support in Spark Connect].
   
   This includes basic functionality to run streaming queries over spark connect. 
   Expectation is 1:1 parity with standard streaming API.
   
   How to try it in local mode ( `./bin/pyspark --remote "local[*]"`)
   ```
   >>> 
   >>> query = ( 
   ...   spark
   ...     .readStream
   ...     .format("rate")
   ...     .option("numPartitions", "1")
   ...     .load()
   ...     .writeStream
   ...     .format("memory")
   ...     .queryName("rate_table")
   ...     .start()
   ... )
   >>> query.isActive
   True
   >>> query.status
   {'message': 'Waiting for data to arrive', 'isDataAvailable': False, 'isTriggerActive': False}
   >>> query.lastProgress
   {'id': 'c962c6f4-7fcf-494a-a16d-ec107701fe66', 'runId': '3f1b4c3d-648c-4da3-abe6-69b5b683d7ad', 'name': 'rate_table', 'timestamp': '2023-03-10T21:33:12.424Z', 'batchId': 19, 'numInputRows': 1, 'inputRowsPerSecond': 76.92307692307692, 'processedRowsPerSecond': 6.25, 'durationMs': {'addBatch': 11, 'commitOffsets': 73, 'getBatch': 0, 'latestOffset': 0, 'queryPlanning': 1, 'triggerExecution': 160, 'walCommit': 75}, 'stateOperators': [], 'sources': [{'description': 'RateStreamV2[rowsPerSecond=1, rampUpTimeSeconds=0, numPartitions=1', 'startOffset': 18, 'endOffset': 19, 'latestOffset': 19, 'numInputRows': 1, 'inputRowsPerSecond': 76.92307692307692, 'processedRowsPerSecond': 6.25}], 'sink': {'description': 'MemorySink', 'numOutputRows': 1}}
   
   >>> query.stop()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi closed pull request #40373: [Draft] Streaming Spark Connect POC

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi closed pull request #40373: [Draft] Streaming Spark Connect POC
URL: https://github.com/apache/spark/pull/40373


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on pull request #40373: [Draft] Streaming Spark Connect POC

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on PR #40373:
URL: https://github.com/apache/spark/pull/40373#issuecomment-1464526547

   cc: @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org