You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "anishshri-db (via GitHub)" <gi...@apache.org> on 2023/07/20 20:56:16 UTC

[GitHub] [spark] anishshri-db opened a new pull request, #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

anishshri-db opened a new pull request, #42098:
URL: https://github.com/apache/spark/pull/42098

   ### What changes were proposed in this pull request?
   Unload provider thereby forcing DB instance close and releasing resources on maintenance task error
   
   ### Why are the changes needed?
   If we don't do the close, the DB instance and corresponding resources (memory, file descriptors etc) are always left open and the pointer to these objects is lost since loadedProviders is cleared.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Existing unit tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42098:
URL: https://github.com/apache/spark/pull/42098#issuecomment-1644974672

   Thanks! Merging to master/3.5.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42098:
URL: https://github.com/apache/spark/pull/42098#issuecomment-1644973541

   https://github.com/anishshri-db/spark/actions/runs/5617368113/job/15221357436
   CI failure which doesn't seem to be related. It was [Run / Build modules: pyspark-pandas-slow-connect](https://github.com/anishshri-db/spark/actions/runs/5617368113/job/15221396305#logs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] anishshri-db commented on pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #42098:
URL: https://github.com/apache/spark/pull/42098#issuecomment-1644845829

   > Looks like CI is failing. Could you please look into this? Feels like it might be related.
   
   yea so looks like it exposes another race condition.
   
   Basically in this case, the maintenance task is calling `close` but without any lock held. So it goes ahead and clears `db`. But the task for the partition might still be executing. In this case, we fail with NPE when trying to access `db` to get a system property.
   
   ```
   org.apache.spark.sql.streaming.StreamingQueryException: [STREAM_FAILED] Query [id = 2fb452cb-0813-4e7f-8212-1ed26d4b9488, runId = 18c3030d-4767-4d6e-b349-7fa74c6867d1] terminated with exception: Job aborted due to stage failure: Task 0 in stage 295.0 failed 1 times, most recent failure: Lost task 0.0 in stage 295.0 (TID 868) (localhost executor driver): java.lang.NullPointerException
   	at org.apache.spark.sql.execution.streaming.state.RocksDB.getDBProperty(RocksDB.scala:575)
   	at org.apache.spark.sql.execution.streaming.state.RocksDB.metrics(RocksDB.scala:491)
   	at org.apache.spark.sql.execution.streaming.state.RocksDB.$anonfun$commit$12(RocksDB.scala:395)
   	at org.apache.spark.internal.Logging.logInfo(Logging.scala:60)
   	at org.apache.spark.internal.Logging.logInfo$(Logging.scala:59)
   ```
   
   Updated code to require `close` to acquire and release the DB instance lock


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR closed pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error
URL: https://github.com/apache/spark/pull/42098


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42098:
URL: https://github.com/apache/spark/pull/42098#issuecomment-1644819254

   Looks like CI is failing. Could you please look into this? Feels like it might be related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] anishshri-db commented on pull request #42098: [SPARK-44504][SS] Unload provider thereby forcing DB instance close and releasing resources on maintenance task error

Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #42098:
URL: https://github.com/apache/spark/pull/42098#issuecomment-1644594745

   cc - @HeartSaVioR - PTAL, thanks !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org