You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "grundprinzip (via GitHub)" <gi...@apache.org> on 2024/01/24 01:25:34 UTC

[PR] [SPARK-44815][CONNECT]Cache df.schema to avoid extra RPC [spark]

grundprinzip opened a new pull request, #42499:
URL: https://github.com/apache/spark/pull/42499

   ### What changes were proposed in this pull request?
   This patch caches the result of the `df.schema` call in the DataFrame to avoid the extra roundtrip to the Spark Connect service to retrieve the columns or the schema. Since the Dataframe is immutable, the schema will not change.
   
   ### Why are the changes needed?
   Performance / Stability
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44815][CONNECT]Cache df.schema to avoid extra RPC [spark]

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell commented on PR #42499:
URL: https://github.com/apache/spark/pull/42499#issuecomment-1923736483

   Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] grundprinzip closed pull request #42499: [SPARK-44815][CONNECT]Cache df.schema to avoid extra RPC

Posted by "grundprinzip (via GitHub)" <gi...@apache.org>.
grundprinzip closed pull request #42499: [SPARK-44815][CONNECT]Cache df.schema to avoid extra RPC
URL: https://github.com/apache/spark/pull/42499


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-44815][CONNECT]Cache df.schema to avoid extra RPC [spark]

Posted by "hvanhovell (via GitHub)" <gi...@apache.org>.
hvanhovell closed pull request #42499: [SPARK-44815][CONNECT]Cache df.schema to avoid extra RPC
URL: https://github.com/apache/spark/pull/42499


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org