You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "yaooqinn (via GitHub)" <gi...@apache.org> on 2024/01/30 07:24:10 UTC

[PR] [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects [spark]

yaooqinn opened a new pull request, #44948:
URL: https://github.com/apache/spark/pull/44948

### What changes were proposed in this pull request?

[SPARK-46747](https://issues.apache.org/jira/browse/SPARK-46747) reported an issue that Postgres instances suffered from too many shared locks, which was caused by Spark‘s get table exist query. In this PR, we supplanted "SELECT 1 FROM $table LIMIT 1"` with `"SELECT 1 FROM $table WHERE 1=0"` to prevent data from being scanned.

### Why are the changes needed?

overhead reduction for JDBC datasources

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

existing JDBC v1/v2 datasouce tests.

### Was this patch authored or co-authored using generative AI tooling?

no

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects [spark]

Posted by "bala-bellam (via GitHub)" <gi...@apache.org>.

bala-bellam commented on PR #44948:
URL: https://github.com/apache/spark/pull/44948#issuecomment-1921828574

   Thank you @yaooqinn . Really appreciate the quick revert on this.
   
   We are bound to older version because we are stuck with EMR. And we have been moving to EMR 6.8, which comes with 3.3.0. 
   
   These locks are really killing us as we have a few tables having thousands of partitions, we are working on finding some alternatives. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects [spark]

Posted by "bala-bellam (via GitHub)" <gi...@apache.org>.

bala-bellam commented on PR #44948:
URL: https://github.com/apache/spark/pull/44948#issuecomment-1916829355

   Thank you @yaooqinn , @dongjoon-hyun . Can this patch be pushed to older versions as well, thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects [spark]

Posted by "yaooqinn (via GitHub)" <gi...@apache.org>.

yaooqinn closed pull request #44948: [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects
URL: https://github.com/apache/spark/pull/44948


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

Re: [PR] [SPARK-46747][SQL] Avoid scan in getTableExistsQuery for JDBC Dialects [spark]

Posted by "dongjoon-hyun (via GitHub)" <gi...@apache.org>.

dongjoon-hyun commented on PR #44948:
URL: https://github.com/apache/spark/pull/44948#issuecomment-1921876371

   That's too bad. 
   > We are bound to older version because we are stuck with EMR. And we have been moving to EMR 6.8, which comes with 3.3.0.
   
   However, the latest EMR also provides Spark 3.5.0 (emr-7.0.0) and (Spark 3.4.1) emr-6.15.0. I believe AWS support team is capable of delivering next Apache Spark releases without any issues.
   - https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark.html
   
   Given that the situation, it seems that you have another non-Spark blocker issues in your production to block you from upgrading your EMR. Please try to fix it first. You need to catch up the latest EMP and Spark version first while you are waiting, @bala-bellam .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org