You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kent Yao (Jira)" <ji...@apache.org> on 2024/01/30 03:42:00 UTC
[jira] [Commented] (SPARK-46747) Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1

    [ https://issues.apache.org/jira/browse/SPARK-46747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17812135#comment-17812135 ] 

Kent Yao commented on SPARK-46747:
----------------------------------

It would be better if you could provide the stats of # of shared locks before and after.

> Too Many Shared Locks due to PostgresDialect.getTableExistsQuery - LIMIT 1
> --------------------------------------------------------------------------
>
>                 Key: SPARK-46747
>                 URL: https://issues.apache.org/jira/browse/SPARK-46747
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.2.3, 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 2.4.5, 2.4.6, 2.4.7, 2.4.8, 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.3, 3.4.2, 3.3.2, 3.4.0, 3.4.1, 3.5.0, 3.3.4
>            Reporter: Bala Bellam
>            Priority: Critical
>
> +*Background:*+
> PostgresDialect.getTableExistsQuery is using LIMIT 1 query to check the table existence in the database by overriding the default JdbcDialect.getTableExistsQuery which has WHERE 1 = 0.
> +*Issue:*+
> Due to LIMIT 1 query pattern, we are seeing high number of shared locks in the PostgreSQL installations where there are many partitions under a table that's being written to. Hence resorting to the default JdbcDialect which does WHERE 1 = 0 is proven to be more optimal as it doesn't scan any of the partitions and effectively checks for table existence.
> The SELECT 1 FROM table LIMIT 1 query can indeed be heavier in certain scenarios, especially with partitioned tables or tables with a lot of data, as it may take shared locks on all partitions or involve more planner and execution time to determine the quickest way to get a single row.
> On the other hand, SELECT 1 FROM table WHERE 1=0 doesn't actually try to read any data due to the always false WHERE condition. This makes it a lighter operation, as it typically only involves checking the table's metadata to validate the table's existence without taking locks on the table's data or partitions.
> So, considering performance and minimizing locks, SELECT 1 FROM table WHERE 1=0 would be a better choice if we're strictly looking to check for a table's existence and want to avoid potentially heavier operations like taking shared locks on partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org