You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ivan Sadikov (Jira)" <ji...@apache.org> on 2022/10/14 03:48:00 UTC

[jira] [Commented] (SPARK-40430) Spark session does not update number of files for partition

    [ https://issues.apache.org/jira/browse/SPARK-40430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617441#comment-17617441 ] 

Ivan Sadikov commented on SPARK-40430:
--------------------------------------

Can you try FSCK REPAIR TABLE command on your table if you use metastore or run REFRESH (https://spark.apache.org/docs/latest/sql-ref-syntax-aux-cache-refresh.html)?

Metadata is cached so it is likely you need to refresh the table to get the updates.

> Spark session does not update number of files for partition
> -----------------------------------------------------------
>
>                 Key: SPARK-40430
>                 URL: https://issues.apache.org/jira/browse/SPARK-40430
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.2
>         Environment: I'm using spark 3.1.2 on AWS EMR and AWS Glue as catalog.
>            Reporter: Filipe Souza
>            Priority: Minor
>         Attachments: session 1.png, session 2.png
>
>
> When a spark session has already queried data from a table and partition and new files are inserted into the partition externally, the spark session keeps the outdated number of files and does not return the new records.
> If the data is inserted into a new partition, the problem will not occur.
> Steps to reproduce the behavior:
> Open a Spark session
> Query a count in a table
> Open another spark session
> insert data into an existing partition
> Check the count again in the first session
> I expect to see the inserted records.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org