You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Lantao Jin (Jira)" <ji...@apache.org> on 2020/03/14 07:23:00 UTC

[jira] [Updated] (SPARK-31154) Expose basic write metrics for InsertIntoDataSourceCommand

     [ https://issues.apache.org/jira/browse/SPARK-31154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lantao Jin updated SPARK-31154:
-------------------------------
    Description: 
Spark provides interface `InsertableRelation` and the `InsertIntoDataSourceCommand` to delegate the inserting processing to a data source. Unlike `DataWritingCommand`, the metrics in InsertIntoDataSourceCommand is empty and has no chance to update. So we cannot get "number of written files" or "number of output rows" from its metrics.

For example, if a table is a Spark parquet table. We can get the writing metrics by:
{code}
val df = sql("INSERT INTO TABLE test_table SELECT 1, 'a'")
df.executionP
{code}
But if it is a Delta table, we cannot.

  was:Spark provides interface `InsertableRelation` and the `InsertIntoDataSourceCommand` to delegate the inserting processing to a data source. Unlike `DataWritingCommand`, the metrics in InsertIntoDataSourceCommand is empty and has no chance to update. So we cannot get "number of written files" or "number of output rows" from its metrics.


> Expose basic write metrics for InsertIntoDataSourceCommand
> ----------------------------------------------------------
>
>                 Key: SPARK-31154
>                 URL: https://issues.apache.org/jira/browse/SPARK-31154
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Lantao Jin
>            Priority: Major
>
> Spark provides interface `InsertableRelation` and the `InsertIntoDataSourceCommand` to delegate the inserting processing to a data source. Unlike `DataWritingCommand`, the metrics in InsertIntoDataSourceCommand is empty and has no chance to update. So we cannot get "number of written files" or "number of output rows" from its metrics.
> For example, if a table is a Spark parquet table. We can get the writing metrics by:
> {code}
> val df = sql("INSERT INTO TABLE test_table SELECT 1, 'a'")
> df.executionP
> {code}
> But if it is a Delta table, we cannot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org