You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2019/01/07 15:26:00 UTC

[jira] [Commented] (SPARK-26225) Scan: track decoding time for row-based data sources

    [ https://issues.apache.org/jira/browse/SPARK-26225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735954#comment-16735954 ] 

Wenchen Fan commented on SPARK-26225:
-------------------------------------

I think it's hard to define the decoding time, as every data source may has its own definition.

For data source v1, I think we just need to update `RowDataSourceScanExec` and track the time of the unsafe projection that turns Row to InternalRow.

For data source v2, it's totally different. Spark needs to ask the data source to report the decoding time (or any other metrics). I'd like to defer it after the data source v2 metrics API is introduced.

> Scan: track decoding time for row-based data sources
> ----------------------------------------------------
>
>                 Key: SPARK-26225
>                 URL: https://issues.apache.org/jira/browse/SPARK-26225
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Reynold Xin
>            Priority: Major
>
> Scan node should report decoding time for each record, if it is not too much overhead.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org