You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/07/16 16:41:01 UTC

[jira] [Updated] (SPARK-26061) Reduce the number of unused UnsafeRowWriters created in whole-stage codegen

     [ https://issues.apache.org/jira/browse/SPARK-26061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-26061:
----------------------------------
    Affects Version/s:     (was: 2.3.2)
                           (was: 2.3.1)
                           (was: 2.4.0)
                           (was: 2.3.0)
                       3.0.0

> Reduce the number of unused UnsafeRowWriters created in whole-stage codegen
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-26061
>                 URL: https://issues.apache.org/jira/browse/SPARK-26061
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Kris Mok
>            Priority: Trivial
>
> Reduce the number of unused UnsafeRowWriters created in whole-stage generated code.
> They come from the CodegenSupport.consume() calling prepareRowVar(), which uses GenerateUnsafeProjection.createCode() and registers an UnsafeRowWriter mutable state, regardless of whether or not the downstream (parent) operator will use the rowVar or not.
> Even when the downstream doConsume function doesn't use the rowVar (i.e. doesn't put row.code as a part of this operator's codegen template), the registered UnsafeRowWriter stays there, which makes the init function of the generated code a bit bloated.
> This ticket doesn't track the root issue, but makes it slightly less painful: when the doConsume function is split out, the prepareRowVar() function is called twice, so it's double the pain of unused UnsafeRowWriters. This fix simply moves the original call to prepareRowVar() down into the doConsume split/no-split branch so that we're back to just 1x the pain.
> To fix the root issue, something that allows the CodegenSupport operators to indicate whether or not they're going to use the rowVar would be needed. That's a much more elaborate change so I'd like to just make a minor fix first.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org