You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2021/11/18 04:49:00 UTC

[jira] [Created] (SPARK-37369) Avoid redundant ColumnarToRow transistion on InMemoryTableScan

L. C. Hsieh created SPARK-37369:
-----------------------------------

Summary: Avoid redundant ColumnarToRow transistion on InMemoryTableScan
Key: SPARK-37369
URL: https://issues.apache.org/jira/browse/SPARK-37369
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 3.3.0
Reporter: L. C. Hsieh

We have a rule to insert columnar transition between row-based and columnar query plans. InMemoryTableScanExec can produce columnar output. So if its parent plan isn't columnar, the rule adds a ColumnarToRow between them.

But InMemoryTableScanExec is a special query plan because it can convert from cached batch to columnar batch or row.

For such case, we ask InMemoryTableScanExec to convert cached batch to columnar batch, and then convert to row in the added ColumnarToRow, before the parent query.

So for such case, we can simply ask InMemoryTableScanExec to produce row output instead of a redundant conversion.

```
+- Union
:- ColumnarToRow
: +- InMemoryTableScan [i#8, j#9]
: +- InMemoryRelation [i#8, j#9], StorageLevel(disk, memory, deserialized, 1 replicas)
```

--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org