You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/12/05 19:56:00 UTC
[jira] [Created] (SPARK-41395) InterpretedMutableProjection can corrupt unsafe buffer when used with decimal data
Bruce Robbins created SPARK-41395:
-------------------------------------
Summary: InterpretedMutableProjection can corrupt unsafe buffer when used with decimal data
Key: SPARK-41395
URL: https://issues.apache.org/jira/browse/SPARK-41395
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.4.0
Reporter: Bruce Robbins
The following returns the wrong answer:
{noformat}
set spark.sql.codegen.wholeStage=false;
set spark.sql.codegen.factoryMode=NO_CODEGEN;
select max(col1), max(col2) from values
(cast(null as decimal(27,2)), cast(null as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);
+---------+---------+
|max(col1)|max(col2)|
+---------+---------+
|null |239.88 |
+---------+---------+
{noformat}
This is because {{InterpretedMutableProjection}} inappropriately uses {{InternalRow#setNullAt}} to set null for decimal types with precision > {{Decimal.MAX_LONG_DIGITS}}.
The path to corruption goes like this:
Unsafe buffer at start:
{noformat}
offset/len for offset/len for
1st decimal 2nd decimal
offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20)
data: 0300000000000000 0000000018000000 0000000028000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
{noformat}
When processing the first incoming row ([null, null]), {{InterpretedMutableProjection}} calls {{setNullAt}} for the decimal types. As a result, the pointers to the storage areas for the two decimals in the variable length region get zeroed out.
Buffer after projecting first row (null, null):
{noformat}
offset/len for offset/len for
1st decimal 2nd decimal
offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20)
data: 0300000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
{noformat}
When it's time to project the second row into the buffer, UnsafeRow#setDecimal uses the zero offsets, which causes {{UnsafeRow#setDecimal}} to overwrite the null-tracking bit set with decimal data:
{noformat}
null-tracking
bit area
offset: 0 8 16 (0x10) 24 (0x18) 32 (0x20)
data: 5db4000000000000 0000000000000000 0200000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000
{noformat}
The null-tracking bit set is overwritten with 239.88 (0x5db4) rather than 245.00 (0x5fb4) because setDecimal indirectly calls setNotNullAt(1), which turns off the null-tracking bit associated with the field at index 1.
In addition, the decimal at field index 0 is now null because of the corruption of the null-tracking bit set.
When a decimal type with precision > {{Decimal.MAX_LONG_DIGITS}} is null, {{InterpretedMutableProjection}} should write a null {{Decimal}} value rather than call {{setNullAt}} (see.)
This bug could get exercised during codgen fallback. Take for example this case where I forcibly made codegen fail for the {{Greatest}} expression:
{noformat}
spark-sql> select max(col1), max(col2) from values
(cast(null as decimal(27,2)), cast(null as decimal(27,2))),
(cast(77.77 as decimal(27,2)), cast(245.00 as decimal(27,2)))
as data(col1, col2);
22/12/05 08:18:54 ERROR CodeGenerator: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 58, Column 1: ';' expected instead of 'if'
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 58, Column 1: ';' expected instead of 'if'
at org.codehaus.janino.TokenStreamImpl.compileException(TokenStreamImpl.java:362)
at org.codehaus.janino.TokenStreamImpl.read(TokenStreamImpl.java:149)
at org.codehaus.janino.Parser.read(Parser.java:3787)
...
22/12/05 08:18:56 WARN MutableProjection: Expr codegen error and falling back to interpreter mode
java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 43, Column 1: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 43, Column 1: ';' expected instead of 'boolean'
at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1583)
at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1580)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
... 36 more
...
NULL 239.88 <== incorrect result, should be (77.77, 245.00)
Time taken: 6.132 seconds, Fetched 1 row(s)
spark-sql>
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org