You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "hgs (Jira)" <ji...@apache.org> on 2021/09/10 01:43:00 UTC
[jira] [Created] (ORC-991) enctypt data throw exception with a sql filter push down

hgs created ORC-991:
-----------------------

             Summary: enctypt data throw exception with a sql filter push down
                 Key: ORC-991
                 URL: https://issues.apache.org/jira/browse/ORC-991
             Project: ORC
          Issue Type: Bug
          Components: Java
    Affects Versions: 1.6.10, 1.6.9, 1.6.8
         Environment: 1.ORC 1.6.8+
2.SparkSQL 2.4.7
3.JDK 1.8
            Reporter: hgs


1.create a table 

CREATE TABLE `itmp8888`(`id` INT, `name` STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
WITH SERDEPROPERTIES (
 'serialization.format' = '1'
)
STORED AS
 INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (
 'transient_lastDdlTime' = '1631174384',
 'orc.encrypt' = 'AES_CTR_128:id,name',
 'orc.mask' = 'sha256:id,name',
 'orc.encrypt.ezk' = 'jNCeDBtNfT8wPaTpR34JHA=='
)

2. insert data

3.  a select statement that no filters is fine

   select * from itmp8888

4. a select statement  whith the filter including the encrypt will throw exception

  select * from itmp8888 where id = 1

 

5.the stack trace

Caused by: java.lang.AssertionError: Index is not populated for 1Caused by: java.lang.AssertionError: Index is not populated for 1 at org.apache.orc.impl.RecordReaderImpl$SargApplier.pickRowGroups(RecordReaderImpl.java:995) at org.apache.orc.impl.RecordReaderImpl.pickRowGroups(RecordReaderImpl.java:1083) at org.apache.orc.impl.RecordReaderImpl.readStripe(RecordReaderImpl.java:1101) at org.apache.orc.impl.RecordReaderImpl.advanceStripe(RecordReaderImpl.java:1151) at org.apache.orc.impl.RecordReaderImpl.advanceToNextRow(RecordReaderImpl.java:1186) at org.apache.orc.impl.RecordReaderImpl.<init>(RecordReaderImpl.java:248) at org.apache.orc.impl.ReaderImpl.rows(ReaderImpl.java:864) at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initialize(OrcColumnarBatchReader.java:142) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:211) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat$$anonfun$buildReaderWithPartitionValues$2.apply(OrcFileFormat.scala:175) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:124) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:177) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)

6. I debug the code find that the RowIndex is null for all the encrypted columns

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)