You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergey Shelukhin <se...@hortonworks.com> on 2016/08/04 21:29:42 UTC

Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/#review144816
-----------------------------------------------------------




metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java (line 40)
<https://reviews.apache.org/r/49766/#comment210935>

    why not move it here and depend on it there?



metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java (line 254)
<https://reviews.apache.org/r/49766/#comment210944>

    nit since it's probably only a few values: if we iterate anyway, why wrap in a hashset? Also via keySet iterator one could probably remove without another lookup



ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java (line 361)
<https://reviews.apache.org/r/49766/#comment210949>

    nit: log unsupported property?



ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java (line 578)
<https://reviews.apache.org/r/49766/#comment210950>

    this check is not done in the above "if" before adding the statementId to "last". What does this check mean i.e. what do negative numbers mean?



ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java (line 864)
<https://reviews.apache.org/r/49766/#comment210951>

    needs to log error. Also, useFileIds could be passed from the main call above if I'm not mistakes, so this can be skipped if we know we cannot get file IDs



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 1790)
<https://reviews.apache.org/r/49766/#comment210952>

    is split.isOriginal no longer valid for other paths too? It could be used elsewhere



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java (line 1987)
<https://reviews.apache.org/r/49766/#comment210953>

    what does this change in behavior do? if base and original are both empty, was the behavior incorrect before? Can it happen?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java (line 380)
<https://reviews.apache.org/r/49766/#comment210956>

    what would row be in case of delete operation? I am assuming this works, it just looks surprising that there would be a row



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java (line 392)
<https://reviews.apache.org/r/49766/#comment210958>

    rowId is always -1 here. Intended?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java (line 484)
<https://reviews.apache.org/r/49766/#comment210961>

    error handling? we don't need to fail if we cannot remove, right? same just below


- Sergey Shelukhin


On July 27, 2016, 9:54 p.m., Saket Saurabh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49766/
> -----------------------------------------------------------
> 
> (Updated July 27, 2016, 9:54 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/HIVE-14035
> 
> In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java e92466f 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 
>   hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 974c6b8 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java ca2a912 
>   metastore/if/hive_metastore.thrift 4d92b73 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 
>   metastore/src/gen/thrift/gen-php/metastore/Types.php f505208 
>   metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
>   metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 
>   orc/src/java/org/apache/orc/impl/TreeReaderFactory.java c4a2093 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 23a13d6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java c150ec5 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 63d02fb 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 9d927bd 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java d48e441 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java b83cea4 
> 
> Diff: https://reviews.apache.org/r/49766/diff/
> 
> 
> Testing
> -------
> 
> Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning.
> 
> 
> Thanks,
> 
> Saket Saurabh
> 
>


Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

Posted by Saket Saurabh <ss...@apache.org>.

> On Aug. 4, 2016, 2:29 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java, line 588
> > <https://reviews.apache.org/r/49766/diff/4/?file=1455573#file1455573line588>
> >
> >     this check is not done in the above "if" before adding the statementId to "last". What does this check mean i.e. what do negative numbers mean?

@Sergey, this change is now no longer part of this patch. It got refactored away when I merged serializeDelta() function with serializeDeleteDelta(). Although in fact, it exists in current master as well. The negative numbers mean that we do not have a statement id for this delta. For example, when we run a minor compaction we produce delta of the form delta_x_y which has no statement id attached to it. In these cases, statement id would be -1.


- Saket


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/#review144816
-----------------------------------------------------------


On Aug. 8, 2016, 5:53 p.m., Saket Saurabh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49766/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2016, 5:53 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/HIVE-14035
> 
> In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 
>   hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 974c6b8 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 731caa8 
>   metastore/if/hive_metastore.thrift a2e35b8 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 
>   metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 
>   metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
>   metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java efde2db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 6648829 
> 
> Diff: https://reviews.apache.org/r/49766/diff/
> 
> 
> Testing
> -------
> 
> Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning.
> 
> 
> Thanks,
> 
> Saket Saurabh
> 
>


Re: Review Request 49766: HIVE-14035 Enable predicate pushdown to delta files created by ACID Transactions

Posted by Saket Saurabh <ss...@apache.org>.

> On Aug. 4, 2016, 2:29 p.m., Sergey Shelukhin wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java, line 399
> > <https://reviews.apache.org/r/49766/diff/4/?file=1455576#file1455576line399>
> >
> >     rowId is always -1 here. Intended?

@Sergey: Thanks for pointing this out. It was a bug and not intended.


- Saket


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/49766/#review144816
-----------------------------------------------------------


On Aug. 8, 2016, 5:53 p.m., Saket Saurabh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/49766/
> -----------------------------------------------------------
> 
> (Updated Aug. 8, 2016, 5:53 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/HIVE-14035
> 
> In current Hive version, delta files created by ACID transactions do not allow predicate pushdown if they contain any update/delete events. This is done to preserve correctness when following a multi-version approach during event collapsing, where an update event overwrites an existing insert event. 
> This JIRA proposes to split an update event into a combination of a delete event followed by a new insert event, that can enable predicate push down to all delta files without breaking correctness. To support backward compatibility for this feature, this JIRA also proposes to add some sort of versioning to ACID that can allow different versions of ACID transactions to co-exist together.
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 70816bd 
>   hcatalog/core/src/main/java/org/apache/hive/hcatalog/mapreduce/FosterStorageHandler.java 14f7316 
>   hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/AbstractRecordWriter.java 974c6b8 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactor.java 731caa8 
>   metastore/if/hive_metastore.thrift a2e35b8 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.h ae14bd1 
>   metastore/src/gen/thrift/gen-cpp/hive_metastore_constants.cpp f982bf2 
>   metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/hive_metastoreConstants.java 5a666f2 
>   metastore/src/gen/thrift/gen-php/metastore/Types.php d6f7f49 
>   metastore/src/gen/thrift/gen-py/hive_metastore/constants.py d1c07a5 
>   metastore/src/gen/thrift/gen-rb/hive_metastore_constants.rb eeccc84 
>   metastore/src/java/org/apache/hadoop/hive/metastore/TransactionalValidationListener.java 3e74675 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java db6848a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java 57b6c67 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java 26e6443 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidOutputFormat.java dd90a95 
>   ql/src/java/org/apache/hadoop/hive/ql/io/AcidUtils.java 449d889 
>   ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 945b828 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java 334cb31 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java efde2db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 1a1af28 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java 8cb5e8a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 8cf261d 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 6caca98 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java af192fb 
>   ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands3.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/io/TestAcidUtils.java 556df18 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 6648829 
> 
> Diff: https://reviews.apache.org/r/49766/diff/
> 
> 
> Testing
> -------
> 
> Tests for the feature are in ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java. These are mostly integration tests that test end-to-end insert/update/delete scenarios followed by compaction and cleaning.
> 
> 
> Thanks,
> 
> Saket Saurabh
> 
>