You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Vaibhav Gumashta <vg...@hortonworks.com> on 2018/11/16 00:59:58 UTC

Review Request 69367: Query based compactor for full CRUD Acid tables

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

Review request for hive and Eugene Koifman.


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 
  pom.xml 26b662e4c3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 


Diff: https://reviews.apache.org/r/69367/diff/1/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Eugene Koifman <ek...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210740
-----------------------------------------------------------




itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Line 165 (original), 165 (patched)
<https://reviews.apache.org/r/69367/#comment295490>

    ?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Line 170 (original), 170 (patched)
<https://reviews.apache.org/r/69367/#comment295491>

    why are all these test made non-tests?
    or does this do somethign else?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 533 (patched)
<https://reviews.apache.org/r/69367/#comment295492>

    were you going to do "0+validate_acid_sort_order(...)" instead?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 54 (patched)
<https://reviews.apache.org/r/69367/#comment295494>

    I'm guessing if compareTo returns 0 that's bad - we should have unique row ids



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 61 (patched)
<https://reviews.apache.org/r/69367/#comment295493>

    should this return 0?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 80 (patched)
<https://reviews.apache.org/r/69367/#comment295489>

    I think comparison should include 'bucketProperty' since we sort on 'bucketProperty' not just bucketId.
    In particular, if you have > 1 statement per txn, we expect that rows from 2nd stmt follow those from 1st.


- Eugene Koifman


On Nov. 19, 2018, 3:49 a.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Nov. 19, 2018, 3:49 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 
>   pom.xml 26b662e4c3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 8cabf960db 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 6e7c78bd17 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.

> On Jan. 29, 2019, 2:04 a.m., Eugene Koifman wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
> > Lines 299 (patched)
> > <https://reviews.apache.org/r/69367/diff/7-9/?file=2121174#file2121174line328>
> >
> >     testMoreBucketsThanReducers/testMoreBucketsThanReducers2 in TestTxnCommands force a specific number of reducers

I used conf.setIntVar(HiveConf.ConfVars.HADOOPNUMREDUCERS, 2) on an unbucketed table. However, the inserts created 2 different buckets. For example:

vgumashta:hive vgumashta$ ../orc-git/build/_CPack_Packages/Darwin/TGZ/ORC-1.5.4-Darwin/bin/orc-contents  /Users/vgumashta/Documents/workspace/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez-1549010487358_1258013156/warehouse/testcompactionwithschemaevolutionnobucketsmultiplereducers/ds\=today/delta_0000003_0000003_0000/bucket_00000 
{"operation": 0, "originalTransaction": 3, "bucket": 536870912, "rowId": 0, "currentTransaction": 3, "row": {"a": 3, "b": 3, "c": 1001}}
{"operation": 0, "originalTransaction": 3, "bucket": 536870912, "rowId": 1, "currentTransaction": 3, "row": {"a": 4, "b": 2, "c": 1003}}
{"operation": 0, "originalTransaction": 3, "bucket": 536870912, "rowId": 2, "currentTransaction": 3, "row": {"a": 4, "b": 4, "c": 1005}}

vgumashta:hive vgumashta$ ../orc-git/build/_CPack_Packages/Darwin/TGZ/ORC-1.5.4-Darwin/bin/orc-contents  /Users/vgumashta/Documents/workspace/hive/itests/hive-unit/target/tmp/org.apache.hadoop.hive.ql.txn.compactor.TestCrudCompactorOnTez-1549010487358_1258013156/warehouse/testcompactionwithschemaevolutionnobucketsmultiplereducers/ds\=yesterday/delta_0000003_0000003_0000/bucket_00001 
{"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 0, "currentTransaction": 3, "row": {"a": 3, "b": 2, "c": 1000}}
{"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 1, "currentTransaction": 3, "row": {"a": 3, "b": 4, "c": 1002}}
{"operation": 0, "originalTransaction": 3, "bucket": 536936448, "rowId": 2, "currentTransaction": 3, "row": {"a": 4, "b": 3, "c": 1004}}


- Vaibhav


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review212399
-----------------------------------------------------------


On Jan. 28, 2019, 7:49 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2019, 7:49 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java db3b427adc 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out c9716e904c 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/9/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Eugene Koifman <ek...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review212399
-----------------------------------------------------------




itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 299 (patched)
<https://reviews.apache.org/r/69367/#comment298161>

    testMoreBucketsThanReducers/testMoreBucketsThanReducers2 in TestTxnCommands force a specific number of reducers



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 185 (patched)
<https://reviews.apache.org/r/69367/#comment298162>

    nit: since this is filtering for 'base' it's not checking if it 'only' contains base...



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 195 (patched)
<https://reviews.apache.org/r/69367/#comment298163>

    I still don't understand what this comment is conveying.  This is just a normal read, so I would assume TezSplitGrouper is not running in compactor mode


- Eugene Koifman


On Jan. 28, 2019, 11:49 a.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2019, 11:49 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java db3b427adc 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out c9716e904c 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/9/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Jan. 28, 2019, 7:49 p.m.)


Review request for hive and Eugene Koifman.


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java db3b427adc 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out c9716e904c 


Diff: https://reviews.apache.org/r/69367/diff/9/

Changes: https://reviews.apache.org/r/69367/diff/8-9/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Jan. 26, 2019, 12:32 a.m.)


Review request for hive and Eugene Koifman.


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b3a475478d 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java e7aa041c25 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java db3b427adc 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out c9716e904c 


Diff: https://reviews.apache.org/r/69367/diff/8/

Changes: https://reviews.apache.org/r/69367/diff/7-8/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.

> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
> > Lines 196 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121174#file2121174line196>
> >
> >     I don't understand the logic here.  Since major compaction was done above, there should only be base/bucket0 and base/bucket1 so there is nothing for this query to group.  Also, I would think SPLIT_GROUPING_MODE should be "query" here...  if it's not, where is it set?

Sorry the comment is misleading (what I wanted to convey was what you wrote above) - removed. We set SPLIT_GROUPING_MODE inside CompactorMR, if COMPACTOR_CRUD_QUERY_BASED is set to true. I've modified runCompaction, runInitiator, runWorker methods to create a new HiveConf object and set COMPACTOR_CRUD_QUERY_BASED to true in that one.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
> > Lines 255 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121174#file2121174line255>
> >
> >     nit: this could just do ShowCompactions to see if anything got queued up

I think you are right, this is redundant as I'm already checking for compaction queue above.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
> > Lines 309 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121174#file2121174line309>
> >
> >     How does (3,3,x) end up in bucket0?  with bucketing_version=1 it should be (val mod num_buckets)=bucketId.

Filed https://issues.apache.org/jira/browse/HIVE-21167. For the test cases in this jira, will use bucketing_version=2.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
> > Lines 312 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121174#file2121174line312>
> >
> >     And smimilarly, (4,4) is in bucket1...

Filed https://issues.apache.org/jira/browse/HIVE-21167. For the test cases in this jira, will use bucketing_version=2.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
> > Lines 315 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121174#file2121174line315>
> >
> >     since you just ran a major compaction, there is only 1 file per bucket so would split grouper do anything?  would there be > 1 split?

Sorry, my comment was misleading - removed.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java
> > Lines 189 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121176#file2121176line189>
> >
> >     What is this for?  It seems fragile since it forces some behavior on all tests.  Do any newly added tests rely on this?

This is a bug that I'm masking for purposes of test (similar to the bug reported here: https://issues.apache.org/jira/browse/HIVE-12957). Basically on my laptop Tez was not estimating taskResource (taskResource = getContext().getVertexTaskResource().getMemory();) correctly - coming up with a -ve value, which would cause it to throw Illegal Capacity exception. I think taskResource should never return -ve value.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java
> > Line 1233 (original)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121178#file2121178line1233>
> >
> >     Seems that now the class level JavaDoc is out of sync

You mean the overall class doc for OrcRawRecordMerger needs an update?


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
> > Lines 637 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121179#file2121179line637>
> >
> >     What throws the IAE?  Above I see
> >     if (!reader.hasMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME)) {
> >     
> >     shouldn't it bail out there if there is no index?

I don't know why I was seeing this earlier - don't see it in my runs now. Removed.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
> > Lines 638 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121179#file2121179line638>
> >
> >     is there a followup Jira for this?

https://jira.apache.org/jira/browse/HIVE-21165


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 248 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121182#file2121182line248>
> >
> >     What does this do for MM table?

Hmmmm, bug - fixed it.


> On Jan. 23, 2019, 1:23 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
> > Lines 91 (patched)
> > <https://reviews.apache.org/r/69367/diff/7/?file=2121184#file2121184line91>
> >
> >     when is it ok for 2 consecutive ROW_IDs to be equal?

Throwing an exception now if comparison returs 0.


- Vaibhav


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review212198
-----------------------------------------------------------


On Jan. 22, 2019, 7:04 a.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Jan. 22, 2019, 7:04 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/7/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Eugene Koifman <ek...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review212198
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2702 (patched)
<https://reviews.apache.org/r/69367/#comment297881>

    I think this needs a better description



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java
Lines 848 (patched)
<https://reviews.apache.org/r/69367/#comment297879>

    Why is this needed?  Shouldn't the compiler set this?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 132 (patched)
<https://reviews.apache.org/r/69367/#comment297882>

    what is orc.rows.between.memory.checks'='1 for?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 196 (patched)
<https://reviews.apache.org/r/69367/#comment297884>

    I don't understand the logic here.  Since major compaction was done above, there should only be base/bucket0 and base/bucket1 so there is nothing for this query to group.  Also, I would think SPLIT_GROUPING_MODE should be "query" here...  if it's not, where is it set?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 255 (patched)
<https://reviews.apache.org/r/69367/#comment297885>

    nit: this could just do ShowCompactions to see if anything got queued up



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 309 (patched)
<https://reviews.apache.org/r/69367/#comment297886>

    How does (3,3,x) end up in bucket0?  with bucketing_version=1 it should be (val mod num_buckets)=bucketId.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 312 (patched)
<https://reviews.apache.org/r/69367/#comment297887>

    And smimilarly, (4,4) is in bucket1...



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 315 (patched)
<https://reviews.apache.org/r/69367/#comment297888>

    since you just ran a major compaction, there is only 1 file per bucket so would split grouper do anything?  would there be > 1 split?



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 325 (patched)
<https://reviews.apache.org/r/69367/#comment297889>

    It would be useful to add a test of a table w/o buckets.  Ideally one that has > 1 reducer during Insert so that there is > 1 output file.  I think there is some propety to specify number of reducers...  not sure if Tez respects it.



itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java
Lines 328 (patched)
<https://reviews.apache.org/r/69367/#comment297883>

    should this be set in setUp()?
    Alternatively, should conf be cloned?  seems error prone as it modifies state outside the method



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java
Lines 189 (patched)
<https://reviews.apache.org/r/69367/#comment297890>

    What is this for?  It seems fragile since it forces some behavior on all tests.  Do any newly added tests rely on this?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Line 168 (original), 172 (patched)
<https://reviews.apache.org/r/69367/#comment297900>

    nit: are empty param decls needed?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 202 (patched)
<https://reviews.apache.org/r/69367/#comment297905>

    It should be rowidoffset or splitstart.  For 'original' splits (w/o acid meta cols in the file) SyntheticBucketProperties should always be there and so rowIdOffset is there.  For 'native' acid files, OrcSplit doesn't have the 1st rowid in the split so splitStart is used to sort.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 203 (patched)
<https://reviews.apache.org/r/69367/#comment297906>

    Would be useful to describe what that invariant is.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 241 (patched)
<https://reviews.apache.org/r/69367/#comment297907>

    This is important to add



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 281 (patched)
<https://reviews.apache.org/r/69367/#comment297908>

    what is this TODO for?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java
Line 1233 (original)
<https://reviews.apache.org/r/69367/#comment297896>

    Seems that now the class level JavaDoc is out of sync



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
Lines 637 (patched)
<https://reviews.apache.org/r/69367/#comment297891>

    What throws the IAE?  Above I see
    if (!reader.hasMetadataValue(OrcRecordUpdater.ACID_KEY_INDEX_NAME)) {
    
    shouldn't it bail out there if there is no index?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java
Lines 638 (patched)
<https://reviews.apache.org/r/69367/#comment297892>

    is there a followup Jira for this?



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
Lines 2201 (patched)
<https://reviews.apache.org/r/69367/#comment297893>

    it would be helpful to add COMPACTOR_CRUD_QUERY_BASED property name to the error msg



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 248 (patched)
<https://reviews.apache.org/r/69367/#comment297919>

    What does this do for MM table?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 371 (patched)
<https://reviews.apache.org/r/69367/#comment297922>

    should 'conf' be cloned?  will this affect 'conf' for something else?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 537 (patched)
<https://reviews.apache.org/r/69367/#comment297923>

    why does it need "0+ ..."



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java
Lines 101 (patched)
<https://reviews.apache.org/r/69367/#comment297895>

    Useful to include table/part name in the msg



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 17 (patched)
<https://reviews.apache.org/r/69367/#comment297912>

    ROW__ID.bucket_column - you mean bucketId?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 57 (patched)
<https://reviews.apache.org/r/69367/#comment297915>

    This doesn't compare statemetId anywhere but it should.
    
    I think the easiest is to compare bucketProperty or you could extract statemetId from it and do it explicitly



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 60 (patched)
<https://reviews.apache.org/r/69367/#comment297917>

    I don't think equals makes sense



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 61 (patched)
<https://reviews.apache.org/r/69367/#comment297918>

    it maybe useful to include both ROW__IDs in the message.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 74 (patched)
<https://reviews.apache.org/r/69367/#comment297916>

    nit: make class and fields final to make sure compareTo is inlined?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java
Lines 91 (patched)
<https://reviews.apache.org/r/69367/#comment297914>

    when is it ok for 2 consecutive ROW_IDs to be equal?


- Eugene Koifman


On Jan. 21, 2019, 11:04 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Jan. 21, 2019, 11:04 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
>   ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/7/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Jan. 22, 2019, 7:04 a.m.)


Review request for hive and Eugene Koifman.


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f 


Diff: https://reviews.apache.org/r/69367/diff/7/

Changes: https://reviews.apache.org/r/69367/diff/6-7/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Jan. 18, 2019, 10:41 p.m.)


Review request for hive and Eugene Koifman.


Changes
-------

Rebased on master


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f 


Diff: https://reviews.apache.org/r/69367/diff/6/

Changes: https://reviews.apache.org/r/69367/diff/5-6/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Jan. 18, 2019, 9:40 p.m.)


Review request for hive and Eugene Koifman.


Changes
-------

Rebased on master


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java bbe7fb0697 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 0e5b3e5473 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java dc05e1990e 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 
  ql/src/test/results/clientpositive/show_functions.q.out 0fdcbda66f 


Diff: https://reviews.apache.org/r/69367/diff/5/

Changes: https://reviews.apache.org/r/69367/diff/4-5/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Jan. 10, 2019, 7:23 p.m.)


Review request for hive and Eugene Koifman.


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java b213609f39 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java d6a41919bf 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCrudCompactorOnTez.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java d7f069eaa7 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java fbb931cbcd 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRecordUpdater.java 6d4578e7a0 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java b477480e0a 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 7d5ee4a59f 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java a0df82cb20 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 


Diff: https://reviews.apache.org/r/69367/diff/4/

Changes: https://reviews.apache.org/r/69367/diff/3-4/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Jan. 8, 2019, 7:18 p.m.)


Review request for hive and Eugene Koifman.


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/txn/compactor/TestCompactorOnTez.java PRE-CREATION 
  pom.xml 26b662e4c3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/HiveSplitGenerator.java 15c14c9be5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 8cabf960db 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 6e7c78bd17 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/Initiator.java beb6902d7d 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 


Diff: https://reviews.apache.org/r/69367/diff/3/

Changes: https://reviews.apache.org/r/69367/diff/2-3/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/
-----------------------------------------------------------

(Updated Nov. 19, 2018, 11:49 a.m.)


Review request for hive and Eugene Koifman.


Bugs: HIVE-20699
    https://issues.apache.org/jira/browse/HIVE-20699


Repository: hive-git


Description
-------

https://jira.apache.org/jira/browse/HIVE-20699


Diffs (updated)
-----

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
  itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 
  pom.xml 26b662e4c3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java 578b16cc7c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcRawRecordMerger.java 8cabf960db 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 6e7c78bd17 
  ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFValidateAcidSortOrder.java PRE-CREATION 


Diff: https://reviews.apache.org/r/69367/diff/2/

Changes: https://reviews.apache.org/r/69367/diff/1-2/


Testing
-------


Thanks,

Vaibhav Gumashta


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.

> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 2685 (patched)
> > <https://reviews.apache.org/r/69367/diff/1/?file=2108425#file2108425line2685>
> >
> >     "And minor compaction will be disabled." - should make sure Initiator doesn't start minor and that Alter Table commands requesting Minor are no-op or throw so that these don't get into the compactor queue.  We should also, perhaps think about how Initiator triggers Major compactions - are current config params adequate?  Should do at least the 2nd part in a follow up jira, maybe both.
> 
> Vaibhav Gumashta wrote:
>     Created https://jira.apache.org/jira/browse/HIVE-20933

NM, addressing this as part of this jira itself as it's a small change and leaving this out might result in garbage data getting accumulated in metastore table


- Vaibhav


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210589
-----------------------------------------------------------


On Nov. 16, 2018, 12:59 a.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2018, 12:59 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 
>   pom.xml 26b662e4c3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Vaibhav Gumashta <vg...@hortonworks.com>.

> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote:
> >

Addressed the comments; will upload changes in the patch.


> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
> > Lines 2685 (patched)
> > <https://reviews.apache.org/r/69367/diff/1/?file=2108425#file2108425line2685>
> >
> >     "And minor compaction will be disabled." - should make sure Initiator doesn't start minor and that Alter Table commands requesting Minor are no-op or throw so that these don't get into the compactor queue.  We should also, perhaps think about how Initiator triggers Major compactions - are current config params adequate?  Should do at least the 2nd part in a follow up jira, maybe both.

Created https://jira.apache.org/jira/browse/HIVE-20933


> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 245 (patched)
> > <https://reviews.apache.org/r/69367/diff/1/?file=2108430#file2108430line252>
> >
> >     Ideally this should be prevented before it gets into the compction_queue. throwing here will cause failed compactions to accumulate in SHOW COMPACTIONS and prevent auto-scheduling of new ones.

Created https://jira.apache.org/jira/browse/HIVE-20933 for this.


> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 399 (patched)
> > <https://reviews.apache.org/r/69367/diff/1/?file=2108430#file2108430line406>
> >
> >     should this be in a finally{}?  SessionState is threadLocal so it may get reused... or do we shutdown the session each time?

Made the change. We do remove the threadlocal via SessionState.detachSession() in the finally block in DriverUtils.runOnDriver


> On Nov. 16, 2018, 3:02 a.m., Eugene Koifman wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
> > Lines 513 (patched)
> > <https://reviews.apache.org/r/69367/diff/1/?file=2108430#file2108430line521>
> >
> >     why do you need partition key/values in the query? we are always reading a single partition.  This is achieved by getAcidState() which takes partition dir as input (i.e. all the files it returns are within a given partition)

As discussed, I'll throw an exeption from SplitGrouper if we somehow end up getting splits across partitions in there (unlikely, since the sql query we run adds a partition filter).


- Vaibhav


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210589
-----------------------------------------------------------


On Nov. 16, 2018, 12:59 a.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Nov. 16, 2018, 12:59 a.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 
>   pom.xml 26b662e4c3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Re: Review Request 69367: Query based compactor for full CRUD Acid tables

Posted by Eugene Koifman <ek...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/69367/#review210589
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
Lines 2685 (patched)
<https://reviews.apache.org/r/69367/#comment295290>

    "And minor compaction will be disabled." - should make sure Initiator doesn't start minor and that Alter Table commands requesting Minor are no-op or throw so that these don't get into the compactor queue.  We should also, perhaps think about how Initiator triggers Major compactions - are current config params adequate?  Should do at least the 2nd part in a follow up jira, maybe both.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Line 180 (original), 183 (patched)
<https://reviews.apache.org/r/69367/#comment295291>

    I guess all this should be no-op for compactor since it only looks at 1 partition at a time and for acid serde and IF/OF don't change.



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 197 (patched)
<https://reviews.apache.org/r/69367/#comment295292>

    bucketSplitMultiMap?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 206 (patched)
<https://reviews.apache.org/r/69367/#comment295293>

    the error should include table name if easily available here or if not maybe a file path from any of the splits...



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 214 (patched)
<https://reviews.apache.org/r/69367/#comment295294>

    should we assert that schemaSplitMultiMap has size=1 since that is what we expect for compactor?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java
Lines 276 (patched)
<https://reviews.apache.org/r/69367/#comment295295>

    Add a comment that this is trully a bucketId (rather than bucket property - BucketCodec.java since 3.0) that is derived from file name
    
    WriteId is also from containing file name and for files that have min/max wrieid, it's the starting one.  Now that I look at the code in TransactionMetadata.findWriteIDForSynthetcRowIDs() - the assert there will throw.  It should be removed since where we have to handle files that come from compacted dirs so min <> max for all deltas.
    
    maybe these comments should be on OrcSplit where getter methods are defined.



ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java
Lines 68 (patched)
<https://reviews.apache.org/r/69367/#comment295296>

    mark these transient for clarity since we don't serialize them



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 245 (patched)
<https://reviews.apache.org/r/69367/#comment295297>

    Ideally this should be prevented before it gets into the compction_queue. throwing here will cause failed compactions to accumulate in SHOW COMPACTIONS and prevent auto-scheduling of new ones.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 399 (patched)
<https://reviews.apache.org/r/69367/#comment295298>

    should this be in a finally{}?  SessionState is threadLocal so it may get reused... or do we shutdown the session each time?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 481 (patched)
<https://reviews.apache.org/r/69367/#comment295299>

    current write id should always be the same as original.  Only delete event can have these be different but major compaction absorbs delete events.



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 503 (patched)
<https://reviews.apache.org/r/69367/#comment295300>

    what's the value of specifying location for tmp table?  I'm surprised it's even legal.  Would this be a security hole potentially?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 510 (patched)
<https://reviews.apache.org/r/69367/#comment295302>

    why overwrite?



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 513 (patched)
<https://reviews.apache.org/r/69367/#comment295301>

    why do you need partition key/values in the query? we are always reading a single partition.  This is achieved by getAcidState() which takes partition dir as input (i.e. all the files it returns are within a given partition)



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 542 (patched)
<https://reviews.apache.org/r/69367/#comment295303>

    need to think about this.  maybe it's ok...



ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java
Lines 565 (patched)
<https://reviews.apache.org/r/69367/#comment295304>

    there should be something in AcidUtils to parse original bucket file name


- Eugene Koifman


On Nov. 15, 2018, 4:59 p.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/69367/
> -----------------------------------------------------------
> 
> (Updated Nov. 15, 2018, 4:59 p.m.)
> 
> 
> Review request for hive and Eugene Koifman.
> 
> 
> Bugs: HIVE-20699
>     https://issues.apache.org/jira/browse/HIVE-20699
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://jira.apache.org/jira/browse/HIVE-20699
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 65264f323f 
>   itests/hive-unit/src/test/java/org/apache/hadoop/hive/ql/TestAcidOnTez.java 40dd992455 
>   pom.xml 26b662e4c3 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/SplitGrouper.java 7f8bd229a6 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcSplit.java 4d55592b63 
>   ql/src/java/org/apache/hadoop/hive/ql/txn/compactor/CompactorMR.java 92c74e1d06 
> 
> 
> Diff: https://reviews.apache.org/r/69367/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>