You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergio Pena <se...@cloudera.com> on 2016/12/05 21:56:39 UTC

Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-15361
    https://issues.apache.org/jira/browse/HIVE-15361


Repository: hive-git


Description
-------

Problem:
- DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.

Solution
- Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.

Other changes
- Merge the MoveWork objects inside the createCondTask() method for better error handling.
- Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
- Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.


Diffs
-----

  itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out c725c96cbb6b0374e67308a54204c7c25e827567 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
  ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 

Diff: https://reviews.apache.org/r/54393/diff/


Testing
-------

All itests/hive-blobstore tests run.

Added new blobstore tests:
- insert_into_dynamic_partitions.q
- insert_overwrite_dynamic_partitions.q

Waiting for HiveQA to run the rest of the q-tests.


Thanks,

Sergio Pena


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Sergio Pena <se...@cloudera.com>.

> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q, lines 15-16
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576929#file1576929line15>
> >
> >     Why explicit ADD PARTITION statement is required? I think insert into will create missing partitions.

That was part of the test I did when I found the issue. But it is not needed. I will remove it just to keep clarity on the type of testing.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q, line 25
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576931#file1576931line25>
> >
> >     Will "SHOW PARTITIONS" be a good validation in this case?

That's a good one. I'll add it.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1763
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576938#file1576938line1763>
> >
> >     Should we take into account HIVE_BLOBSTORE_USE_BLOBSTORE_AS_SCRATCHDIR?

It's not necessary. This is just an optimization when two MoveTask are on BlobStore. The condition in the method verifies that.

return condOutputPath.equals(linkedSourcePath)
        && BlobStorageUtils.isBlobStoragePath(conf, condInputPath)
        && BlobStorageUtils.isBlobStoragePath(conf, linkedTargetPath);
        
If the user has HIVE_BLOBSTORE_USE_BLOBSTORE_AS_SCRATCHDIR=false, then the condInputPath might be on HDFS, and the merge won't happen.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1784
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576938#file1576938line1784>
> >
> >     Just note: s3 to s3 copy is *more* efficient than hdfs to s3 copy.

This is only merging two MoveWorks into one, so the hdfs to s3 copy does not apply here I think. I changed the comment thought:
* This is an optimization for BlobStore systems to avoid doing two renames or copies that are not necessary.


> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, lines 1885-1894
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576938#file1576938line1885>
> >
> >     I feel like this logic should be inside "addDependentMoveTasks" method.

It is confusing. The "addDependentMoveTasks" is used to link a MoveTask to the desired task. 
For instance: addDependentMoveTasks(moveTaskToLink, conf, moveOnlyMoveTask, dependencyTask);
The above method will link: 
  moveOnlyMoveTask -> moveTaskToLink -> otherChildTasks
  
  
But I don't want to do that with the optimization, I want to copy the moveTaskToLink child tasks to moveOnlyMoveTask instead.
Like:
  moveOnlyMoveTask -> otherChildTasks


On Dec. 5, 2016, 11:11 p.m., Sergio Pena wrote:
> > Could you also add a test with a strict dynamic partitioning, like:
> > INSERT OVERWRITE TABLE t2 PARTITION(p1="1", p2) SELECT \*, c1 AS p2 FROM t1;

Thanks. I will add it.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/#review158069
-----------------------------------------------------------


On Dec. 5, 2016, 9:56 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54393/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2016, 9:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15361
>     https://issues.apache.org/jira/browse/HIVE-15361
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> - DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.
> 
> Solution
> - Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.
> 
> Other changes
> - Merge the MoveWork objects inside the createCondTask() method for better error handling.
> - Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
> - Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.
> 
> 
> Diffs
> -----
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out c725c96cbb6b0374e67308a54204c7c25e827567 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 
> 
> Diff: https://reviews.apache.org/r/54393/diff/
> 
> 
> Testing
> -------
> 
> All itests/hive-blobstore tests run.
> 
> Added new blobstore tests:
> - insert_into_dynamic_partitions.q
> - insert_overwrite_dynamic_partitions.q
> 
> Waiting for HiveQA to run the rest of the q-tests.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Sergio Pena <se...@cloudera.com>.

> On Dec. 5, 2016, 11:11 p.m., Illya Yalovyy wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1760
> > <https://reviews.apache.org/r/54393/diff/1/?file=1576938#file1576938line1760>
> >
> >     Could this method be made protected and covered with unit tests?

I added some unit-tests to this method. 
I did not add the ones that have a LoadTableWork on the MoveWork because a static method is used in order to get the table path. Nevertheless, this path is covered by the q-tests where real tables are created on S3.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/#review158069
-----------------------------------------------------------


On Dec. 6, 2016, 8:13 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54393/
> -----------------------------------------------------------
> 
> (Updated Dec. 6, 2016, 8:13 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15361
>     https://issues.apache.org/jira/browse/HIVE-15361
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> - DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.
> 
> Solution
> - Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.
> 
> Other changes
> - Merge the MoveWork objects inside the createCondTask() method for better error handling.
> - Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
> - Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.
> 
> 
> Diffs
> -----
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out c725c96cbb6b0374e67308a54204c7c25e827567 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 
> 
> Diff: https://reviews.apache.org/r/54393/diff/
> 
> 
> Testing
> -------
> 
> All itests/hive-blobstore tests run.
> 
> Added new blobstore tests:
> - insert_into_dynamic_partitions.q
> - insert_overwrite_dynamic_partitions.q
> 
> Waiting for HiveQA to run the rest of the q-tests.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Illya Yalovyy <ya...@amazon.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/#review158069
-----------------------------------------------------------




itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q (lines 15 - 16)
<https://reviews.apache.org/r/54393/#comment228742>

    Why explicit ADD PARTITION statement is required? I think insert into will create missing partitions.



itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q (line 18)
<https://reviews.apache.org/r/54393/#comment228741>

    Style: indentation is not required.



itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q (line 24)
<https://reviews.apache.org/r/54393/#comment228744>

    Style: table -> TABLE. It is better to keep code style consistent across all tests.



itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q (line 25)
<https://reviews.apache.org/r/54393/#comment228748>

    Will "SHOW PARTITIONS" be a good validation in this case?



itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q (line 27)
<https://reviews.apache.org/r/54393/#comment228745>

    Style: table -> TABLE



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1636)
<https://reviews.apache.org/r/54393/#comment228753>

    Terminology: BlobStorage -> BlobStore



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1640)
<https://reviews.apache.org/r/54393/#comment228750>

    Syntax: tha -> that



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1644)
<https://reviews.apache.org/r/54393/#comment228751>

    Could this method be made protected and covered with unit tests?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1647)
<https://reviews.apache.org/r/54393/#comment228754>

    Should we take into account HIVE_BLOBSTORE_USE_BLOBSTORE_AS_SCRATCHDIR?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1668)
<https://reviews.apache.org/r/54393/#comment228756>

    Just note: s3 to s3 copy is *more* efficient than hdfs to s3 copy.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1686)
<https://reviews.apache.org/r/54393/#comment228761>

    It seems to be an impossible outcome. Can we throw an exception instead of returning null?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (lines 1769 - 1778)
<https://reviews.apache.org/r/54393/#comment228766>

    I feel like this logic should be inside "addDependentMoveTasks" method.



ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java (line 45)
<https://reviews.apache.org/r/54393/#comment228767>

    you don't have to call getters here.



ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java (line 52)
<https://reviews.apache.org/r/54393/#comment228768>

    Don't have to call getters here.


Could you also add a test with a strict dynamic partitioning, like:
INSERT OVERWRITE TABLE t2 PARTITION(p1="1", p2) SELECT \*, c1 AS p2 FROM t1;

- Illya Yalovyy


On Dec. 5, 2016, 9:56 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54393/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2016, 9:56 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15361
>     https://issues.apache.org/jira/browse/HIVE-15361
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> - DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.
> 
> Solution
> - Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.
> 
> Other changes
> - Merge the MoveWork objects inside the createCondTask() method for better error handling.
> - Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
> - Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.
> 
> 
> Diffs
> -----
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out c725c96cbb6b0374e67308a54204c7c25e827567 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 
> 
> Diff: https://reviews.apache.org/r/54393/diff/
> 
> 
> Testing
> -------
> 
> All itests/hive-blobstore tests run.
> 
> Added new blobstore tests:
> - insert_into_dynamic_partitions.q
> - insert_overwrite_dynamic_partitions.q
> 
> Waiting for HiveQA to run the rest of the q-tests.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/
-----------------------------------------------------------

(Updated Dec. 7, 2016, 5:02 p.m.)


Review request for hive.


Changes
-------

Updated file that removes the following code:

    try {
      Hive hive = Hive.get(conf);
    } catch (HiveException e) {
    }
    
It is unnecessary.


Bugs: HIVE-15361
    https://issues.apache.org/jira/browse/HIVE-15361


Repository: hive-git


Description
-------

Problem:
- DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.

Solution
- Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.

Other changes
- Merge the MoveWork objects inside the createCondTask() method for better error handling.
- Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
- Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.


Diffs (updated)
-----

  itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 223cdf4d17e7fe9959a68f22a0aee6ea3872e2a9 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9b993a6568b63aa21bf3e644832c6347954f6f59 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out 81bcc7674727d68e881413dc083d4969013bd0fb 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
  ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 

Diff: https://reviews.apache.org/r/54393/diff/


Testing
-------

All itests/hive-blobstore tests run.

Added new blobstore tests:
- insert_into_dynamic_partitions.q
- insert_overwrite_dynamic_partitions.q

Waiting for HiveQA to run the rest of the q-tests.


Thanks,

Sergio Pena


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Sergio Pena <se...@cloudera.com>.

> On Dec. 7, 2016, 4:43 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1765
> > <https://reviews.apache.org/r/54393/diff/2/?file=1578074#file1578074line1765>
> >
> >     can BlobStorageUtils.areOptimizationsEnabled(conf) return true for non blobstores  ?

Yes. This only checks if the flag is enabled or not. Useful for users that do not want Bobstore optimizations even if they use Blobstore paths.

Another check happens at the end of the method that verifies that the paths are Blobstore.


> On Dec. 7, 2016, 4:43 p.m., Mohit Sabharwal wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java, line 1842
> > <https://reviews.apache.org/r/54393/diff/2/?file=1578074#file1578074line1842>
> >
> >     throw RuntimeException

I don't think this is necessary. It is a non-check exception. The mergeMovePaths() method should never be called with an invalid MoveWork (that's why shouldMergePaths() is called), but I left it in case a user attempts to call it in a different way in a different method.


- Sergio


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/#review158345
-----------------------------------------------------------


On Dec. 7, 2016, 5:02 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54393/
> -----------------------------------------------------------
> 
> (Updated Dec. 7, 2016, 5:02 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15361
>     https://issues.apache.org/jira/browse/HIVE-15361
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> - DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.
> 
> Solution
> - Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.
> 
> Other changes
> - Merge the MoveWork objects inside the createCondTask() method for better error handling.
> - Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
> - Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.
> 
> 
> Diffs
> -----
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out 223cdf4d17e7fe9959a68f22a0aee6ea3872e2a9 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9b993a6568b63aa21bf3e644832c6347954f6f59 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out 81bcc7674727d68e881413dc083d4969013bd0fb 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 
> 
> Diff: https://reviews.apache.org/r/54393/diff/
> 
> 
> Testing
> -------
> 
> All itests/hive-blobstore tests run.
> 
> Added new blobstore tests:
> - insert_into_dynamic_partitions.q
> - insert_overwrite_dynamic_partitions.q
> 
> Waiting for HiveQA to run the rest of the q-tests.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Mohit Sabharwal <mo...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/#review158345
-----------------------------------------------------------



lgtm overall, one question.


ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1649)
<https://reviews.apache.org/r/54393/#comment229109>

    can BlobStorageUtils.areOptimizationsEnabled(conf) return true for non blobstores  ?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java (line 1726)
<https://reviews.apache.org/r/54393/#comment229110>

    throw RuntimeException


- Mohit Sabharwal


On Dec. 6, 2016, 8:13 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54393/
> -----------------------------------------------------------
> 
> (Updated Dec. 6, 2016, 8:13 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15361
>     https://issues.apache.org/jira/browse/HIVE-15361
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> - DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.
> 
> Solution
> - Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.
> 
> Other changes
> - Merge the MoveWork objects inside the createCondTask() method for better error handling.
> - Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
> - Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.
> 
> 
> Diffs
> -----
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out c725c96cbb6b0374e67308a54204c7c25e827567 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 
> 
> Diff: https://reviews.apache.org/r/54393/diff/
> 
> 
> Testing
> -------
> 
> All itests/hive-blobstore tests run.
> 
> Added new blobstore tests:
> - insert_into_dynamic_partitions.q
> - insert_overwrite_dynamic_partitions.q
> 
> Waiting for HiveQA to run the rest of the q-tests.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Illya Yalovyy <ya...@amazon.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/#review158221
-----------------------------------------------------------


Ship it!




Ship It!

- Illya Yalovyy


On Dec. 6, 2016, 8:13 p.m., Sergio Pena wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54393/
> -----------------------------------------------------------
> 
> (Updated Dec. 6, 2016, 8:13 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-15361
>     https://issues.apache.org/jira/browse/HIVE-15361
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Problem:
> - DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.
> 
> Solution
> - Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.
> 
> Other changes
> - Merge the MoveWork objects inside the createCondTask() method for better error handling.
> - Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
> - Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.
> 
> 
> Diffs
> -----
> 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
>   itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
>   itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out c725c96cbb6b0374e67308a54204c7c25e827567 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
>   ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 
> 
> Diff: https://reviews.apache.org/r/54393/diff/
> 
> 
> Testing
> -------
> 
> All itests/hive-blobstore tests run.
> 
> Added new blobstore tests:
> - insert_into_dynamic_partitions.q
> - insert_overwrite_dynamic_partitions.q
> 
> Waiting for HiveQA to run the rest of the q-tests.
> 
> 
> Thanks,
> 
> Sergio Pena
> 
>


Re: Review Request 54393: HIVE-15361: INSERT dynamic partition on S3 fails with a MoveTask failure

Posted by Sergio Pena <se...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54393/
-----------------------------------------------------------

(Updated Dec. 6, 2016, 8:13 p.m.)


Review request for hive.


Changes
-------

Addressed all Illya comments.


Bugs: HIVE-15361
    https://issues.apache.org/jira/browse/HIVE-15361


Repository: hive-git


Description
-------

Problem:
- DynamicPartitionCtx and ListBucketingCtx objects weren't set on the new MoveWork created when merging the two MoveWork objects from the ConditionalTask.

Solution
- Set the DynamicPartitionCtx and ListBucketingCtx objects to the new MoveWork created for the S3 optimization.

Other changes
- Merge the MoveWork objects inside the createCondTask() method for better error handling.
- Only merge the MoveWork related to the moveOnlyMoveTask. The MoveWork from the mergeAndMoveMoveTask may cause other issues that are not correctly tested.
- Two new private methods are added to check and merge the conditional input/output tasks to the linked MoveWork.


Diffs (updated)
-----

  itests/hive-blobstore/src/test/queries/clientpositive/insert_into_dynamic_partitions.q PRE-CREATION 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_into_table.q 25e2e7007ff539223d9244ca9822aa65d1441eb0 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_dynamic_partitions.q PRE-CREATION 
  itests/hive-blobstore/src/test/queries/clientpositive/insert_overwrite_table.q 846b2b113f09a74a3f05c13ffb56163e81dc1e8e 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_dynamic_partitions.q.out PRE-CREATION 
  itests/hive-blobstore/src/test/results/clientpositive/insert_into_table.q.out fbb52c132a331aefe870264e035c397078f3c82e 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_directory.q.out 9f575a66ecefc3933b16dff554bdcc1c1f6420ee 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_dynamic_partitions.q.out PRE-CREATION 
  itests/hive-blobstore/src/test/results/clientpositive/insert_overwrite_table.q.out c725c96cbb6b0374e67308a54204c7c25e827567 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java adc1188f09c8019a8aa60403d5813d6fa4509ceb 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadDesc.java bcd3125ab4ad20c00fec565e5004ee200c0187d5 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 9a868a04ce93d5c2ee75b5c6e96a1401cea93133 
  ql/src/java/org/apache/hadoop/hive/ql/plan/LoadTableDesc.java 771a919ccd0bd75fe6197299ae057647ece89a7e 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MoveWork.java 9f498c7fb88a7a9f77b8c6739c097a2b26b0c617 
  ql/src/test/org/apache/hadoop/hive/ql/optimizer/TestGenMapRedUtilsCreateConditionalTask.java e6ec44504685bd9e53f158cc359b8a7b79fd0166 

Diff: https://reviews.apache.org/r/54393/diff/


Testing
-------

All itests/hive-blobstore tests run.

Added new blobstore tests:
- insert_into_dynamic_partitions.q
- insert_overwrite_dynamic_partitions.q

Waiting for HiveQA to run the rest of the q-tests.


Thanks,

Sergio Pena