You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Krisztian Kasa <kk...@hortonworks.com> on 2019/07/10 14:16:25 UTC

Review Request 71038: HIVE-21965

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71038/
-----------------------------------------------------------

Review request for hive, Jesús Camacho Rodríguez, Jason Dere, Zoltan Haindrich, and Vineet Garg.


Bugs: HIVE-21965
    https://issues.apache.org/jira/browse/HIVE-21965


Repository: hive-git


Description
-------

Implement parallel processing in HiveStrictManagedMigration
===========================================================

1. Process databases and tables paralelly using a thread pool for each. Thread pools size can be defined by command line options. If no options are given the default pool size is the number of cpu cores / 2
2. Add option for filtering table type.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/util/CloseableThreadLocal.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java 80025b7046 
  ql/src/java/org/apache/hadoop/hive/ql/util/NamedForkJoinWorkerThreadFactory.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 5f39fdccb5 
  ql/src/test/org/apache/hadoop/hive/ql/util/CloseableThreadLocalTest.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/util/TestHiveStrictManagedMigration.java PRE-CREATION 


Diff: https://reviews.apache.org/r/71038/diff/1/


Testing
-------

New UT for testing migration.


Thanks,

Krisztian Kasa


Re: Review Request 71038: HIVE-21965

Posted by Krisztian Kasa <kk...@hortonworks.com>.

> On July 11, 2019, 12:56 p.m., Zoltan Haindrich wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java
> > Lines 600 (patched)
> > <https://reviews.apache.org/r/71038/diff/2/?file=2154469#file2154469line630>
> >
> >     I think we should stop processing

The original code did not stopped processing I didn't want to change this behavior.


> On July 11, 2019, 12:56 p.m., Zoltan Haindrich wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java
> > Lines 619-620 (patched)
> > <https://reviews.apache.org/r/71038/diff/2/?file=2154469#file2154469line649>
> >
> >     I think the old behaviour was to stop processing after the first error; and start exiting...right now it will continue as much as it can...and at the end it will print out something depending on this boolean...

I did not changed the original or at least I was not intend to.
In the original version both databases and tables were processed in for loops. Each loop body contains a try block with a catch clause for Exception. So all exceptions were caugth logged and the variable failuresEncountered was set to true. The loop continued to the next iteration the program did not stop processing.

https://github.com/apache/hive/blob/4e8b940aa2fd4ea4574d0a3ff74bfbc2f336781b/ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java#L296
https://github.com/apache/hive/blob/4e8b940aa2fd4ea4574d0a3ff74bfbc2f336781b/ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java#L416


- Krisztian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71038/#review216522
-----------------------------------------------------------


On July 11, 2019, noon, Krisztian Kasa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71038/
> -----------------------------------------------------------
> 
> (Updated July 11, 2019, noon)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Jason Dere, Zoltan Haindrich, and Vineet Garg.
> 
> 
> Bugs: HIVE-21965
>     https://issues.apache.org/jira/browse/HIVE-21965
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement parallel processing in HiveStrictManagedMigration
> ===========================================================
> 
> 1. Process tables paralelly using a thread pool for each. Thread pools size can be defined by command line options. If no options are given the default pool size is the number of cpu cores.
> 2. Add option for filtering table type.
> 
> I had to remove database parallel processing because the ForkJoinPool implementation does not support nested parallel streams.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/util/CloseableThreadLocal.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java 80025b7046 
>   ql/src/java/org/apache/hadoop/hive/ql/util/NamedForkJoinWorkerThreadFactory.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 5f39fdccb5 
>   ql/src/test/org/apache/hadoop/hive/ql/util/CloseableThreadLocalTest.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/util/TestHiveStrictManagedMigration.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71038/diff/2/
> 
> 
> Testing
> -------
> 
> New UT for testing migration.
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>


Re: Review Request 71038: HIVE-21965

Posted by Zoltan Haindrich <ki...@rxd.hu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71038/#review216522
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java
Lines 600 (patched)
<https://reviews.apache.org/r/71038/#comment303743>

    I think we should stop processing



ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java
Lines 619-620 (patched)
<https://reviews.apache.org/r/71038/#comment303741>

    I think the old behaviour was to stop processing after the first error; and start exiting...right now it will continue as much as it can...and at the end it will print out something depending on this boolean...



ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java
Lines 695-697 (patched)
<https://reviews.apache.org/r/71038/#comment303742>

    earlier this exception have bubbled up


- Zoltan Haindrich


On July 11, 2019, 2 p.m., Krisztian Kasa wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/71038/
> -----------------------------------------------------------
> 
> (Updated July 11, 2019, 2 p.m.)
> 
> 
> Review request for hive, Jesús Camacho Rodríguez, Jason Dere, Zoltan Haindrich, and Vineet Garg.
> 
> 
> Bugs: HIVE-21965
>     https://issues.apache.org/jira/browse/HIVE-21965
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Implement parallel processing in HiveStrictManagedMigration
> ===========================================================
> 
> 1. Process tables paralelly using a thread pool for each. Thread pools size can be defined by command line options. If no options are given the default pool size is the number of cpu cores.
> 2. Add option for filtering table type.
> 
> I had to remove database parallel processing because the ForkJoinPool implementation does not support nested parallel streams.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/util/CloseableThreadLocal.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java 80025b7046 
>   ql/src/java/org/apache/hadoop/hive/ql/util/NamedForkJoinWorkerThreadFactory.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 5f39fdccb5 
>   ql/src/test/org/apache/hadoop/hive/ql/util/CloseableThreadLocalTest.java PRE-CREATION 
>   ql/src/test/org/apache/hadoop/hive/ql/util/TestHiveStrictManagedMigration.java PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/71038/diff/2/
> 
> 
> Testing
> -------
> 
> New UT for testing migration.
> 
> 
> Thanks,
> 
> Krisztian Kasa
> 
>


Re: Review Request 71038: HIVE-21965

Posted by Krisztian Kasa <kk...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71038/
-----------------------------------------------------------

(Updated July 11, 2019, noon)


Review request for hive, Jesús Camacho Rodríguez, Jason Dere, Zoltan Haindrich, and Vineet Garg.


Bugs: HIVE-21965
    https://issues.apache.org/jira/browse/HIVE-21965


Repository: hive-git


Description (updated)
-------

Implement parallel processing in HiveStrictManagedMigration
===========================================================

1. Process tables paralelly using a thread pool for each. Thread pools size can be defined by command line options. If no options are given the default pool size is the number of cpu cores.
2. Add option for filtering table type.

I had to remove database parallel processing because the ForkJoinPool implementation does not support nested parallel streams.


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/util/CloseableThreadLocal.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/util/HiveStrictManagedMigration.java 80025b7046 
  ql/src/java/org/apache/hadoop/hive/ql/util/NamedForkJoinWorkerThreadFactory.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/TxnCommandsBaseForTests.java 5f39fdccb5 
  ql/src/test/org/apache/hadoop/hive/ql/util/CloseableThreadLocalTest.java PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/util/TestHiveStrictManagedMigration.java PRE-CREATION 


Diff: https://reviews.apache.org/r/71038/diff/2/

Changes: https://reviews.apache.org/r/71038/diff/1-2/


Testing
-------

New UT for testing migration.


Thanks,

Krisztian Kasa