You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergey Shelukhin <se...@hortonworks.com> on 2017/12/05 03:29:23 UTC

Review Request 64324: HIVE-18153 refactor reopen and file management in TezTask

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64324/
-----------------------------------------------------------

Review request for hive, Prasanth_J and Siddharth Seth.


Repository: hive-git


Description
-------

see jira


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 88a75edd35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 5c338b89c9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 3bcf657ac4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 8417ebb7d5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java b3ccd24fd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 6fa37244a5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java af77f300c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java ecdcf12510 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 3dd4b31186 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 4148a8aa3a 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 52484540ff 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 829ea8cecc 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 47aa936845 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 78df962a3a 


Diff: https://reviews.apache.org/r/64324/diff/1/


Testing
-------


Thanks,

Sergey Shelukhin


Re: Review Request 64324: HIVE-18153 refactor reopen and file management in TezTask

Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64324/
-----------------------------------------------------------

(Updated Dec. 14, 2017, 2:30 a.m.)


Review request for hive, Prasanth_J and Siddharth Seth.


Repository: hive-git


Description
-------

see jira


Diffs (updated)
-----

  ql/pom.xml f35a4c87a0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 88a75edd35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 6c1afa6555 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 3bcf657ac4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 8417ebb7d5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java b3ccd24fd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java dd879fc5e8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 8795cfcee1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java dbdbbf25db 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 9726af1506 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 4148a8aa3a 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 52484540ff 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 829ea8cecc 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 47aa936845 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java c58e4507f2 


Diff: https://reviews.apache.org/r/64324/diff/5/

Changes: https://reviews.apache.org/r/64324/diff/4-5/


Testing
-------


Thanks,

Sergey Shelukhin


Re: Review Request 64324: HIVE-18153 refactor reopen and file management in TezTask

Posted by Sergey Shelukhin <se...@hortonworks.com>.

> On Dec. 13, 2017, 10:40 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
> > Lines 790 (patched)
> > <https://reviews.apache.org/r/64324/diff/4/?file=1912930#file1912930line806>
> >
> >     This exclusion of LLAP jars happen only for task localization or even for AM.
> >     If for AM as well, then AM would need hive-llap-tez and hadoop-yarn-registry jars right?

It already has them as part of the epic jar I think. The logic w.r.t. these jars is not changed, I just added a comment.


> On Dec. 13, 2017, 10:40 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
> > Lines 1525 (patched)
> > <https://reviews.apache.org/r/64324/diff/4/?file=1912930#file1912930line1548>
> >
> >     even in uber mode?

This is moved code, old comment


> On Dec. 13, 2017, 10:40 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java
> > Line 471 (original), 459 (patched)
> > <https://reviews.apache.org/r/64324/diff/4/?file=1912932#file1912932line473>
> >
> >     Why is the boolean flipped? If its reopen do we need to recreate tez scratch dirs?

Not anymore; the Hive files directory is "taken away" from the session before close, via HiveResources. The Tez scratchdir doesn't actually need to be preserved, it was only preserved for Hive jars. open will re-create the scratchdir if needed


> On Dec. 13, 2017, 10:40 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java
> > Line 241 (original), 236 (patched)
> > <https://reviews.apache.org/r/64324/diff/4/?file=1912934#file1912934line251>
> >
> >     Handle duplicates in additionalFilesNotFromConf since Set<String> is now moved to String[]?

Not sure what you mean?


> On Dec. 13, 2017, 10:40 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java
> > Line 287 (original), 282 (patched)
> > <https://reviews.apache.org/r/64324/diff/4/?file=1912934#file1912934line299>
> >
> >     nit: RegistryOperations.class?

Will check if that is easy to change. This is existing code that used string for some reason.


> On Dec. 13, 2017, 10:40 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java
> > Line 578 (original), 575 (patched)
> > <https://reviews.apache.org/r/64324/diff/4/?file=1912934#file1912934line594>
> >
> >     Question: 
> >     For jars, does it track the file name or file name + sha? If this is just file names how does it refresh when the sha changes?

File name only for non-exec jar. The files are specified as local files and not shared, so I think there's no expectation of consistency if local files change during process lifetime, behavior is undefined (reopen is also not a defined away). Exec jar uses sha because it's shared between multiple users.


- Sergey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64324/#review193743
-----------------------------------------------------------


On Dec. 12, 2017, 2:10 a.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64324/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2017, 2:10 a.m.)
> 
> 
> Review request for hive, Prasanth_J and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 88a75edd35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 6c1afa6555 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 3bcf657ac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 8417ebb7d5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java b3ccd24fd6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java dd879fc5e8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 8795cfcee1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java dbdbbf25db 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 9726af1506 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 4148a8aa3a 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 52484540ff 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 829ea8cecc 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 47aa936845 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java c58e4507f2 
> 
> 
> Diff: https://reviews.apache.org/r/64324/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 64324: HIVE-18153 refactor reopen and file management in TezTask

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64324/#review193743
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
Lines 790 (patched)
<https://reviews.apache.org/r/64324/#comment272368>

    This exclusion of LLAP jars happen only for task localization or even for AM.
    If for AM as well, then AM would need hive-llap-tez and hadoop-yarn-registry jars right?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java
Lines 1525 (patched)
<https://reviews.apache.org/r/64324/#comment272369>

    even in uber mode?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java
Line 471 (original), 459 (patched)
<https://reviews.apache.org/r/64324/#comment272370>

    Why is the boolean flipped? If its reopen do we need to recreate tez scratch dirs?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java
Line 241 (original), 236 (patched)
<https://reviews.apache.org/r/64324/#comment272371>

    Handle duplicates in additionalFilesNotFromConf since Set<String> is now moved to String[]?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java
Line 287 (original), 282 (patched)
<https://reviews.apache.org/r/64324/#comment272372>

    nit: RegistryOperations.class?



ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java
Line 578 (original), 575 (patched)
<https://reviews.apache.org/r/64324/#comment272373>

    Question: 
    For jars, does it track the file name or file name + sha? If this is just file names how does it refresh when the sha changes?


- Prasanth_J


On Dec. 12, 2017, 2:10 a.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/64324/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2017, 2:10 a.m.)
> 
> 
> Review request for hive, Prasanth_J and Siddharth Seth.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 88a75edd35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 6c1afa6555 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 3bcf657ac4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 8417ebb7d5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java b3ccd24fd6 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java dd879fc5e8 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 8795cfcee1 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java dbdbbf25db 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 9726af1506 
>   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 4148a8aa3a 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 52484540ff 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 829ea8cecc 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 47aa936845 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java c58e4507f2 
> 
> 
> Diff: https://reviews.apache.org/r/64324/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 64324: HIVE-18153 refactor reopen and file management in TezTask

Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64324/
-----------------------------------------------------------

(Updated Dec. 12, 2017, 2:10 a.m.)


Review request for hive, Prasanth_J and Siddharth Seth.


Repository: hive-git


Description
-------

see jira


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 88a75edd35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 6c1afa6555 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 3bcf657ac4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 8417ebb7d5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java b3ccd24fd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java dd879fc5e8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 8795cfcee1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java dbdbbf25db 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 9726af1506 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 4148a8aa3a 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 52484540ff 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 829ea8cecc 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 47aa936845 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java c58e4507f2 


Diff: https://reviews.apache.org/r/64324/diff/4/

Changes: https://reviews.apache.org/r/64324/diff/3-4/


Testing
-------


Thanks,

Sergey Shelukhin


Re: Review Request 64324: HIVE-18153 refactor reopen and file management in TezTask

Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64324/
-----------------------------------------------------------

(Updated Dec. 6, 2017, 8:50 p.m.)


Review request for hive, Prasanth_J and Siddharth Seth.


Repository: hive-git


Description
-------

see jira


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 88a75edd35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 5c338b89c9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 3bcf657ac4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 8417ebb7d5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java b3ccd24fd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java dd879fc5e8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java 8795cfcee1 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java dbdbbf25db 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 9726af1506 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 4148a8aa3a 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 52484540ff 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 829ea8cecc 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 47aa936845 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java c58e4507f2 


Diff: https://reviews.apache.org/r/64324/diff/3/

Changes: https://reviews.apache.org/r/64324/diff/2-3/


Testing
-------


Thanks,

Sergey Shelukhin


Re: Review Request 64324: HIVE-18153 refactor reopen and file management in TezTask

Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/64324/
-----------------------------------------------------------

(Updated Dec. 5, 2017, 8:58 p.m.)


Review request for hive, Prasanth_J and Siddharth Seth.


Repository: hive-git


Description
-------

see jira


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 88a75edd35 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/DagUtils.java 5c338b89c9 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPool.java 3bcf657ac4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolManager.java 8417ebb7d5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionPoolSession.java b3ccd24fd6 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezSessionState.java 6fa37244a5 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezTask.java af77f300c2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/WorkloadManager.java ecdcf12510 
  ql/src/java/org/apache/hadoop/hive/ql/exec/tez/monitoring/TezJobMonitor.java 3dd4b31186 
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDTFGetSplits.java 4148a8aa3a 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/SampleTezSessionState.java 52484540ff 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezSessionPool.java 829ea8cecc 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestTezTask.java 47aa936845 
  ql/src/test/org/apache/hadoop/hive/ql/exec/tez/TestWorkloadManager.java 78df962a3a 


Diff: https://reviews.apache.org/r/64324/diff/2/

Changes: https://reviews.apache.org/r/64324/diff/1-2/


Testing
-------


Thanks,

Sergey Shelukhin