You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Brock Noland <br...@cloudera.com> on 2014/02/18 01:25:32 UTC

Review Request 18200: HIVE-860 - Persistent distributed cache

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

Review request for hive.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs
-----

  bin/hive 3bd949f 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.

> On Feb. 18, 2014, 7:02 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 91
> > <https://reviews.apache.org/r/18200/diff/5/?file=494786#file494786line91>
> >
> >     cool. But what about the old cached files? Does HDFS clean them up automatically?

The files will have to be periodically cleaned up by the user or admin. Often times admins have policies that delete files which have not been accessed for a long time like a year. We update the access time on files once per day for this purpose. In reality most users won't be using thousands of jars so they won't have to clean them up.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review34740
-----------------------------------------------------------


On Feb. 18, 2014, 3:36 a.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 3:36 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review34740
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment64983>

    cool. But what about the old cached files? Does HDFS clean them up automatically?


- Xuefu Zhang


On Feb. 18, 2014, 3:36 a.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 3:36 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.

> On Feb. 18, 2014, 6:47 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 91
> > <https://reviews.apache.org/r/18200/diff/5/?file=494786#file494786line91>
> >
> >     What happens if file exists in cache but is outdated?
> 
> Brock Noland wrote:
>     The file name contains the sha1 hash so files will be unique.

There is a scenario where we tried to create the files in HDFS but could not finish. I will cover that case.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review34716
-----------------------------------------------------------


On Feb. 18, 2014, 3:36 a.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 3:36 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.

> On Feb. 18, 2014, 6:47 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java, line 57
> > <https://reviews.apache.org/r/18200/diff/5/?file=494784#file494784line57>
> >
> >     protected static member for a final class?

Copy and paste error. I will make it private and remove the final from the class.


> On Feb. 18, 2014, 6:47 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 91
> > <https://reviews.apache.org/r/18200/diff/5/?file=494786#file494786line91>
> >
> >     What happens if file exists in cache but is outdated?

The file name contains the sha1 hash so files will be unique.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review34716
-----------------------------------------------------------


On Feb. 18, 2014, 3:36 a.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 3:36 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review34716
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java
<https://reviews.apache.org/r/18200/#comment64938>

    protected static member for a final class?



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment64973>

    What happens if file exists in cache but is outdated?


- Xuefu Zhang


On Feb. 18, 2014, 3:36 a.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 3:36 a.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review35067
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java
<https://reviews.apache.org/r/18200/#comment65457>

    This also seems to be a static class. Again, a minor issue.



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment65458>

    I agree this is a minor issue, but I don't see why there is a need of mocking it ever.


- Xuefu Zhang


On Feb. 19, 2014, 8:35 p.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2014, 8:35 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
>   conf/hive-default.xml.template 0d08aa2 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.

> On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 80
> > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line80>
> >
> >     Don't we need to close the stream from the open() call?

I thought that DigestUtils do since it reads to EOF. It doesn't: https://issues.apache.org/jira/browse/PIG-2672?focusedCommentId=13907447&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13907447


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review35042
-----------------------------------------------------------


On Feb. 19, 2014, 8:35 p.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2014, 8:35 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
>   conf/hive-default.xml.template 0d08aa2 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.

> On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 67
> > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line67>
> >
> >     For my info, does hadoop knows that a file is already in distributed cache so as to skip it. Otherise, it will cache everytime a job is launched. I couldn't find doc about this.

The big item from our perspective is that we are saving putting the data in HDFS each time. YARN has future work to share amongst jobs: https://issues.apache.org/jira/browse/YARN-1492


> On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 81
> > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line81>
> >
> >     I'm not sure if we need put this in a synchronized block.

Probably better to since HS2 is typically only running on single host and calculating hashes is CPU intensive.


> On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 92
> > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line92>
> >
> >     2. So the cached file is named without including its originial name? This might make it hard to figure out if problem arises.

Added the name for debugging purposes.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review35042
-----------------------------------------------------------


On Feb. 19, 2014, 8:35 p.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2014, 8:35 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
>   conf/hive-default.xml.template 0d08aa2 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.

> On Feb. 20, 2014, 8:06 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java, line 44
> > <https://reviews.apache.org/r/18200/diff/8/?file=498177#file498177line44>
> >
> >     It appears that this class has no state, so there is no need to instantiate it to gain any functionality from it. Maybe we can just keep everything static.

There is very little cost to creating a new object and once an item is static mocking is very difficult. I'd prefer we keep it non-static.


- Brock


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review35042
-----------------------------------------------------------


On Feb. 19, 2014, 8:35 p.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2014, 8:35 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
>   conf/hive-default.xml.template 0d08aa2 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/#review35042
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment65422>

    It appears that this class has no state, so there is no need to instantiate it to gain any functionality from it. Maybe we can just keep everything static.



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment65438>

    For my info, does hadoop knows that a file is already in distributed cache so as to skip it. Otherise, it will cache everytime a job is launched. I couldn't find doc about this.



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment65408>

    Don't we need to close the stream from the open() call?



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment65426>

    I'm not sure if we need put this in a synchronized block.



ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java
<https://reviews.apache.org/r/18200/#comment65415>

    2. So the cached file is named without including its originial name? This might make it hard to figure out if problem arises.


- Xuefu Zhang


On Feb. 19, 2014, 8:35 p.m., Brock Noland wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18200/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2014, 8:35 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Bugs: HIVE-860
>     https://issues.apache.org/jira/browse/HIVE-860
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:
> 
> 1) faster queries
> 2) less distributed cache churn
> 3) a smaller/cleaner hive-exec jar
> 
> 
> Diffs
> -----
> 
>   bin/hive 3bd949f 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
>   conf/hive-default.xml.template 0d08aa2 
>   packaging/src/main/assembly/bin.xml a97ef7d 
>   ql/pom.xml 53d0b9e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
>   shims/aggregator/pom.xml 7aa8c4c 
> 
> Diff: https://reviews.apache.org/r/18200/diff/
> 
> 
> Testing
> -------
> 
> Tested manually on a cluster.
> 
> 
> Thanks,
> 
> Brock Noland
> 
>


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

(Updated Feb. 20, 2014, 10:05 p.m.)


Review request for hive.


Changes
-------

Updated based on review.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs (updated)
-----

  bin/hive 3bd949f 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
  conf/hive-default.xml.template 0d08aa2 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
  shims/aggregator/pom.xml 7aa8c4c 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

(Updated Feb. 19, 2014, 8:35 p.m.)


Review request for hive.


Changes
-------

Latest update.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs (updated)
-----

  bin/hive 3bd949f 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
  conf/hive-default.xml.template 0d08aa2 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
  shims/aggregator/pom.xml 7aa8c4c 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

(Updated Feb. 18, 2014, 9:53 p.m.)


Review request for hive.


Changes
-------

Makes caching configurable.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs (updated)
-----

  bin/hive 3bd949f 
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java a182cd7 
  conf/hive-default.xml.template 0d08aa2 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
  shims/aggregator/pom.xml 7aa8c4c 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

(Updated Feb. 18, 2014, 7:07 p.m.)


Review request for hive.


Changes
-------

Fixes bug when the jar file could not be fully written to HDFS.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs (updated)
-----

  bin/hive 3bd949f 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
  shims/aggregator/pom.xml 7aa8c4c 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

(Updated Feb. 18, 2014, 3:36 a.m.)


Review request for hive.


Changes
-------

Minor update to jar cache.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs (updated)
-----

  bin/hive 3bd949f 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
  shims/aggregator/pom.xml 7aa8c4c 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

(Updated Feb. 18, 2014, 3:17 a.m.)


Review request for hive.


Changes
-------

Updates since the "conf" variable of ExecDriver is sometimes null.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs (updated)
-----

  bin/hive 3bd949f 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
  shims/aggregator/pom.xml 7aa8c4c 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland


Re: Review Request 18200: HIVE-860 - Persistent distributed cache

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18200/
-----------------------------------------------------------

(Updated Feb. 18, 2014, 12:45 a.m.)


Review request for hive.


Changes
-------

Minor update to the pom so the build can bootstrap.


Bugs: HIVE-860
    https://issues.apache.org/jira/browse/HIVE-860


Repository: hive-git


Description
-------

Caches auxiliary jars and remote runtime jars in /user/$user/.hiveJars by their sha1 hash. This results in:

1) faster queries
2) less distributed cache churn
3) a smaller/cleaner hive-exec jar


Diffs (updated)
-----

  bin/hive 3bd949f 
  packaging/src/main/assembly/bin.xml a97ef7d 
  ql/pom.xml 53d0b9e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/HiveAuxClasspathBuilder.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/ExecDriver.java 288da8e 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/JarCache.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapRedTask.java 326654f 
  shims/aggregator/pom.xml 7aa8c4c 

Diff: https://reviews.apache.org/r/18200/diff/


Testing
-------

Tested manually on a cluster.


Thanks,

Brock Noland