You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Andrew Oliver <ac...@gmail.com> on 2014/09/06 00:50:28 UTC

Run pig scripts from HDFS with local jars

Question is it possible to do:

pig -useHCatalog hdfs://myserver:8020/load/scripts/mydir/myscript.pig

And run my pig script with HDFS but cause the hive/hcatalog/pig jars to
load LOCALLY? Rationale: avoid the single point of failure chance of having
all my scripts locally, but since I need most of those jars on the nodes
I'd run them on anyhow, no reason to go to DFS for them and deal with
centralizing those libs quite that much.

Right now, after fixing an NPE (
https://issues.apache.org/jira/browse/PIG-4156) it gets into fileLocalizer
sees that the pig.jars.relative.to.dfs property is true and then tries to
resolve EVERY JAR (PiggyBank, HCatalog jars, Hive jars) from HDFS.

The pig.jars.relative.to.dfs hard codes to true in Main.java
"
                FileLocalizer.FetchFileRet localFileRet =
FileLocalizer.fetchFile(properties, remainders[0]);
                if (localFileRet.didFetch) {
                    properties.setProperty("pig.jars.relative.to.dfs",
"true");
                }
"

If the script was remote (apparently). The code isn't super clear here, but
"didFetch" is set to true after a remote file is fetched (despite the
variable name).  So by my read if your script is remote it expects the
libraries to all be remote.

Am I missing something? Is this a bug or just a missing feature? Would
anyone object to a property like pig.jars.forcelocal=true or something?