You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pig.apache.org by "Zhang, Liyun" <li...@intel.com> on 2014/12/29 09:51:37 UTC

about how to implement ship in other mode like "spark"

Hi all,
  I want to ask a question about "ship" in pig:
    Ship with streaming, it will send streaming binary and supporting files, if any, from the client node to the compute nodes.
  I found that the implementation of ship in Mapreduce mode is:

/home/zly/prj/oss/pig/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
line 721:
setupDistributedCache(pigContext, conf, pigContext.getProperties(),
                    "pig.streaming.ship.files", true);

this function gets all "pig.streaming.ship.files" from the properties, then copy the ship files to hadoop using fs.copyFromLocalFile, at the same time, symlink feature is turned on by using DistributedCache.createSymlink(conf). For example, if ship file "/tmp/teststreaming.pl" is copyed from local to hadoop, the hadoop file will be hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl. /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 is a cache for hdfs://xxxx:8020/tmp/tempxxxx/tmp-xxx#teststreaming.pl . teststreaming.pl will be generated as a link to  /tmp/hadoop-root/mapred/local/1419842279890/tmp-1268857767 in the current execution path.  If i want to implement ship in other mode like spark, the only thing i need to do is copying the shiped files from the shiped path to current execution path?



Best regards
Zhang,Liyun