You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Richard Ding (JIRA)" <ji...@apache.org> on 2010/07/23 01:02:51 UTC

[jira] Commented: (PIG-787) Allow UDFs and their dependencies to be distributed via Hadoop's distributed cache

    [ https://issues.apache.org/jira/browse/PIG-787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891383#action_12891383 ] 

Richard Ding commented on PIG-787:
----------------------------------

Currently, Pig bundles UDFs and their dependencies (including pig.jar) into job.jar and sends it to the job track via jobconf. Hadoop then copies the jar to its hdfs and pushes it to all the nodes. This is essentially the same as using distributed cache (but Pig doesn't need to copy the jar to hdfs).

One use case of using distributed cache is that some UDF jars are already on hdfs. In this case, instead of adding them to job.jar, Pig can directly add them to Hadoop's distributed cache. This will reduce the size of job.jar and avoid copying those jars to hdfs again.

Is there any other use cases that distributed cache will be helpful to distribute UDFs and their dependencies? 

> Allow UDFs and their dependencies to be distributed via Hadoop's distributed cache
> ----------------------------------------------------------------------------------
>
>                 Key: PIG-787
>                 URL: https://issues.apache.org/jira/browse/PIG-787
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Olga Natkovich
>            Assignee: Richard Ding
>             Fix For: 0.8.0
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.