You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2008/07/01 20:44:45 UTC

[jira] Updated: (PIG-252) Allow multiple paths in the load statement

     [ https://issues.apache.org/jira/browse/PIG-252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-252:
---------------------------

    Attachment: localglobbing.patch

Pig will use hadoop default mode as its local mode execution engine. There should be no difference to support globbing in both local mode and mapreduce mode. Pig will pass unfiltered globbing string to hadoop ("org.apache.hadoop.fs.FileSystem.globPaths"). So once [HADOOP-3498|https://issues.apache.org/jira/browse/HADOOP-3498] is fixed, pig should automatically benefit from it. The only thing is currently there is still some code for file existence checking for local mode specificly. We need to clear this out. I attached a patch for reference (target branches/types).

> Allow multiple paths in the load statement
> ------------------------------------------
>
>                 Key: PIG-252
>                 URL: https://issues.apache.org/jira/browse/PIG-252
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>         Attachments: localglobbing.patch
>
>
> From Tom White:
> I;m having a problem loading data from multiple paths in Pig. What I'm trying to do is to load data from a range of dates, so I would like to specify an input of two globbed paths:
> x = LOAD '2008/05/{26,27,28,29,30,31},2008/06/{1,2}'
> Pig doesn't seem to like this though as it's trying to interpret it as a single path. The best I can do it to use UNION:
> x1 = LOAD '2008/05/{26,27,28,29,30,31}'
> x2 = LOAD '2008/06/{1,2}'
> x = UNION x1, x2
> The downside to this is that I want to parameterize my paths, and having separate script for each number of paths in the input is cumbersome.
> Is there a better way of doing this? Are there any plans to support multiple paths, and/or PathFilters?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.