You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Stefan Groschupf (JIRA)" <ji...@apache.org> on 2008/02/01 19:45:07 UTC

[jira] Created: (PIG-86) load and store function should be file independent

load and store function should be file independent
--------------------------------------------------

                 Key: PIG-86
                 URL: https://issues.apache.org/jira/browse/PIG-86
             Project: Pig
          Issue Type: Improvement
            Reporter: Stefan Groschupf
            Priority: Critical


Currently pig is very file system centric for input LoadFunc and output StoreFunc.
However in many buzz usecases the data are not in the dfs directly but come from a remote service. Also it is common to store the results into a database for a webbased ui.
There for I suggest to make filepath elements in the gramar optional.
For example:
A = load 'mydata' using PigStorage() // " 'myData' " would be optional  
STORE A into 'output.bz2' USING myStore(); //  " into 'output.bz2' " would be optional.

Current workaround for implementing Load and Store functions that work without file system access is to create temporary dummy files since pig is checking for this files, even if the load and store functions do not need those file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-86) load and store function should be file independent

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587184#action_12587184 ] 

Pi Song commented on PIG-86:
----------------------------

There is a demand to be able to read data from databases or external services.
I think we need to modify the load syntax a bit or introduce a new syntax (whatever suitable) instead of passing a dummy file.

> load and store function should be file independent
> --------------------------------------------------
>
>                 Key: PIG-86
>                 URL: https://issues.apache.org/jira/browse/PIG-86
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Stefan Groschupf
>            Priority: Critical
>
> Currently pig is very file system centric for input LoadFunc and output StoreFunc.
> However in many buzz usecases the data are not in the dfs directly but come from a remote service. Also it is common to store the results into a database for a webbased ui.
> There for I suggest to make filepath elements in the gramar optional.
> For example:
> A = load 'mydata' using PigStorage() // " 'myData' " would be optional  
> STORE A into 'output.bz2' USING myStore(); //  " into 'output.bz2' " would be optional.
> Current workaround for implementing Load and Store functions that work without file system access is to create temporary dummy files since pig is checking for this files, even if the load and store functions do not need those file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-86) load and store function should be file independent

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587209#action_12587209 ] 

Benjamin Reed commented on PIG-86:
----------------------------------

I believe PIG-55 takes care of this. For a database you would use a LoadFunc which implements SlicerFactory. Your LoadFunc would then be able to validate the "filename", which in this case would probably be a spec to a table.

> load and store function should be file independent
> --------------------------------------------------
>
>                 Key: PIG-86
>                 URL: https://issues.apache.org/jira/browse/PIG-86
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Stefan Groschupf
>            Priority: Critical
>
> Currently pig is very file system centric for input LoadFunc and output StoreFunc.
> However in many buzz usecases the data are not in the dfs directly but come from a remote service. Also it is common to store the results into a database for a webbased ui.
> There for I suggest to make filepath elements in the gramar optional.
> For example:
> A = load 'mydata' using PigStorage() // " 'myData' " would be optional  
> STORE A into 'output.bz2' USING myStore(); //  " into 'output.bz2' " would be optional.
> Current workaround for implementing Load and Store functions that work without file system access is to create temporary dummy files since pig is checking for this files, even if the load and store functions do not need those file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.