You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ashutosh Chauhan (Created) (JIRA)" <ji...@apache.org> on 2011/11/02 21:29:33 UTC

[jira] [Created] (PIG-2344) UDF / LoadFunc / StoreFunc should be serializable

UDF / LoadFunc / StoreFunc should be serializable
-------------------------------------------------

                 Key: PIG-2344
                 URL: https://issues.apache.org/jira/browse/PIG-2344
             Project: Pig
          Issue Type: Improvement
            Reporter: Ashutosh Chauhan


If there is a redesign, this should be a requirement. We will get away with all the saving of state which got created in frontend and then recreating the same state in backend.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2344) UDF / LoadFunc / StoreFunc should be serializable

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226322#comment-13226322 ] 

Dmitriy V. Ryaboy commented on PIG-2344:
----------------------------------------

I'm a fan of the general idea, but let's rethink those method names and provide cleaner (complete) lifecycle methods.

How are you going to make sure this is backwards compatible? Some UDFs might not even have no-arg constructors.
                
> UDF / LoadFunc / StoreFunc should be serializable
> -------------------------------------------------
>
>                 Key: PIG-2344
>                 URL: https://issues.apache.org/jira/browse/PIG-2344
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Ashutosh Chauhan
>
> If there is a redesign, this should be a requirement. We will get away with all the saving of state which got created in frontend and then recreating the same state in backend.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2344) UDF / LoadFunc / StoreFunc should be serializable

Posted by "Ashutosh Chauhan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228588#comment-13228588 ] 

Ashutosh Chauhan commented on PIG-2344:
---------------------------------------

Few problems which are related but possibly can be fixed without the redesign are following: 

Pig instantiates LF/SF 3 times in frontend and call different methods of the interface on different objects, making it impossible to communicate states between constructor and different methods within frontend. Illustration of this can be found in HCatLoader which saves schema in ctor in UDFContext and retrieves it back in frontend itself. This is a *nasty* *nasty* hack. This problem manifest itself in other places also in HCatlog, making code in it brittle. 

Second, if these LF/SF functions do not perform idempotent actions, then they have to workaround that too.

Third problem is some of these methods pass jobconf, but writing anything into it is useless, since jobConf is thrown away. After making a call on interface, Pig should save this jobconf and when instantiate a real JobConf later, should initialize with this one.  


                
> UDF / LoadFunc / StoreFunc should be serializable
> -------------------------------------------------
>
>                 Key: PIG-2344
>                 URL: https://issues.apache.org/jira/browse/PIG-2344
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Ashutosh Chauhan
>
> If there is a redesign, this should be a requirement. We will get away with all the saving of state which got created in frontend and then recreating the same state in backend.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2344) UDF / LoadFunc / StoreFunc should be serializable

Posted by "Thomas Weise (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226299#comment-13226299 ] 

Thomas Weise commented on PIG-2344:
-----------------------------------

Serialization alone would not help in situations where UDF exec(..) depends on state that needs to be initialized where exec(..) runs. One of the workarounds is to do that lazily from exec(...) currently, that will guarantee it happens where the action is...

Speaking about solutions with Ashutosh, we identified the following needs:

Pig should construct UDF through default/no-arg ctor. No more multiple times instantiation through UDF ctor with arguments.

Pig should call initialize(...) in the frontend, with the arguments provided for the UDF.

Pig should call preExec() in the backend once, this would be the place where things like local file system access etc. can take place

Probably there should also be a postExec() hook for any cleanup to be done.

And finally, need to address backward compatibility also, so that existing UDFs don't suddenly stop to work.

                
> UDF / LoadFunc / StoreFunc should be serializable
> -------------------------------------------------
>
>                 Key: PIG-2344
>                 URL: https://issues.apache.org/jira/browse/PIG-2344
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Ashutosh Chauhan
>
> If there is a redesign, this should be a requirement. We will get away with all the saving of state which got created in frontend and then recreating the same state in backend.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira