You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/05/28 00:16:59 UTC

[jira] Reopened: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reopened PIG-85:
-------------------------------


Pi, I think the patch still has an issue. With this changes, there is potential of using much more memory then needed. This is caused by the changes to the parsing code in the tuple. Looks like when weallow array list to grow dinamically instead of specifying fixed size, it causes large memory overhead. (Reading Java documentation, I did not see what is the reallocation algorithm is but if it is like STL - doubling every time - this can get expensive.)

After I applied this patch, I have a group all query that used to run but now is failing.

I made quick fix - just for testing - of reusing the size of the previous tuple since most of the time tuples have the same number of fields, and that solved the issue for this particular case. 

That might be a resonable approach but I am open for other suggestions as well.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.