You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Anand Murugappan (JIRA)" <ji...@apache.org> on 2008/01/31 21:03:09 UTC

[jira] Created: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Unable to specify CTRL-A as a delimiter for the PigStorage function
-------------------------------------------------------------------

                 Key: PIG-85
                 URL: https://issues.apache.org/jira/browse/PIG-85
             Project: Pig
          Issue Type: Bug
            Reporter: Anand Murugappan


A PIG command like - 

store abc into 'abc' using PigStorage('\x01');

 does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 

Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598017#action_12598017 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

Pi,

Thanks for reworking the patch. 

Looking at the unit test, the store does not require escaping special character while load does. This will be confusing for the users. Why is this necessary?

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Amir Youssefi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593103#action_12593103 ] 

Amir Youssefi commented on PIG-85:
----------------------------------

Yes, store has a problem. Users should be able to have: 

STORE ... USING PigStorage('\u0001') 

for Ctrl+A


> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597176#action_12597176 ] 

Benjamin Reed commented on PIG-85:
----------------------------------

I also vote to remove the Regex. To be honest, I only used it in our initial implementation because it was easy not because I wanted regular expressions. Apart from causing the problems listed here it also has a performance impact.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597636#action_12597636 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

I added the following unit test to TestStore:

public void testDelimiter() throws IOException{
        System.out.println("Temp files: " + tmpFile1 + ", " + tmpFile2);
        pigServer.registerQuery("A = load " + fileName + ";");
        pigServer.store("A", tmpFile1, "PigStorage('\u0001')");
        pigServer.registerQuery("B = load " + tmpFile1 + "using PigStorage('\u0001');");
        pigServer.registerQuery("C = foreach B generate $0, $1;");
        pigServer.store("C", tmpFile2);
        pigServer.registerQuery("E = load " + tmpFile2 + ";");
        Iterator<Tuple> iter  = pigServer.openIterator("E");
        int i =0;
        while (iter.hasNext()){
            Tuple t = iter.next();
            assertEquals(t.getAtomField(0).numval().intValue(),i);
            assertEquals(t.getAtomField(1).numval().intValue(),i);
            i++;
        }
    }

With the latest patch, it works in local mode but fails in M-R mode. Will attach test outut in a second.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597528#action_12597528 ] 

Pi Song commented on PIG-85:
----------------------------

Olga,

1) In this case we have to use StringUtils.unescapeInputString() because LoadFunc/StoreFunc expects quoted strings as input. This is different from string literals where we want to get rid of quotes.

2) I was thinking about that too but I was not quite sure whether you have already got the solution yet so just did a quick one.

will change (2) quickly.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-85:
------------------------------

    Attachment: PIG-85_v4.patch

This patch reuses the size from the previous tuple.

The next thing I want to try is to walking the data counting number of delimiters upfront and then using String[] rather than ArrayList.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG-85_v4.patch, PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600324#action_12600324 ] 

Pi Song commented on PIG-85:
----------------------------

I think that's a good solution. 

I will do performance tests on other different ways e.g. use trimToSize()  or count the exact number before instantiate the ArrayList to be able to tolerate variable tuple size.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Sam Pullara (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564525#action_12564525 ] 

Sam Pullara commented on PIG-85:
--------------------------------

Ah, this may be a bug in store vs. load.  I know that load works with that format.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593325#action_12593325 ] 

Pi Song commented on PIG-85:
----------------------------

At the moment, string escaping only works with standalone literals. To add support for LOAD/STORE function parameter is not difficult but it will cause trouble in the parser change (which is close to finish) in type branch. Please wait a bit more and I'll fix this.

In the mean time one way to do is inheriting the PigStorage and set the default delimitor to anything you want in Java. Then use it as a UDF LOAD/STORE.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598145#action_12598145 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

In the unit test that you provided, the delimiter is escaped in load but not in store. Is that intentional?

Also, what would be the work involved in not requiring escaping at all?

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pi Song updated PIG-85:
-----------------------

    Attachment: PIG_85_v3.patch

This patch includes all above plus fixes the backend job submission issue.
The test provided by Olga incorporated.
All tests passed in Local & MapReduce.

@Olga
This line:-
{noformat}
 pigServer.registerQuery("B = load " + tmpFile1 + "using PigStorage('\\u0001') ;");
{noformat}
needs double \ because it has to get through query parser as "\u0001" whereas 
{noformat}
pigServer.store("A", tmpFile1, "PigStorage('\u0001')");
{noformat}
doesn't need because this one doesn't go through query parser.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597138#action_12597138 ] 

Pi Song commented on PIG-85:
----------------------------

In Load we use regex to split:-
{noformat}
String[] splitString = textLine.split(delimiter, -1);

JavaDoc: String[] 	split(String regex)      Splits this string around matches of the given regular expression
{noformat}
But in Store we use normal string concatenation:-
{noformat}
if (it.hasNext()) buf.append(delim);
{noformat} 
  
This can be very confusing for users.
For example, if I do
{noformat}
a = LOAD '/tmp/datatest1.txt' USING PigStorage('\\d') ;
{noformat}
 "\ \d" will get unescaped as in Ben's solution to '\d' which is seperating by digits.

Then I  do
{noformat}
STORE a INTO '/tmp/dataout' USING PigStorage('\\d')
{noformat}
the output will have the actual string "\d" as separators 

Are there many users relying on this Regex stuff? If not +1 for removing it. We can introduce a new built-in storage that supports Regex.


> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598839#action_12598839 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

I was able to successfully run unit as well as end-to-end tests. Will commit as soon is Hadoop 17 patch is on. Thanks, Pi!

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-85.
-------------------------------

    Resolution: Fixed

Patch committed. Thanks Pi for fixing this

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598147#action_12598147 ] 

Pi Song commented on PIG-85:
----------------------------

Yes, that's intentional. We have introduced escaping in Query Parser to allow user to do escape like in Java. Basically users can use escaping in string literals. This bug is that our escaping doesn't work after USING.  We have fixed it to be consistent in the patch.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600544#action_12600544 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

Pi, 

Thanks for the numbers. This is very useful!

I agree that we are making huge improvement which is great to see but I also think that allowing extra reallocation and copy of the references is not desirable and seem like it can be avoided with remembering the last size.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-85:
------------------------------

    Attachment: TEST-org.apache.pig.test.TestStore.txt

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597669#action_12597669 ] 

Pi Song commented on PIG-85:
----------------------------

Seems like hadoop jobconf doesn't like special char. I may have to work around this.



08/05/16 22:19:29 FATAL conf.Configuration: error parsing conf file: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML character.

08/05/16 22:19:29 INFO ipc.Server: IPC Server handler 5 on 39413, call submitJob(job_200805162217_0010) from 127.0.0.1:39792: error: java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML character.

java.io.IOException: java.lang.RuntimeException: org.xml.sax.SAXParseException: Character reference "&#1" is an invalid XML character.



> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598758#action_12598758 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

Pi, sounds good. I will be testing your patch today and will commit it after hadoop 0.17 patch sometimes tomorrow or Friday.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596226#action_12596226 ] 

Benjamin Reed commented on PIG-85:
----------------------------------

StringList() in QueryParser.jjt should be using unquote() on QUOTEDSTRING. It looks like it is the only place that was missed when we did the unquote patch. It's simple enough to fix (a two line patch). Is that going to mess up your parser change too much Pi? Here are the lines in StringList that need to be changed:

	t = <QUOTEDSTRING> {sb.append(t.image);}
        t = <QUOTEDSTRING> {sb.append(t.image);}

they should both become
	t = <QUOTEDSTRING> {sb.append(unquote(t.image));}

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596944#action_12596944 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

I am guessing this is due to the fact that we are using regular expression for load. I will take a stab at integrating code I got from Alan to get rid of regular expression and then retest the changes

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pi Song updated PIG-85:
-----------------------

    Attachment: PIG_85_v2.patch

The patch optimizes string splitting.

I've run all tests in local mode. No problem. 

Gotta go outside now. Enjoy your weekend!!!

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600389#action_12600389 ] 

Pi Song commented on PIG-85:
----------------------------

Test result: Splitting strings of 100 tokens 1 million times

||Approach||time in ms||
|String.Split() |87407|
|StringTokenizer|25969|
|Count  #tokens before create ArrayList|22094|
|Manual Split with trimToSize() |15031|
|Manual Split without trimToSize() |14860|

I think we have already gained a lot from switching to manual splitting. It shouldn't be too bad to also call ArrayList.trimToSize() before returning the result. Memory is a hard constrain so very important to save.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600938#action_12600938 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

Ran tests: the current code is about 30% faster than double-walking line and using String[]. So I belive that the current patch should be a reasonable solution.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG-85_v4.patch, PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596233#action_12596233 ] 

Pi Song commented on PIG-85:
----------------------------

Thanks Ben.

At the beginning, I thought I wouldn't want to touch the grammar file for a while but so many people (including myself) already touched it. I'll fix this.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-85:
------------------------------

    Status: Patch Available  (was: Reopened)

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG-85_v4.patch, PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-85:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch has been committed. issue resolved. Thanks Pi for contributing.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG-85_v4.patch, PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reopened PIG-85:
-------------------------------


Pi, I think the patch still has an issue. With this changes, there is potential of using much more memory then needed. This is caused by the changes to the parsing code in the tuple. Looks like when weallow array list to grow dinamically instead of specifying fixed size, it causes large memory overhead. (Reading Java documentation, I did not see what is the reallocation algorithm is but if it is like STL - doubling every time - this can get expensive.)

After I applied this patch, I have a group all query that used to run but now is failing.

I made quick fix - just for testing - of reusing the size of the previous tuple since most of the time tuples have the same number of fields, and that solved the issue for this particular case. 

That might be a resonable approach but I am open for other suggestions as well.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596936#action_12596936 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

I integrated and tested the solution proposed by Ben. It does solve the store issue but it breaks the load :).

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pi Song updated PIG-85:
-----------------------

    Attachment: PIG_85_escaping_parameters.patch

Just in case nobody has got the solution yet.
- Regex thing removed
- Tested

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12598142#action_12598142 ] 

Pi Song commented on PIG-85:
----------------------------

The current semantic is "do escape when you do Pig query" and "don't do escape when you work directly with API"

We may have to address this but I would suggest we open a new Jira for discussion.


> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597151#action_12597151 ] 

Alan Gates commented on PIG-85:
-------------------------------

AFAIK people are not relying heavily on the regex in the load.  And using a regex for parsing lines in our default load function is very heavy.  I was planning on removing that from PigStorage as part of the rewrite of that function in the pipeline work.  So I vote for removing it too.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596404#action_12596404 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

Pi, when do you think this might be resolved. We have several users seeing this problem. Thanks.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Anand Murugappan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564523#action_12564523 ] 

Anand Murugappan commented on PIG-85:
-------------------------------------

I am not sure if that helps. 

Here is the query:
entries = LOAD '/user/anandm/testData/testTableA.txt' AS (key,value);
STORE entries INTO '/user/anandm/tmp/testTableACtrl01.tsv' USING PigStorage('\01');

The output contains this:
a\011
b\012
c\013
d\014
e\015

We do not want to see \01 but instead see '^A' character as the separator.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597521#action_12597521 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

Hi Pi,

Thanks for submitting the patch. Couple of comments:

(1) To keep QueryParser.jjt code consistent I think it would make sense to use unquote rather than StringUtils.unescapeInputString since that's what the rest of the code does
(2) I think we should keep track of where we are at in the initial textLine rather than reallocating it every time we find the delimiter to keep the code more efficient.

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Sam Pullara (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12564501#action_12564501 ] 

Sam Pullara commented on PIG-85:
--------------------------------

This is the right format for this: PigStorage('\01')


> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597622#action_12597622 ] 

Olga Natkovich commented on PIG-85:
-----------------------------------

Patch looks good. Running tests now

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG_85_escaping_parameters.patch, PIG_85_v2.patch
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-85) Unable to specify CTRL-A as a delimiter for the PigStorage function

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600958#action_12600958 ] 

Pi Song commented on PIG-85:
----------------------------

+1

> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>         Attachments: PIG-85_v4.patch, PIG_85_escaping_parameters.patch, PIG_85_v2.patch, PIG_85_v3.patch, TEST-org.apache.pig.test.TestStore.txt
>
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.