You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Justin Sanders (JIRA)" <ji...@apache.org> on 2010/06/14 19:39:13 UTC

[jira] Created: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

RegExLoader hangs on lines that don't match the regular expression
------------------------------------------------------------------

                 Key: PIG-1449
                 URL: https://issues.apache.org/jira/browse/PIG-1449
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Justin Sanders
            Priority: Minor


In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".

Here are the steps to recreate the bug:

Create a text file in HDFS with the following lines:

test1
testA
test2

Run the following pig script:

REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
dump test;

Expected result:

(test1)
(test3)

Actual result:

Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878663#action_12878663 ] 

Ashutosh Chauhan commented on PIG-1449:
---------------------------------------

Justin,

Good catch. Can you assimilate your test case in junit in one of piggybank/test/storage/TestRegExLoader or TestMyRegExLoader. That way we'll have a regression test for the issue.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884552#action_12884552 ] 

Ashutosh Chauhan commented on PIG-1449:
---------------------------------------

@Christian,

It would definitely be useful to get the execution time for running the tests down. It takes a while currently to run all Pig tests.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12878812#action_12878812 ] 

Hadoop QA commented on PIG-1449:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12447045/RegExLoader.patch
  against trunk revision 953798.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/328/console

This message is automatically generated.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Christian Hargraves (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884389#action_12884389 ] 

Christian Hargraves commented on PIG-1449:
------------------------------------------

I ran into this issue last night and before seeing this bug, I fixed it. My fix is similar to the previous, but it includes a unit test. Hopefully, the test will help move this in more quickly. I notice that it takes over 4 minutes to run the unit tests. Would be any added value in trying to reduce the execution time in these tests? If there's any interest, I might be able to help.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884539#action_12884539 ] 

Hadoop QA commented on PIG-1449:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448516/PIG-1449-RegExLoaderInfiniteLoopFix.patch
  against trunk revision 958666.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/357/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/357/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/357/console

This message is automatically generated.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy closed PIG-1449.
----------------------------------


> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Justin Sanders (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Sanders updated PIG-1449:
--------------------------------

    Attachment: RegExLoader.patch

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1449:
----------------------------------

    Status: Open  (was: Patch Available)

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1449:
----------------------------------

    Status: Patch Available  (was: Open)

Running through Hudson.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Justin Sanders (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Justin Sanders updated PIG-1449:
--------------------------------

          Status: Patch Available  (was: Open)
    Release Note: Fixed hanging in RegExLoader if line didn't match regular expression.

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashutosh Chauhan updated PIG-1449:
----------------------------------

           Status: Resolved  (was: Patch Available)
    Fix Version/s: 0.8.0
       Resolution: Fixed

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Christian Hargraves (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christian Hargraves updated PIG-1449:
-------------------------------------

    Attachment: PIG-1449-RegExLoaderInfiniteLoopFix.patch

This should fix the problem by adding a call to nextKeyValue on each iteration. 

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1449) RegExLoader hangs on lines that don't match the regular expression

Posted by "Ashutosh Chauhan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884551#action_12884551 ] 

Ashutosh Chauhan commented on PIG-1449:
---------------------------------------

Reran the contrib tests. All passed. Patch committed. Thanks, Christian and Justin for working on this !

> RegExLoader hangs on lines that don't match the regular expression
> ------------------------------------------------------------------
>
>                 Key: PIG-1449
>                 URL: https://issues.apache.org/jira/browse/PIG-1449
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Justin Sanders
>            Priority: Minor
>         Attachments: PIG-1449-RegExLoaderInfiniteLoopFix.patch, RegExLoader.patch
>
>
> In the 0.7.0 changes to RegExLoader there was a bug introduced where the code will stay in the while loop if the line isn't matched.  Before 0.7.0 these lines would be skipped if they didn't match the regular expression.  The result is the mapper will not respond and will time out with "Task attempt_X failed to report status for 600 seconds. Killing!".
> Here are the steps to recreate the bug:
> Create a text file in HDFS with the following lines:
> test1
> testA
> test2
> Run the following pig script:
> REGISTER /usr/local/pig/contrib/piggybank/java/piggybank.jar;
> test = LOAD '/path/to/test.txt' using org.apache.pig.piggybank.storage.MyRegExLoader('(test\\d)') AS (line);
> dump test;
> Expected result:
> (test1)
> (test3)
> Actual result:
> Job fails to complete after 600 second timeout waiting on the mapper to complete.  The mapper hangs at 33% since it can process the first line but gets stuck into the while loop on the second line.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.