You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2010/01/20 00:06:54 UTC

[jira] Created: (PIG-1197) TextLoader should be updated to match changes to PigStorage

TextLoader should be updated to match changes to PigStorage
-----------------------------------------------------------

                 Key: PIG-1197
                 URL: https://issues.apache.org/jira/browse/PIG-1197
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.6.0
            Reporter: Alan Gates
            Assignee: Alan Gates
            Priority: Minor


In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802920#action_12802920 ] 

Alan Gates commented on PIG-1197:
---------------------------------

It's already been rewritten for that branch.  I'll check with Pradeep on whether he wants to check this patch in (which will make his merges harder) or just leave it here as a patch for anyone who wants to use it, since hopefully by 0.7 we'll have PIG-966 checked in and this isn't going into 0.6.

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1197:
----------------------------

    Attachment: PIG-1197.patch

Patch that changes TextLoader to use LineRecordReader.  No unit tests are included because there are already unit tests for TextLoader in TestBuiltin.

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1197:
----------------------------

    Fix Version/s: 0.7.0
           Status: Patch Available  (was: Open)

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1197:
----------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 0.7.0)
                   0.6.0
           Status: Resolved  (was: Patch Available)

Patch checked into 0.6 branch.

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802756#action_12802756 ] 

Hadoop QA commented on PIG-1197:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12430812/PIG-1197.patch
  against trunk revision 901021.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/185/console

This message is automatically generated.

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802562#action_12802562 ] 

Dmitriy V. Ryaboy commented on PIG-1197:
----------------------------------------

+1 looks good.

Does it need to be changed separately for the load-store redesign branch?

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802529#action_12802529 ] 

Alan Gates commented on PIG-1197:
---------------------------------

Initial quick performance tests showed a 25% improvement in performance from the patch.

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802924#action_12802924 ] 

Pradeep Kamath commented on PIG-1197:
-------------------------------------

Alan is right - TextLoader on the load-store-redesign branch already uses TextInputFormat (and hence LineReader) - do committers feel this patch is important enough that it should be committed to trunk? Otherwise I would vote in favor of just keeping it a patch as Alan suggested for people to use since TextLoader probably is not a frequently used Loader (am guessing).

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803083#action_12803083 ] 

Alan Gates commented on PIG-1197:
---------------------------------

I'm ok with putting it in 0.6, as it is very localized and it is a significant performance boost.  If I don't hear any complaints over the next couple of days I'll check it in.

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1197) TextLoader should be updated to match changes to PigStorage

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802939#action_12802939 ] 

Dmitriy V. Ryaboy commented on PIG-1197:
----------------------------------------

I know you guys feel strongly about not adding anything but bug-fixes into 0.6 at this point, but I would love for this to make it in. It's a huge performance boost, and people use TextLoader a lot.

Agreed that it doesn't really need to go into 0.7 if we are hoping to get 966 completed for that release. 

> TextLoader should be updated to match changes to PigStorage
> -----------------------------------------------------------
>
>                 Key: PIG-1197
>                 URL: https://issues.apache.org/jira/browse/PIG-1197
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.6.0
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Minor
>             Fix For: 0.7.0
>
>         Attachments: PIG-1197.patch
>
>
> In 0.6 PigStorage was changed to use LineRecordReader to parse lines out of its stream instead of doing the parsing itself.  This resulted in about a 30% speed up in parsing time.  TextLoader should be changed to use LineRecordReader in the same way to benefit from the same speed up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.