You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Sriranjan Manjunath (JIRA)" <ji...@apache.org> on 2009/10/12 20:59:31 UTC

[jira] Created: (PIG-1017) Converts strings to text in Pig

Converts strings to text in Pig
-------------------------------

                 Key: PIG-1017
                 URL: https://issues.apache.org/jira/browse/PIG-1017
             Project: Pig
          Issue Type: Improvement
            Reporter: Sriranjan Manjunath


Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765381#action_12765381 ] 

Sriranjan Manjunath commented on PIG-1017:
------------------------------------------

Something fishy is going on. I ran L6 a couple more times with the modified code and it completed in 1:8

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1017:
--------------------------------

    Assignee: Thejas M Nair  (was: Sriranjan Manjunath)

We need to decide if this is something we should do for 0.9

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Thejas M Nair
>             Fix For: 0.9.0
>
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1017:
--------------------------------

    Fix Version/s: 0.9.0

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>             Fix For: 0.9.0
>
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriranjan Manjunath updated PIG-1017:
-------------------------------------

    Attachment: stotext.patch

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767952#action_12767952 ] 

Sriranjan Manjunath commented on PIG-1017:
------------------------------------------

The release audit warnings are related to html files.

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765380#action_12765380 ] 

Sriranjan Manjunath commented on PIG-1017:
------------------------------------------

Pigmix results before and after converting strings to text:

||Pigmix query||Trunk||Modified code||
|L1| 3:2|2:24|
|L2| 2:6|1:23|
|L3| 3:36|3:49|
|L4| 1:42|1:49|
|L5| 1:49|1:49|
|L6| 1:47|3:3|
|L7| 1:44|1:49|
|L8| 1:19|1:18|
|L9| 4:6|5:35|
|L10| 8:52|7:56|
|L11| 2:26|1:34|
|L12| 1:57|1:54|


> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriranjan Manjunath updated PIG-1017:
-------------------------------------

    Status: Open  (was: Patch Available)

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriranjan Manjunath updated PIG-1017:
-------------------------------------

    Attachment: stotext.patch

The patch will fail MRCompiler and LogToPhyTransalator unit tests since we need to replace the golden files. The rest should pass.

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1017) Converts strings to text in Pig

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12769599#action_12769599 ] 

Hadoop QA commented on PIG-1017:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12423086/stotext.patch
  against trunk revision 829246.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 130 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 330 release audit warnings (more than the trunk's current 310 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/113/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/113/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/113/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/113/console

This message is automatically generated.

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1017) Converts strings to text in Pig

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909251#action_12909251 ] 

Alan Gates commented on PIG-1017:
---------------------------------

Are we really going to do this?  I doubt it now, as the backward incompatibility cost would be so high.  At the very least I don't think we'll do it for 0.9.

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>             Fix For: 0.9.0
>
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriranjan Manjunath updated PIG-1017:
-------------------------------------

    Status: Patch Available  (was: Open)

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1017) Converts strings to text in Pig

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765598#action_12765598 ] 

Olga Natkovich commented on PIG-1017:
-------------------------------------

L9 is still slower - any idea why?

Also, before we can commit these changes, we need to make sure that all piggybank functions are converted if they use string.

We also need to provide update for the UDF manual changes.

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1017) Converts strings to text in Pig

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767154#action_12767154 ] 

Hadoop QA commented on PIG-1017:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12422406/stotext.patch
  against trunk revision 826110.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 114 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    -1 release audit.  The applied patch generated 326 release audit warnings (more than the trunk's current 306 warnings).

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/98/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/98/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/98/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/98/console

This message is automatically generated.

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriranjan Manjunath updated PIG-1017:
-------------------------------------

    Status: Patch Available  (was: Open)

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriranjan Manjunath updated PIG-1017:
-------------------------------------

    Attachment:     (was: stotext.patch)

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-1017) Converts strings to text in Pig

Posted by "Sriranjan Manjunath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sriranjan Manjunath reassigned PIG-1017:
----------------------------------------

    Assignee: Sriranjan Manjunath

> Converts strings to text in Pig
> -------------------------------
>
>                 Key: PIG-1017
>                 URL: https://issues.apache.org/jira/browse/PIG-1017
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Sriranjan Manjunath
>            Assignee: Sriranjan Manjunath
>         Attachments: stotext.patch
>
>
> Strings in Java are UTF-16 and takes 2 bytes. Text (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.