You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/09/21 09:04:08 UTC

[jira] [Created] (PIG-2296) case where optimizer causes incorrect filtering

case where optimizer causes incorrect filtering
-----------------------------------------------

                 Key: PIG-2296
                 URL: https://issues.apache.org/jira/browse/PIG-2296
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.8.1
            Reporter: Todd Lipcon
            Priority: Critical


I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2296) case where optimizer causes incorrect filtering

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109318#comment-13109318 ] 

Todd Lipcon commented on PIG-2296:
----------------------------------

looks like the faulty optimization is {{FilterLogicExpressionSimplifier}} - running with that disabled generates the correct result.

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: er.head
>
>
> I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2296) case where optimizer causes incorrect filtering

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109309#comment-13109309 ] 

Todd Lipcon commented on PIG-2296:
----------------------------------

The script in question is:
{code}
er = LOAD 'er' AS (en : chararray, es : chararray);
tokenized = FOREACH er GENERATE TOKENIZE(en) AS en, TOKENIZE(es) AS es;
pairs = FOREACH tokenized GENERATE FLATTEN(en) AS en_word, FLATTEN(es) AS es_word;
pairs_long = FILTER pairs BY (SIZE(en_word) > 4) AND (SIZE(es_word) > 4);
{code}
After this, pairs_long contains pairs where es_word has length <= 4. Running with "-t All" to disable the optimizer has correct results. An example line of data is:
{code}
it was a bright cold day in April , and the clocks were striking thirteen . Winston Smith , his chin nuzzled into his breast in an effort to escape the vile wind , slipped quickly through the glass doors of Victory Mansions , though not quickly enough to prevent a swirl of gritty dust from entering along with him .    intr - o zi senina si friguroasa de aprilie , pe cind ceasurile bateau ora treisprezece , Winston Smith , cu barbia infundata in piept pentru a scapa de vintul care - l lua pe sus , se strecura iute prin usile de sticla ale Blocului Victoria , desi nu destul de repede pentru a impiedica un virtej de praf si nisip sa patrunda o data cu el . 
{code}

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>
> I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2296) case where optimizer causes incorrect filtering

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109319#comment-13109319 ] 

Dmitriy V. Ryaboy commented on PIG-2296:
----------------------------------------

I am still unable to reproduce.
Can you try the same svn revision as me, sans any cloudera patches?

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: er.head
>
>
> I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2296) case where optimizer causes incorrect filtering

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109320#comment-13109320 ] 

Dmitriy V. Ryaboy commented on PIG-2296:
----------------------------------------

Looking at the JIRAs this seems exactly like PIG-2067

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: er.head
>
>
> I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2296) case where optimizer causes incorrect filtering

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109316#comment-13109316 ] 

Dmitriy V. Ryaboy commented on PIG-2296:
----------------------------------------

Unable to reproduce with this input so far. Using revision 1166476 of branch-0.8

Todd, please provide examples of bad output and attach your input and script as files, in case there's some character encoding issue being masked by Jira.

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: er.head
>
>
> I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2296) case where optimizer causes incorrect filtering

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated PIG-2296:
-----------------------------

    Attachment: er.head

Attaching sample data. The following script:
{code}
set mapred.max.split.size 20000;
set pig.maxCombinedSplitSize 200000;

er = LOAD '/tmp/er.head' AS (en : chararray, er : chararray);
tokenized = FOREACH er GENERATE TOKENIZE(en) AS en, TOKENIZE(er) AS er;
pairs = FOREACH tokenized GENERATE FLATTEN(en) AS en_word, FLATTEN(er) AS er_word;
pairs_long = FILTER pairs BY (SIZE(en_word) > 4) AND (SIZE(er_word) > 4);

pairs_l = LIMIT pairs_long 10;
DUMP pairs_l;
{code}
generates output like:
{code}
(bright,-)
(bright,o)
(bright,de)
(bright,pe)
...
{code}

whereas with {{-t All}} it generates:
{code}
(bright,Smith)
(bright,barbia)
(bright,bateau)
(bright,senina)
(bright,Winston)
(bright,aprilie)
...
{code}

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: er.head
>
>
> I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (PIG-2296) case where optimizer causes incorrect filtering

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon resolved PIG-2296.
------------------------------

    Resolution: Duplicate

yep, I can verify that this problem is fixed in 0.9.0. The JIRA that Dmitriy referenced was incorrectly marked as resolved in 0.8.1, since it was committed post-release.

> case where optimizer causes incorrect filtering
> -----------------------------------------------
>
>                 Key: PIG-2296
>                 URL: https://issues.apache.org/jira/browse/PIG-2296
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: er.head
>
>
> I have a script which reproducibly generates incorrect filter results on Pig 0.8.1. Haven't tried to reproduce on 0.9 or trunk.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira