You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org> on 2008/04/10 12:06:05 UTC

[jira] Created: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

ComparatorFunc provided to ORDER clause is not always honoured
--------------------------------------------------------------

                 Key: PIG-202
                 URL: https://issues.apache.org/jira/browse/PIG-202
             Project: Pig
          Issue Type: Bug
            Reporter: Mathieu Poumeyrol


Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.

Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: Sort.patch

As requested, a all-in-one patch (Sort.patch) that:
 - call instantiateFunc on PO before the actual execution (fix using clause in local context)
 - discard the only one "late" comparator instantiation I could found (made redundant, dead code)
 - correct a marginal biais in the findQuantile builtin function (one of the two extremum quantile was bigger or smaller depending on truncation)
 - fix quantile job.

The quantile job issue is tricky. It is not easy to show how it misbehaves with a pig unit test, as the result is correct... FindQuantiles is responsible for defining a partition of the intermediary keyspace. Hadoop uses this partition through a SortPartitioner instance to split the reduce half of the Sort job among several reduce tasks. Now the FindQuartiles were using a StarSpec as a comparator, whereas SortPartitioner were using the UDF comparator to perform a Arrays.binarySearch. The binary search can not work correctly in these conditions, and this leads to widely unbalanced reduce tasks as most of the keys fall in the same partition. 

"Prooving" this point actualy required counting how many items go to which partition in SortPartitioner (some printf-like debugging). But honestly, I think the patch just makes a lot of sense.

The fix just provides the UDF compartor to the sort used internaly by the findQuartile job.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, Sort.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594039#action_12594039 ] 

Pi Song commented on PIG-202:
-----------------------------

+1 on Sort.v2.patch. This fixes an obvious bug in quantile bucketing. I would be good to see the benchmark but not necessary.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: Sort.v2.patch

Well, well, well...

New delivery of the all-in-one sort patch (Sort.v2.patch). There was still a glitch in my findQuantileJob version, but I think I got it right this time (at least as much as I did before...).

By the way, I'm starting to worry about the absence of attention or interest from the commiters on this issue. Am I struggling with some piece of code about to be superseded by physical and/or logical layers refactoring ? Or is it the fact that efficient sort operations are critical in what i'm trying to do with pig that make me perceive this as more critical than it is ?

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592555#action_12592555 ] 

Mathieu Poumeyrol commented on PIG-202:
---------------------------------------

Thanks for having confirmed I was not wasting my time.

1. This is not what I'm trying to do. The current implementation, when asked for 9 quantiles among 100 elements (0..99) returns this:
(all, {(0), (12), (24), (36), (48), (60), (72), (84), (96)})
This does not lead to a good partition. The first part is empty or so, the last part is smaller than the "central" parts.

Sort.patch and Sort.v2.patch change FindQuantiles to make it return:
(all, {(11), (21), (31), (41), (51), (61), (71), (81), (91)})

It looks better. Actualy, it looks even nicer with a <= instead of < in the big if in the loop...
(all, {(10), (20), (30), (40), (50), (60), (70), (80), (90)})

The impact on sort performance of this fix in FindQuantiles is probably marginal. it just avoids some empty or smaller reduce jobs.  But it gives better quantiles to a end user trying to use the function.

2. I will try to run and time my test job over the weekend. The performance killer was not the small glitch in FindQuantiles, but the fact that the SortPartitioner's and the quantiles' comparator were not consistent. I'll try to give you some figures.

3. I will also generate a Sort.v3.patch (with the <= in FindQuantiles) using svn diff as eclipse tends to generates ugly patches with absolute paths.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Status: Patch Available  (was: Open)

Please review InstantiateFunc patch. TestOrderBy may be worth a look too. The rest is junk.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: InstantiateFunc.patch

Patch 4 introduces a POVisitor that call instantiateFunc on all EvalSpec of PO nodes. It is called by the LogicalExecutionEngine just before execution.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Issue Comment Edited: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12594039#action_12594039 ] 

pi_song edited comment on PIG-202 at 5/3/08 4:56 PM:
-----------------------------------------------------

+1 on Sort.v2.patch. This fixes an obvious bug in quantile bucketing. It would be good to see the benchmark but not necessary.

      was (Author: pi_song):
    +1 on Sort.v2.patch. This fixes an obvious bug in quantile bucketing. I would be good to see the benchmark but not necessary.
  
> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-202:
---------------------------

       Resolution: Fixed
    Fix Version/s: 0.1.0
           Status: Resolved  (was: Patch Available)

Fixed checked in at revision 654172.  I made one small change, which was to change the name of InsantiateFuncCallerPOVisitor to InstantiateFuncCallerPOVisitor.  Thanks Mathieu for your work on this.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>             Fix For: 0.1.0
>
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: quantiles.pig
                quantiles.in

The test files I used to generates the quantiles in my previous comment.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: MapreducePlanCompiler.patch

MapreduceCompiler patch puts the same compartor in the quantile job than in the sort job. I just feel like it makes more sense, but I might be totally wrong here.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, MapreducePlanCompiler.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: TestOderBy.patch

This patch makes testOrderBy test both LOCAL and MAPREDUCE mode.

Local attempt on testTopLevelOrderBy_*_Using are failing.



> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12590767#action_12590767 ] 

Pi Song commented on PIG-202:
-----------------------------

Mathieu,

Ensuring all the specs instantiating what needed before use using a visitor pattern is a good idea but just to make the code clean we should remove code where those stuffs were previously instantiated as well.

For the MapreducePlanCompiler thing, you've got very good eyes. Now I also tempted to add that comparator instantiation. Can you prove that at the moment the quantilejob doesn't really sort based on the given function? Possibly a small unit test would be enough.

In regard to the test, currently we only test either MapReduce or Local at a time. That's why a bug such like this still occurs. In the mean time, I think you should send an email to the mailing list so we can find a global solution for all the tests together.

PS. Could you please combine all the needed patches? Too many fragments will make it difficult for our already busy committers. 

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592796#action_12592796 ] 

Pi Song commented on PIG-202:
-----------------------------

1) The first quantile always being zero is something really has to be fixed.
2) The <= to < thing also helps a (very tiny) bit more to make the first bucket the same size as others. 

Theoretically, there should be some small performance gain due to (1)

I'm looking forward to seeing the result!!!

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mathieu Poumeyrol updated PIG-202:
----------------------------------

    Attachment: EvalSpec.patch

EvalSpec.patch is fix a first issue where comparatorFunction is not created and default implementation is used where it should not.

But I'm struggling with the fact that in local mode, the comparator is given the whole tuple instead of the relevant datum ($2 for instance) for the tuple.

I guess I won't be able to do much more by myself on the localmode issue.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12592518#action_12592518 ] 

Alan Gates commented on PIG-202:
--------------------------------

A couple of comments/questions:

# I don't understand your changes to FindQuantiles.  If I read the code correctly, its now taking the first x values out of the bag, instead of sampling at regular intervals from the whole bag.  This has the advantage of not needing to read the whole bag (though the code isn't taking advantage of that), but it will give a much worse sample, at least if the bag is ordered.  Am I missing something?  Should it be that we take the first n values and then break if the bag is unordered, and every 1/n values if the bag is ordered?
# Have you done any performance testing to get an idea of the speed up this gives?  Obviously that will depend on the data set, but it would be interesting to see.

FWIW, I haven't been ignoring your work on this.  It seemed you were making good progress and getting feedback from Pi, so I hadn't jumped in yet.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Pi Song (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12589519#action_12589519 ] 

Pi Song commented on PIG-202:
-----------------------------

Mathieu,

Welcome in Pig community. I appreciate your attempt to help us. My real work is killing me and  seems other people are busy too. Please don't give up. We will have a look at your patches as soon as possible (Latest will be on the weekend).

Cheers,
Pi 

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588647#action_12588647 ] 

Mathieu Poumeyrol commented on PIG-202:
---------------------------------------

After struggling a bit with the local mode evaluator, I found the read issue. The code I was looking for was actualy in instantiateFunc() method, and my previous patch (1) a useless and incomplete duplication of this code.

Instead I tried to make sure instantiateFunc() is called on all EvalSpec nodes prior to execution attempts. I now run a POVisitor in LocalExecutionEngine.execute() before pp.open(). It looks much better as the 16 tests from testOrderBy are now working in both local and mapreduce mode.

The patch is not ready yet (the visitor only visit what my tests needs it to visit) and there is still come cleanup and commenting to do.

Any comment welcome.

By the way, the MapReducePlanCompiler patch is a piece of junk, but I'm still convinced there is a severe performance issue here.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, MapreducePlanCompiler.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-202) ComparatorFunc provided to ORDER clause is not always honoured

Posted by "Mathieu Poumeyrol (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12593042#action_12593042 ] 

Mathieu Poumeyrol commented on PIG-202:
---------------------------------------

Just a short update to say I'm alive, but still crunching numbers on my cluster. And trying to make sense of what is hapenning : it looks like I never get balanced partitions, whether or not I specify a UDF comparator, with or without my patches...

Anyway, still digging.

> ComparatorFunc provided to ORDER clause is not always honoured
> --------------------------------------------------------------
>
>                 Key: PIG-202
>                 URL: https://issues.apache.org/jira/browse/PIG-202
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Mathieu Poumeyrol
>         Attachments: EvalSpec.patch, InstantiateFunc.patch, MapreducePlanCompiler.patch, quantiles.in, quantiles.pig, Sort.patch, Sort.v2.patch, TestOderBy.patch
>
>
> Specifying a comparator function is acknowledge neither by local implementation, nor by quartile lookup job.
> Patch coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.