You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/08/28 03:00:46 UTC

[jira] Created: (PIG-402) order by on single field with user defined comparator fails

order by on single field with user defined comparator fails
-----------------------------------------------------------

                 Key: PIG-402
                 URL: https://issues.apache.org/jira/browse/PIG-402
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Olga Natkovich
             Fix For: types_branch


register udf.jar;
a = load 'data';
c = order a by $0 using MyOrderUDF();
store c into 'out',

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-402) order by on single field with user defined comparator fails

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-402:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

patch committed. shravan, thanks!

> order by on single field with user defined comparator fails
> -----------------------------------------------------------
>
>                 Key: PIG-402
>                 URL: https://issues.apache.org/jira/browse/PIG-402
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>         Attachments: 402.patch
>
>
> register udf.jar;
> a = load 'data';
> c = order a by $0 using MyOrderUDF();
> store c into 'out',

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-402) order by on single field with user defined comparator fails

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629578#action_12629578 ] 

Alan Gates commented on PIG-402:
--------------------------------

I think that's perfectly reasonable.  There is a performance penalty for wrapping in a tuple.  But user defined comparators are expected to be the exception, especially now that we provide numeric and descending order sort (the only two reasons we added the user defined comparators to begin with).

> order by on single field with user defined comparator fails
> -----------------------------------------------------------
>
>                 Key: PIG-402
>                 URL: https://issues.apache.org/jira/browse/PIG-402
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>
> register udf.jar;
> a = load 'data';
> c = order a by $0 using MyOrderUDF();
> store c into 'out',

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-402) order by on single field with user defined comparator fails

Posted by "Shravan Matthur Narayanamurthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629574#action_12629574 ] 

Shravan Matthur Narayanamurthy commented on PIG-402:
----------------------------------------------------

This is what I have understood - Comparators in Hadoop assume that you know the type of key(keyClass) beforehand and do not let you configure the type dynamically. So I feel, the only way out for us is to wrap the key inside a tuple whenever, a User Defined Comparison Func is used.

If any of you have other suggestions, please comment.

> order by on single field with user defined comparator fails
> -----------------------------------------------------------
>
>                 Key: PIG-402
>                 URL: https://issues.apache.org/jira/browse/PIG-402
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>
> register udf.jar;
> a = load 'data';
> c = order a by $0 using MyOrderUDF();
> store c into 'out',

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-402) order by on single field with user defined comparator fails

Posted by "Shravan Matthur Narayanamurthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shravan Matthur Narayanamurthy updated PIG-402:
-----------------------------------------------

    Attachment: 402.patch

> order by on single field with user defined comparator fails
> -----------------------------------------------------------
>
>                 Key: PIG-402
>                 URL: https://issues.apache.org/jira/browse/PIG-402
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>         Attachments: 402.patch
>
>
> register udf.jar;
> a = load 'data';
> c = order a by $0 using MyOrderUDF();
> store c into 'out',

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-402) order by on single field with user defined comparator fails

Posted by "Shravan Matthur Narayanamurthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shravan Matthur Narayanamurthy updated PIG-402:
-----------------------------------------------

    Status: Patch Available  (was: Open)

The solution I am providing is to wrap the key inside a tuple whenever a user defined comparison func is used. For that I have the following in the patch.

1) Created a new Mapper class MapWithComparator in PigMapReduce which will be used whenever a u.d. comparator is used. The assumuption is that keyType and keyClass will be appropriately set to Tuple and the collect here wraps the key in a Tuple. This was done to avoid an if branch in the earlier Mapper class.

2) JobControlCompiler: To meet the assumptions in 1 above, the changes to job control compiler ensures consistency

3) PigMapBase: Incidental. Introduced a tuple factory instance into the base class.

4) TestEvalPipeline: Added a new unit test to test Sort with UDF.

> order by on single field with user defined comparator fails
> -----------------------------------------------------------
>
>                 Key: PIG-402
>                 URL: https://issues.apache.org/jira/browse/PIG-402
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>         Attachments: 402.patch
>
>
> register udf.jar;
> a = load 'data';
> c = order a by $0 using MyOrderUDF();
> store c into 'out',

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (PIG-402) order by on single field with user defined comparator fails

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich reassigned PIG-402:
----------------------------------

    Assignee: Shravan Matthur Narayanamurthy

> order by on single field with user defined comparator fails
> -----------------------------------------------------------
>
>                 Key: PIG-402
>                 URL: https://issues.apache.org/jira/browse/PIG-402
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>
> register udf.jar;
> a = load 'data';
> c = order a by $0 using MyOrderUDF();
> store c into 'out',

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.