You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Caleb Holtzinger (JIRA)" <ji...@apache.org> on 2016/05/09 23:31:13 UTC

[jira] [Commented] (PIG-3268) Case statement support

    [ https://issues.apache.org/jira/browse/PIG-3268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15277318#comment-15277318 ] 

Caleb Holtzinger commented on PIG-3268:
---------------------------------------

Hi Chelsoo,

It doesn't look like MATCHES is supported by your syntax. For instance, the following does not work:
{code}CASE e1
    WHEN 'a'            THEN 'alpha'
    WHEN MATCHES 'b.*'  THEN 'alpha'
    WHEN 'c'            THEN 'alpha'
    WHEN 'd'            THEN 'alpha'
    ELSE 'numeric'
END
{code}

whereas this works:
{code}CASE 
    WHEN e1 == 'a'         THEN 'alpha'
    WHEN e1 MATCHES 'b.*'  THEN 'alpha'
    WHEN e1 == 'c'         THEN 'alpha'
    WHEN e1 == 'd'         THEN 'alpha'
    ELSE 'numeric'
END
{code}

> Case statement support
> ----------------------
>
>                 Key: PIG-3268
>                 URL: https://issues.apache.org/jira/browse/PIG-3268
>             Project: Pig
>          Issue Type: New Feature
>          Components: internal-udfs, parser
>    Affects Versions: 0.11
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.12.0
>
>         Attachments: PIG-3268-2.patch, PIG-3268-3.patch, PIG-3268-4.patch, PIG-3268-5.patch, PIG-3268-6.patch, PIG-3268-7.patch, PIG-3268.patch
>
>
> Currently, Pig has no support for case statement. To mimic it, users often use nested bincond operators. However, that easily becomes unreadable when there are multiple levels of nesting.
> For example,
> {code}
> a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
> b = FOREACH a GENERATE (
>     i % 3 == 0 ? '3n' : (i % 3 == 1 ? '3n + 1' : '3n + 2')
> );
> {code}
> This can be re-written much more nicely using case statement as follows:
> {code}
> a = LOAD '1.txt' USING PigStorage(',') AS (i:int);
> b = FOREACH a GENERATE (
>     CASE i % 3
>         WHEN 0 THEN '3n'
>         WHEN 1 THEN '3n + 1'
>         ELSE        '3n + 2'
>     END
> );
> {code}
> I propose that we implement case statement in the following manner:
> * Add built-in UDFs that take expressions as args. Take for example the aforementioned case statement, we can define a UDF such as {{builtInUdf(i % 3, 0, '3n', 1, '3n + 1', '3n + 2')}}.
> * Add syntactical sugar for these built-in UDFs.
> In fact, I burrowed this idea from HIVE-164. 
> One downside of this approach is that all the possible args schemas of these UDFs must be pre-computed. Specifically, we need to populate the full list of possible args schemas in {{EvalFunc.getArgToFuncMapping}}.
> In particular, since we obviously cannot support infinitely long args, it is necessary to impose a limit on the size of when branches. For now, I arbitrarily chose 50, but it can be easily changed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)