You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Earl Cahill (JIRA)" <ji...@apache.org> on 2008/10/08 09:53:44 UTC

[jira] Created: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
---------------------------------------------------------------------------------------------------------------------------

                 Key: PIG-476
                 URL: https://issues.apache.org/jira/browse/PIG-476
             Project: Pig
          Issue Type: New Feature
            Reporter: Earl Cahill


Want to be able to do something like

A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");

to extract the year, or if your date is formatted as

dd/MMM/yyyy:HH:mm:ss Z

you could do something like

A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");

to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-476:
----------------------------

    Attachment:     (was: DateExtractor-PIG-476)

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-476:
----------------------------

    Attachment: DateExtractor-PIG-476

Guess I am betting on there generally being one date format in a given log, so I am switching to the speedier version as Alan described

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-476:
----------------------------

    Attachment: DateExtractor-PIG-476

patch contains

org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestDateExtractor


> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-476:
----------------------------

    Attachment:     (was: DateExtractor-PIG-476)

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638594#action_12638594 ] 

Alan Gates commented on PIG-476:
--------------------------------

Pig Latin has a way that you can define constructors for a UDF:

{code}
define MyDateExtractor org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor("MM-dd-yyyy");
...
A = FOREACH raw GENERATE DateExtractor(dayTime);
{code}

Whatever you pass as an argument in the define method is passed to the constructor of the UDF.  In your case, this would allow you to pass the date format up front, parse it once, and avoid parsing it on every tuple passed to the UDF.  This should give you a significant performance boost.  The downside of this is that if you want to use the same UDF with different date formats in the same query you'd have to alias it different ways.  

It's up to you whether to choose flexibility or performance here.

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639086#action_12639086 ] 

Alan Gates commented on PIG-476:
--------------------------------

A couple of comments:

You commented the constructor arguments in the class level comments, but not the function level.  

In exec, if the incomingDateFormat doesn't parse you call printStackTrace and keep going.  I'm not sure what you want there.  In 2.0 you'll want to return a null.  But in 1.x you need to either choose to return an empty DataAtom (probably what you want) or throw an error (probably not what you want because it will stop all processing on the job).  But you definitely don't want to spew the stack trace every time this happens.

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-476:
----------------------------

    Attachment: DateExtractor-PIG-476

I moved some javadoc down, and now on parsing failures, we just return, which results in an empty string in out.  I also added a test case that tests making it past a failed parse.

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Earl Cahill (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Earl Cahill updated PIG-476:
----------------------------

    Status: Patch Available  (was: Open)

patch attached

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-476) given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-476:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch checked in.  I took the liberty of adding a System.err.println at line 88 of DateExtractor.java so that if the incomingDateFormat didn't parse an error message would be emitted.  Swallowing exceptions is bad because users never know what went wrong.  Thanks Earl.

> given a date that can match a SimpleDateFormat want to be able to extract arbitrary SimpleDateFormat data, like day or year
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-476
>                 URL: https://issues.apache.org/jira/browse/PIG-476
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Earl Cahill
>         Attachments: DateExtractor-PIG-476
>
>
> Want to be able to do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "yyyy", "dd/MMM/yyyy:HH:mm:ss");
> to extract the year, or if your date is formatted as
> dd/MMM/yyyy:HH:mm:ss Z
> you could do something like
> A = FOREACH raw GENERATE org.apache.pig.piggybank.evaluation.util.apachelogparser.DateExtractor(dayTime, "MM-dd-yyyy");
> to grab out the day

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.