You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (JIRA)" <ji...@apache.org> on 2012/10/29 19:08:13 UTC

[jira] [Created] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Jonathan Coveney created PIG-3014:
-------------------------------------

             Summary: CurrentTime() UDF has undesirable characteristics
                 Key: PIG-3014
                 URL: https://issues.apache.org/jira/browse/PIG-3014
             Project: Pig
          Issue Type: Bug
            Reporter: Jonathan Coveney


As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-3014:
-------------------------------

    Attachment: PIG-3014-3.patch

Attached a patch that fixes {{TestBuiltin}}.

The CurrentTime() must get called only in the back-end because it reads the value of "pig.job.submitted.timestamp" out of JobConf. But the unit test case was calling it in the front-end, resulting in a NullPointerException.

Since the test case is not valid, I simply removed it.

Thanks!
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494789#comment-13494789 ] 

Cheolsoo Park commented on PIG-3014:
------------------------------------

Hi Jonathan,

I agree with using the job start time. That sounds reasonable to me.

But I have two comments regarding your patch:
- The Apache license header shouldn't be removed in {{src/org/apache/pig/builtin/CurrentTime.java}}.
- After applying your patch, {{src/org/apache/pig/builtin/CurrentTime.java}} looks like this. Can you please fix indentations?
{code}
/**
     * This is a default constructor for Pig reflection purposes. It should
     * never actually be used.
 */
    public CurrentTime() {}

    @Override
    public DateTime exec(Tuple input) throws IOException {
        if (!isInitialized) {
            String dateTimeValue = UDFContext.getUDFContext().getJobConf().get("pig.job.submitted.timestamp");
            if (dateTimeValue == null) {
                throw new ExecException("pig.job.submitted.timestamp was not set!");
    }   
            dateTime = new DateTime(Long.parseLong(dateTimeValue));
            isInitialized  = true;
    }   
        return dateTime;
    }
{code}

Thanks!
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499933#comment-13499933 ] 

Cheolsoo Park commented on PIG-3014:
------------------------------------

Here is how I ended up fixing the test with hadoop-2.0.x:

from
{code:title=MapReduceLauncher.java}
for (Job job : jc.getWaitingJobs()) {
    job.getJobConf().set("pig.script.submitted.timestamp", Long.toString(scriptSubmittedTimestamp));
    job.getJobConf().set("pig.job.submitted.timestamp", Long.toString(System.currentTimeMillis()));
}
{code}
to
{code:title=MapReduceLauncher.java}
for (Job job : jc.getWaitingJobs()) {
    JobConf jobConfCopy = job.getJobConf();
    jobConfCopy.set("pig.script.submitted.timestamp", Long.toString(scriptSubmittedTimestamp));
    jobConfCopy.set("pig.job.submitted.timestamp", Long.toString(System.currentTimeMillis()));
    job.setJobConf(jobConfCopy);
}
{code}
Apparently, {{job.getJobConf()}} returns a different JobConf object each time, so properties that are set by {{job.getJobConf().set()}} do not last at all.

This is quite surprising to me because this means that there are many other properties that are not properly set with hadoop-2.0.x now. I will open another jira to get this issue fixed.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Zhijie Shen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491270#comment-13491270 ] 

Zhijie Shen commented on PIG-3014:
----------------------------------

Hi Jonathan,

thanks for correcting my error. According to you patch, I've also some comments.

As far as I know, getArgToFuncMapping() is called when generating the logic plan, while exec() is called at runtime. Hence the DateTime object generated in getArgToFuncMapping() reflects the timestamp when the pig latin statements are parsed. If there's several statements containing CurrentTime(), the timestamps will be similar. Please correct me if I'm wrong.

For example,

"
A = load 'justSomeRows' using mock.Storage();
B = foreach A generate *, CurrentTime();
......
C = foreach B generate *, CurrentTime();
"

In this case, there're a bunch of statements between B and C. In B, we want to get the timestamp before executing the statements, while in C, want to get the timestamp after executing them. The difference between the two timestamps should reflect the runtime interval instead of the interval between parsing two CurrentTime() UDFs.

I think the more accurate behavior of CurrentTime() should be generating a unique timestamp for a statement when it is executed.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507725#comment-13507725 ] 

Rohini Palaniswamy commented on PIG-3014:
-----------------------------------------

bq. Since the test case is not valid, I simply removed it.

+1. TestCurrentTime covers CurrentTime udf adequately.

Just an observation though. All builtin udf tests are in TestBuiltin, but CurrentTime alone has a separate test class with just one test. Should we move that to TestBuiltin?
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-3014:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed to trunk.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507825#comment-13507825 ] 

Rohini Palaniswamy commented on PIG-3014:
-----------------------------------------

Ah. I had forgotten about that question. Agree with Julien.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503931#comment-13503931 ] 

Rohini Palaniswamy commented on PIG-3014:
-----------------------------------------

bq. I don't see a clone happening in mapreduce code. Were you able to get to the root cause of the behaviour?
   I was looking at the wrong version of hadoop code. H23 indeed returns a copy of the JobConf for jobcontrol.Job. Checked that we don't do a set in other places. Even the submitted.timestamp that was set was not being used in code elsewhere before this case. May be it was just set for debugging purposes. So we should be good with 0.9 and 0.10 for h23. 

{code}
public synchronized JobConf getJobConf() {
    return new JobConf(super.getJob().getConfiguration());
  }
{code} 

 My +1 as well. 
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491288#comment-13491288 ] 

Jonathan Coveney commented on PIG-3014:
---------------------------------------

Zhijie,

I think that the semantics in my patch are sufficient. You are correct that in some cases, they might be "closer together" than we might want, but what does that even mean? The semantics are not well specified. What if the optimizer in fact put C before B? What if the optimizer had them run at the same time? What if my cluster happens to be tuned to a certain workload...and so on and so on. I think as long as "now" is defined as "after the script runs," and as long as it is the same for every value in a given relation that uses it, that's the only guarantee that we can make. We can document this limitation (i.e. that "now" is a more or less arbitrary value in between the beginning of your script and when it is finished being parsed).

I suppose there would be some utility in a CurrentTime() where the time is with respect to the beginning of execution, but it could easily suffer from the same issue if it was in a foreach with a really time consuming value, where the "now" value quickly becomes stale. I think the incremental gain is minimal, and the incremental complexity is quite high. If you deeply disagree, though, we can discuss how to do it. I think the following would work: per each instantiation of the UDF, we create two unique files and put them in HDFS (I do not think the distributed cache will work in this specific case, but it may). Those files will be the constructor argument. On first execution, each mapper tries to delete the file. Since delete is atomic, only one should succeed. This is the leader. It will record the current time and serialize it to the second file. We would have to coordinate atomicity...perhaps it could write a magic value at the end of the serialized date time, so all of the mappers would read the file until they read the magic number, and then they'd know it was done.

This would be pretty complicated for what I see as a minimal gain, but it would probably be a "more correct" now() implementation. I do not know if Hadoop has a more convenient coordination mechanism between mappers (this sort of goes against the whole point).

I welcome more thoughts
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3014:
----------------------------------

    Attachment: PIG-3014-0.patch

I've attached a fix and a couple light tests. Note that I uncovered PIG-3032 while developing this, though this isn't affected by that bug.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Julien Le Dem reopened PIG-3014:
--------------------------------


I see a failing test:
org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString

java.lang.NullPointerException
	at org.apache.pig.builtin.CurrentTime.exec(CurrentTime.java:41)
	at org.apache.pig.test.TestBuiltin.testConversionBetweenDateTimeAndString(TestBuiltin.java:450)

                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507731#comment-13507731 ] 

Cheolsoo Park commented on PIG-3014:
------------------------------------

Thanks Rohini.

In fact, I asked that question on the dev mailing list a while ago:
http://search-hadoop.com/m/OVyoR1Ktpcy/Adding+new+test+cases+to+TestBuiltin.java&subj=Adding+new+test+cases+to+TestBuiltin+java

Julien said that each built-in UDF should have its own test suite, so I followed it in PIG-2881. I guess that the same applies to CurrentTime().

Please anyone correct me if I am wrong.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500155#comment-13500155 ] 

Rohini Palaniswamy commented on PIG-3014:
-----------------------------------------

bq. Apparently, job.getJobConf() returns a different JobConf object each time, so properties that are set by job.getJobConf().set() do not last at all.

  This is got me worried. I don't see a clone happening in mapreduce code. Were you able to get to the root cause of the behaviour?
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-3014:
-------------------------------

    Attachment: PIG-3014-2.patch

Attaching a patch that makes {{TestCurrentTime}} pass in both hadoop 20 and 23. I also fixed whitespace and Apache header issues that I mentioned in a previous comment.

Thanks!
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Julien Le Dem (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507750#comment-13507750 ] 

Julien Le Dem commented on PIG-3014:
------------------------------------

I think it's better to have one test class per UDF.
Usually tests are grouped per class or functional group of classes.
All builtin UDFs do not make a functional group as they have various different purposes. It just makes a huge Test class which is undesirable.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500450#comment-13500450 ] 

Jonathan Coveney commented on PIG-3014:
---------------------------------------

Thanks for fixing my patch over the weekend, Cheolsoo! Feel free to commit, or I can later. The reviewer has become the reviewed :)

I agree with Rohini that we need to be vigilant about this getJobConf situation.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499503#comment-13499503 ] 

Cheolsoo Park commented on PIG-3014:
------------------------------------

I was going to commit the patch after fixing whitespaces.

But I realized that the new test case {{TestCurrentTime}} fails with hadoop-2.0.x.
{code}
ERROR 0: pig.job.submitted.timestamp was not set!
{code}
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13491301#comment-13491301 ] 

Jonathan Coveney commented on PIG-3014:
---------------------------------------

(as a sidenote, I was thinking about this, and IF (big if) Hadoop can guarantee an atomic write action (I don't think it can?) then we only need one file. Each mapper can attempt to read it, and if it is empty, it appends the current time, and then it reads the first date time in that file. It would avoid a race condition because of the atomic write action. If writing isn't atomic though you'd have to abuse some atomic action for coordination, a la delete above. In fact, we could even make this a generic API that let's you coordinate some runtime value for udf invocations, but once again, it's not really a pattern we want to encourage).

Now I sort of want to do this just for the challenge of it...
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490820#comment-13490820 ] 

Alan Gates commented on PIG-3014:
---------------------------------

+1.  It's hard to envision why you'd want the current behavior.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-3014:
-------------------------------

    Status: Patch Available  (was: Reopened)
    
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch, PIG-3014-3.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-3014:
-------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Jonathan and Rohini!
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3014:
----------------------------------

    Attachment: PIG-3014-1.patch

I think this patch is a good compromise. I was talking to Bill earlier today and he mentioned that we have in the job conf a timestamp for around when a given job starts. IMHO this is close enough, and it will be the same. Easy peasy.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507654#comment-13507654 ] 

Cheolsoo Park commented on PIG-3014:
------------------------------------

Hi Julien,

Sorry for that. It is failing because {{TestBuiltin}} is not set {{pig.job.submitted.timestamp}}. I will get it fixed now.
                
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch, PIG-3014-1.patch, PIG-3014-2.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (PIG-3014) CurrentTime() UDF has undesirable characteristics

Posted by "Jonathan Coveney (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-3014:
----------------------------------

    Fix Version/s: 0.12
         Assignee: Jonathan Coveney
           Status: Patch Available  (was: Open)
    
> CurrentTime() UDF has undesirable characteristics
> -------------------------------------------------
>
>                 Key: PIG-3014
>                 URL: https://issues.apache.org/jira/browse/PIG-3014
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>             Fix For: 0.12
>
>         Attachments: PIG-3014-0.patch
>
>
> As part of the explanation of the new DateTime datatype I noticed that we had added a CurrentTime() UDF. The issue with this UDF is that it returns the current time _of every exec invocation_, which can lead to confusing results. In PIG-1431 I proposed a way such that every instance of the same NOW() will return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira