You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ayazur Rehman <re...@gmail.com> on 2015/06/02 16:19:29 UTC

current_date function in hive

Hi everyone,

I am trying to schedule a hive query using Oozie, to perform aggregation on
a table on data of a particular day and save the results in another table
whenever every 24 hours.

the schema of my table is something like (tablename - currenttable)
id                      string
cdatetime               timestamp
average                 int
locations               array<string>
color                   string

And currently the query that I perform manually everyday is something like

insert into table lotable select id, lv, cdatetime, color, count(color)
from currenttable lateral view explode(locations) lvtable as lv where
to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;

So, in order to automate the process I want to use a date function that
would let hive aggregate on the data of the previous day.
I tried using current_date function but I can't get the syntax right. I get
the following error
              FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
function 'current_date'

Could you please help me with the syntax.


-- 
Thanking You,
Ayaz

Re: current_date function in hive

Posted by "Bhagwan S. Soni" <bh...@gmail.com>.
Use "from_unixtime(unix_timestamp())". This might help you to get what you
want.
You may have to split date and time because this function will returns
TIMESTAMP.

On Tue, Jun 2, 2015 at 7:49 PM, Ayazur Rehman <re...@gmail.com>
wrote:

> Hi everyone,
>
> I am trying to schedule a hive query using Oozie, to perform aggregation
> on a table on data of a particular day and save the results in another
> table whenever every 24 hours.
>
> the schema of my table is something like (tablename - currenttable)
> id                      string
> cdatetime               timestamp
> average                 int
> locations               array<string>
> color                   string
>
> And currently the query that I perform manually everyday is something like
>
> insert into table lotable select id, lv, cdatetime, color, count(color)
> from currenttable lateral view explode(locations) lvtable as lv where
> to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
>
> So, in order to automate the process I want to use a date function that
> would let hive aggregate on the data of the previous day.
> I tried using current_date function but I can't get the syntax right. I
> get the following error
>               FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
> function 'current_date'
>
> Could you please help me with the syntax.
>
>
> --
> Thanking You,
> Ayaz
>
>

Re: current_date function in hive

Posted by Ayazur Rehman <re...@gmail.com>.
Hi everyone,

Thanks for the responses. My problem is solved now.
I was not using the proper version of hive to be able to use the
current_date function.
Thanks Gopal for the UDF link.
I beleive I will have to add the udf to the hiveserver2 in order to be able
to run the query using Oozie.

On Tue, Jun 2, 2015 at 1:30 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:

> Hi,
>
> > You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))
>
> That would be a very bad idea for query correctness.
>
> The current_date UDF was introduced because of that particular
> anti-pattern.
>
> The unix_timestamp() is evaluated when a row is read, so each row gets a
> slightly different timestamp when doing ETLs & I have seen it give
> completely incorrect results when used near midnight (i.e query kicked off
> at 11:59, executed at 00:01 next day).
>
> Plus map-reduce falure handling mandates (logically) identical outputs on
> map task retries or speculation.
>
> I¹ve recently pulled out the current_timestamp() impl out into a reusable
> UDF
> recently so that I can use that in hive-1.0.x installs as well.
>
> https://github.com/t3rmin4t0r/current-timestamp
>
> You can use that UDF JAR with hive-1.0+, the constant folding will replace
> the value in-place during compilation.
>
> Cheers,
> Gopal
>
>
>


-- 
Thanking You,
Ayazur Rehman
+1 720 917 8540

Re: current_date function in hive

Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,

> You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))

That would be a very bad idea for query correctness.

The current_date UDF was introduced because of that particular
anti-pattern.

The unix_timestamp() is evaluated when a row is read, so each row gets a
slightly different timestamp when doing ETLs & I have seen it give
completely incorrect results when used near midnight (i.e query kicked off
at 11:59, executed at 00:01 next day).

Plus map-reduce falure handling mandates (logically) identical outputs on
map task retries or speculation.

I¹ve recently pulled out the current_timestamp() impl out into a reusable
UDF 
recently so that I can use that in hive-1.0.x installs as well.

https://github.com/t3rmin4t0r/current-timestamp

You can use that UDF JAR with hive-1.0+, the constant folding will replace
the value in-place during compilation.

Cheers,
Gopal



Re: current_date function in hive

Posted by DU DU <wi...@gmail.com>.
You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))

On Tue, Jun 2, 2015 at 10:19 AM, Ayazur Rehman <re...@gmail.com>
wrote:

> Hi everyone,
>
> I am trying to schedule a hive query using Oozie, to perform aggregation
> on a table on data of a particular day and save the results in another
> table whenever every 24 hours.
>
> the schema of my table is something like (tablename - currenttable)
> id                      string
> cdatetime               timestamp
> average                 int
> locations               array<string>
> color                   string
>
> And currently the query that I perform manually everyday is something like
>
> insert into table lotable select id, lv, cdatetime, color, count(color)
> from currenttable lateral view explode(locations) lvtable as lv where
> to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
>
> So, in order to automate the process I want to use a date function that
> would let hive aggregate on the data of the previous day.
> I tried using current_date function but I can't get the syntax right. I
> get the following error
>               FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
> function 'current_date'
>
> Could you please help me with the syntax.
>
>
> --
> Thanking You,
> Ayaz
>
>


-- 
Thanks,
Will

RE: current_date function in hive

Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
How about using individual data functions on crdatetime (in my case op_time)

 

Something like

 

INSERT OVERWRITE TABLE tmp

SELECT rs.op_type, rs.year, rs.Month, rs.Day, rs.Total_Rows

FROM

(

SELECT

          op_type

        , YEAR(op_time) As Year

        , MONTH(op_time) AS Month

        , DAY(op_time)   AS Day

        , count(*) AS Total_Rows

FROM t

GROUP BY

          op_type

        , YEAR(op_time)

        , MONTH(op_time)

        , DAY(op_time)

) rs

 

 

HTH

 

Mich Talebzadeh

 

http://talebzadehmich.wordpress.com

 

Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7. 

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly

 

NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.

 

From: Ayazur Rehman [mailto:rehman.ayazur90@gmail.com] 
Sent: 02 June 2015 15:19
To: user@hive.apache.org
Subject: current_date function in hive

 

Hi everyone, 

 

I am trying to schedule a hive query using Oozie, to perform aggregation on a table on data of a particular day and save the results in another table whenever every 24 hours.

 

the schema of my table is something like (tablename - currenttable)

id                      string

cdatetime               timestamp

average                 int

locations               array<string>

color                   string

 

And currently the query that I perform manually everyday is something like

 

insert into table lotable select id, lv, cdatetime, color, count(color) from currenttable lateral view explode(locations) lvtable as lv where to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;

 

So, in order to automate the process I want to use a date function that would let hive aggregate on the data of the previous day.

I tried using current_date function but I can't get the syntax right. I get the following error

              FAILED: SemanticException [Error 10011]: Line 1:47 Invalid function 'current_date'

 

Could you please help me with the syntax. 

 

 

-- 

Thanking You,

Ayaz