You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Ayazur Rehman <re...@gmail.com> on 2015/06/02 16:19:29 UTC
current_date function in hive
Hi everyone,
I am trying to schedule a hive query using Oozie, to perform aggregation on
a table on data of a particular day and save the results in another table
whenever every 24 hours.
the schema of my table is something like (tablename - currenttable)
id string
cdatetime timestamp
average int
locations array<string>
color string
And currently the query that I perform manually everyday is something like
insert into table lotable select id, lv, cdatetime, color, count(color)
from currenttable lateral view explode(locations) lvtable as lv where
to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
So, in order to automate the process I want to use a date function that
would let hive aggregate on the data of the previous day.
I tried using current_date function but I can't get the syntax right. I get
the following error
FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
function 'current_date'
Could you please help me with the syntax.
--
Thanking You,
Ayaz
Re: current_date function in hive
Posted by "Bhagwan S. Soni" <bh...@gmail.com>.
Use "from_unixtime(unix_timestamp())". This might help you to get what you
want.
You may have to split date and time because this function will returns
TIMESTAMP.
On Tue, Jun 2, 2015 at 7:49 PM, Ayazur Rehman <re...@gmail.com>
wrote:
> Hi everyone,
>
> I am trying to schedule a hive query using Oozie, to perform aggregation
> on a table on data of a particular day and save the results in another
> table whenever every 24 hours.
>
> the schema of my table is something like (tablename - currenttable)
> id string
> cdatetime timestamp
> average int
> locations array<string>
> color string
>
> And currently the query that I perform manually everyday is something like
>
> insert into table lotable select id, lv, cdatetime, color, count(color)
> from currenttable lateral view explode(locations) lvtable as lv where
> to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
>
> So, in order to automate the process I want to use a date function that
> would let hive aggregate on the data of the previous day.
> I tried using current_date function but I can't get the syntax right. I
> get the following error
> FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
> function 'current_date'
>
> Could you please help me with the syntax.
>
>
> --
> Thanking You,
> Ayaz
>
>
Re: current_date function in hive
Posted by Ayazur Rehman <re...@gmail.com>.
Hi everyone,
Thanks for the responses. My problem is solved now.
I was not using the proper version of hive to be able to use the
current_date function.
Thanks Gopal for the UDF link.
I beleive I will have to add the udf to the hiveserver2 in order to be able
to run the query using Oozie.
On Tue, Jun 2, 2015 at 1:30 PM, Gopal Vijayaraghavan <go...@apache.org>
wrote:
> Hi,
>
> > You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))
>
> That would be a very bad idea for query correctness.
>
> The current_date UDF was introduced because of that particular
> anti-pattern.
>
> The unix_timestamp() is evaluated when a row is read, so each row gets a
> slightly different timestamp when doing ETLs & I have seen it give
> completely incorrect results when used near midnight (i.e query kicked off
> at 11:59, executed at 00:01 next day).
>
> Plus map-reduce falure handling mandates (logically) identical outputs on
> map task retries or speculation.
>
> I¹ve recently pulled out the current_timestamp() impl out into a reusable
> UDF
> recently so that I can use that in hive-1.0.x installs as well.
>
> https://github.com/t3rmin4t0r/current-timestamp
>
> You can use that UDF JAR with hive-1.0+, the constant folding will replace
> the value in-place during compilation.
>
> Cheers,
> Gopal
>
>
>
--
Thanking You,
Ayazur Rehman
+1 720 917 8540
Re: current_date function in hive
Posted by Gopal Vijayaraghavan <go...@apache.org>.
Hi,
> You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))
That would be a very bad idea for query correctness.
The current_date UDF was introduced because of that particular
anti-pattern.
The unix_timestamp() is evaluated when a row is read, so each row gets a
slightly different timestamp when doing ETLs & I have seen it give
completely incorrect results when used near midnight (i.e query kicked off
at 11:59, executed at 00:01 next day).
Plus map-reduce falure handling mandates (logically) identical outputs on
map task retries or speculation.
I¹ve recently pulled out the current_timestamp() impl out into a reusable
UDF
recently so that I can use that in hive-1.0.x installs as well.
https://github.com/t3rmin4t0r/current-timestamp
You can use that UDF JAR with hive-1.0+, the constant folding will replace
the value in-place during compilation.
Cheers,
Gopal
Re: current_date function in hive
Posted by DU DU <wi...@gmail.com>.
You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))
On Tue, Jun 2, 2015 at 10:19 AM, Ayazur Rehman <re...@gmail.com>
wrote:
> Hi everyone,
>
> I am trying to schedule a hive query using Oozie, to perform aggregation
> on a table on data of a particular day and save the results in another
> table whenever every 24 hours.
>
> the schema of my table is something like (tablename - currenttable)
> id string
> cdatetime timestamp
> average int
> locations array<string>
> color string
>
> And currently the query that I perform manually everyday is something like
>
> insert into table lotable select id, lv, cdatetime, color, count(color)
> from currenttable lateral view explode(locations) lvtable as lv where
> to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
>
> So, in order to automate the process I want to use a date function that
> would let hive aggregate on the data of the previous day.
> I tried using current_date function but I can't get the syntax right. I
> get the following error
> FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
> function 'current_date'
>
> Could you please help me with the syntax.
>
>
> --
> Thanking You,
> Ayaz
>
>
--
Thanks,
Will
RE: current_date function in hive
Posted by Mich Talebzadeh <mi...@peridale.co.uk>.
How about using individual data functions on crdatetime (in my case op_time)
Something like
INSERT OVERWRITE TABLE tmp
SELECT rs.op_type, rs.year, rs.Month, rs.Day, rs.Total_Rows
FROM
(
SELECT
op_type
, YEAR(op_time) As Year
, MONTH(op_time) AS Month
, DAY(op_time) AS Day
, count(*) AS Total_Rows
FROM t
GROUP BY
op_type
, YEAR(op_time)
, MONTH(op_time)
, DAY(op_time)
) rs
HTH
Mich Talebzadeh
http://talebzadehmich.wordpress.com
Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
Publications due shortly:
Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache
Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
From: Ayazur Rehman [mailto:rehman.ayazur90@gmail.com]
Sent: 02 June 2015 15:19
To: user@hive.apache.org
Subject: current_date function in hive
Hi everyone,
I am trying to schedule a hive query using Oozie, to perform aggregation on a table on data of a particular day and save the results in another table whenever every 24 hours.
the schema of my table is something like (tablename - currenttable)
id string
cdatetime timestamp
average int
locations array<string>
color string
And currently the query that I perform manually everyday is something like
insert into table lotable select id, lv, cdatetime, color, count(color) from currenttable lateral view explode(locations) lvtable as lv where to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
So, in order to automate the process I want to use a date function that would let hive aggregate on the data of the previous day.
I tried using current_date function but I can't get the syntax right. I get the following error
FAILED: SemanticException [Error 10011]: Line 1:47 Invalid function 'current_date'
Could you please help me with the syntax.
--
Thanking You,
Ayaz