You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Micah Whitacre <mk...@gmail.com> on 2015/07/13 15:54:02 UTC

Parsing Calculated Time Into Individual Fields

I have a situation where I have a Hive table that is time partitioned by
individual fields (e.g. time_year, time_month, time_day, time_hour).  I
want to regularly run Hive queries over this table for data in the last 24
hours (or X days or X months).  To improve performance and make use of
these partitions to cut out data I'm trying to figure out how to calculate
and parse the components of a time value based on the nominal time.

To calculate the time I want I know I can do the following:
${coord:dateOffset(coord:nominalTime(), -1, 'DAY')}

So now what I'm trying to figure out is how to parse the individual year,
month, day, hour components out of that calculate time similar to how you
can have the ${YEAR}, ${MONTH}, etc values when composing a dataset. Then
I'll be able to substitute those values into a Hive query like

>> select * from table foo with time_year >= ${CALCULATED_YEAR}...;

Can this be done using existing EL Functions? I'd prefer not to have to
write my own.

Micah

Re: Parsing Calculated Time Into Individual Fields

Posted by Micah Whitacre <mk...@gmail.com>.
Nam,
thanks for the suggestion that does exactly what I wanted.

Micah

On Mon, Jul 13, 2015 at 9:02 AM, Nam Pham <ph...@gmail.com> wrote:

> Micah,
>
> You can use the *formatTime* EL function described here:
>
>
> https://oozie.apache.org/docs/3.1.3-incubating/CoordinatorFunctionalSpec.html#a6.8.2._coord:formatTimeString_ts_String_format_EL_Function_since_Oozie_2.3.2
>
> Cheers,
> Nam
>
> On Mon, Jul 13, 2015 at 8:54 PM, Micah Whitacre <mk...@gmail.com> wrote:
>
> > I have a situation where I have a Hive table that is time partitioned by
> > individual fields (e.g. time_year, time_month, time_day, time_hour).  I
> > want to regularly run Hive queries over this table for data in the last
> 24
> > hours (or X days or X months).  To improve performance and make use of
> > these partitions to cut out data I'm trying to figure out how to
> calculate
> > and parse the components of a time value based on the nominal time.
> >
> > To calculate the time I want I know I can do the following:
> > ${coord:dateOffset(coord:nominalTime(), -1, 'DAY')}
> >
> > So now what I'm trying to figure out is how to parse the individual year,
> > month, day, hour components out of that calculate time similar to how you
> > can have the ${YEAR}, ${MONTH}, etc values when composing a dataset. Then
> > I'll be able to substitute those values into a Hive query like
> >
> > >> select * from table foo with time_year >= ${CALCULATED_YEAR}...;
> >
> > Can this be done using existing EL Functions? I'd prefer not to have to
> > write my own.
> >
> > Micah
> >
>

Re: Parsing Calculated Time Into Individual Fields

Posted by Nam Pham <ph...@gmail.com>.
Micah,

You can use the *formatTime* EL function described here:

https://oozie.apache.org/docs/3.1.3-incubating/CoordinatorFunctionalSpec.html#a6.8.2._coord:formatTimeString_ts_String_format_EL_Function_since_Oozie_2.3.2

Cheers,
Nam

On Mon, Jul 13, 2015 at 8:54 PM, Micah Whitacre <mk...@gmail.com> wrote:

> I have a situation where I have a Hive table that is time partitioned by
> individual fields (e.g. time_year, time_month, time_day, time_hour).  I
> want to regularly run Hive queries over this table for data in the last 24
> hours (or X days or X months).  To improve performance and make use of
> these partitions to cut out data I'm trying to figure out how to calculate
> and parse the components of a time value based on the nominal time.
>
> To calculate the time I want I know I can do the following:
> ${coord:dateOffset(coord:nominalTime(), -1, 'DAY')}
>
> So now what I'm trying to figure out is how to parse the individual year,
> month, day, hour components out of that calculate time similar to how you
> can have the ${YEAR}, ${MONTH}, etc values when composing a dataset. Then
> I'll be able to substitute those values into a Hive query like
>
> >> select * from table foo with time_year >= ${CALCULATED_YEAR}...;
>
> Can this be done using existing EL Functions? I'd prefer not to have to
> write my own.
>
> Micah
>