You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Vijay <te...@gmail.com> on 2009/12/20 03:29:30 UTC

Variable support in Amazon's Elastic MapReduce version of Hive

Amazon Elastic MapReduce version of Hive seems to have a nice feature called
"Variables." Basically you can define a variable via command-line while
invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT}
within the hive queries. This could be extremely useful. I can't seem to
find this feature even on trunk. Is this feature currently anywhere in the
roadmap?

Re: Variable support in Amazon's Elastic MapReduce version of Hive

Posted by Edward Capriolo <ed...@gmail.com>.
I do not think adding up the QueryParser, if something is useful.
Already users can specify options with SET and -hiveconf that effect
how are query runs, like -hiveconf mapred.map.tasks=5, so this is not
really much different. Also, I think we want to keep as much logic out
of the CLI as possible. I added something to do escapes a while back
and we really wanted in the QueryProcessor.

I agree that programmatic systems probably have less need for this,
but theoretically another component COULD use it so why not move it
upstream? It has tsql like implications.


On Wed, Dec 30, 2009 at 3:03 PM, Vijay <te...@gmail.com> wrote:
> Thanks guys!
>
> When I think about it, it may be good enough to do this at the CLI level as
> that is probably the most common use case for this (in most of the other
> "API" style modes the apps can dynamically generate queries as they need).
> That way the parser does not have to be overloaded with too many assumptions
> around this concept.
>
> We should take a look at current "Parameter Substitution" feature in Pig. It
> seems pretty comprehensive. I'm not familiar enough with the code to even
> venture a guess as to how much of that code would be reusable.
>
> On Wed, Dec 30, 2009 at 7:31 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>>
>> I see two ways to do this. we can do the variable substitution at the
>> CLI level. Or we can do this at the query processor level.
>>
>> In each case the variables would be set into the SessionState and the
>> respective component could do the substitution.
>>
>> I think having the query processor handle this would be better.
>>
>> If we don't here back in a few I will gladly do this as I can leverage
>> this as well.
>>
>> Edward
>> On Tue, Dec 29, 2009 at 8:57 PM, Zheng Shao <zs...@gmail.com> wrote:
>> > Hi Vijay,
>> >
>> > I sent out an inquiry to the guys at aws on 12/21. There is no reply
>> > yet. It might be that people are on vacation.
>> > Let's wait a bit to see if they can contribute that back to open-source.
>> >
>> > Zheng
>> >
>> > On Tue, Dec 29, 2009 at 5:10 PM, Vijay <te...@gmail.com> wrote:
>> >> Sorry to bump the thread again. I thought this was lost during the
>> >> holidays.
>> >> Anybody have any ideas about this?
>> >>
>> >> On Sat, Dec 19, 2009 at 6:29 PM, Vijay <te...@gmail.com> wrote:
>> >>>
>> >>> Amazon Elastic MapReduce version of Hive seems to have a nice feature
>> >>> called "Variables." Basically you can define a variable via
>> >>> command-line
>> >>> while invoking hive with -d DT=2009-12-09 and then refer to the
>> >>> variable via
>> >>> ${DT} within the hive queries. This could be extremely useful. I can't
>> >>> seem
>> >>> to find this feature even on trunk. Is this feature currently anywhere
>> >>> in
>> >>> the roadmap?
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Yours,
>> > Zheng
>> >
>
>

Re: Variable support in Amazon's Elastic MapReduce version of Hive

Posted by Thejas Nair <te...@yahoo-inc.com>.
The parameter substitution in pig is done using a query pre-processor, this
code is mostly independent of rest of pig code, so it can be understood in
isolation. It uses javacc.
The code is in the package - org.apache.pig.tools.parameters (
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/org/apache/pig/tools/param
eters/)

-Thejas

On 12/30/09 12:03 PM, "Vijay" <te...@gmail.com> wrote:

> Thanks guys!
> 
> When I think about it, it may be good enough to do this at the CLI level as
> that is probably the most common use case for this (in most of the other
> "API" style modes the apps can dynamically generate queries as they need).
> That way the parser does not have to be overloaded with too many assumptions
> around this concept.
> 
> We should take a look at current "Parameter
> Substitution<http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#
> Parameter+Substitution>"
> feature in Pig. It seems pretty comprehensive. I'm not familiar enough with
> the code to even venture a guess as to how much of that code would be
> reusable.
> 
> On Wed, Dec 30, 2009 at 7:31 AM, Edward Capriolo <ed...@gmail.com>wrote:
> 
>> I see two ways to do this. we can do the variable substitution at the
>> CLI level. Or we can do this at the query processor level.
>> 
>> In each case the variables would be set into the SessionState and the
>> respective component could do the substitution.
>> 
>> I think having the query processor handle this would be better.
>> 
>> If we don't here back in a few I will gladly do this as I can leverage
>> this as well.
>> 
>> Edward
>> On Tue, Dec 29, 2009 at 8:57 PM, Zheng Shao <zs...@gmail.com> wrote:
>>> Hi Vijay,
>>> 
>>> I sent out an inquiry to the guys at aws on 12/21. There is no reply
>>> yet. It might be that people are on vacation.
>>> Let's wait a bit to see if they can contribute that back to open-source.
>>> 
>>> Zheng
>>> 
>>> On Tue, Dec 29, 2009 at 5:10 PM, Vijay <te...@gmail.com> wrote:
>>>> Sorry to bump the thread again. I thought this was lost during the
>> holidays.
>>>> Anybody have any ideas about this?
>>>> 
>>>> On Sat, Dec 19, 2009 at 6:29 PM, Vijay <te...@gmail.com> wrote:
>>>>> 
>>>>> Amazon Elastic MapReduce version of Hive seems to have a nice feature
>>>>> called "Variables." Basically you can define a variable via
>> command-line
>>>>> while invoking hive with -d DT=2009-12-09 and then refer to the
>> variable via
>>>>> ${DT} within the hive queries. This could be extremely useful. I can't
>> seem
>>>>> to find this feature even on trunk. Is this feature currently anywhere
>> in
>>>>> the roadmap?
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Yours,
>>> Zheng
>>> 
>> 


Re: Variable support in Amazon's Elastic MapReduce version of Hive

Posted by Vijay <te...@gmail.com>.
Thanks guys!

When I think about it, it may be good enough to do this at the CLI level as
that is probably the most common use case for this (in most of the other
"API" style modes the apps can dynamically generate queries as they need).
That way the parser does not have to be overloaded with too many assumptions
around this concept.

We should take a look at current "Parameter
Substitution<http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#Parameter+Substitution>"
feature in Pig. It seems pretty comprehensive. I'm not familiar enough with
the code to even venture a guess as to how much of that code would be
reusable.

On Wed, Dec 30, 2009 at 7:31 AM, Edward Capriolo <ed...@gmail.com>wrote:

> I see two ways to do this. we can do the variable substitution at the
> CLI level. Or we can do this at the query processor level.
>
> In each case the variables would be set into the SessionState and the
> respective component could do the substitution.
>
> I think having the query processor handle this would be better.
>
> If we don't here back in a few I will gladly do this as I can leverage
> this as well.
>
> Edward
> On Tue, Dec 29, 2009 at 8:57 PM, Zheng Shao <zs...@gmail.com> wrote:
> > Hi Vijay,
> >
> > I sent out an inquiry to the guys at aws on 12/21. There is no reply
> > yet. It might be that people are on vacation.
> > Let's wait a bit to see if they can contribute that back to open-source.
> >
> > Zheng
> >
> > On Tue, Dec 29, 2009 at 5:10 PM, Vijay <te...@gmail.com> wrote:
> >> Sorry to bump the thread again. I thought this was lost during the
> holidays.
> >> Anybody have any ideas about this?
> >>
> >> On Sat, Dec 19, 2009 at 6:29 PM, Vijay <te...@gmail.com> wrote:
> >>>
> >>> Amazon Elastic MapReduce version of Hive seems to have a nice feature
> >>> called "Variables." Basically you can define a variable via
> command-line
> >>> while invoking hive with -d DT=2009-12-09 and then refer to the
> variable via
> >>> ${DT} within the hive queries. This could be extremely useful. I can't
> seem
> >>> to find this feature even on trunk. Is this feature currently anywhere
> in
> >>> the roadmap?
> >>
> >>
> >
> >
> >
> > --
> > Yours,
> > Zheng
> >
>

Re: Variable support in Amazon's Elastic MapReduce version of Hive

Posted by Vijay <te...@gmail.com>.
Thanks guys!

When I think about it, it may be good enough to do this at the CLI level as
that is probably the most common use case for this (in most of the other
"API" style modes the apps can dynamically generate queries as they need).
That way the parser does not have to be overloaded with too many assumptions
around this concept.

We should take a look at current "Parameter
Substitution<http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#Parameter+Substitution>"
feature in Pig. It seems pretty comprehensive. I'm not familiar enough with
the code to even venture a guess as to how much of that code would be
reusable.

On Wed, Dec 30, 2009 at 7:31 AM, Edward Capriolo <ed...@gmail.com>wrote:

> I see two ways to do this. we can do the variable substitution at the
> CLI level. Or we can do this at the query processor level.
>
> In each case the variables would be set into the SessionState and the
> respective component could do the substitution.
>
> I think having the query processor handle this would be better.
>
> If we don't here back in a few I will gladly do this as I can leverage
> this as well.
>
> Edward
> On Tue, Dec 29, 2009 at 8:57 PM, Zheng Shao <zs...@gmail.com> wrote:
> > Hi Vijay,
> >
> > I sent out an inquiry to the guys at aws on 12/21. There is no reply
> > yet. It might be that people are on vacation.
> > Let's wait a bit to see if they can contribute that back to open-source.
> >
> > Zheng
> >
> > On Tue, Dec 29, 2009 at 5:10 PM, Vijay <te...@gmail.com> wrote:
> >> Sorry to bump the thread again. I thought this was lost during the
> holidays.
> >> Anybody have any ideas about this?
> >>
> >> On Sat, Dec 19, 2009 at 6:29 PM, Vijay <te...@gmail.com> wrote:
> >>>
> >>> Amazon Elastic MapReduce version of Hive seems to have a nice feature
> >>> called "Variables." Basically you can define a variable via
> command-line
> >>> while invoking hive with -d DT=2009-12-09 and then refer to the
> variable via
> >>> ${DT} within the hive queries. This could be extremely useful. I can't
> seem
> >>> to find this feature even on trunk. Is this feature currently anywhere
> in
> >>> the roadmap?
> >>
> >>
> >
> >
> >
> > --
> > Yours,
> > Zheng
> >
>

Re: Variable support in Amazon's Elastic MapReduce version of Hive

Posted by Edward Capriolo <ed...@gmail.com>.
I see two ways to do this. we can do the variable substitution at the
CLI level. Or we can do this at the query processor level.

In each case the variables would be set into the SessionState and the
respective component could do the substitution.

I think having the query processor handle this would be better.

If we don't here back in a few I will gladly do this as I can leverage
this as well.

Edward
On Tue, Dec 29, 2009 at 8:57 PM, Zheng Shao <zs...@gmail.com> wrote:
> Hi Vijay,
>
> I sent out an inquiry to the guys at aws on 12/21. There is no reply
> yet. It might be that people are on vacation.
> Let's wait a bit to see if they can contribute that back to open-source.
>
> Zheng
>
> On Tue, Dec 29, 2009 at 5:10 PM, Vijay <te...@gmail.com> wrote:
>> Sorry to bump the thread again. I thought this was lost during the holidays.
>> Anybody have any ideas about this?
>>
>> On Sat, Dec 19, 2009 at 6:29 PM, Vijay <te...@gmail.com> wrote:
>>>
>>> Amazon Elastic MapReduce version of Hive seems to have a nice feature
>>> called "Variables." Basically you can define a variable via command-line
>>> while invoking hive with -d DT=2009-12-09 and then refer to the variable via
>>> ${DT} within the hive queries. This could be extremely useful. I can't seem
>>> to find this feature even on trunk. Is this feature currently anywhere in
>>> the roadmap?
>>
>>
>
>
>
> --
> Yours,
> Zheng
>

Re: Variable support in Amazon's Elastic MapReduce version of Hive

Posted by Zheng Shao <zs...@gmail.com>.
Hi Vijay,

I sent out an inquiry to the guys at aws on 12/21. There is no reply
yet. It might be that people are on vacation.
Let's wait a bit to see if they can contribute that back to open-source.

Zheng

On Tue, Dec 29, 2009 at 5:10 PM, Vijay <te...@gmail.com> wrote:
> Sorry to bump the thread again. I thought this was lost during the holidays.
> Anybody have any ideas about this?
>
> On Sat, Dec 19, 2009 at 6:29 PM, Vijay <te...@gmail.com> wrote:
>>
>> Amazon Elastic MapReduce version of Hive seems to have a nice feature
>> called "Variables." Basically you can define a variable via command-line
>> while invoking hive with -d DT=2009-12-09 and then refer to the variable via
>> ${DT} within the hive queries. This could be extremely useful. I can't seem
>> to find this feature even on trunk. Is this feature currently anywhere in
>> the roadmap?
>
>



-- 
Yours,
Zheng

Re: Variable support in Amazon's Elastic MapReduce version of Hive

Posted by Vijay <te...@gmail.com>.
Sorry to bump the thread again. I thought this was lost during the holidays.
Anybody have any ideas about this?

On Sat, Dec 19, 2009 at 6:29 PM, Vijay <te...@gmail.com> wrote:

> Amazon Elastic MapReduce version of Hive seems to have a nice feature
> called "Variables." Basically you can define a variable via command-line
> while invoking hive with -d DT=2009-12-09 and then refer to the variable via
> ${DT} within the hive queries. This could be extremely useful. I can't seem
> to find this feature even on trunk. Is this feature currently anywhere in
> the roadmap?
>