You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Matt Tucker <ma...@gmail.com> on 2013/03/06 22:13:11 UTC

Variable Substitution

Using CDH3u3 (Hive 0.7.1), it appears that variable substitution becomes
disabled when I use a variable in a query that hasn't been defined.

For instance, using the following script:

set var2=2013-02-01;
set var3=2013-02-10;

SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR
(clndr_dt >= "${hiveconf:var2}" AND clndr_dt <= "${hiveconf:var3}") LIMIT 1;


I would expect the results job configuration to list the hive.query.string
as:

SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR
(clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10") LIMIT 1;


(or with ${hiveconf:var1} removed, leaving an empty string).

Instead, it prints the query as-is (with the variable placeholders).  If I
set var1=2012-01-01, it properly substitutes the variables, but it only
returns the '2012-01-01' record (not the 2013 records).

SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "2012-01-01" OR (
clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1

I was originally planning to use this for partition pruning, but it doesn't
appear to be the cause as the calendar table is not partitioned.

Is there something that I've overlooked?

Thanks!

Re: Avro Backed Hive tables

Posted by David Morel <dm...@gmail.com>.
On 7 Mar 2013, at 2:43, Murtaza Doctor wrote:

> Folks,
>
> Wanted to get some help or feedback from the community on this one:

Hello,

in that case it is advisable to start a new thread, and not 'reply-to' 
when you compose your email :-)

Have a nice day

David

Avro Backed Hive tables

Posted by Murtaza Doctor <mu...@richrelevance.com>.
Folks,

Wanted to get some help or feedback from the community on this one:

We are trying to create a table backed by avro schema:
Hive Version: 0.10

This is the syntax we are using to create the table. As you can notice the avro schema is defined inline.
Question:-  If you look at the avro schema we have a union of two records and when we issue the command it successfully creates the table with only the first record PageView and the columns searchTerms & filterCategories. It completely ignored the second union record.
It is probably where hive is not supporting avro union records as part of the CREATE table statement. Any clues on this one would help.

CREATE TABLE foo
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS
INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{
      "type" : "record",
      "name" : "PageView",
      "fields" : [ {
        "name" : "searchTerms",
        "type" : {
          "type" : "array",
          "items" : "string"
        }
      }, {
        "name" : "filterCategories",
        "type" : {
          "type" : "array",
          "items" : "string"
        }
      } ]
    }, {
      "type" : "record",
      "name" : "CategoryView",
      "fields" : [ {
        "name" : "categoryId",
        "type" : "string"
      }, {
        "name" : "refinements",
        "type" : {
          "type" : "array",
          "items" : {
            "type" : "record",
            "name" : "Refinement",
            "fields" : [ {
              "name" : "attribute",
              "type" : "string"
            }, {
              "name" : "value",
              "type" : "string"
            } ]
          }
        },
        "default" : [ ]
      } ]
}');

Thanks,
murtaza


Re: Variable Substitution

Posted by Matt Tucker <ma...@gmail.com>.
I'm fine with the variable placeholder not being removed in cases where the variable is not defined (until I change my mind). When I define var2 and var3, though, their placeholders aren't swapped for their values.

My reasoning for this was that I'm moving from one execution script that defines var1, to a newer script that defines var2 and var3.  The goal for this was that I could use the same hive script with either execution script until we've fully migrated.

I don't disagree with the placeholders remaining if they're not defined, I'm just unsure that having 1 variable reference that's not defined should prevent the others from working.


On Mar 6, 2013, at 4:41 PM, Edward Capriolo <ed...@gmail.com> wrote:

> It was done like this in hive because that is what hadoops variable substitution does, namely if it does not understand the variable it does not replace it.
> 
> On Wed, Mar 6, 2013 at 4:30 PM, Dean Wampler <de...@thinkbiganalytics.com> wrote:
>> Even newer versions of Hive do this. Any reason you don't want to provide a definition for all of them? You could argue that an undefined variable is a bug and leaving the literal text in place makes it easier to notice. Although, Unix shells would insert an empty string, so never mind ;)
>> 
>> On Wed, Mar 6, 2013 at 3:13 PM, Matt Tucker <ma...@gmail.com> wrote:
>>> Using CDH3u3 (Hive 0.7.1), it appears that variable substitution becomes disabled when I use a variable in a query that hasn't been defined.
>>> 
>>> For instance, using the following script:
>>> 
>>> set var2=2013-02-01;
>>> set var3=2013-02-10;
>>> 
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR (clndr_dt >= "${hiveconf:var2}" AND clndr_dt <= "${hiveconf:var3}") LIMIT 1;
>>> 
>>> I would expect the results job configuration to list the hive.query.string as:
>>> 
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR (clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10") LIMIT 1;
>>> 
>>> (or with ${hiveconf:var1} removed, leaving an empty string).
>>> 
>>> Instead, it prints the query as-is (with the variable placeholders).  If I set var1=2012-01-01, it properly substitutes the variables, but it only returns the '2012-01-01' record (not the 2013 records).
>>> 
>>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "2012-01-01" OR ( clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1
>>> 
>>> I was originally planning to use this for partition pruning, but it doesn't appear to be the cause as the calendar table is not partitioned.
>>> 
>>> Is there something that I've overlooked?
>>> 
>>> Thanks!
>> 
>> 
>> 
>> -- 
>> Dean Wampler, Ph.D.
>> thinkbiganalytics.com
>> +1-312-339-1330
> 

Re: Variable Substitution

Posted by Edward Capriolo <ed...@gmail.com>.
It was done like this in hive because that is what hadoops variable
substitution does, namely if it does not understand the variable it does
not replace it.

On Wed, Mar 6, 2013 at 4:30 PM, Dean Wampler <
dean.wampler@thinkbiganalytics.com> wrote:

> Even newer versions of Hive do this. Any reason you don't want to provide
> a definition for all of them? You could argue that an undefined variable is
> a bug and leaving the literal text in place makes it easier to notice.
> Although, Unix shells would insert an empty string, so never mind ;)
>
> On Wed, Mar 6, 2013 at 3:13 PM, Matt Tucker <ma...@gmail.com> wrote:
>
>> Using CDH3u3 (Hive 0.7.1), it appears that variable substitution becomes
>> disabled when I use a variable in a query that hasn't been defined.
>>
>> For instance, using the following script:
>>
>> set var2=2013-02-01;
>> set var3=2013-02-10;
>>
>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR
>> (clndr_dt >= "${hiveconf:var2}" AND clndr_dt <= "${hiveconf:var3}") LIMIT 1;
>>
>>
>> I would expect the results job configuration to list the
>> hive.query.string as:
>>
>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR
>> (clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10") LIMIT 1;
>>
>>
>> (or with ${hiveconf:var1} removed, leaving an empty string).
>>
>> Instead, it prints the query as-is (with the variable placeholders).  If
>> I set var1=2012-01-01, it properly substitutes the variables, but it only
>> returns the '2012-01-01' record (not the 2013 records).
>>
>> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "2012-01-01" OR (
>> clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1
>>
>> I was originally planning to use this for partition pruning, but it
>> doesn't appear to be the cause as the calendar table is not partitioned.
>>
>> Is there something that I've overlooked?
>>
>> Thanks!
>>
>>
>
>
> --
> *Dean Wampler, Ph.D.*
> thinkbiganalytics.com
> +1-312-339-1330
>
>

Re: Variable Substitution

Posted by Dean Wampler <de...@thinkbiganalytics.com>.
Even newer versions of Hive do this. Any reason you don't want to provide a
definition for all of them? You could argue that an undefined variable is a
bug and leaving the literal text in place makes it easier to notice.
Although, Unix shells would insert an empty string, so never mind ;)

On Wed, Mar 6, 2013 at 3:13 PM, Matt Tucker <ma...@gmail.com> wrote:

> Using CDH3u3 (Hive 0.7.1), it appears that variable substitution becomes
> disabled when I use a variable in a query that hasn't been defined.
>
> For instance, using the following script:
>
> set var2=2013-02-01;
> set var3=2013-02-10;
>
> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR
> (clndr_dt >= "${hiveconf:var2}" AND clndr_dt <= "${hiveconf:var3}") LIMIT 1;
>
>
> I would expect the results job configuration to list the hive.query.string
> as:
>
> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "${hiveconf:var1}" OR
> (clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10") LIMIT 1;
>
>
> (or with ${hiveconf:var1} removed, leaving an empty string).
>
> Instead, it prints the query as-is (with the variable placeholders).  If I
> set var1=2012-01-01, it properly substitutes the variables, but it only
> returns the '2012-01-01' record (not the 2013 records).
>
> SELECT clndr_dt FROM calendar WHERE clndr_dt LIKE "2012-01-01" OR (
> clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1
>
> I was originally planning to use this for partition pruning, but it
> doesn't appear to be the cause as the calendar table is not partitioned.
>
> Is there something that I've overlooked?
>
> Thanks!
>
>


-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330