You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by prakash sejwani <pr...@gmail.com> on 2010/03/09 09:13:21 UTC

problem with IS NOT NULL operator in hive

Hi all,
      I have a query below

FROM (
  SELECT h.*
  FROM (
      -- Pull from the access_log
      SELECT ip,
        -- Reformat the time from the access log
        time, dt,
        --method, resource, protocol, status, length, referer, agent,
        -- Extract the product_id for the hit from the URL
        cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING) AS
product_name
      FROM a_log
    ) h
)hit
    -- Insert the hit data into a seperate search table
INSERT OVERWRITE TABLE search
  SELECT ip, time, dt,
    product_name
WHERE product_name IS NOT NULL;


it suppose to populate the search table with only if product_name is present
but i get all of it..

any help would be appreciated

thanks
prakash sejwani
econify infotech
mumbai

Re: problem with IS NOT NULL operator in hive

Posted by Zheng Shao <zs...@gmail.com>.
WHERE product_name IS NOT NULL AND product_name <> ''

On Tue, Mar 9, 2010 at 12:45 AM, prakash sejwani
<pr...@gmail.com> wrote:
> yes right can you give me a tip how to exclude blank values
>
> On Tue, Mar 9, 2010 at 2:13 PM, Zheng Shao <zs...@gmail.com> wrote:
>>
>> So I guess you didn't exclude the Blank ones?
>>
>> On Tue, Mar 9, 2010 at 12:41 AM, prakash sejwani
>> <pr...@gmail.com> wrote:
>> > yes, regexp_extract return NULL or Blank
>> >
>> > On Tue, Mar 9, 2010 at 2:05 PM, Zheng Shao <zs...@gmail.com> wrote:
>> >>
>> >> What do you mean by "product_name" is "present"?
>> >> If it is not present, does the regexp_extract return NULL?
>> >>
>> >> Zheng
>> >>
>> >> On Tue, Mar 9, 2010 at 12:13 AM, prakash sejwani
>> >> <pr...@gmail.com> wrote:
>> >> > Hi all,
>> >> >       I have a query below
>> >> >
>> >> > FROM (
>> >> >   SELECT h.*
>> >> >   FROM (
>> >> >       -- Pull from the access_log
>> >> >       SELECT ip,
>> >> >         -- Reformat the time from the access log
>> >> >         time, dt,
>> >> >         --method, resource, protocol, status, length, referer, agent,
>> >> >         -- Extract the product_id for the hit from the URL
>> >> >         cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING)
>> >> > AS
>> >> > product_name
>> >> >       FROM a_log
>> >> >     ) h
>> >> > )hit
>> >> >     -- Insert the hit data into a seperate search table
>> >> > INSERT OVERWRITE TABLE search
>> >> >   SELECT ip, time, dt,
>> >> >     product_name
>> >> > WHERE product_name IS NOT NULL;
>> >> >
>> >> >
>> >> > it suppose to populate the search table with only if product_name is
>> >> > present
>> >> > but i get all of it..
>> >> >
>> >> > any help would be appreciated
>> >> >
>> >> > thanks
>> >> > prakash sejwani
>> >> > econify infotech
>> >> > mumbai
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Yours,
>> >> Zheng
>> >
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>



-- 
Yours,
Zheng

Re: problem with IS NOT NULL operator in hive

Posted by prakash sejwani <pr...@gmail.com>.
Thanks it worked for me

On Tue, Mar 9, 2010 at 2:23 PM, prakash sejwani <pr...@gmail.com>wrote:

> I am sending sample file to populate the a_log in hive and check if its
> needed by you
>
> On Tue, Mar 9, 2010 at 2:15 PM, prakash sejwani <pr...@gmail.com>wrote:
>
>> yes right can you give me a tip how to exclude blank values
>>
>>
>> On Tue, Mar 9, 2010 at 2:13 PM, Zheng Shao <zs...@gmail.com> wrote:
>>
>>> So I guess you didn't exclude the Blank ones?
>>>
>>> On Tue, Mar 9, 2010 at 12:41 AM, prakash sejwani
>>> <pr...@gmail.com> wrote:
>>> > yes, regexp_extract return NULL or Blank
>>> >
>>> > On Tue, Mar 9, 2010 at 2:05 PM, Zheng Shao <zs...@gmail.com> wrote:
>>> >>
>>> >> What do you mean by "product_name" is "present"?
>>> >> If it is not present, does the regexp_extract return NULL?
>>> >>
>>> >> Zheng
>>> >>
>>> >> On Tue, Mar 9, 2010 at 12:13 AM, prakash sejwani
>>> >> <pr...@gmail.com> wrote:
>>> >> > Hi all,
>>> >> >       I have a query below
>>> >> >
>>> >> > FROM (
>>> >> >   SELECT h.*
>>> >> >   FROM (
>>> >> >       -- Pull from the access_log
>>> >> >       SELECT ip,
>>> >> >         -- Reformat the time from the access log
>>> >> >         time, dt,
>>> >> >         --method, resource, protocol, status, length, referer,
>>> agent,
>>> >> >         -- Extract the product_id for the hit from the URL
>>> >> >         cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING)
>>> AS
>>> >> > product_name
>>> >> >       FROM a_log
>>> >> >     ) h
>>> >> > )hit
>>> >> >     -- Insert the hit data into a seperate search table
>>> >> > INSERT OVERWRITE TABLE search
>>> >> >   SELECT ip, time, dt,
>>> >> >     product_name
>>> >> > WHERE product_name IS NOT NULL;
>>> >> >
>>> >> >
>>> >> > it suppose to populate the search table with only if product_name is
>>> >> > present
>>> >> > but i get all of it..
>>> >> >
>>> >> > any help would be appreciated
>>> >> >
>>> >> > thanks
>>> >> > prakash sejwani
>>> >> > econify infotech
>>> >> > mumbai
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Yours,
>>> >> Zheng
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Yours,
>>> Zheng
>>>
>>
>>
>

Re: problem with IS NOT NULL operator in hive

Posted by prakash sejwani <pr...@gmail.com>.
I am sending sample file to populate the a_log in hive and check if its
needed by you

On Tue, Mar 9, 2010 at 2:15 PM, prakash sejwani <pr...@gmail.com>wrote:

> yes right can you give me a tip how to exclude blank values
>
>
> On Tue, Mar 9, 2010 at 2:13 PM, Zheng Shao <zs...@gmail.com> wrote:
>
>> So I guess you didn't exclude the Blank ones?
>>
>> On Tue, Mar 9, 2010 at 12:41 AM, prakash sejwani
>> <pr...@gmail.com> wrote:
>> > yes, regexp_extract return NULL or Blank
>> >
>> > On Tue, Mar 9, 2010 at 2:05 PM, Zheng Shao <zs...@gmail.com> wrote:
>> >>
>> >> What do you mean by "product_name" is "present"?
>> >> If it is not present, does the regexp_extract return NULL?
>> >>
>> >> Zheng
>> >>
>> >> On Tue, Mar 9, 2010 at 12:13 AM, prakash sejwani
>> >> <pr...@gmail.com> wrote:
>> >> > Hi all,
>> >> >       I have a query below
>> >> >
>> >> > FROM (
>> >> >   SELECT h.*
>> >> >   FROM (
>> >> >       -- Pull from the access_log
>> >> >       SELECT ip,
>> >> >         -- Reformat the time from the access log
>> >> >         time, dt,
>> >> >         --method, resource, protocol, status, length, referer, agent,
>> >> >         -- Extract the product_id for the hit from the URL
>> >> >         cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING)
>> AS
>> >> > product_name
>> >> >       FROM a_log
>> >> >     ) h
>> >> > )hit
>> >> >     -- Insert the hit data into a seperate search table
>> >> > INSERT OVERWRITE TABLE search
>> >> >   SELECT ip, time, dt,
>> >> >     product_name
>> >> > WHERE product_name IS NOT NULL;
>> >> >
>> >> >
>> >> > it suppose to populate the search table with only if product_name is
>> >> > present
>> >> > but i get all of it..
>> >> >
>> >> > any help would be appreciated
>> >> >
>> >> > thanks
>> >> > prakash sejwani
>> >> > econify infotech
>> >> > mumbai
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Yours,
>> >> Zheng
>> >
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>>
>
>

Re: problem with IS NOT NULL operator in hive

Posted by prakash sejwani <pr...@gmail.com>.
yes right can you give me a tip how to exclude blank values

On Tue, Mar 9, 2010 at 2:13 PM, Zheng Shao <zs...@gmail.com> wrote:

> So I guess you didn't exclude the Blank ones?
>
> On Tue, Mar 9, 2010 at 12:41 AM, prakash sejwani
> <pr...@gmail.com> wrote:
> > yes, regexp_extract return NULL or Blank
> >
> > On Tue, Mar 9, 2010 at 2:05 PM, Zheng Shao <zs...@gmail.com> wrote:
> >>
> >> What do you mean by "product_name" is "present"?
> >> If it is not present, does the regexp_extract return NULL?
> >>
> >> Zheng
> >>
> >> On Tue, Mar 9, 2010 at 12:13 AM, prakash sejwani
> >> <pr...@gmail.com> wrote:
> >> > Hi all,
> >> >       I have a query below
> >> >
> >> > FROM (
> >> >   SELECT h.*
> >> >   FROM (
> >> >       -- Pull from the access_log
> >> >       SELECT ip,
> >> >         -- Reformat the time from the access log
> >> >         time, dt,
> >> >         --method, resource, protocol, status, length, referer, agent,
> >> >         -- Extract the product_id for the hit from the URL
> >> >         cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING) AS
> >> > product_name
> >> >       FROM a_log
> >> >     ) h
> >> > )hit
> >> >     -- Insert the hit data into a seperate search table
> >> > INSERT OVERWRITE TABLE search
> >> >   SELECT ip, time, dt,
> >> >     product_name
> >> > WHERE product_name IS NOT NULL;
> >> >
> >> >
> >> > it suppose to populate the search table with only if product_name is
> >> > present
> >> > but i get all of it..
> >> >
> >> > any help would be appreciated
> >> >
> >> > thanks
> >> > prakash sejwani
> >> > econify infotech
> >> > mumbai
> >> >
> >>
> >>
> >>
> >> --
> >> Yours,
> >> Zheng
> >
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: problem with IS NOT NULL operator in hive

Posted by Zheng Shao <zs...@gmail.com>.
So I guess you didn't exclude the Blank ones?

On Tue, Mar 9, 2010 at 12:41 AM, prakash sejwani
<pr...@gmail.com> wrote:
> yes, regexp_extract return NULL or Blank
>
> On Tue, Mar 9, 2010 at 2:05 PM, Zheng Shao <zs...@gmail.com> wrote:
>>
>> What do you mean by "product_name" is "present"?
>> If it is not present, does the regexp_extract return NULL?
>>
>> Zheng
>>
>> On Tue, Mar 9, 2010 at 12:13 AM, prakash sejwani
>> <pr...@gmail.com> wrote:
>> > Hi all,
>> >       I have a query below
>> >
>> > FROM (
>> >   SELECT h.*
>> >   FROM (
>> >       -- Pull from the access_log
>> >       SELECT ip,
>> >         -- Reformat the time from the access log
>> >         time, dt,
>> >         --method, resource, protocol, status, length, referer, agent,
>> >         -- Extract the product_id for the hit from the URL
>> >         cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING) AS
>> > product_name
>> >       FROM a_log
>> >     ) h
>> > )hit
>> >     -- Insert the hit data into a seperate search table
>> > INSERT OVERWRITE TABLE search
>> >   SELECT ip, time, dt,
>> >     product_name
>> > WHERE product_name IS NOT NULL;
>> >
>> >
>> > it suppose to populate the search table with only if product_name is
>> > present
>> > but i get all of it..
>> >
>> > any help would be appreciated
>> >
>> > thanks
>> > prakash sejwani
>> > econify infotech
>> > mumbai
>> >
>>
>>
>>
>> --
>> Yours,
>> Zheng
>
>



-- 
Yours,
Zheng

Re: problem with IS NOT NULL operator in hive

Posted by prakash sejwani <pr...@gmail.com>.
yes, regexp_extract return NULL or Blank

On Tue, Mar 9, 2010 at 2:05 PM, Zheng Shao <zs...@gmail.com> wrote:

> What do you mean by "product_name" is "present"?
> If it is not present, does the regexp_extract return NULL?
>
> Zheng
>
> On Tue, Mar 9, 2010 at 12:13 AM, prakash sejwani
> <pr...@gmail.com> wrote:
> > Hi all,
> >       I have a query below
> >
> > FROM (
> >   SELECT h.*
> >   FROM (
> >       -- Pull from the access_log
> >       SELECT ip,
> >         -- Reformat the time from the access log
> >         time, dt,
> >         --method, resource, protocol, status, length, referer, agent,
> >         -- Extract the product_id for the hit from the URL
> >         cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING) AS
> > product_name
> >       FROM a_log
> >     ) h
> > )hit
> >     -- Insert the hit data into a seperate search table
> > INSERT OVERWRITE TABLE search
> >   SELECT ip, time, dt,
> >     product_name
> > WHERE product_name IS NOT NULL;
> >
> >
> > it suppose to populate the search table with only if product_name is
> present
> > but i get all of it..
> >
> > any help would be appreciated
> >
> > thanks
> > prakash sejwani
> > econify infotech
> > mumbai
> >
>
>
>
> --
> Yours,
> Zheng
>

Re: problem with IS NOT NULL operator in hive

Posted by Zheng Shao <zs...@gmail.com>.
What do you mean by "product_name" is "present"?
If it is not present, does the regexp_extract return NULL?

Zheng

On Tue, Mar 9, 2010 at 12:13 AM, prakash sejwani
<pr...@gmail.com> wrote:
> Hi all,
>       I have a query below
>
> FROM (
>   SELECT h.*
>   FROM (
>       -- Pull from the access_log
>       SELECT ip,
>         -- Reformat the time from the access log
>         time, dt,
>         --method, resource, protocol, status, length, referer, agent,
>         -- Extract the product_id for the hit from the URL
>         cast( regexp_extract(resource,'\&q=([^\&]+)', 1) AS STRING) AS
> product_name
>       FROM a_log
>     ) h
> )hit
>     -- Insert the hit data into a seperate search table
> INSERT OVERWRITE TABLE search
>   SELECT ip, time, dt,
>     product_name
> WHERE product_name IS NOT NULL;
>
>
> it suppose to populate the search table with only if product_name is present
> but i get all of it..
>
> any help would be appreciated
>
> thanks
> prakash sejwani
> econify infotech
> mumbai
>



-- 
Yours,
Zheng