You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Cam Bazz <ca...@gmail.com> on 2011/02/10 07:03:55 UTC

query returns sometext instead of none

Hello,

I am making a query such that:

insert overwrite table selection_hourly_clicks partition (date_hour =
PARTNAME) select sel_sid, count(*) cc from (select
split(parse_url(iv.referrer_url,'PATH'), '_')[1] sel_sid from item_raw
iv where iv.date_hour='PARTNAME' AND iv.referrer_url is not null AND
substring(parse_url(iv.referrer_url,'PATH'),0,8)=='/mypath/') s group
by sel_sid

if the url referrer starts is like /mypath/blabla_10, I get 10, which
is the sel_sid, and then agregate by number of sel_sids per hour.

all is fine, and the query runs. but for some partitions, it finds
nothing, which is also fine.

but when I look over hdfs, I see files like:

SEQ"org.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Text���������&�u"�͇���<�

SEQ"org.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Text��������h�:��j'P�*/

those are for partitions that does not have a count, i,e the query
does not return anything.

when it returns something it writes a file like:

SEQ"org.apache.hadoop.io.BytesWritableorg.apache.hadoop.io.Text������i�0+9?
���׮����
�������1515

everything totally works, but this behaivor is inconsistent with my
other group by queryies, which dont write anyfile if the group by does
not produce and result.

is there something wrong with my query?

best regards,
-c.b.

Re: query returns sometext instead of none

Posted by Ajo Fod <aj...@gmail.com>.
Have you tried constructing the table as a text file?

use the following at the end of the "CREATE table" statement :

ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS TEXTFILE;

It might be just that sequencefile puts in some information even if there is
no data.

Cheers,
Ajo.

On Wed, Feb 9, 2011 at 10:03 PM, Cam Bazz <ca...@gmail.com> wrote:

> Hello,
>
> I am making a query such that:
>
> insert overwrite table selection_hourly_clicks partition (date_hour =
> PARTNAME) select sel_sid, count(*) cc from (select
> split(parse_url(iv.referrer_url,'PATH'), '_')[1] sel_sid from item_raw
> iv where iv.date_hour='PARTNAME' AND iv.referrer_url is not null AND
> substring(parse_url(iv.referrer_url,'PATH'),0,8)=='/mypath/') s group
> by sel_sid
>
> if the url referrer starts is like /mypath/blabla_10, I get 10, which
> is the sel_sid, and then agregate by number of sel_sids per hour.
>
> all is fine, and the query runs. but for some partitions, it finds
> nothing, which is also fine.
>
> but when I look over hdfs, I see files like:
>
> SEQ "org.apache.hadoop.io.BytesWritable
> org.apache.hadoop.io.Text���������&�u"�͇�� �<�
>
> SEQ "org.apache.hadoop.io.BytesWritable
> org.apache.hadoop.io.Text��������h�:��j'P�*/
>
> those are for partitions that does not have a count, i,e the query
> does not return anything.
>
> when it returns something it writes a file like:
>
> SEQ "org.apache.hadoop.io.BytesWritable org.apache.hadoop.io.Text������
> i�0+9?
> � � �׮����
> ��� ���� 151 5
>
> everything totally works, but this behaivor is inconsistent with my
> other group by queryies, which dont write anyfile if the group by does
> not produce and result.
>
> is there something wrong with my query?
>
> best regards,
> -c.b.
>