You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Rakesh Setty <se...@yahoo-inc.com> on 2009/07/02 21:37:00 UTC
distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Rakesh Setty <se...@yahoo-inc.com>.
Hi Namit,
This is the plan generated.
hive> explain extended
>
> select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_SUBQUERY (TOK_UNION (TOK_QUERY (TOK_FROM (TOK_TABREF page_views pv)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF user) username)))) (TOK_QUERY (TOK_FROM (TOK_TABREF users u)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF name) username))))) ur)) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECTDI (TOK_SELEXPR (TOK_COLREF username)))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
null-subquery2:ur-subquery2:u
Select Operator
expressions:
expr: name
type: string
Select Operator
expressions:
expr: 0
type: string
Union
Group By Operator
keys:
expr: 0
type: string
mode: hash
Reduce Output Operator
key expressions:
expr: 0
type: string
sort order: +
Map-reduce partition columns:
expr: 0
type: string
tag: -1
null-subquery1:ur-subquery1:pv
Select Operator
expressions:
expr: user
type: string
Select Operator
expressions:
expr: 0
type: string
Union
Group By Operator
keys:
expr: 0
type: string
mode: hash
Reduce Output Operator
key expressions:
expr: 0
type: string
sort order: +
Map-reduce partition columns:
expr: 0
type: string
tag: -1
Needs Tagging: false
Path -> Alias:
/user/serakesh/users_hive
/user/serakesh/page_views_hive
Path -> Partition:
/user/serakesh/users_hive
Partition
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
properties:
line.delim
name users
field.delim
columns.types string:string:string:string:string:int
serialization.ddl struct users { string name, string phone, string address, string city, string state, i32 zip}
columns name,phone,address,city,state,zip
serialization.format
bucket_count -1
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
location /user/serakesh/users_hive
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: users
/user/serakesh/page_views_hive
Partition
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
properties:
line.delim
name page_views
field.delim
columns.types string:int:int:string:bigint:bigint:double
serialization.ddl struct page_views { string user, i32 action, i32 timespent, string query_term, i64 ip_addr, i64 time_stamp, double estimated_revenue}
columns user,action,timespent,query_term,ip_addr,time_stamp,estimated_revenue
serialization.format
bucket_count -1
serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
file.inputformat org.apache.hadoop.mapred.TextInputFormat
file.outputformat org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
location /user/serakesh/page_views_hive
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: page_views
Reduce Operator Tree:
Group By Operator
keys:
expr: KEY.0
type: string
mode: mergepartial
Select Operator
expressions:
expr: 0
type: string
File Output Operator
compressed: false
GlobalTableId: 0
directory: /tmp/hive-serakesh/135690254.10001.insclause-0
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
properties:
columns username
serialization.format 1
Stage: Stage-0
Fetch Operator
limit: -1
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 2:24 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Can you do :
explain extended
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and send the plan ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 1:54 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Can you do :
explain extended
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and send the plan ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 1:54 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Your plan looks OK.
As Amr said, try to find the offending users.
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Can you tell the exact command to generate the data ?
Detailed instructions.
Do I need to install pig, or is the standalone perl file good enough ?
Thanks,
-namit
[njain@dev029 ~/pigmix]$ perl generate_data.pl
Usage: generate_data.pl filetype numrows tablename [nosql]
Valid filetypes [studenttab, studentcolon, studentusrdef,
votertab, reg1459894, textdoc, unicode]
Died at generate_data.pl line 145.
[njain@dev029 ~/pigmix]$
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Tuesday, July 07, 2009 1:23 PM
To: Namit Jain
Subject: RE: distinct with union all
Hi Namit,
The query that I am working on is test L11 in PigMix converted to Hive. To generate the data, you can use the patch attached to https://issues.apache.org/jira/browse/PIG-200.
The exact Hive query I am using is
insert overwrite table L11out
select distinct username from
(select field1 as username from widerow w
union all select user as username from page_views pv) ur;
The output says 1642777 rows are loaded. If I do
select count(distinct user) from L11out
I get 652384 as the output.
Please let me know if you have any other questions.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Tuesday, July 07, 2009 12:01 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Rakesh,
Did the queries work for you ?
Thanks,
-namit
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 2:22 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Similar queries seem to be working for me.
1. Which version of hive are using ?
2. Would it be possible for you to ship the data to us ?
Thanks,
-namit
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:36 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Following are the counts
select count(distinct user) from Tb1;
976272
select count(1) from Tb1;
1642777
select count(distinct user) from Tb1Debug;
653824
select count(1) from Tb1Debug;
1095933
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 1:29 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Can you send the following counts:
select count(distinct user) from Tb1;
select count(1) from Tb1;
select count(distinct user) from Tb1Debug;
select count(1) from Tb1Debug;
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:20 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi,
>From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here
AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb 1
AABIxQR_Un]DLGXNwla\`o 3
AAB[_pudLZeuOe 7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin 3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM 1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV 3
AAHs]A_gvZxfKKCI^\ 2
AAIAGwnFPsrGFxaHYgOAiCHe 2
AAIjiO]r`rkrobp_xRn 1
AAIwuSGOmu_L`YGCoHrmvub` 1
AAJG[RPx\RvEYljsBgDdR 3
AAJ^BFCNpTXt]wOEir[ 1
AAJvlnBfYXNnXulGOHlAeW 2
AAKwheCincrxm_jkbm 2
The number of rows written is varying between the queries
insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and
insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
The query
select count(distinct user) from Tb1;
returns different value compared to the number of rows created by the first query. It is also different from the query
select count(distinct user) from Tb1Debug;
@Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine.
Please let me know what could be the problem.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 10:03 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Rakesh,
Did the queries work for you ?
Thanks,
-namit
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 2:22 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Similar queries seem to be working for me.
1. Which version of hive are using ?
2. Would it be possible for you to ship the data to us ?
Thanks,
-namit
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:36 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Following are the counts
select count(distinct user) from Tb1;
976272
select count(1) from Tb1;
1642777
select count(distinct user) from Tb1Debug;
653824
select count(1) from Tb1Debug;
1095933
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 1:29 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Can you send the following counts:
select count(distinct user) from Tb1;
select count(1) from Tb1;
select count(distinct user) from Tb1Debug;
select count(1) from Tb1Debug;
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:20 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi,
>From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here
AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb 1
AABIxQR_Un]DLGXNwla\`o 3
AAB[_pudLZeuOe 7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin 3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM 1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV 3
AAHs]A_gvZxfKKCI^\ 2
AAIAGwnFPsrGFxaHYgOAiCHe 2
AAIjiO]r`rkrobp_xRn 1
AAIwuSGOmu_L`YGCoHrmvub` 1
AAJG[RPx\RvEYljsBgDdR 3
AAJ^BFCNpTXt]wOEir[ 1
AAJvlnBfYXNnXulGOHlAeW 2
AAKwheCincrxm_jkbm 2
The number of rows written is varying between the queries
insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and
insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
The query
select count(distinct user) from Tb1;
returns different value compared to the number of rows created by the first query. It is also different from the query
select count(distinct user) from Tb1Debug;
@Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine.
Please let me know what could be the problem.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 10:03 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Similar queries seem to be working for me.
1. Which version of hive are using ?
2. Would it be possible for you to ship the data to us ?
Thanks,
-namit
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:36 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Following are the counts
select count(distinct user) from Tb1;
976272
select count(1) from Tb1;
1642777
select count(distinct user) from Tb1Debug;
653824
select count(1) from Tb1Debug;
1095933
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 1:29 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Can you send the following counts:
select count(distinct user) from Tb1;
select count(1) from Tb1;
select count(distinct user) from Tb1Debug;
select count(1) from Tb1Debug;
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:20 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi,
>From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here
AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb 1
AABIxQR_Un]DLGXNwla\`o 3
AAB[_pudLZeuOe 7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin 3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM 1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV 3
AAHs]A_gvZxfKKCI^\ 2
AAIAGwnFPsrGFxaHYgOAiCHe 2
AAIjiO]r`rkrobp_xRn 1
AAIwuSGOmu_L`YGCoHrmvub` 1
AAJG[RPx\RvEYljsBgDdR 3
AAJ^BFCNpTXt]wOEir[ 1
AAJvlnBfYXNnXulGOHlAeW 2
AAKwheCincrxm_jkbm 2
The number of rows written is varying between the queries
insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and
insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
The query
select count(distinct user) from Tb1;
returns different value compared to the number of rows created by the first query. It is also different from the query
select count(distinct user) from Tb1Debug;
@Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine.
Please let me know what could be the problem.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 10:03 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Rakesh Setty <se...@yahoo-inc.com>.
Following are the counts
select count(distinct user) from Tb1;
976272
select count(1) from Tb1;
1642777
select count(distinct user) from Tb1Debug;
653824
select count(1) from Tb1Debug;
1095933
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 1:29 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Can you send the following counts:
select count(distinct user) from Tb1;
select count(1) from Tb1;
select count(distinct user) from Tb1Debug;
select count(1) from Tb1Debug;
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:20 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi,
>From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here
AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb 1
AABIxQR_Un]DLGXNwla\`o 3
AAB[_pudLZeuOe 7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin 3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM 1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV 3
AAHs]A_gvZxfKKCI^\ 2
AAIAGwnFPsrGFxaHYgOAiCHe 2
AAIjiO]r`rkrobp_xRn 1
AAIwuSGOmu_L`YGCoHrmvub` 1
AAJG[RPx\RvEYljsBgDdR 3
AAJ^BFCNpTXt]wOEir[ 1
AAJvlnBfYXNnXulGOHlAeW 2
AAKwheCincrxm_jkbm 2
The number of rows written is varying between the queries
insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and
insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
The query
select count(distinct user) from Tb1;
returns different value compared to the number of rows created by the first query. It is also different from the query
select count(distinct user) from Tb1Debug;
@Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine.
Please let me know what could be the problem.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 10:03 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Can you send the following counts:
select count(distinct user) from Tb1;
select count(1) from Tb1;
select count(distinct user) from Tb1Debug;
select count(1) from Tb1Debug;
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 1:20 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi,
>From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here
AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb 1
AABIxQR_Un]DLGXNwla\`o 3
AAB[_pudLZeuOe 7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin 3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM 1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV 3
AAHs]A_gvZxfKKCI^\ 2
AAIAGwnFPsrGFxaHYgOAiCHe 2
AAIjiO]r`rkrobp_xRn 1
AAIwuSGOmu_L`YGCoHrmvub` 1
AAJG[RPx\RvEYljsBgDdR 3
AAJ^BFCNpTXt]wOEir[ 1
AAJvlnBfYXNnXulGOHlAeW 2
AAKwheCincrxm_jkbm 2
The number of rows written is varying between the queries
insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and
insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
The query
select count(distinct user) from Tb1;
returns different value compared to the number of rows created by the first query. It is also different from the query
select count(distinct user) from Tb1Debug;
@Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine.
Please let me know what could be the problem.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 10:03 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Rakesh Setty <se...@yahoo-inc.com>.
Hi,
>From the counts, I could see that grouping was happening. The output is too big, so I am attaching just partial output here
AAAGd`RH_HaA`AZNgA_WDeq 2
AAAZTMIxhpSjx[Oyb 1
AABIxQR_Un]DLGXNwla\`o 3
AAB[_pudLZeuOe 7
AAClxPCotyLn\qC 2
AAEdZeHVNj\[XECl^Amin 3
AAEkewjwSCDfCgmPbxkQEG` 2
AAFQxFnlnImcPKpvQMM 1
AAGbQojol_EXcPk 1
AAHPZBe^Q`UeMBmlhhcAVV 3
AAHs]A_gvZxfKKCI^\ 2
AAIAGwnFPsrGFxaHYgOAiCHe 2
AAIjiO]r`rkrobp_xRn 1
AAIwuSGOmu_L`YGCoHrmvub` 1
AAJG[RPx\RvEYljsBgDdR 3
AAJ^BFCNpTXt]wOEir[ 1
AAJvlnBfYXNnXulGOHlAeW 2
AAKwheCincrxm_jkbm 2
The number of rows written is varying between the queries
insert overwrite table Tbl1 select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
and
insert overwrite table Tbl1Debug select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
The query
select count(distinct user) from Tb1;
returns different value compared to the number of rows created by the first query. It is also different from the query
select count(distinct user) from Tb1Debug;
@Ashish - I had attached the explain plan output in an earlier reply which Namit said looks fine.
Please let me know what could be the problem.
Thanks,
Rakesh
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 10:03 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Ashish Thusoo <at...@facebook.com>.
Can you also send the explain plan outputs for the distinct query?
Ashish
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Monday, July 06, 2009 10:03 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Nothing special for these characters.
select username, count(1) from (select user as username from page_views pv union all select name as username from users u) ur group by username;
should return the counts.
When you see the duplicate usernames, can you send them -
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Monday, July 06, 2009 9:45 AM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Rakesh Setty <se...@yahoo-inc.com>.
Hi Amr,
The group by clause that you suggest shows that grouping is happening as the count column suggests. Perhaps it is not happening across the tables?
I see characters like \, [, _, ^, etc in username. Are these considered as special characters? If so, is there any workaround to deal with this?
Thanks,
Rakesh
________________________________
From: Amr Awadallah [mailto:aaa@cloudera.com]
Sent: Thursday, July 02, 2009 5:53 PM
To: hive-user@hadoop.apache.org
Subject: Re: distinct with union all
make sure you don't have any leading or trailing spaces (or special characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
Re: distinct with union all
Posted by Amr Awadallah <aa...@cloudera.com>.
make sure you don't have any leading or trailing spaces (or special
characters) for the usernames being extracted.
also to debug, try to do a select username, count(1) then group by username
-- amr
Rakesh Setty wrote:
>
> Yes, I am getting duplicate usernames.
>
>
>
>
>
> ------------------------------------------------------------------------
>
> *From:* Namit Jain [mailto:njain@facebook.com]
> *Sent:* Thursday, July 02, 2009 1:25 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* RE: distinct with union all
>
>
>
> Are you getting duplicate usernames ?
>
>
>
>
>
> *From:* Rakesh Setty [mailto:serakesh@yahoo-inc.com]
> *Sent:* Thursday, July 02, 2009 12:37 PM
> *To:* hive-user@hadoop.apache.org
> *Subject:* distinct with union all
>
>
>
> Hi,
>
>
>
> I have a query like
>
>
>
> select distinct username from (select user as username from page_views
> pv union all select name as username from users u) ur;
>
>
>
> But I see that result is not actually distinct. Am I missing something
> here?
>
>
>
> Thanks,
>
> Rakesh
>
RE: distinct with union all
Posted by Rakesh Setty <se...@yahoo-inc.com>.
Yes, I am getting duplicate usernames.
________________________________
From: Namit Jain [mailto:njain@facebook.com]
Sent: Thursday, July 02, 2009 1:25 PM
To: hive-user@hadoop.apache.org
Subject: RE: distinct with union all
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh
RE: distinct with union all
Posted by Namit Jain <nj...@facebook.com>.
Are you getting duplicate usernames ?
From: Rakesh Setty [mailto:serakesh@yahoo-inc.com]
Sent: Thursday, July 02, 2009 12:37 PM
To: hive-user@hadoop.apache.org
Subject: distinct with union all
Hi,
I have a query like
select distinct username from (select user as username from page_views pv union all select name as username from users u) ur;
But I see that result is not actually distinct. Am I missing something here?
Thanks,
Rakesh