You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by "Sun Fulin (JIRA)" <ji...@apache.org> on 2014/08/26 10:51:58 UTC
[jira] [Issue Comment Deleted] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes

     [ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sun Fulin updated PHOENIX-1203:
-------------------------------

    Comment: was deleted

(was: James Taylor - I rebuilt the latest phoenix version 4.1 RC from here: https://dist.apache.org/repos/dist/dev/phoenix/phoenix-4.1.0-rc1/src/ 
And confirm queries with another unit test:
DDL: create table example (my_pk integer not null, first_name varchar(20), last_name varchar(20) constraint pk PRIMARY KEY (my_pk));
create index my_idx on example (first_name);
DML: Upsert into example values (9 records with several same first_name groups)
The following queries: select /+NO_INDEX/ count (distinct first_name) from example; got finally 5 rows. 
The following queries: select /+INDEX(example my_idx)/ count (distinct first_name) from example; got all 9 records.
So James, the count (distinct col) query over index table still not get expecting results even in the latest release.
Hoping Anoop Sam John can explain this and get some available report.


 
From: James Taylor (JIRA)
Date: 2014-08-26 02:52
To: sunfl
Subject: [jira] [Commented] (PHOENIX-1203) Uable to work for count (distinct col) queries via phoenix table with secondary indexes
 
    [ https://issues.apache.org/jira/browse/PHOENIX-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14109498#comment-14109498 ] 
 
James Taylor commented on PHOENIX-1203:
---------------------------------------
 
[~sunfl] - can you confirm that when you tried with with the latest 4.1 RC that you replaced *both* the client and the *server* jar? If the problem is what I suspect it is, this should have been fixed. 
 
 
 
 
--
This message was sent by Atlassian JIRA
(v6.2#6252)
 
 
)

> Uable to work for count (distinct col) queries via phoenix table with secondary indexes
> ---------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-1203
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1203
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.0.0, 4.1
>         Environment: hadoop-2.2.0   hbase: Version 0.98.3-hadoop2
>            Reporter: Sun Fulin
>            Assignee: Anoop Sam John
>              Labels: distinct, secondaryIndex, test
>
> I build the latest 4.1 rc0 from here:  https://github.com/apache/phoenix/releases
> And examine the count (distinct col) query within the new environment. However, the problem still exists with index queries as the following while the correct distinct query result is expected to be 536 for my project:
> 0: jdbc:phoenix:zookeeper1> select count (distinct t.imsi) from ranapsignal t where t.pkttime>=1404964800000 and t.pkttime<=1404965699999 and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and t.ranapsubmsgtype=0 ;
> +-------------+
> | COUNT(IMSI) |
> +-------------+
> | 2322        |
> +-------------+
> 1 row selected (70.572 seconds)
> As James suggests, I conduct the query adding group by t.imsi with /without secondary indexes. And the result seems to be fine as they both got the correct 536 distinct groups. 
> Here are some considerations:
> 1. count (distinct col) query over index table did not work as expectation.
> 2. only distinct query over index table works fine.
> 3. If the phoenix version got some wrong configuration, correct me.
> Thanks and Best Regards,
> Sun
> -------------------------------
> Hi Sun,
> Thanks for the detailed description. Yes, your syntax is correct, and
> it's definitely true that the count distinct query should return the
> same result with and without the index. Would you mind trying this on
> our latest 3.1 RC2 and/or 4.1 RC0 and if the problem still occurs to
> file a JIRA?
>  
> One thing that make make it easier for your testing: do you know about
> our NO_INDEX hint which forces the query *not* to use an index, like
> this:
>  
> select /*+ NO_INDEX */ ...
>  
> Another question too. What about this query with and with/out the index:
>  
> select count(*) from ranapsignal t
> where t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0
> and t.ranapsubmsgtype=0
> group by t.imsi;
>  
> Thanks,
> James
>  
> On Thu, Aug 21, 2014 at 10:38 PM, sunfl@certusnet.com.cn
> <su...@certusnet.com.cn> wrote:
> >
> >
> > Hi James,
> >
> >   Recently I got trouble while trying to conduct some query performance test
> > in my phoenix tables with secondary indexes.
> >   I created a table called RANAPSIGNAL for my projects in phoenix via
> > sqlline client and load data into the table. Then I create
> >   an index on the specific column PKTTIME for the table RANAPSIGNAL while
> > including other more columns for adjusting my
> >   index query, like the following DDL:
> >          create index if not exists pkt_idx on RANAPSIGNAL (PKTTIME) include
> > (SAC,NASMSGTYPE, RANAPMSGTYPE, RANAPSUBMSGTYPE );
> >    The index creation worked successfully without any errors. So, when I am
> > trying to conduct such query as:
> >            select count (distinct t.imsi) from ranapsignal t where
> > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> >            and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> > t.ranapsubmsgtype=0 ;
> >    Without secondary indexes, the final result got 536 distinct imsi, wihch
> > is the right distinct count results. However, after I  create the above
> >   secondary index PKT_IDX and reconducting the above count (distinct imsi)
> > query, I got 2322 imsi rows which obviously are not the expected
> >   distinct counts results. I used the explain grammar to observe the scan of
> > the above select query and found that it definitely scaned over
> >   the index table PKT_IDX. I then tried to conduct the following query with
> > no count function:
> >             select distinct t.imsi from ranapsignal t where
> > t.pkttime>=1404964800000 and t.pkttime<=1404965699999
> >            and t.sac=32351 and t.nasmsgtype=0 and t.ranapmsgtype=0 and
> > t.ranapsubmsgtype=0 ;
> >    And the result is right 536 distinct imsi over scanning the index table.
> > By the way, imsi is one of the primary key when creating the table
> > RANAPSIGNAL.
> >    Here are several considerations for my trouble and practice:
> >    1. Did you guys ever practice such count (distinct) queries over phoenix
> > table via secondary index?
> >    2. I am not sure whether this problem was due to the index table, but my
> > practice may assume that conclusion.
> >    3. Corrects my if I am wrong with my previous operations (index creation,
> > DDL grammar, etc..)
> >    4. Any available hints or reply are best appreciated.
> >
> >   Thanks and Best Regards
> >   Sun
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)