You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by lxw <lx...@qq.com> on 2016/11/18 06:00:36 UTC

SQL "NOT IN" returns incorrect result

Hi,


   When I use "NOT IN" in where clause, it returns incorrect result, and instead use "<>", then result is OK.
   
Raw data;


SELECT  
c.ad_place_type,
COUNT(1) as cnt 
FROM fact_table a 
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_) 
WHERE pt = '2016-11-17' 
GROUP by c.ad_place_type;


--results
wap 64578476
app 70764413
pc  3398137
unknown 419942





SQL1(correct) :


SELECT  
c.ad_place_type,
COUNT(1) as cnt 
FROM fact_table a 
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_) 
WHERE pt = '2016-11-17' 
AND c.ad_place_type <> 'pc' 
GROUP by c.ad_place_type;



--
wap 64578476
app 70764413
unknown 419942



SQL2(incorrect):


SELECT  
c.ad_place_type,
COUNT(1) as cnt 
FROM fact_table a 
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_) 
WHERE pt = '2016-11-17' 
AND c.ad_place_type NOT IN ('pc')
GROUP by c.ad_place_type;



--
wap 4718980
app 33253424
unknown 90533



SQL3(incorrect):


SELECT  
c.ad_place_type,
COUNT(1) as cnt 
FROM fact_table a 
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_) 
WHERE pt = '2016-11-17' 
AND c.ad_place_type NOT IN ('app','wap') 
GROUP by c.ad_place_type;



--
result(0)  ,


The correct result should be "pc  3398137" and "unknown 419942", 
when instead use "(c.ad_place_type  <> 'app' AND c.ad_place_type <> 'wap')", the result is OK.




Who can help me to explain this, thanks!

Re: Some questions about Kylin2.0

Posted by Alberto Ramón <a....@gmail.com>.
Q1:Kylin 2633 <https://issues.apache.org/jira/browse/KYLIN-2633>  The
actual version of spark is 1.6.3 (in Kylin 2.0.0)

On 13 June 2017 at 04:41, lxw <lx...@qq.com> wrote:

> Hi,All :
>
>    I have some questions about Kylin2.0, and my environment:
>     hadoop-2.6.0-cdh5.8.3
>     hbase-1.2.0-cdh5.8.3
>     apache-kylin-2.0.0-bin-cdh57
>     spark-2.1.0-bin-hadoop2.6
>
> *Q1: Kylin2.0 not support Spark2.0?*
>
>      find-spark-dependency.sh:
>      spark_dependency=`find -L $spark_home -name
> 'spark-assembly-[a-z0-9A-Z\.-]*.jar' ....
>
> *Q2: I want to use Kylin2.0 without Spark Cubing, but failed.*
>
>      kylin.sh:
>      function retrieveDependency() {
>          #retrive $hive_dependency and $hbase_dependency
>          source ${dir}/find-hive-dependency.sh
>          source ${dir}/find-hbase-dependency.sh
>          source ${dir}/find-hadoop-conf-dir.sh
>          source ${dir}/find-kafka-dependency.sh
>          source ${dir}/find-spark-dependency.sh
>
>      If not found spark dependencies, Kylin can not start :
>
>      [hadoop@hadoop10 bin]$ ./kylin.sh start
>      Retrieving hadoop conf dir...
>      KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
>      Retrieving hive dependency...
>      Retrieving hbase dependency...
>      Retrieving hadoop conf dir...
>      Retrieving kafka dependency...
>      Retrieving Spark dependency...
>      *spark assembly lib not found.*
>
>      after modify kylin.sh “*####*source ${dir}/find-spark-dependency.sh”,
> Kylin start success ..
>
> *Q3: Abount kylin_hadoop_conf_dir ?*
>
>      I make some soft link under $KYLIN_HOME/hadoop-conf
> (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml),
>      and set "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf",
> when I execute ./check-env.sh,
>
>      *[hadoop@hadoop10 bin]$ ./check-env.sh *
> *     Retrieving hadoop conf dir...*
> *    /home/bigdata/kylin/current/hadoop-conf is override as the
> kylin_hadoop_conf_dir*
> *    KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current*
> *    -mkdir: java.net.UnknownHostException: cdh5*
> *    Usage: hadoop fs [generic options] -mkdir [-p] <path> ...*
> *    Failed to create /kylin20. Please make sure the user has right to
> access /kylin20*
>
>     My HDFS with HA, fs.defaultFS is "cdh5",when I don't set
> "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF, HBASE_CONF_DIR
> from envionment variables (/etc/profile), it was correct.
>
>
> Best Regards!
> lxw
>

Re: Some questions about Kylin2.0

Posted by ShaoFeng Shi <sh...@apache.org>.
For Q1, yes Kylin is running with Spark 1.6.3 now; Will upgrade to Spark
2.x in future releases.

For Q2, if you don't use Spark cubing, you can modify the start bash
scripts to disable the checking for Spark.

For Q3, can the host name "cdh5" be resolved on this machine?

2017-06-16 20:55 GMT+08:00 Alberto Ramón <a....@gmail.com>:

> About Q2:
> I'm agree with you, I think is a *issue*
> To start Kylin you must check exists one Source, one Engine and one Storage
> system
> (for example, is not necesary have Hive and Kafka)
>
> Example Spark
> <https://github.com/apache/kylin/blob/c38def7b53dae81f9fde0520b1fb27
> 0804dde728/build/bin/find-spark-dependency.sh#L38>
>
> On 16 June 2017 at 13:17, skyyws <sk...@163.com> wrote:
>
> > For Q3, you can try to make soft links for both hdfs-site.xml and
> > mapred-site.xml.
> >
> > 2017-06-16
> >
> > skyyws
> >
> >
> >
> > 发件人:"lxw" <lx...@qq.com>
> > 发送时间:2017-06-13 11:41
> > 主题:Some questions about Kylin2.0
> > 收件人:"dev"<de...@kylin.apache.org>
> > 抄送:
> >
> > Hi,All :
> >
> >    I have some questions about Kylin2.0, and my environment:
> >     hadoop-2.6.0-cdh5.8.3
> >     hbase-1.2.0-cdh5.8.3
> >     apache-kylin-2.0.0-bin-cdh57
> >     spark-2.1.0-bin-hadoop2.6
> >
> >
> > Q1: Kylin2.0 not support Spark2.0?
> >
> >      find-spark-dependency.sh:
> >      spark_dependency=`find -L $spark_home -name
> > 'spark-assembly-[a-z0-9A-Z\.-]*.jar' ....
> >
> >
> > Q2: I want to use Kylin2.0 without Spark Cubing, but failed.
> >
> >
> >      kylin.sh:
> >      function retrieveDependency() {
> >          #retrive $hive_dependency and $hbase_dependency
> >          source ${dir}/find-hive-dependency.sh
> >          source ${dir}/find-hbase-dependency.sh
> >          source ${dir}/find-hadoop-conf-dir.sh
> >          source ${dir}/find-kafka-dependency.sh
> >          source ${dir}/find-spark-dependency.sh
> >
> >
> >      If not found spark dependencies, Kylin can not start :
> >
> >      [hadoop@hadoop10 bin]$ ./kylin.sh start
> >      Retrieving hadoop conf dir...
> >      KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
> >      Retrieving hive dependency...
> >      Retrieving hbase dependency...
> >      Retrieving hadoop conf dir...
> >      Retrieving kafka dependency...
> >      Retrieving Spark dependency...
> >      spark assembly lib not found.
> >
> >
> >      after modify kylin.sh “####source ${dir}/find-spark-dependency.sh”,
> > Kylin start success ..
> >
> >
> > Q3: Abount kylin_hadoop_conf_dir ?
> >
> >      I make some soft link under $KYLIN_HOME/hadoop-conf
> > (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml),
> >      and set "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/
> hadoop-conf",
> > when I execute ./check-env.sh,
> >
> >
> >      [hadoop@hadoop10 bin]$ ./check-env.sh
> >      Retrieving hadoop conf dir...
> >     /home/bigdata/kylin/current/hadoop-conf is override as the
> > kylin_hadoop_conf_dir
> >     KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
> >     -mkdir: java.net.UnknownHostException: cdh5
> >     Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
> >     Failed to create /kylin20. Please make sure the user has right to
> > access /kylin20
> >
> >
> >     My HDFS with HA, fs.defaultFS is "cdh5",when I don't set
> > "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF,
> > HBASE_CONF_DIR from envionment variables (/etc/profile), it was correct.
> >
> >
> > Best Regards!
> > lxw
> >
>



-- 
Best regards,

Shaofeng Shi 史少锋

Re: Some questions about Kylin2.0

Posted by Alberto Ramón <a....@gmail.com>.
About Q2:
I'm agree with you, I think is a *issue*
To start Kylin you must check exists one Source, one Engine and one Storage
system
(for example, is not necesary have Hive and Kafka)

Example Spark
<https://github.com/apache/kylin/blob/c38def7b53dae81f9fde0520b1fb270804dde728/build/bin/find-spark-dependency.sh#L38>

On 16 June 2017 at 13:17, skyyws <sk...@163.com> wrote:

> For Q3, you can try to make soft links for both hdfs-site.xml and
> mapred-site.xml.
>
> 2017-06-16
>
> skyyws
>
>
>
> 发件人:"lxw" <lx...@qq.com>
> 发送时间:2017-06-13 11:41
> 主题:Some questions about Kylin2.0
> 收件人:"dev"<de...@kylin.apache.org>
> 抄送:
>
> Hi,All :
>
>    I have some questions about Kylin2.0, and my environment:
>     hadoop-2.6.0-cdh5.8.3
>     hbase-1.2.0-cdh5.8.3
>     apache-kylin-2.0.0-bin-cdh57
>     spark-2.1.0-bin-hadoop2.6
>
>
> Q1: Kylin2.0 not support Spark2.0?
>
>      find-spark-dependency.sh:
>      spark_dependency=`find -L $spark_home -name
> 'spark-assembly-[a-z0-9A-Z\.-]*.jar' ....
>
>
> Q2: I want to use Kylin2.0 without Spark Cubing, but failed.
>
>
>      kylin.sh:
>      function retrieveDependency() {
>          #retrive $hive_dependency and $hbase_dependency
>          source ${dir}/find-hive-dependency.sh
>          source ${dir}/find-hbase-dependency.sh
>          source ${dir}/find-hadoop-conf-dir.sh
>          source ${dir}/find-kafka-dependency.sh
>          source ${dir}/find-spark-dependency.sh
>
>
>      If not found spark dependencies, Kylin can not start :
>
>      [hadoop@hadoop10 bin]$ ./kylin.sh start
>      Retrieving hadoop conf dir...
>      KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
>      Retrieving hive dependency...
>      Retrieving hbase dependency...
>      Retrieving hadoop conf dir...
>      Retrieving kafka dependency...
>      Retrieving Spark dependency...
>      spark assembly lib not found.
>
>
>      after modify kylin.sh “####source ${dir}/find-spark-dependency.sh”,
> Kylin start success ..
>
>
> Q3: Abount kylin_hadoop_conf_dir ?
>
>      I make some soft link under $KYLIN_HOME/hadoop-conf
> (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml),
>      and set "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf",
> when I execute ./check-env.sh,
>
>
>      [hadoop@hadoop10 bin]$ ./check-env.sh
>      Retrieving hadoop conf dir...
>     /home/bigdata/kylin/current/hadoop-conf is override as the
> kylin_hadoop_conf_dir
>     KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
>     -mkdir: java.net.UnknownHostException: cdh5
>     Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
>     Failed to create /kylin20. Please make sure the user has right to
> access /kylin20
>
>
>     My HDFS with HA, fs.defaultFS is "cdh5",when I don't set
> "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF,
> HBASE_CONF_DIR from envionment variables (/etc/profile), it was correct.
>
>
> Best Regards!
> lxw
>

Re: Some questions about Kylin2.0

Posted by skyyws <sk...@163.com>.
For Q3, you can try to make soft links for both hdfs-site.xml and mapred-site.xml. 

2017-06-16 

skyyws 



发件人:"lxw" <lx...@qq.com>
发送时间:2017-06-13 11:41
主题:Some questions about Kylin2.0 
收件人:"dev"<de...@kylin.apache.org>
抄送:

Hi,All : 

   I have some questions about Kylin2.0, and my environment: 
    hadoop-2.6.0-cdh5.8.3 
    hbase-1.2.0-cdh5.8.3 
    apache-kylin-2.0.0-bin-cdh57 
    spark-2.1.0-bin-hadoop2.6 


Q1: Kylin2.0 not support Spark2.0? 
      
     find-spark-dependency.sh:  
     spark_dependency=`find -L $spark_home -name 'spark-assembly-[a-z0-9A-Z\.-]*.jar' .... 


Q2: I want to use Kylin2.0 without Spark Cubing, but failed. 


     kylin.sh: 
     function retrieveDependency() { 
         #retrive $hive_dependency and $hbase_dependency 
         source ${dir}/find-hive-dependency.sh 
         source ${dir}/find-hbase-dependency.sh 
         source ${dir}/find-hadoop-conf-dir.sh 
         source ${dir}/find-kafka-dependency.sh 
         source ${dir}/find-spark-dependency.sh 


     If not found spark dependencies, Kylin can not start : 
     
     [hadoop@hadoop10 bin]$ ./kylin.sh start 
     Retrieving hadoop conf dir... 
     KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current 
     Retrieving hive dependency... 
     Retrieving hbase dependency... 
     Retrieving hadoop conf dir... 
     Retrieving kafka dependency... 
     Retrieving Spark dependency... 
     spark assembly lib not found. 


     after modify kylin.sh “####source ${dir}/find-spark-dependency.sh”, Kylin start success .. 


Q3: Abount kylin_hadoop_conf_dir ? 
  
     I make some soft link under $KYLIN_HOME/hadoop-conf (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml),  
     and set "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf", when I execute ./check-env.sh,  


     [hadoop@hadoop10 bin]$ ./check-env.sh  
     Retrieving hadoop conf dir... 
    /home/bigdata/kylin/current/hadoop-conf is override as the kylin_hadoop_conf_dir 
    KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current 
    -mkdir: java.net.UnknownHostException: cdh5 
    Usage: hadoop fs [generic options] -mkdir [-p] <path> ... 
    Failed to create /kylin20. Please make sure the user has right to access /kylin20 


    My HDFS with HA, fs.defaultFS is "cdh5",when I don't set "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF, HBASE_CONF_DIR from envionment variables (/etc/profile), it was correct.  
      
         
Best Regards! 
lxw 

Some questions about Kylin2.0

Posted by lxw <lx...@qq.com>.
Hi,All :

   I have some questions about Kylin2.0, and my environment:
    hadoop-2.6.0-cdh5.8.3
    hbase-1.2.0-cdh5.8.3
    apache-kylin-2.0.0-bin-cdh57
    spark-2.1.0-bin-hadoop2.6


Q1: Kylin2.0 not support Spark2.0?
     
     find-spark-dependency.sh: 
     spark_dependency=`find -L $spark_home -name 'spark-assembly-[a-z0-9A-Z\.-]*.jar' ....


Q2: I want to use Kylin2.0 without Spark Cubing, but failed.


     kylin.sh:
     function retrieveDependency() {
         #retrive $hive_dependency and $hbase_dependency
         source ${dir}/find-hive-dependency.sh
         source ${dir}/find-hbase-dependency.sh
         source ${dir}/find-hadoop-conf-dir.sh
         source ${dir}/find-kafka-dependency.sh
         source ${dir}/find-spark-dependency.sh


     If not found spark dependencies, Kylin can not start :
    
     [hadoop@hadoop10 bin]$ ./kylin.sh start
     Retrieving hadoop conf dir...
     KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
     Retrieving hive dependency...
     Retrieving hbase dependency...
     Retrieving hadoop conf dir...
     Retrieving kafka dependency...
     Retrieving Spark dependency...
     spark assembly lib not found.


     after modify kylin.sh “####source ${dir}/find-spark-dependency.sh”, Kylin start success ..


Q3: Abount kylin_hadoop_conf_dir ?
 
     I make some soft link under $KYLIN_HOME/hadoop-conf (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml), 
     and set "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf", when I execute ./check-env.sh, 


     [hadoop@hadoop10 bin]$ ./check-env.sh 
     Retrieving hadoop conf dir...
    /home/bigdata/kylin/current/hadoop-conf is override as the kylin_hadoop_conf_dir
    KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
    -mkdir: java.net.UnknownHostException: cdh5
    Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
    Failed to create /kylin20. Please make sure the user has right to access /kylin20


    My HDFS with HA, fs.defaultFS is "cdh5",when I don't set "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF, HBASE_CONF_DIR from envionment variables (/etc/profile), it was correct. 
     
        
Best Regards!
lxw

Re: 答复: SQL "NOT IN" returns incorrect result

Posted by Li Yang <li...@apache.org>.
Looking at KYLIN-2212, it is still different from what lwx reports.

I created https://issues.apache.org/jira/browse/KYLIN-2214 for lwx's issue.



On Fri, Nov 18, 2016 at 5:28 PM, Li Yang <li...@apache.org> wrote:

> Nice catch! Thanks!
>
> On Fri, Nov 18, 2016 at 3:17 PM, Donald,Zheng(vip.com) <
> donald.zheng@vipshop.com> wrote:
>
>> We meet the same problem. I have logged a JIRA issue
>> https://issues.apache.org/jira/browse/KYLIN-2212
>>
>> 发件人: lxw [mailto:lxw1234@qq.com]
>> 发送时间: 2016年11月18日 14:01
>> 收件人: dev; user
>> 主题: SQL "NOT IN" returns incorrect result
>>
>> Hi,
>>
>>    When I use "NOT IN" in where clause, it returns incorrect result, and
>> instead use "<>", then result is OK.
>>
>> Raw data;
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> GROUP by c.ad_place_type;
>>
>> --results
>> wap 64578476
>> app 70764413
>> pc  3398137
>> unknown 419942
>>
>>
>> SQL1(correct) :
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> AND c.ad_place_type <> 'pc'
>> GROUP by c.ad_place_type;
>>
>> --
>> wap 64578476
>> app 70764413
>> unknown 419942
>>
>> SQL2(incorrect):
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> AND c.ad_place_type NOT IN ('pc')
>> GROUP by c.ad_place_type;
>>
>> --
>> wap 4718980
>> app 33253424
>> unknown 90533
>>
>> SQL3(incorrect):
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> AND c.ad_place_type NOT IN ('app','wap')
>> GROUP by c.ad_place_type;
>>
>> --
>> result(0)  ,
>>
>> The correct result should be "pc  3398137" and "unknown 419942",
>> when instead use "(c.ad_place_type  <> 'app' AND c.ad_place_type <>
>> 'wap')", the result is OK.
>>
>>
>> Who can help me to explain this, thanks!
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本
>> 电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only
>> for the addressee(s) and may contain information that is privileged and
>> confidential. You are hereby notified that, if you are not an intended
>> recipient listed above, or an authorized employee or agent of an addressee
>> of this communication responsible for delivering e-mail messages to an
>> intended recipient, any dissemination, distribution or reproduction of this
>> communication (including any attachments hereto) is strictly prohibited. If
>> you have received this communication in error, please notify us immediately
>> by a reply e-mail addressed to the sender and permanently delete the
>> original e-mail communication and any attachments from all storage devices
>> without making or otherwise retaining a copy.
>>
>
>

Re: 答复: SQL "NOT IN" returns incorrect result

Posted by Li Yang <li...@apache.org>.
Looking at KYLIN-2212, it is still different from what lwx reports.

I created https://issues.apache.org/jira/browse/KYLIN-2214 for lwx's issue.



On Fri, Nov 18, 2016 at 5:28 PM, Li Yang <li...@apache.org> wrote:

> Nice catch! Thanks!
>
> On Fri, Nov 18, 2016 at 3:17 PM, Donald,Zheng(vip.com) <
> donald.zheng@vipshop.com> wrote:
>
>> We meet the same problem. I have logged a JIRA issue
>> https://issues.apache.org/jira/browse/KYLIN-2212
>>
>> 发件人: lxw [mailto:lxw1234@qq.com]
>> 发送时间: 2016年11月18日 14:01
>> 收件人: dev; user
>> 主题: SQL "NOT IN" returns incorrect result
>>
>> Hi,
>>
>>    When I use "NOT IN" in where clause, it returns incorrect result, and
>> instead use "<>", then result is OK.
>>
>> Raw data;
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> GROUP by c.ad_place_type;
>>
>> --results
>> wap 64578476
>> app 70764413
>> pc  3398137
>> unknown 419942
>>
>>
>> SQL1(correct) :
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> AND c.ad_place_type <> 'pc'
>> GROUP by c.ad_place_type;
>>
>> --
>> wap 64578476
>> app 70764413
>> unknown 419942
>>
>> SQL2(incorrect):
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> AND c.ad_place_type NOT IN ('pc')
>> GROUP by c.ad_place_type;
>>
>> --
>> wap 4718980
>> app 33253424
>> unknown 90533
>>
>> SQL3(incorrect):
>>
>> SELECT
>> c.ad_place_type,
>> COUNT(1) as cnt
>> FROM fact_table a
>> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
>> WHERE pt = '2016-11-17'
>> AND c.ad_place_type NOT IN ('app','wap')
>> GROUP by c.ad_place_type;
>>
>> --
>> result(0)  ,
>>
>> The correct result should be "pc  3398137" and "unknown 419942",
>> when instead use "(c.ad_place_type  <> 'app' AND c.ad_place_type <>
>> 'wap')", the result is OK.
>>
>>
>> Who can help me to explain this, thanks!
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本
>> 电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only
>> for the addressee(s) and may contain information that is privileged and
>> confidential. You are hereby notified that, if you are not an intended
>> recipient listed above, or an authorized employee or agent of an addressee
>> of this communication responsible for delivering e-mail messages to an
>> intended recipient, any dissemination, distribution or reproduction of this
>> communication (including any attachments hereto) is strictly prohibited. If
>> you have received this communication in error, please notify us immediately
>> by a reply e-mail addressed to the sender and permanently delete the
>> original e-mail communication and any attachments from all storage devices
>> without making or otherwise retaining a copy.
>>
>
>

Re: 答复: SQL "NOT IN" returns incorrect result

Posted by Li Yang <li...@apache.org>.
Nice catch! Thanks!

On Fri, Nov 18, 2016 at 3:17 PM, Donald,Zheng(vip.com) <
donald.zheng@vipshop.com> wrote:

> We meet the same problem. I have logged a JIRA issue
> https://issues.apache.org/jira/browse/KYLIN-2212
>
> 发件人: lxw [mailto:lxw1234@qq.com]
> 发送时间: 2016年11月18日 14:01
> 收件人: dev; user
> 主题: SQL "NOT IN" returns incorrect result
>
> Hi,
>
>    When I use "NOT IN" in where clause, it returns incorrect result, and
> instead use "<>", then result is OK.
>
> Raw data;
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> GROUP by c.ad_place_type;
>
> --results
> wap 64578476
> app 70764413
> pc  3398137
> unknown 419942
>
>
> SQL1(correct) :
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> AND c.ad_place_type <> 'pc'
> GROUP by c.ad_place_type;
>
> --
> wap 64578476
> app 70764413
> unknown 419942
>
> SQL2(incorrect):
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> AND c.ad_place_type NOT IN ('pc')
> GROUP by c.ad_place_type;
>
> --
> wap 4718980
> app 33253424
> unknown 90533
>
> SQL3(incorrect):
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> AND c.ad_place_type NOT IN ('app','wap')
> GROUP by c.ad_place_type;
>
> --
> result(0)  ,
>
> The correct result should be "pc  3398137" and "unknown 419942",
> when instead use "(c.ad_place_type  <> 'app' AND c.ad_place_type <>
> 'wap')", the result is OK.
>
>
> Who can help me to explain this, thanks!
>
>
>
>
>
>
>
>
>
>
> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
> This communication is intended only for the addressee(s) and may contain
> information that is privileged and confidential. You are hereby notified
> that, if you are not an intended recipient listed above, or an authorized
> employee or agent of an addressee of this communication responsible for
> delivering e-mail messages to an intended recipient, any dissemination,
> distribution or reproduction of this communication (including any
> attachments hereto) is strictly prohibited. If you have received this
> communication in error, please notify us immediately by a reply e-mail
> addressed to the sender and permanently delete the original e-mail
> communication and any attachments from all storage devices without making
> or otherwise retaining a copy.
>

Re: 答复: SQL "NOT IN" returns incorrect result

Posted by Li Yang <li...@apache.org>.
Nice catch! Thanks!

On Fri, Nov 18, 2016 at 3:17 PM, Donald,Zheng(vip.com) <
donald.zheng@vipshop.com> wrote:

> We meet the same problem. I have logged a JIRA issue
> https://issues.apache.org/jira/browse/KYLIN-2212
>
> 发件人: lxw [mailto:lxw1234@qq.com]
> 发送时间: 2016年11月18日 14:01
> 收件人: dev; user
> 主题: SQL "NOT IN" returns incorrect result
>
> Hi,
>
>    When I use "NOT IN" in where clause, it returns incorrect result, and
> instead use "<>", then result is OK.
>
> Raw data;
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> GROUP by c.ad_place_type;
>
> --results
> wap 64578476
> app 70764413
> pc  3398137
> unknown 419942
>
>
> SQL1(correct) :
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> AND c.ad_place_type <> 'pc'
> GROUP by c.ad_place_type;
>
> --
> wap 64578476
> app 70764413
> unknown 419942
>
> SQL2(incorrect):
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> AND c.ad_place_type NOT IN ('pc')
> GROUP by c.ad_place_type;
>
> --
> wap 4718980
> app 33253424
> unknown 90533
>
> SQL3(incorrect):
>
> SELECT
> c.ad_place_type,
> COUNT(1) as cnt
> FROM fact_table a
> LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
> WHERE pt = '2016-11-17'
> AND c.ad_place_type NOT IN ('app','wap')
> GROUP by c.ad_place_type;
>
> --
> result(0)  ,
>
> The correct result should be "pc  3398137" and "unknown 419942",
> when instead use "(c.ad_place_type  <> 'app' AND c.ad_place_type <>
> 'wap')", the result is OK.
>
>
> Who can help me to explain this, thanks!
>
>
>
>
>
>
>
>
>
>
> 本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作!
> This communication is intended only for the addressee(s) and may contain
> information that is privileged and confidential. You are hereby notified
> that, if you are not an intended recipient listed above, or an authorized
> employee or agent of an addressee of this communication responsible for
> delivering e-mail messages to an intended recipient, any dissemination,
> distribution or reproduction of this communication (including any
> attachments hereto) is strictly prohibited. If you have received this
> communication in error, please notify us immediately by a reply e-mail
> addressed to the sender and permanently delete the original e-mail
> communication and any attachments from all storage devices without making
> or otherwise retaining a copy.
>

答复: SQL "NOT IN" returns incorrect result

Posted by "Donald,Zheng(vip.com)" <do...@vipshop.com>.
We meet the same problem. I have logged a JIRA issue  https://issues.apache.org/jira/browse/KYLIN-2212

发件人: lxw [mailto:lxw1234@qq.com]
发送时间: 2016年11月18日 14:01
收件人: dev; user
主题: SQL "NOT IN" returns incorrect result

Hi,

   When I use "NOT IN" in where clause, it returns incorrect result, and instead use "<>", then result is OK.

Raw data;

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
GROUP by c.ad_place_type;

--results
wap 64578476
app 70764413
pc  3398137
unknown 419942


SQL1(correct) :

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
AND c.ad_place_type <> 'pc'
GROUP by c.ad_place_type;

--
wap 64578476
app 70764413
unknown 419942

SQL2(incorrect):

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
AND c.ad_place_type NOT IN ('pc')
GROUP by c.ad_place_type;

--
wap 4718980
app 33253424
unknown 90533

SQL3(incorrect):

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
AND c.ad_place_type NOT IN ('app','wap')
GROUP by c.ad_place_type;

--
result(0)  ,

The correct result should be "pc  3398137" and "unknown 419942",
when instead use "(c.ad_place_type  <> 'app' AND c.ad_place_type <> 'wap')", the result is OK.


Who can help me to explain this, thanks!










本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.

答复: SQL "NOT IN" returns incorrect result

Posted by "Donald,Zheng(vip.com)" <do...@vipshop.com>.
We meet the same problem. I have logged a JIRA issue  https://issues.apache.org/jira/browse/KYLIN-2212

发件人: lxw [mailto:lxw1234@qq.com]
发送时间: 2016年11月18日 14:01
收件人: dev; user
主题: SQL "NOT IN" returns incorrect result

Hi,

   When I use "NOT IN" in where clause, it returns incorrect result, and instead use "<>", then result is OK.

Raw data;

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
GROUP by c.ad_place_type;

--results
wap 64578476
app 70764413
pc  3398137
unknown 419942


SQL1(correct) :

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
AND c.ad_place_type <> 'pc'
GROUP by c.ad_place_type;

--
wap 64578476
app 70764413
unknown 419942

SQL2(incorrect):

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
AND c.ad_place_type NOT IN ('pc')
GROUP by c.ad_place_type;

--
wap 4718980
app 33253424
unknown 90533

SQL3(incorrect):

SELECT
c.ad_place_type,
COUNT(1) as cnt
FROM fact_table a
LEFT JOIN dim_table as c ON (ad_id = crt_id AND mtid = mtid_)
WHERE pt = '2016-11-17'
AND c.ad_place_type NOT IN ('app','wap')
GROUP by c.ad_place_type;

--
result(0)  ,

The correct result should be "pc  3398137" and "unknown 419942",
when instead use "(c.ad_place_type  <> 'app' AND c.ad_place_type <> 'wap')", the result is OK.


Who can help me to explain this, thanks!










本电子邮件可能为保密文件。如果阁下非电子邮件所指定之收件人,谨请立即通知本人。敬请阁下不要使用、保存、复印、打印、散布本电子邮件及其内容,或将其用于其他任何目的或向任何人披露。谢谢您的合作! This communication is intended only for the addressee(s) and may contain information that is privileged and confidential. You are hereby notified that, if you are not an intended recipient listed above, or an authorized employee or agent of an addressee of this communication responsible for delivering e-mail messages to an intended recipient, any dissemination, distribution or reproduction of this communication (including any attachments hereto) is strictly prohibited. If you have received this communication in error, please notify us immediately by a reply e-mail addressed to the sender and permanently delete the original e-mail communication and any attachments from all storage devices without making or otherwise retaining a copy.

Some questions about Kylin2.0

Posted by lxw <lx...@qq.com>.
Hi,All :

   I have some questions about Kylin2.0, and my environment:
    hadoop-2.6.0-cdh5.8.3
    hbase-1.2.0-cdh5.8.3
    apache-kylin-2.0.0-bin-cdh57
    spark-2.1.0-bin-hadoop2.6


Q1: Kylin2.0 not support Spark2.0?
     
     find-spark-dependency.sh: 
     spark_dependency=`find -L $spark_home -name 'spark-assembly-[a-z0-9A-Z\.-]*.jar' ....


Q2: I want to use Kylin2.0 without Spark Cubing, but failed.


     kylin.sh:
     function retrieveDependency() {
         #retrive $hive_dependency and $hbase_dependency
         source ${dir}/find-hive-dependency.sh
         source ${dir}/find-hbase-dependency.sh
         source ${dir}/find-hadoop-conf-dir.sh
         source ${dir}/find-kafka-dependency.sh
         source ${dir}/find-spark-dependency.sh


     If not found spark dependencies, Kylin can not start :
    
     [hadoop@hadoop10 bin]$ ./kylin.sh start
     Retrieving hadoop conf dir...
     KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
     Retrieving hive dependency...
     Retrieving hbase dependency...
     Retrieving hadoop conf dir...
     Retrieving kafka dependency...
     Retrieving Spark dependency...
     spark assembly lib not found.


     after modify kylin.sh “####source ${dir}/find-spark-dependency.sh”, Kylin start success ..


Q3: Abount kylin_hadoop_conf_dir ?
 
     I make some soft link under $KYLIN_HOME/hadoop-conf (core-site.xml、yarn-site.xml、hbase-site.xml、hive-site.xml), 
     and set "kylin.env.hadoop-conf-dir=/home/bigdata/kylin/current/hadoop-conf", when I execute ./check-env.sh, 


     [hadoop@hadoop10 bin]$ ./check-env.sh 
     Retrieving hadoop conf dir...
    /home/bigdata/kylin/current/hadoop-conf is override as the kylin_hadoop_conf_dir
    KYLIN_HOME is set to /home/hadoop/bigdata/kylin/current
    -mkdir: java.net.UnknownHostException: cdh5
    Usage: hadoop fs [generic options] -mkdir [-p] <path> ...
    Failed to create /kylin20. Please make sure the user has right to access /kylin20


    My HDFS with HA, fs.defaultFS is "cdh5",when I don't set "kylin.env.hadoop-conf-dir", and use HADOOP_CONF_DIR, HIVE_CONF, HBASE_CONF_DIR from envionment variables (/etc/profile), it was correct. 
     
        
Best Regards!
lxw