You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by geda <be...@gmail.com> on 2016/12/15 02:23:44 UTC

Re: carbondata test join question

hi,thanks
like table a has 200+filelds , this sql use id ,v_id are is positin
 3,4,should i put it in the first or second
id  or v_id cardinality id 3w~7w ,

2016-12-15 9:29 GMT+08:00 杰 [via Apache CarbonData Mailing List archive] <
ml-node+s1130556n4442h26@n5.nabble.com>:

> hi, geda
>  can u share ur create ddl?
>  some suggestion: for that filter field (like id), u can try to put in
> left column and use dictionary_include or exclude to make it dimension. if
>  cardinality more than 100 thousand, u can try to make it no dictionary.
>  as for ur question, if all the dimensions make no dictionary, there will
> be no decode part.
>
>
> thanks
> Jay
>
>
>
>
>
> ------------------ Original ------------------
> From:  "geda";<[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4442&i=0>>;
> Date:  Wed, Dec 14, 2016 11:10 PM
> To:  "dev"<[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4442&i=1>>;
>
> Subject:  carbondata test join  question
>
>
>
> hello
> i  want to test orc ,carbon , which is faster
> test on yarn with 8 executors, each executor 4G,each 2 core
> spark1.6 ,carbon2.0
> sql : join  3 table
> A :300W row,5GB
> B:13W row,30MB
> C:7W row,10MB
> like this:
> select b.id ,b.d_name ,a.v_no,count(*) o_count  from    a left join  b on
> a.d_id=b.id left join    c on  c.v_no=a.v_no  where
>  date(a.create_time)>=
> '2016-07-01' and  date(a.create_time)<= '2016-09-02' group by  b.id ,
> b.d_name, a.v_no  having  o_count> 30 order by b.id desc
> use context
> cc.sql($SQL).show() : carbondata :run 5times  avg time :7.3s
> hiveContext.sql($SQL).show() : ORC : run 5times  avg time:5.3s
> i find from DAG ,carbon has a more job ,do carbon decode ,finish this job
> cause 2-3s spend
> if strip this job ,carbon and orc  use time more or less the same
> i want to know how to strip  the last stage or how to tune sql like this
> .Thanks
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/jobtrace.png>
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/laststage-carbon.png>
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/laststage-orc.png>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/carbondata-test-
> join-question-tp4440.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/carbondata-test-join-question-tp4440p4442.html
> To unsubscribe from carbondata test join question, click here
> <http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4440&code=YmVpZG91NzdAZ21haWwuY29tfDQ0NDB8MTU5NTU5NDExOQ==>
> .
> NAML
> <http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Re-carbondata-test-join-question-tp4447.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.

回复: carbondata test join question

Posted by Jay <25...@qq.com>.
hi, genda


 yes. suggest to move to the first or second. 
the basic principle is,   should from left to right with cardinality increasing,
 at the same time, if it is a commonly used filter field, u should make it dimension and put left .


thanks
Jay 




------------------ 原始邮件 ------------------
发件人: "geda";<be...@gmail.com>;
发送时间: 2016年12月15日(星期四) 上午10:23
收件人: "dev"<de...@carbondata.incubator.apache.org>; 

主题: Re: carbondata test join question



hi,thanks
like table a has 200+filelds , this sql use id ,v_id are is positin
 3,4,should i put it in the first or second
id  or v_id cardinality id 3w~7w ,

2016-12-15 9:29 GMT+08:00 杰 [via Apache CarbonData Mailing List archive] <
ml-node+s1130556n4442h26@n5.nabble.com>:

> hi, geda
>  can u share ur create ddl?
>  some suggestion: for that filter field (like id), u can try to put in
> left column and use dictionary_include or exclude to make it dimension. if
>  cardinality more than 100 thousand, u can try to make it no dictionary.
>  as for ur question, if all the dimensions make no dictionary, there will
> be no decode part.
>
>
> thanks
> Jay
>
>
>
>
>
> ------------------ Original ------------------
> From:  "geda";<[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4442&i=0>>;
> Date:  Wed, Dec 14, 2016 11:10 PM
> To:  "dev"<[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4442&i=1>>;
>
> Subject:  carbondata test join  question
>
>
>
> hello
> i  want to test orc ,carbon , which is faster
> test on yarn with 8 executors, each executor 4G,each 2 core
> spark1.6 ,carbon2.0
> sql : join  3 table
> A :300W row,5GB
> B:13W row,30MB
> C:7W row,10MB
> like this:
> select b.id ,b.d_name ,a.v_no,count(*) o_count  from    a left join  b on
> a.d_id=b.id left join    c on  c.v_no=a.v_no  where
>  date(a.create_time)>=
> '2016-07-01' and  date(a.create_time)<= '2016-09-02' group by  b.id ,
> b.d_name, a.v_no  having  o_count> 30 order by b.id desc
> use context
> cc.sql($SQL).show() : carbondata :run 5times  avg time :7.3s
> hiveContext.sql($SQL).show() : ORC : run 5times  avg time:5.3s
> i find from DAG ,carbon has a more job ,do carbon decode ,finish this job
> cause 2-3s spend
> if strip this job ,carbon and orc  use time more or less the same
> i want to know how to strip  the last stage or how to tune sql like this
> .Thanks
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/jobtrace.png>
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/laststage-carbon.png>
> <http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/file/n4440/laststage-orc.png>
>
>
>
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/carbondata-test-
> join-question-tp4440.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-carbondata-mailing-list-archive.1130556.
> n5.nabble.com/carbondata-test-join-question-tp4440p4442.html
> To unsubscribe from carbondata test join question, click here
> <http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4440&code=YmVpZG91NzdAZ21haWwuY29tfDQ0NDB8MTU5NTU5NDExOQ==>
> .
> NAML
> <http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Re-carbondata-test-join-question-tp4447.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.