You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by Yinwei Li <25...@qq.com> on 2017/02/07 03:07:23 UTC
Discussion about getting excution duration about a query when using sparkshell+carbondata
Hi all,
When we are using sparkshell + carbondata to send a query, how can we get the excution duration? Some topics are thrown as follows:
1. One query can produce one or more jobs, and some of the jobs may have DAG dependence, thus we can't get the excution duration by sum up all the jobs' duration or get the max duration of the jobs roughly.
2. In the spark shell console or spark application web ui, we can get each job's duration, but we can't get the carbondata-query directly, if some improvement would take by carbondata in the near future.
3. Maybe we can use the following command to get a approximate result:
scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end = new Date();
Any other opinions?
Re: Discussion about getting excution duration about a query when
using sparkshell+carbondata
Posted by Ravindra Pesala <ra...@gmail.com>.
Hi Libis,
spark-sql CLI is not supported by carbondata.
Why don't you use carbon thrift server and beeline, it is also same as
spark-sql CLI and it gives execution time for each query.
Start carbondata thrift server script.
bin/spark-submit --class
org.apache.carbondata.spark.thriftserver.CarbonThriftServer <carbondata
jar file> <store-location>
beeline script
bin/beeline -u jdbc:hive2://localhost:10000
Regards,
Ravindra
On 9 February 2017 at 07:55, 范范欣欣 <li...@gmail.com> wrote:
> Hi
>
> Now i can use carbondata 1.0.0 with spark-shell(spark 2.1) as:
>
> ./bin/spark-shell --jars <carbondata assembly jar path>
>
> but it's inconvenient to get the query time , so i try to use
> ./bin/spark-sql --jars <carbondata assembly jar path>,but i found some
> errors when create table :
>
> spark-sql> create table if not exists test_table(id string, name string,
> city string, age int) stored by 'carbondata';
> Error in query:
> Operation not allowed:STORED BY(line 1, pos 87)
>
> it seems that the carbondata jar is not load successfully. How can i use
> ./bin/spark-sql?
>
> Regards
>
> Libis
>
>
>
> 2017-02-07 13:16 GMT+08:00 Liang Chen <ch...@gmail.com>:
>
> > Hi
> >
> > I used the below method in spark shell for DEMO, for your reference:
> >
> > import org.apache.spark.sql.catalyst.util._
> >
> > benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
> > and $"province" === "NB" and $"singler" === "false").count }
> >
> >
> > Regards
> >
> > Liang
> >
> > 2017-02-06 22:07 GMT-05:00 Yinwei Li <25...@qq.com>:
> >
> > > Hi all,
> > >
> > >
> > > When we are using sparkshell + carbondata to send a query, how can we
> > > get the excution duration? Some topics are thrown as follows:
> > >
> > >
> > > 1. One query can produce one or more jobs, and some of the jobs may
> > have
> > > DAG dependence, thus we can't get the excution duration by sum up all
> the
> > > jobs' duration or get the max duration of the jobs roughly.
> > >
> > >
> > > 2. In the spark shell console or spark application web ui, we can get
> > > each job's duration, but we can't get the carbondata-query directly, if
> > > some improvement would take by carbondata in the near future.
> > >
> > >
> > > 3. Maybe we can use the following command to get a approximate
> result:
> > >
> > >
> > > scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val
> end =
> > > new Date();
> > >
> > >
> > > Any other opinions?
> >
> >
> >
> >
> > --
> > Regards
> > Liang
> >
>
--
Thanks & Regards,
Ravi
Re: Discussion about getting excution duration about a query when
using sparkshell+carbondata
Posted by 范范欣欣 <li...@gmail.com>.
Hi
Now i can use carbondata 1.0.0 with spark-shell(spark 2.1) as:
./bin/spark-shell --jars <carbondata assembly jar path>
but it's inconvenient to get the query time , so i try to use
./bin/spark-sql --jars <carbondata assembly jar path>,but i found some
errors when create table :
spark-sql> create table if not exists test_table(id string, name string,
city string, age int) stored by 'carbondata';
Error in query:
Operation not allowed:STORED BY(line 1, pos 87)
it seems that the carbondata jar is not load successfully. How can i use
./bin/spark-sql?
Regards
Libis
2017-02-07 13:16 GMT+08:00 Liang Chen <ch...@gmail.com>:
> Hi
>
> I used the below method in spark shell for DEMO, for your reference:
>
> import org.apache.spark.sql.catalyst.util._
>
> benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
> and $"province" === "NB" and $"singler" === "false").count }
>
>
> Regards
>
> Liang
>
> 2017-02-06 22:07 GMT-05:00 Yinwei Li <25...@qq.com>:
>
> > Hi all,
> >
> >
> > When we are using sparkshell + carbondata to send a query, how can we
> > get the excution duration? Some topics are thrown as follows:
> >
> >
> > 1. One query can produce one or more jobs, and some of the jobs may
> have
> > DAG dependence, thus we can't get the excution duration by sum up all the
> > jobs' duration or get the max duration of the jobs roughly.
> >
> >
> > 2. In the spark shell console or spark application web ui, we can get
> > each job's duration, but we can't get the carbondata-query directly, if
> > some improvement would take by carbondata in the near future.
> >
> >
> > 3. Maybe we can use the following command to get a approximate result:
> >
> >
> > scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end =
> > new Date();
> >
> >
> > Any other opinions?
>
>
>
>
> --
> Regards
> Liang
>
Re: Discussion about getting excution duration about a query when
using sparkshell+carbondata
Posted by Liang Chen <ch...@gmail.com>.
Hi
I used the below method in spark shell for DEMO, for your reference:
import org.apache.spark.sql.catalyst.util._
benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
and $"province" === "NB" and $"singler" === "false").count }
Regards
Liang
2017-02-06 22:07 GMT-05:00 Yinwei Li <25...@qq.com>:
> Hi all,
>
>
> When we are using sparkshell + carbondata to send a query, how can we
> get the excution duration? Some topics are thrown as follows:
>
>
> 1. One query can produce one or more jobs, and some of the jobs may have
> DAG dependence, thus we can't get the excution duration by sum up all the
> jobs' duration or get the max duration of the jobs roughly.
>
>
> 2. In the spark shell console or spark application web ui, we can get
> each job's duration, but we can't get the carbondata-query directly, if
> some improvement would take by carbondata in the near future.
>
>
> 3. Maybe we can use the following command to get a approximate result:
>
>
> scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end =
> new Date();
>
>
> Any other opinions?
--
Regards
Liang