You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@carbondata.apache.org by Yinwei Li <25...@qq.com> on 2017/02/07 03:07:23 UTC

Discussion about getting excution duration about a query when using sparkshell+carbondata

Hi all,


  When we are using sparkshell + carbondata to send a query, how can we get the excution duration? Some topics are thrown as follows:


  1. One query can produce one or more jobs, and some of the jobs may have DAG dependence, thus we can't get the excution duration by sum up all the jobs' duration or get the max duration of the jobs roughly.


  2. In the spark shell console or spark application web ui, we can get each job's duration, but we can't get the carbondata-query directly, if some improvement would take by carbondata in the near future.


  3. Maybe we can use the following command to get a approximate result:


    scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end = new Date();


  Any other opinions?

Re: Discussion about getting excution duration about a query when using sparkshell+carbondata

Posted by Ravindra Pesala <ra...@gmail.com>.

Hi Libis,

spark-sql CLI is not supported by carbondata.
Why don't you use carbon thrift server and beeline, it is also same as
spark-sql CLI and it gives execution time for each query.

Start carbondata thrift server script.
bin/spark-submit --class
org.apache.carbondata.spark.thriftserver.CarbonThriftServer  <carbondata
jar file> <store-location>

beeline script
bin/beeline -u jdbc:hive2://localhost:10000

Regards,
Ravindra

On 9 February 2017 at 07:55, 范范欣欣 <li...@gmail.com> wrote:

> Hi
>
> Now i can use carbondata 1.0.0 with spark-shell(spark 2.1) as:
>
> ./bin/spark-shell --jars <carbondata assembly jar path>
>
> but it's inconvenient to get the query time , so i try to use
> ./bin/spark-sql --jars  <carbondata assembly jar path>,but i found some
> errors when create table :
>
> spark-sql> create table if not exists test_table(id string, name string,
> city string, age int) stored by 'carbondata';
> Error in query:
> Operation not allowed:STORED BY(line 1, pos 87)
>
> it seems that the carbondata jar is not load successfully. How can i use
> ./bin/spark-sql?
>
> Regards
>
> Libis
>
>
>
> 2017-02-07 13:16 GMT+08:00 Liang Chen <ch...@gmail.com>:
>
> > Hi
> >
> > I used the below method in spark shell for DEMO, for your reference:
> >
> > import org.apache.spark.sql.catalyst.util._
> >
> > benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
> > and $"province" === "NB" and $"singler" === "false").count }
> >
> >
> > Regards
> >
> > Liang
> >
> > 2017-02-06 22:07 GMT-05:00 Yinwei Li <25...@qq.com>:
> >
> > > Hi all,
> > >
> > >
> > >   When we are using sparkshell + carbondata to send a query, how can we
> > > get the excution duration? Some topics are thrown as follows:
> > >
> > >
> > >   1. One query can produce one or more jobs, and some of the jobs may
> > have
> > > DAG dependence, thus we can't get the excution duration by sum up all
> the
> > > jobs' duration or get the max duration of the jobs roughly.
> > >
> > >
> > >   2. In the spark shell console or spark application web ui, we can get
> > > each job's duration, but we can't get the carbondata-query directly, if
> > > some improvement would take by carbondata in the near future.
> > >
> > >
> > >   3. Maybe we can use the following command to get a approximate
> result:
> > >
> > >
> > >     scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val
> end =
> > > new Date();
> > >
> > >
> > >   Any other opinions?
> >
> >
> >
> >
> > --
> > Regards
> > Liang
> >
>



-- 
Thanks & Regards,
Ravi

Re: Discussion about getting excution duration about a query when using sparkshell+carbondata

Posted by 范范欣欣 <li...@gmail.com>.

Hi

Now i can use carbondata 1.0.0 with spark-shell(spark 2.1) as:

./bin/spark-shell --jars <carbondata assembly jar path>

but it's inconvenient to get the query time , so i try to use
./bin/spark-sql --jars  <carbondata assembly jar path>,but i found some
errors when create table :

spark-sql> create table if not exists test_table(id string, name string,
city string, age int) stored by 'carbondata';
Error in query:
Operation not allowed:STORED BY(line 1, pos 87)

it seems that the carbondata jar is not load successfully. How can i use
./bin/spark-sql?

Regards

Libis



2017-02-07 13:16 GMT+08:00 Liang Chen <ch...@gmail.com>:

> Hi
>
> I used the below method in spark shell for DEMO, for your reference:
>
> import org.apache.spark.sql.catalyst.util._
>
> benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
> and $"province" === "NB" and $"singler" === "false").count }
>
>
> Regards
>
> Liang
>
> 2017-02-06 22:07 GMT-05:00 Yinwei Li <25...@qq.com>:
>
> > Hi all,
> >
> >
> >   When we are using sparkshell + carbondata to send a query, how can we
> > get the excution duration? Some topics are thrown as follows:
> >
> >
> >   1. One query can produce one or more jobs, and some of the jobs may
> have
> > DAG dependence, thus we can't get the excution duration by sum up all the
> > jobs' duration or get the max duration of the jobs roughly.
> >
> >
> >   2. In the spark shell console or spark application web ui, we can get
> > each job's duration, but we can't get the carbondata-query directly, if
> > some improvement would take by carbondata in the near future.
> >
> >
> >   3. Maybe we can use the following command to get a approximate result:
> >
> >
> >     scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end =
> > new Date();
> >
> >
> >   Any other opinions?
>
>
>
>
> --
> Regards
> Liang
>

Re: Discussion about getting excution duration about a query when using sparkshell+carbondata

Posted by Liang Chen <ch...@gmail.com>.

Hi

I used the below method in spark shell for DEMO, for your reference:

import org.apache.spark.sql.catalyst.util._

benchmark { carbondf.filter($"name" === "Allen" and $"gender" === "Male"
and $"province" === "NB" and $"singler" === "false").count }


Regards

Liang

2017-02-06 22:07 GMT-05:00 Yinwei Li <25...@qq.com>:

> Hi all,
>
>
>   When we are using sparkshell + carbondata to send a query, how can we
> get the excution duration? Some topics are thrown as follows:
>
>
>   1. One query can produce one or more jobs, and some of the jobs may have
> DAG dependence, thus we can't get the excution duration by sum up all the
> jobs' duration or get the max duration of the jobs roughly.
>
>
>   2. In the spark shell console or spark application web ui, we can get
> each job's duration, but we can't get the carbondata-query directly, if
> some improvement would take by carbondata in the near future.
>
>
>   3. Maybe we can use the following command to get a approximate result:
>
>
>     scala > val begin = new Date();cc.sql("$SQL_COMMAND").show;val end =
> new Date();
>
>
>   Any other opinions?




-- 
Regards
Liang