You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Louis Hust <lo...@gmail.com> on 2015/07/22 12:14:21 UTC

Is spark suitable for real time query

Hi, all

I am using spark jar in standalone mode, fetch data from different mysql
instance and do some action, but i found the time is at second level.

So i want to know if spark job is suitable for real time query which at
microseconds?

Re: Is spark suitable for real time query

Posted by Igor Berman <ig...@gmail.com>.

you can use spark rest job server(or any other solution that provides long
running spark context) so that you won't pay this bootstrap time on each
query
in addition : if you have some rdd that u want your queries to be executed
on, you can cache this rdd in memory(depends on ur cluster memory size) so
that you wont pay reading from disk time


On 22 July 2015 at 14:46, Louis Hust <lo...@gmail.com> wrote:

> I do a simple test using spark in standalone mode(not cluster),
>  and found a simple action take a few seconds, the data size is small,
> just few rows.
> So each spark job will cost some time for init or prepare work no matter
> what the job is?
> I mean if the basic framework of spark job will cost seconds?
>
> 2015-07-22 19:17 GMT+08:00 Robin East <ro...@xense.co.uk>:
>
>> Real-time is, of course, relative but you’ve mentioned microsecond level.
>> Spark is designed to process large amounts of data in a distributed
>> fashion. No distributed system I know of could give any kind of guarantees
>> at the microsecond level.
>>
>> Robin
>>
>> > On 22 Jul 2015, at 11:14, Louis Hust <lo...@gmail.com> wrote:
>> >
>> > Hi, all
>> >
>> > I am using spark jar in standalone mode, fetch data from different
>> mysql instance and do some action, but i found the time is at second level.
>> >
>> > So i want to know if spark job is suitable for real time query which at
>> microseconds?
>>
>>
>

R: Is spark suitable for real time query

Posted by Paolo Platter <pa...@agilelab.it>.

Are you using jdbc server?

Paolo

Inviata dal mio Windows Phone
________________________________
Da: Louis Hust<ma...@gmail.com>
Inviato: ‎22/‎07/‎2015 13:47
A: Robin East<ma...@xense.co.uk>
Cc: user@spark.apache.org<ma...@spark.apache.org>
Oggetto: Re: Is spark suitable for real time query

I do a simple test using spark in standalone mode(not cluster),
 and found a simple action take a few seconds, the data size is small, just few rows.
So each spark job will cost some time for init or prepare work no matter what the job is?
I mean if the basic framework of spark job will cost seconds?

2015-07-22 19:17 GMT+08:00 Robin East <ro...@xense.co.uk>>:
Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level.

Robin

> On 22 Jul 2015, at 11:14, Louis Hust <lo...@gmail.com>> wrote:
>
> Hi, all
>
> I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level.
>
> So i want to know if spark job is suitable for real time query which at microseconds?

Re: Is spark suitable for real time query

Posted by Louis Hust <lo...@gmail.com>.

I do a simple test using spark in standalone mode(not cluster),
 and found a simple action take a few seconds, the data size is small, just
few rows.
So each spark job will cost some time for init or prepare work no matter
what the job is?
I mean if the basic framework of spark job will cost seconds?

2015-07-22 19:17 GMT+08:00 Robin East <ro...@xense.co.uk>:

> Real-time is, of course, relative but you’ve mentioned microsecond level.
> Spark is designed to process large amounts of data in a distributed
> fashion. No distributed system I know of could give any kind of guarantees
> at the microsecond level.
>
> Robin
>
> > On 22 Jul 2015, at 11:14, Louis Hust <lo...@gmail.com> wrote:
> >
> > Hi, all
> >
> > I am using spark jar in standalone mode, fetch data from different mysql
> instance and do some action, but i found the time is at second level.
> >
> > So i want to know if spark job is suitable for real time query which at
> microseconds?
>
>

Re: Is spark suitable for real time query

Posted by Petar Zecevic <pe...@gmail.com>.

You can try out a few tricks employed by folks at Lynx Analytics... 
Daniel Darabos gave some details at Spark Summit:
https://www.youtube.com/watch?v=zt1LdVj76LU&index=13&list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs


On 22.7.2015. 17:00, Louis Hust wrote:
> My code like below:
>             Map<String, String> t11opt = new HashMap<String, String>();
>             t11opt.put("url", DB_URL);
>             t11opt.put("dbtable", "t11");
>             DataFrame t11 = sqlContext.load("jdbc", t11opt);
>             t11.registerTempTable("t11");
>
>             .......the same for t12, t21, t22
>
>
>             DataFrame t1 = t11.unionAll(t12);
>             t1.registerTempTable("t1");
>             DataFrame t2 = t21.unionAll(t22);
>             t2.registerTempTable("t2");
>             for (int i = 0; i < 10; i ++) {
>                 System.out.println(new Date(System.currentTimeMillis()));
>                 DataFrame crossjoin = sqlContext.sql("select txt from 
> t1 join t2 on t1.id <http://t1.id> = t2.id <http://t2.id>");
>                 crossjoin.show();
>                 System.out.println(new Date(System.currentTimeMillis()));
>             }
>
> Where t11,t12, t21,t22 are all table dataframe load from jdbc  of 
> mysql database which is at local with the spark job.
>
> But each loop execute about 3 seconds. i do not know why cost so many 
> time?
>
>
>
>
> 2015-07-22 19:52 GMT+08:00 Robin East <robin.east@xense.co.uk 
> <ma...@xense.co.uk>>:
>
>     Here’s an example using spark-shell on my laptop:
>
>     sc.textFile("LICENSE").filter(_ contains "Spark").count
>
>     This takes less than a second the first time I run it and is
>     instantaneous on every subsequent run.
>
>     What code are you running?
>
>
>>     On 22 Jul 2015, at 12:34, Louis Hust <louis.hust@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>     I do a simple test using spark in standalone mode(not cluster),
>>      and found a simple action take a few seconds, the data size is
>>     small, just few rows.
>>     So each spark job will cost some time for init or prepare work no
>>     matter what the job is?
>>     I mean if the basic framework of spark job will cost seconds?
>>
>>     2015-07-22 19:17 GMT+08:00 Robin East <robin.east@xense.co.uk
>>     <ma...@xense.co.uk>>:
>>
>>         Real-time is, of course, relative but you’ve mentioned
>>         microsecond level. Spark is designed to process large amounts
>>         of data in a distributed fashion. No distributed system I
>>         know of could give any kind of guarantees at the microsecond
>>         level.
>>
>>         Robin
>>
>>         > On 22 Jul 2015, at 11:14, Louis Hust <louis.hust@gmail.com
>>         <ma...@gmail.com>> wrote:
>>         >
>>         > Hi, all
>>         >
>>         > I am using spark jar in standalone mode, fetch data from
>>         different mysql instance and do some action, but i found the
>>         time is at second level.
>>         >
>>         > So i want to know if spark job is suitable for real time
>>         query which at microseconds?
>>
>>
>
>

Re: Is spark suitable for real time query

Posted by Petar Zecevic <pe...@gmail.com>.

You can try out a few tricks employed by folks at Lynx Analytics... 
Daniel Darabos gave some details at Spark Summit:
https://www.youtube.com/watch?v=zt1LdVj76LU&index=13&list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs


On 22.7.2015. 17:00, Louis Hust wrote:
> My code like below:
>             Map<String, String> t11opt = new HashMap<String, String>();
>             t11opt.put("url", DB_URL);
>             t11opt.put("dbtable", "t11");
>             DataFrame t11 = sqlContext.load("jdbc", t11opt);
>             t11.registerTempTable("t11");
>
>             .......the same for t12, t21, t22
>
>
>             DataFrame t1 = t11.unionAll(t12);
>             t1.registerTempTable("t1");
>             DataFrame t2 = t21.unionAll(t22);
>             t2.registerTempTable("t2");
>             for (int i = 0; i < 10; i ++) {
>                 System.out.println(new Date(System.currentTimeMillis()));
>                 DataFrame crossjoin = sqlContext.sql("select txt from 
> t1 join t2 on t1.id <http://t1.id> = t2.id <http://t2.id>");
>                 crossjoin.show();
>                 System.out.println(new Date(System.currentTimeMillis()));
>             }
>
> Where t11,t12, t21,t22 are all table dataframe load from jdbc  of 
> mysql database which is at local with the spark job.
>
> But each loop execute about 3 seconds. i do not know why cost so many 
> time?
>
>
>
>
> 2015-07-22 19:52 GMT+08:00 Robin East <robin.east@xense.co.uk 
> <ma...@xense.co.uk>>:
>
>     Here’s an example using spark-shell on my laptop:
>
>     sc.textFile("LICENSE").filter(_ contains "Spark").count
>
>     This takes less than a second the first time I run it and is
>     instantaneous on every subsequent run.
>
>     What code are you running?
>
>
>>     On 22 Jul 2015, at 12:34, Louis Hust <louis.hust@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>     I do a simple test using spark in standalone mode(not cluster),
>>      and found a simple action take a few seconds, the data size is
>>     small, just few rows.
>>     So each spark job will cost some time for init or prepare work no
>>     matter what the job is?
>>     I mean if the basic framework of spark job will cost seconds?
>>
>>     2015-07-22 19:17 GMT+08:00 Robin East <robin.east@xense.co.uk
>>     <ma...@xense.co.uk>>:
>>
>>         Real-time is, of course, relative but you’ve mentioned
>>         microsecond level. Spark is designed to process large amounts
>>         of data in a distributed fashion. No distributed system I
>>         know of could give any kind of guarantees at the microsecond
>>         level.
>>
>>         Robin
>>
>>         > On 22 Jul 2015, at 11:14, Louis Hust <louis.hust@gmail.com
>>         <ma...@gmail.com>> wrote:
>>         >
>>         > Hi, all
>>         >
>>         > I am using spark jar in standalone mode, fetch data from
>>         different mysql instance and do some action, but i found the
>>         time is at second level.
>>         >
>>         > So i want to know if spark job is suitable for real time
>>         query which at microseconds?
>>
>>
>
>

Re: Is spark suitable for real time query

Posted by Louis Hust <lo...@gmail.com>.

My code like below:
            Map<String, String> t11opt = new HashMap<String, String>();
            t11opt.put("url", DB_URL);
            t11opt.put("dbtable", "t11");
            DataFrame t11 = sqlContext.load("jdbc", t11opt);
            t11.registerTempTable("t11");

            .......the same for t12, t21, t22



            DataFrame t1 = t11.unionAll(t12);
            t1.registerTempTable("t1");
            DataFrame t2 = t21.unionAll(t22);
            t2.registerTempTable("t2");
            for (int i = 0; i < 10; i ++) {
                System.out.println(new Date(System.currentTimeMillis()));
                DataFrame crossjoin = sqlContext.sql("select txt from t1
join t2 on t1.id = t2.id");
                crossjoin.show();
                System.out.println(new Date(System.currentTimeMillis()));
            }

Where t11,t12, t21,t22 are all table dataframe load from jdbc  of mysql
database which is at local with the spark job.

But each loop execute about 3 seconds. i do not know why cost so many time?




2015-07-22 19:52 GMT+08:00 Robin East <ro...@xense.co.uk>:

> Here’s an example using spark-shell on my laptop:
>
> sc.textFile("LICENSE").filter(_ contains "Spark").count
>
> This takes less than a second the first time I run it and is instantaneous
> on every subsequent run.
>
> What code are you running?
>
>
> On 22 Jul 2015, at 12:34, Louis Hust <lo...@gmail.com> wrote:
>
> I do a simple test using spark in standalone mode(not cluster),
>  and found a simple action take a few seconds, the data size is small,
> just few rows.
> So each spark job will cost some time for init or prepare work no matter
> what the job is?
> I mean if the basic framework of spark job will cost seconds?
>
> 2015-07-22 19:17 GMT+08:00 Robin East <ro...@xense.co.uk>:
>
>> Real-time is, of course, relative but you’ve mentioned microsecond level.
>> Spark is designed to process large amounts of data in a distributed
>> fashion. No distributed system I know of could give any kind of guarantees
>> at the microsecond level.
>>
>> Robin
>>
>> > On 22 Jul 2015, at 11:14, Louis Hust <lo...@gmail.com> wrote:
>> >
>> > Hi, all
>> >
>> > I am using spark jar in standalone mode, fetch data from different
>> mysql instance and do some action, but i found the time is at second level.
>> >
>> > So i want to know if spark job is suitable for real time query which at
>> microseconds?
>>
>>
>
>

Re: Is spark suitable for real time query

Posted by Robin East <ro...@xense.co.uk>.

Real-time is, of course, relative but you’ve mentioned microsecond level. Spark is designed to process large amounts of data in a distributed fashion. No distributed system I know of could give any kind of guarantees at the microsecond level.

Robin

> On 22 Jul 2015, at 11:14, Louis Hust <lo...@gmail.com> wrote:
> 
> Hi, all
> 
> I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level.
> 
> So i want to know if spark job is suitable for real time query which at microseconds?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org