You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2016/11/23 20:45:40 UTC
Spark Shell doesnt seem to use spark workers but Spark Submit does.
Hi All,
Spark Shell doesnt seem to use spark workers but Spark Submit does. I had
the workers ips listed under conf/slaves file.
I am trying to count number of rows in Cassandra using spark-shell so I do
the following on spark master
val df = spark.sql("SELECT test from hello") // This has about billion rows
scala> df.count
[Stage 0:=> (686 + 2) / 24686] // What are these numbers precisely?
This is taking forever so I checked the I/O, CPU, Network usage using
dstat, iostat and so on it looks like nothing is going on in worker
machines but for master I can see it.
I am using spark 2.0.2
Any ideas on what is going on? and how to fix this?
Thanks,
kant
Re: Spark Shell doesnt seem to use spark workers but Spark Submit does.
Posted by kant kodali <ka...@gmail.com>.
Sorry please ignore this if you like. Looks like the network throughput is
very low but every worker/executor machine is indeed working.
My current incoming Network throughput on each worker machine is about
2.5KB/s (Kilobyte per second) so this needs to go somewhere in 5MB-6MB/s
and that means somehow the table scan to do the count of billion rows in
Cassandra is not being done in parallel.
On Wed, Nov 23, 2016 at 12:45 PM, kant kodali <ka...@gmail.com> wrote:
> Hi All,
>
>
> Spark Shell doesnt seem to use spark workers but Spark Submit does. I had
> the workers ips listed under conf/slaves file.
>
> I am trying to count number of rows in Cassandra using spark-shell so I
> do the following on spark master
>
> val df = spark.sql("SELECT test from hello") // This has about billion rows
>
> scala> df.count
>
> [Stage 0:=> (686 + 2) / 24686] // What are these numbers precisely?
>
> This is taking forever so I checked the I/O, CPU, Network usage using
> dstat, iostat and so on it looks like nothing is going on in worker
> machines but for master I can see it.
>
> I am using spark 2.0.2
>
> Any ideas on what is going on? and how to fix this?
>
> Thanks,
>
> kant
>
>
>