You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by kant kodali <ka...@gmail.com> on 2016/11/23 20:45:40 UTC

Spark Shell doesnt seem to use spark workers but Spark Submit does.

Hi All,


Spark Shell doesnt seem to use spark workers but Spark Submit does. I had
the workers ips listed under conf/slaves file.

I am trying to count number of rows in Cassandra using spark-shell  so I do
the following on spark master

val df = spark.sql("SELECT test from hello") // This has about billion rows

scala> df.count

[Stage 0:=>  (686 + 2) / 24686] // What are these numbers precisely?

 This is taking forever so I checked the I/O, CPU, Network usage using
dstat, iostat and so on it looks like nothing is going on in worker
machines but for master I can see it.

I am using spark 2.0.2

Any ideas on what is going on? and how to fix this?

Thanks,

kant

Re: Spark Shell doesnt seem to use spark workers but Spark Submit does.

Posted by kant kodali <ka...@gmail.com>.
Sorry please ignore this if you like. Looks like the network throughput is
very low but every worker/executor machine is indeed working.

My current incoming Network throughput on each worker machine is about
2.5KB/s (Kilobyte per second) so this needs to go somewhere in 5MB-6MB/s
and that means somehow the table scan to do the count of billion rows in
Cassandra is not being done in parallel.

On Wed, Nov 23, 2016 at 12:45 PM, kant kodali <ka...@gmail.com> wrote:

> Hi All,
>
>
> Spark Shell doesnt seem to use spark workers but Spark Submit does. I had
> the workers ips listed under conf/slaves file.
>
> I am trying to count number of rows in Cassandra using spark-shell  so I
> do the following on spark master
>
> val df = spark.sql("SELECT test from hello") // This has about billion rows
>
> scala> df.count
>
> [Stage 0:=>  (686 + 2) / 24686] // What are these numbers precisely?
>
>  This is taking forever so I checked the I/O, CPU, Network usage using
> dstat, iostat and so on it looks like nothing is going on in worker
> machines but for master I can see it.
>
> I am using spark 2.0.2
>
> Any ideas on what is going on? and how to fix this?
>
> Thanks,
>
> kant
>
>
>