You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Madabhattula Rajesh Kumar <mr...@gmail.com> on 2016/02/12 05:45:39 UTC

SparkSQL parallelism

Hi,

I have a spark cluster with One Master and 3 worker nodes. I have written a
below code to fetch the records from oracle using sparkSQL

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val employees = sqlContext.read.format("jdbc").options(
    Map("url" -> "jdbc:oracle:thin:@xxxx:1525:SID",
    "dbtable" -> "(select * from employee where name like '%18%')",
    "user" -> "username",
    "password" -> "password")).load

I have a submitted this job to spark cluster using spark-submit command.



*Looks like, All 3 workers are executing same query and fetching same data.
It means, it is making 3 jdbc calls to oracle.*
*How to make this code to make a single jdbc call to oracle(In case of more
than one worker) ?*

Please help me to resolve this use case

Regards,
Rajesh

Re: SparkSQL parallelism

Posted by Rishi Mishra <rm...@snappydata.io>.
I am not sure why all 3 nodes should query.  If you have not mentioned any
partitions it should only be one partition of JDBCRDD where all dataset
should reside.


On Fri, Feb 12, 2016 at 10:15 AM, Madabhattula Rajesh Kumar <
mrajaforu@gmail.com> wrote:

> Hi,
>
> I have a spark cluster with One Master and 3 worker nodes. I have written
> a below code to fetch the records from oracle using sparkSQL
>
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> val employees = sqlContext.read.format("jdbc").options(
>     Map("url" -> "jdbc:oracle:thin:@xxxx:1525:SID",
>     "dbtable" -> "(select * from employee where name like '%18%')",
>     "user" -> "username",
>     "password" -> "password")).load
>
> I have a submitted this job to spark cluster using spark-submit command.
>
>
>
> *Looks like, All 3 workers are executing same query and fetching same
> data. It means, it is making 3 jdbc calls to oracle.*
> *How to make this code to make a single jdbc call to oracle(In case of
> more than one worker) ?*
>
> Please help me to resolve this use case
>
> Regards,
> Rajesh
>
>
>


-- 
Regards,
Rishitesh Mishra,
SnappyData . (http://www.snappydata.io/)

https://in.linkedin.com/in/rishiteshmishra