You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Andy Huang <an...@servian.com.au> on 2015/09/23 07:03:39 UTC
Fwd: Parallel collection in driver programs
Hi Devs,
Hopefully one of you know more on this?
Thanks
Andy
---------- Forwarded message ----------
From: Andy Huang <an...@servian.com.au>
Date: Wed, Sep 23, 2015 at 12:39 PM
Subject: Parallel collection in driver programs
To: user@spark.apache.org
Hi All,
Would like know if anyone has experienced with parallel collection in the
driver program. And, if there is actual advantage/disadvantage of doing so.
E.g. With a collection of Jdbc connections and tables
We have adapted our non-spark code which utilize parallel collection to the
spark code and it seems to work fine.
val conf = List(
("tbl1","dbo.tbl1::tb1_id::0::127::128"),
("tbl2","dbo.tbl2::tb2_id::0::31::32"),
("tbl3","dbo.tbl3::tb3_id::0::63::64")
)
val _JDBC_DEFAULT = "jdbc:sqlserver://192.168.52.1;database=TestSource"
val _STORE_DEFAULT = "hdfs://192.168.52.132:9000/"
val prop = new Properties()
prop.setProperty("user","sa")
prop.setProperty("password","password")
conf.par.map(pair=>{
val qry = pair._2.split("::")(0)
val pCol = pair._2.split("::")(1)
val lo = pair._2.split("::")(2).toInt
val hi = pair._2.split("::")(3).toInt
val part = pair._2.split("::")(4).toInt
//create dataframe from jdbc table
val jdbcDF = sqlContext.read.jdbc(
_JDBC_DEFAULT,
"("+qry+") a",
pCol,
lo, //lower bound
hi, //upper bound
part, //number of partitions
prop //java.utils.Properties - key value pair
)
//save to parquet
jdbcDF.write.mode("overwrite").parquet(_STORE_DEFAULT+pair._1+".parquet")
})
Thanks.
--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
f: 02 9376 0730| m: 0433221979
--
Andy Huang | Managing Consultant | Servian Pty Ltd | t: 02 9376 0700 |
f: 02 9376 0730| m: 0433221979