You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ashish Soni <as...@gmail.com> on 2015/07/02 14:40:48 UTC
Spark SQL and Streaming - How to execute JDBC Query only once
Hi All ,
I have and Stream of Event coming in and i want to fetch some additional
data from the database based on the values in the incoming data , For Eg
below is the data coming in
loginName
Email
address
city
Now for each login name i need to go to oracle database and get the userId
from the database *but i do not want to hit the database again and again
instead i want to load the complete table in memory and then find the user
id based on the incoming data*
JavaRDD<Charge> rdd = sc.textFile("/home/spark/workspace/data.csv").map(new
Function<String, String>() {
@Override
public Charge call(String s) {
String str[] = s.split(",");
* //How to load the complete table in memory and use it as
when i do outside the loop i get stage failure error *
* DataFrame dbRdd =
sqlContext.read().format("jdbc").options(options).load();*
System.out.println(dbRdd.filter("ogin_nm='"+str[0]+"'").count());
return str[0] ;
}
});
How i can achieve this , Please suggest
Thanks
Re: Spark SQL and Streaming - How to execute JDBC Query only once
Posted by Raghavendra Pandey <ra...@gmail.com>.
This will not work i.e. using data frame inside map function..
Although you can try to create df separately n cache it...
Then you can join your event stream with this df.
On Jul 2, 2015 6:11 PM, "Ashish Soni" <as...@gmail.com> wrote:
> Hi All ,
>
> I have and Stream of Event coming in and i want to fetch some additional
> data from the database based on the values in the incoming data , For Eg
> below is the data coming in
>
> loginName
> Email
> address
> city
>
> Now for each login name i need to go to oracle database and get the userId
> from the database *but i do not want to hit the database again and again
> instead i want to load the complete table in memory and then find the user
> id based on the incoming data*
>
> JavaRDD<Charge> rdd =
> sc.textFile("/home/spark/workspace/data.csv").map(new Function<String,
> String>() {
> @Override
> public Charge call(String s) {
> String str[] = s.split(",");
> * //How to load the complete table in memory and use it as
> when i do outside the loop i get stage failure error *
> * DataFrame dbRdd =
> sqlContext.read().format("jdbc").options(options).load();*
>
> System.out.println(dbRdd.filter("ogin_nm='"+str[0]+"'").count());
>
> return str[0] ;
> }
> });
>
>
> How i can achieve this , Please suggest
>
> Thanks
>