You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by rkishore999 <rk...@yahoo.com> on 2014/09/14 07:29:26 UTC

Spark SQL

val file =
sc.textFile("hdfs://ec2-54-164-243-97.compute-1.amazonaws.com:9010/user/fin/events.txt")

1. val xyz = file.map(line => extractCurRate(sqlContext.sql("select rate
from CurrencyCodeRates where txCurCode = '" + line.substring(202,205) + "'
and fxCurCode = '" + fxCurCodesMap(line.substring(77,82)) + "' and
effectiveDate >= '" + line.substring(221,229) + "' order by effectiveDate
desc"))

2. val xyz = file.map(line => sqlContext.sql("select rate, txCurCode,
fxCurCode, effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and
fxCurCode = 'CSD' and effectiveDate >= '20140901' order by effectiveDate
desc"))

3. val xyz = sqlContext.sql("select rate, txCurCode, fxCurCode,
effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and fxCurCode =
'CSD' and effectiveDate >= '20140901' order by effectiveDate desc")

xyz.saveAsTextFile("/user/output")

In statements 1 and 2 I'm getting nullpointer expecption. But statement 3 is
good. I'm guessing spark context and sql context are not going together
well.

Any suggestions regarding how I can achieve this?


		



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-tp14183.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Spark SQL

Posted by Burak Yavuz <by...@stanford.edu>.
Hi,

I'm not a master on SparkSQL, but from what I understand, the problem ıs that you're trying to access an RDD
inside an RDD here: val xyz = file.map(line => *** extractCurRate(sqlContext.sql("select rate ... *** and 
here:  xyz = file.map(line => *** extractCurRate(sqlContext.sql("select rate ... ***.
RDDs can't be serialized inside other RDD tasks, therefore you're receiving the NullPointerException.

More specifically, you are trying to generate a SchemaRDD inside an RDD, which you can't do.

If file isn't huge, you can call .collect() to transform the RDD to an array and then use .map() on the Array.

If the file is huge, then you may do number 3 first, join the two RDDs using 'txCurCode' as a key, and then do filtering
operations, etc...

Best,
Burak

----- Original Message -----
From: "rkishore999" <rk...@yahoo.com>
To: user@spark.incubator.apache.org
Sent: Saturday, September 13, 2014 10:29:26 PM
Subject: Spark SQL

val file =
sc.textFile("hdfs://ec2-54-164-243-97.compute-1.amazonaws.com:9010/user/fin/events.txt")

1. val xyz = file.map(line => extractCurRate(sqlContext.sql("select rate
from CurrencyCodeRates where txCurCode = '" + line.substring(202,205) + "'
and fxCurCode = '" + fxCurCodesMap(line.substring(77,82)) + "' and
effectiveDate >= '" + line.substring(221,229) + "' order by effectiveDate
desc"))

2. val xyz = file.map(line => sqlContext.sql("select rate, txCurCode,
fxCurCode, effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and
fxCurCode = 'CSD' and effectiveDate >= '20140901' order by effectiveDate
desc"))

3. val xyz = sqlContext.sql("select rate, txCurCode, fxCurCode,
effectiveDate from CurrencyCodeRates where txCurCode = 'USD' and fxCurCode =
'CSD' and effectiveDate >= '20140901' order by effectiveDate desc")

xyz.saveAsTextFile("/user/output")

In statements 1 and 2 I'm getting nullpointer expecption. But statement 3 is
good. I'm guessing spark context and sql context are not going together
well.

Any suggestions regarding how I can achieve this?


		



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-tp14183.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org