You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Sameer Tilak <ss...@live.com> on 2014/06/23 19:38:04 UTC

Basic Scala and Spark questions
















Hi All,I am new so Scala and Spark. I have a basic question. I have the following import statements in my Scala program. I want to pass my function (printScore) to Spark. It will compare a string 
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf/* import thirdparty jars */  I have the following method in my Scala class:
class DistanceClass{val ta = new textAnalytics();
def printScore(sourceStr: String, rdd: RDD[String]) {
// Third party jars have StringWrapper	val str1 = new StringWrapper (sourceStr)val ta_ = this.ta;
rdd.map(str1, x => ta_.score(str1, StringWrapper(x))       }
I am using Eclipse for development. I have the following questions:1. I get error Not found: type RDD error. Can someone please tell me which jars do I need to add as external jars and what dhoulf I add iunder import statements so that this error will go away. 2. Also, including StringWrapper(x) inside map, will that be OK? rdd.map(str1, x => ta_.score(str1, StringWrapper(x))
 		 	   		  

RE: Basic Scala and Spark questions

Posted by Sameer Tilak <ss...@live.com>.
Hi All,I was able to solve both these issues. Thanks!
Just FYI:
For 1:







import org.apache.spark.rdd;
import org.apache.spark.rdd.RDD;
For 2: 







	rdd.map(x => jc_.score(str1, new StringWrapper(x)))     From: sstilak@live.com
To: user@spark.incubator.apache.org
Subject: Basic Scala and Spark questions
Date: Mon, 23 Jun 2014 10:38:04 -0700




















Hi All,I am new so Scala and Spark. I have a basic question. I have the following import statements in my Scala program. I want to pass my function (printScore) to Spark. It will compare a string 
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf/* import thirdparty jars */  I have the following method in my Scala class:
class DistanceClass{val ta = new textAnalytics();
def printScore(sourceStr: String, rdd: RDD[String]) {
// Third party jars have StringWrapper	val str1 = new StringWrapper (sourceStr)val ta_ = this.ta;
rdd.map(str1, x => ta_.score(str1, StringWrapper(x))       }
I am using Eclipse for development. I have the following questions:1. I get error Not found: type RDD error. Can someone please tell me which jars do I need to add as external jars and what dhoulf I add iunder import statements so that this error will go away. 2. Also, including StringWrapper(x) inside map, will that be OK? rdd.map(str1, x => ta_.score(str1, StringWrapper(x))
 		 	   		   		 	   		  

RE: Basic Scala and Spark questions

Posted by Sameer Tilak <ss...@live.com>.
Hi there,Here is how I specify it during the compilation.
scalac -classpath /apps/software/abc.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/datanucleus-api-jdo-3.2.1.jar:/apps/software/spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop1.0.4.jar:spark-assembly-1.0.0-hadoop1.0.4.jar/datanucleus-core-3.2.2.jar Score.scala
Then I generate a jar file out of it say myapp.
Finally, to run this I do the following:
 ./spark-shell --jars /apps/software/abc.jar,/apps/software/myapp/myapp.jar

Hope this helps.
From: vmuttineni@ebay.com
To: user@spark.apache.org; user@spark.incubator.apache.org
Subject: RE: Basic Scala and Spark questions
Date: Tue, 24 Jun 2014 20:06:04 +0000









Hello Tilak,
1. I get error Not found: type RDD error. Can someone please tell me which jars do I need to add as external jars and what dhoulf I add iunder import statements so that this error will go
 away. 
Do you not see any issues with the import statements?

Add the spark-assembly-1.0.0-hadoop2.2.0.jar file as a dependency.
You can download Spark from here (http://spark.apache.org/downloads.html). You’ll find the above mentioned
 jar in the lib folder. 
Import Statement: import org.apache.spark.rdd.RDD



From: Sameer Tilak [mailto:sstilak@live.com]


Sent: Monday, June 23, 2014 10:38 AM

To: user@spark.incubator.apache.org

Subject: Basic Scala and Spark questions


 

Hi All,
I am new so Scala and Spark. I have a basic question. I have the following import statements in my Scala program. I want to pass my function (printScore)
 to Spark. It will compare a string 
 
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
/* import thirdparty jars */
  
I have the following method in my Scala class:
 
class DistanceClass
{
val ta = new textAnalytics();
 
def printScore(sourceStr: String, rdd:
RDD[String]) 
{
 
// Third party jars have StringWrapper
val
str1 =
new StringWrapper (sourceStr)
val ta_ = this.ta;
 
rdd.map(str1, x => ta_.score(str1, StringWrapper(x))
       
}
 
I am using Eclipse for development. I have the following questions:
1. I get error Not found: type RDD error. Can someone please tell me which jars do I need to add as external jars and what dhoulf I add iunder import statements so that this error will go
 away. 
2. Also, including StringWrapper(x) inside map, will that be OK? rdd.map(str1, x => ta_.score(str1, StringWrapper(x))

 


 		 	   		  

RE: Basic Scala and Spark questions

Posted by "Muttineni, Vinay" <vm...@ebay.com>.
Hello Tilak,
1. I get error Not found: type RDD error. Can someone please tell me which jars do I need to add as external jars and what dhoulf I add iunder import statements so that this error will go away.
Do you not see any issues with the import statements?
Add the spark-assembly-1.0.0-hadoop2.2.0.jar file as a dependency.
You can download Spark from here (http://spark.apache.org/downloads.html). You'll find the above mentioned jar in the lib folder.
Import Statement: import org.apache.spark.rdd.RDD
From: Sameer Tilak [mailto:sstilak@live.com]
Sent: Monday, June 23, 2014 10:38 AM
To: user@spark.incubator.apache.org
Subject: Basic Scala and Spark questions

Hi All,
I am new so Scala and Spark. I have a basic question. I have the following import statements in my Scala program. I want to pass my function (printScore) to Spark. It will compare a string

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
/* import thirdparty jars */

I have the following method in my Scala class:

class DistanceClass
{
val ta = new textAnalytics();

def printScore(sourceStr: String, rdd: RDD[String])
{

// Third party jars have StringWrapper
val str1 = new StringWrapper (sourceStr)
val ta_ = this.ta;

rdd.map(str1, x => ta_.score(str1, StringWrapper(x))

}

I am using Eclipse for development. I have the following questions:
1. I get error Not found: type RDD error. Can someone please tell me which jars do I need to add as external jars and what dhoulf I add iunder import statements so that this error will go away.
2. Also, including StringWrapper(x) inside map, will that be OK? rdd.map(str1, x => ta_.score(str1, StringWrapper(x))