You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by jaykatukuri <jk...@apple.com> on 2015/03/16 17:08:33 UTC

RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Hi all,
I am trying to use the new ALS implementation under
org.apache.spark.ml.recommendation.ALS.



The new method to invoke for training seems to be  override def fit(dataset:
DataFrame, paramMap: ParamMap): ALSModel.

How do I create a dataframe object from ratings data set that is on hdfs ?


where as the method in the old ALS implementation under
org.apache.spark.mllib.recommendation.ALS was 
 def train(
      ratings: RDD[Rating],
      rank: Int,
      iterations: Int,
      lambda: Double,
      blocks: Int,
      seed: Long
    ): MatrixFactorizationModel

My code to run the old ALS train method is as below:

 "val sc = new SparkContext(conf) 
     
     val pfile = args(0)
     val purchase=sc.textFile(pfile)
    val ratings = purchase.map(_.split(',') match { case Array(user, item,
rate) =>
    	Rating(user.toInt, item.toInt, rate.toInt)
    })

val model = ALS.train(ratings, rank, numIterations, 0.01)"


Now, for the new ALS fit method, I am trying to use the below code to run,
but getting a compilation error:

val als = new ALS()
       .setRank(rank)
      .setRegParam(regParam)
      .setImplicitPrefs(implicitPrefs)
      .setNumUserBlocks(numUserBlocks)
      .setNumItemBlocks(numItemBlocks)

val sc = new SparkContext(conf) 
     
     val pfile = args(0)
     val purchase=sc.textFile(pfile)
    val ratings = purchase.map(_.split(',') match { case Array(user, item,
rate) =>
    	Rating(user.toInt, item.toInt, rate.toInt)
    })

val model = als.fit(ratings.toDF())

I get an error that the method toDF() is not a member of
org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].

Appreciate the help !

Thanks,
Jay






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Posted by Chang Lim <ch...@gmail.com>.

After this line:
   val sc = new SparkContext(conf) 
You need to add this line:
   import sc.implicits._  //this is used to implicitly convert an RDD to a
DataFrame.

Hope this helps



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083p22247.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: org.apache.spark.ml.recommendation.ALS

Posted by Xiangrui Meng <me...@gmail.com>.

Yes, I think the default Spark builds are on Scala 2.10. You need to
follow instructions at
http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211
to build 2.11 packages. -Xiangrui

On Mon, Apr 13, 2015 at 4:00 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi Xiangrui,
>
> Here is the class:
>
>
> object ALSNew {
>
>  def main (args: Array[String]) {
>      val conf = new SparkConf()
>       .setAppName("TrainingDataPurchase")
>       .set("spark.executor.memory", "4g")
>
>
>
>       conf.set("spark.shuffle.memoryFraction","0.65") //default is 0.2
>     conf.set("spark.storage.memoryFraction","0.3")//default is 0.6
>
>
>
>
>
>     val sc = new SparkContext(conf)
>      val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>     import sqlContext.implicits._
>
>
>
>      val pfile = args(0)
>      val purchase=sc.textFile(pfile)
>
>
>
>
>     val ratings = purchase.map ( line =>
>     line.split(',') match { case Array(user, item, rate) =>
>     (user.toInt, item.toInt, rate.toFloat)
>     }).toDF()
>
>
>
>
>
> val rank = args(1).toInt
> val numIterations = args(2).toInt
> val regParam : Double = 0.01
> val implicitPrefs : Boolean = true
> val numUserBlocks : Int = 100
> val numItemBlocks : Int = 100
> val nonnegative : Boolean = true
>
>
> //val paramMap = ParamMap (regParam=0.01)
> //paramMap.put(numUserBlocks=100,  numItemBlocks=100)
>    val als = new ALS()
>        .setRank(rank)
>       .setRegParam(regParam)
>       .setImplicitPrefs(implicitPrefs)
>       .setNumUserBlocks(numUserBlocks)
>       .setNumItemBlocks(numItemBlocks)
>
>
>
>
>
>     val alpha = als.getAlpha
>
>
>
>
>
>   val model =  als.fit(ratings)
>
>
>
>
>
>   val predictions = model.transform(ratings)
>       .select("rating", "prediction")
>       .map { case Row(rating: Float, prediction: Float) =>
>         (rating.toDouble, prediction.toDouble)
>       }
>     val rmse =
>       if (implicitPrefs) {
>         // TODO: Use a better (rank-based?) evaluation metric for implicit
> feedback.
>         // We limit the ratings and the predictions to interval [0, 1] and
> compute the weighted RMSE
>         // with the confidence scores as weights.
>         val (totalWeight, weightedSumSq) = predictions.map { case (rating,
> prediction) =>
>           val confidence = 1.0 + alpha * math.abs(rating)
>           val rating01 = math.max(math.min(rating, 1.0), 0.0)
>           val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
>           val err = prediction01 - rating01
>           (confidence, confidence * err * err)
>         }.reduce { case ((c0, e0), (c1, e1)) =>
>           (c0 + c1, e0 + e1)
>         }
>         math.sqrt(weightedSumSq /totalWeight)
>       } else {
>         val mse = predictions.map { case (rating, prediction) =>
>           val err = rating - prediction
>           err * err
>         }.mean()
>         math.sqrt(mse)
>       }
>
>
>
>     println("Mean Squared Error = " + rmse)
>  }
>
>
>
>
>
>
>
>  }
>
>
>
>
> I am using the following in my maven build (pom.xml):
>
>
> <dependencies>
>     <dependency>
>       <groupId>org.scala-lang</groupId>
>       <artifactId>scala-library</artifactId>
>       <version>2.11.2</version>
>     </dependency>
>     <dependency>
>       <groupId>org.apache.spark</groupId>
>       <artifactId>spark-core_2.11</artifactId>
>       <version>1.3.0</version>
>     </dependency>
>
>
>
>     <dependency>
> <groupId>org.apache.spark</groupId>
> <artifactId>spark-mllib_2.11</artifactId>
> <version>1.3.0</version>
>    </dependency>
>    <dependency>
>    <groupId>org.apache.spark</groupId>
> <artifactId>spark-sql_2.11</artifactId>
> <version>1.3.0</version>
>    </dependency>
>   </dependencies>
>
>
> I am using scala version 2.11.2.
>
> Could it be that "spark-1.3.0-bin-hadoop2.4.tgz requires  a different
> version of scala ?
>
> Thanks,
> Jay
>
>
>
> On Apr 9, 2015, at 4:38 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Could you share ALSNew.scala? Which Scala version did you use? -Xiangrui
>
> On Wed, Apr 8, 2015 at 4:09 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi Xiangrui,
>
> I tried running this on my local machine  (laptop) and got the same error:
>
> Here is what I did:
>
> 1. downloaded spark 1.30 release version (prebuilt for hadoop 2.4 and later)
> "spark-1.3.0-bin-hadoop2.4.tgz".
> 2. Ran the following command:
>
> spark-submit --class ALSNew  --master local[8] ALSNew.jar  /input_path
>
>
> The stack trace is exactly same.
>
> Thanks,
> Jay
>
>
>
> On Apr 8, 2015, at 10:47 AM, Jay Katukuri <jk...@apple.com> wrote:
>
> some additional context:
>
> Since, I am using features of spark 1.3.0, I have downloaded spark 1.3.0 and
> used spark-submit from there.
> The cluster is still on spark-1.2.0.
>
> So, this looks to me that at runtime, the executors could not find some
> libraries of spark-1.3.0, even though I ran spark-submit from my downloaded
> spark-1.30.
>
>
>
> On Apr 6, 2015, at 1:37 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Here is the command that I have used :
>
> spark-submit —class packagename.ALSNew --num-executors 100 --master yarn
> ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path
>
> Btw - I could run the old ALS in mllib package.
>
>
>
>
>
> On Apr 6, 2015, at 12:32 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> So ALSNew.scala is your own application, did you add it with
> spark-submit or spark-shell? The correct command should like
>
> spark-submit --class your.package.name.ALSNew ALSNew.jar [options]
>
> Please check the documentation:
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> -Xiangrui
>
> On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi,
>
> Here is the stack trace:
>
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
> at ALSNew$.main(ALSNew.scala:35)
> at ALSNew.main(ALSNew.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Thanks,
> Jay
>
>
>
> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please attach the full stack trace. -Xiangrui
>
> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>
>
> Hi all,
>
> I got a runtime error while running the ALS.
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>
>
> The error that I am getting is at the following code:
>
> val ratings = purchase.map ( line =>
>  line.split(',') match { case Array(user, item, rate) =>
>  (user.toInt, item.toInt, rate.toFloat)
>  }).toDF()
>
>
> Any help is appreciated !
>
> I have tried passing the spark-sql jar using the -jar
> spark-sql_2.11-1.3.0.jar
>
> Thanks,
> Jay
>
>
>
> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please remember to copy the user list next time. I might not be able
> to respond quickly. There are many others who can help or who can
> benefit from the discussion. Thanks! -Xiangrui
>
> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Great Xiangrui. It works now.
>
> Sorry that I needed to bug you :)
>
> Jay
>
>
> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please check this section in the user guide:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>
> You need `import sqlContext.implicits._` to use `toDF()`.
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi Xiangrui,
> Thanks a lot for the quick reply.
>
> I am still facing an issue.
>
> I have tried the code snippet that you have suggested:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate”)}
>
> for this, I got the below error:
>
> error: ';' expected but '.' found.
> [INFO] }.toDF("user", "item", "rate”)}
> [INFO]  ^
>
> when I tried below code
>
> val ratings = purchase.map ( line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }).toDF("user", "item", "rate")
>
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
> Float)]
> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
> [INFO]     }).toDF("user", "item", "rate")
>
>
>
> I have looked at the document that you have shared and tried the following
> code:
>
> case class Record(user: Int, item: Int, rate:Double)
> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>
> for this, I got the below error:
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>
>
> Appreciate your help !
>
> Thanks,
> Jay
>
>
> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Try this:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate")
>
> Doc for DataFrames:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
> def train(
> ratings: RDD[Rating],
> rank: Int,
> iterations: Int,
> lambda: Double,
> blocks: Int,
> seed: Long
> ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
> "val sc = new SparkContext(conf)
>
> val pfile = args(0)
> val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>   Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>  .setRank(rank)
> .setRegParam(regParam)
> .setImplicitPrefs(implicitPrefs)
> .setNumUserBlocks(numUserBlocks)
> .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
> val pfile = args(0)
> val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>   Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: org.apache.spark.ml.recommendation.ALS

Posted by Jay Katukuri <jk...@apple.com>.

Hi Xiangrui,

Here is the class:


object ALSNew {

 def main (args: Array[String]) {
     val conf = new SparkConf()
      .setAppName("TrainingDataPurchase")
      .set("spark.executor.memory", "4g")
      
      conf.set("spark.shuffle.memoryFraction","0.65") //default is 0.2 	
    conf.set("spark.storage.memoryFraction","0.3")//default is 0.6 
    
    
    val sc = new SparkContext(conf) 
     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._
    
     val pfile = args(0)
     val purchase=sc.textFile(pfile)
   

    val ratings = purchase.map ( line =>	
    line.split(',') match { case Array(user, item, rate) =>
    (user.toInt, item.toInt, rate.toFloat)
    }).toDF()
  
    
	val rank = args(1).toInt
	val numIterations = args(2).toInt
	val regParam : Double = 0.01
	val implicitPrefs : Boolean = true
	val numUserBlocks : Int = 100
	val numItemBlocks : Int = 100
	val nonnegative : Boolean = true
	
	//val paramMap = ParamMap (regParam=0.01)
	//paramMap.put(numUserBlocks=100,  numItemBlocks=100)
   val als = new ALS()
       .setRank(rank)
      .setRegParam(regParam)
      .setImplicitPrefs(implicitPrefs)
      .setNumUserBlocks(numUserBlocks)
      .setNumItemBlocks(numItemBlocks)
      
     
    val alpha = als.getAlpha
  
       
  val model =  als.fit(ratings)
  
  
  val predictions = model.transform(ratings)
      .select("rating", "prediction")
      .map { case Row(rating: Float, prediction: Float) =>
        (rating.toDouble, prediction.toDouble)
      }
    val rmse =
      if (implicitPrefs) {
        // TODO: Use a better (rank-based?) evaluation metric for implicit feedback.
        // We limit the ratings and the predictions to interval [0, 1] and compute the weighted RMSE
        // with the confidence scores as weights.
        val (totalWeight, weightedSumSq) = predictions.map { case (rating, prediction) =>
          val confidence = 1.0 + alpha * math.abs(rating)
          val rating01 = math.max(math.min(rating, 1.0), 0.0)
          val prediction01 = math.max(math.min(prediction, 1.0), 0.0)
          val err = prediction01 - rating01
          (confidence, confidence * err * err)
        }.reduce { case ((c0, e0), (c1, e1)) =>
          (c0 + c1, e0 + e1)
        }
        math.sqrt(weightedSumSq /totalWeight)
      } else {
        val mse = predictions.map { case (rating, prediction) =>
          val err = rating - prediction
          err * err
        }.mean()
        math.sqrt(mse)
      }
    
    println("Mean Squared Error = " + rmse)
 }
 
 
 
 }




I am using the following in my maven build (pom.xml): 


<dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.11.2</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>1.3.0</version>
    </dependency>
    
    <dependency>
	<groupId>org.apache.spark</groupId>
	<artifactId>spark-mllib_2.11</artifactId>
	<version>1.3.0</version>
   </dependency>
   <dependency>
   <groupId>org.apache.spark</groupId>
	<artifactId>spark-sql_2.11</artifactId>
	<version>1.3.0</version>
   </dependency>
  </dependencies>


I am using scala version 2.11.2.

Could it be that "spark-1.3.0-bin-hadoop2.4.tgz requires  a different version of scala ?

Thanks,
Jay



On Apr 9, 2015, at 4:38 PM, Xiangrui Meng <me...@gmail.com> wrote:

> Could you share ALSNew.scala? Which Scala version did you use? -Xiangrui
> 
> On Wed, Apr 8, 2015 at 4:09 PM, Jay Katukuri <jk...@apple.com> wrote:
>> Hi Xiangrui,
>> 
>> I tried running this on my local machine  (laptop) and got the same error:
>> 
>> Here is what I did:
>> 
>> 1. downloaded spark 1.30 release version (prebuilt for hadoop 2.4 and later)
>> "spark-1.3.0-bin-hadoop2.4.tgz".
>> 2. Ran the following command:
>> 
>> spark-submit --class ALSNew  --master local[8] ALSNew.jar  /input_path
>> 
>> 
>> The stack trace is exactly same.
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> On Apr 8, 2015, at 10:47 AM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> some additional context:
>> 
>> Since, I am using features of spark 1.3.0, I have downloaded spark 1.3.0 and
>> used spark-submit from there.
>> The cluster is still on spark-1.2.0.
>> 
>> So, this looks to me that at runtime, the executors could not find some
>> libraries of spark-1.3.0, even though I ran spark-submit from my downloaded
>> spark-1.30.
>> 
>> 
>> 
>> On Apr 6, 2015, at 1:37 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Here is the command that I have used :
>> 
>> spark-submit —class packagename.ALSNew --num-executors 100 --master yarn
>> ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path
>> 
>> Btw - I could run the old ALS in mllib package.
>> 
>> 
>> 
>> 
>> 
>> On Apr 6, 2015, at 12:32 PM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> So ALSNew.scala is your own application, did you add it with
>> spark-submit or spark-shell? The correct command should like
>> 
>> spark-submit --class your.package.name.ALSNew ALSNew.jar [options]
>> 
>> Please check the documentation:
>> http://spark.apache.org/docs/latest/submitting-applications.html
>> 
>> -Xiangrui
>> 
>> On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Hi,
>> 
>> Here is the stack trace:
>> 
>> 
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>> at ALSNew$.main(ALSNew.scala:35)
>> at ALSNew.main(ALSNew.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> 
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please attach the full stack trace. -Xiangrui
>> 
>> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> 
>> Hi all,
>> 
>> I got a runtime error while running the ALS.
>> 
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>> 
>> 
>> The error that I am getting is at the following code:
>> 
>> val ratings = purchase.map ( line =>
>>  line.split(',') match { case Array(user, item, rate) =>
>>  (user.toInt, item.toInt, rate.toFloat)
>>  }).toDF()
>> 
>> 
>> Any help is appreciated !
>> 
>> I have tried passing the spark-sql jar using the -jar
>> spark-sql_2.11-1.3.0.jar
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please remember to copy the user list next time. I might not be able
>> to respond quickly. There are many others who can help or who can
>> benefit from the discussion. Thanks! -Xiangrui
>> 
>> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Great Xiangrui. It works now.
>> 
>> Sorry that I needed to bug you :)
>> 
>> Jay
>> 
>> 
>> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please check this section in the user guide:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>> 
>> You need `import sqlContext.implicits._` to use `toDF()`.
>> 
>> -Xiangrui
>> 
>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Hi Xiangrui,
>> Thanks a lot for the quick reply.
>> 
>> I am still facing an issue.
>> 
>> I have tried the code snippet that you have suggested:
>> 
>> val ratings = purchase.map { line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }.toDF("user", "item", "rate”)}
>> 
>> for this, I got the below error:
>> 
>> error: ';' expected but '.' found.
>> [INFO] }.toDF("user", "item", "rate”)}
>> [INFO]  ^
>> 
>> when I tried below code
>> 
>> val ratings = purchase.map ( line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }).toDF("user", "item", "rate")
>> 
>> 
>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>> Float)]
>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>> [INFO]     }).toDF("user", "item", "rate")
>> 
>> 
>> 
>> I have looked at the document that you have shared and tried the following
>> code:
>> 
>> case class Record(user: Int, item: Int, rate:Double)
>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>> 
>> for this, I got the below error:
>> 
>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>> 
>> 
>> Appreciate your help !
>> 
>> Thanks,
>> Jay
>> 
>> 
>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Try this:
>> 
>> val ratings = purchase.map { line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }.toDF("user", "item", "rate")
>> 
>> Doc for DataFrames:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>> 
>> -Xiangrui
>> 
>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>> 
>> Hi all,
>> I am trying to use the new ALS implementation under
>> org.apache.spark.ml.recommendation.ALS.
>> 
>> 
>> 
>> The new method to invoke for training seems to be  override def fit(dataset:
>> DataFrame, paramMap: ParamMap): ALSModel.
>> 
>> How do I create a dataframe object from ratings data set that is on hdfs ?
>> 
>> 
>> where as the method in the old ALS implementation under
>> org.apache.spark.mllib.recommendation.ALS was
>> def train(
>> ratings: RDD[Rating],
>> rank: Int,
>> iterations: Int,
>> lambda: Double,
>> blocks: Int,
>> seed: Long
>> ): MatrixFactorizationModel
>> 
>> My code to run the old ALS train method is as below:
>> 
>> "val sc = new SparkContext(conf)
>> 
>> val pfile = args(0)
>> val purchase=sc.textFile(pfile)
>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>> rate) =>
>>   Rating(user.toInt, item.toInt, rate.toInt)
>> })
>> 
>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>> 
>> 
>> Now, for the new ALS fit method, I am trying to use the below code to run,
>> but getting a compilation error:
>> 
>> val als = new ALS()
>>  .setRank(rank)
>> .setRegParam(regParam)
>> .setImplicitPrefs(implicitPrefs)
>> .setNumUserBlocks(numUserBlocks)
>> .setNumItemBlocks(numItemBlocks)
>> 
>> val sc = new SparkContext(conf)
>> 
>> val pfile = args(0)
>> val purchase=sc.textFile(pfile)
>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>> rate) =>
>>   Rating(user.toInt, item.toInt, rate.toInt)
>> })
>> 
>> val model = als.fit(ratings.toDF())
>> 
>> I get an error that the method toDF() is not a member of
>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>> 
>> Appreciate the help !
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>>

Re: org.apache.spark.ml.recommendation.ALS

Posted by Xiangrui Meng <me...@gmail.com>.

Could you share ALSNew.scala? Which Scala version did you use? -Xiangrui

On Wed, Apr 8, 2015 at 4:09 PM, Jay Katukuri <jk...@apple.com> wrote:
> Hi Xiangrui,
>
> I tried running this on my local machine  (laptop) and got the same error:
>
> Here is what I did:
>
> 1. downloaded spark 1.30 release version (prebuilt for hadoop 2.4 and later)
> "spark-1.3.0-bin-hadoop2.4.tgz".
> 2. Ran the following command:
>
> spark-submit --class ALSNew  --master local[8] ALSNew.jar  /input_path
>
>
> The stack trace is exactly same.
>
> Thanks,
> Jay
>
>
>
> On Apr 8, 2015, at 10:47 AM, Jay Katukuri <jk...@apple.com> wrote:
>
> some additional context:
>
> Since, I am using features of spark 1.3.0, I have downloaded spark 1.3.0 and
> used spark-submit from there.
> The cluster is still on spark-1.2.0.
>
> So, this looks to me that at runtime, the executors could not find some
> libraries of spark-1.3.0, even though I ran spark-submit from my downloaded
> spark-1.30.
>
>
>
> On Apr 6, 2015, at 1:37 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Here is the command that I have used :
>
> spark-submit —class packagename.ALSNew --num-executors 100 --master yarn
> ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path
>
> Btw - I could run the old ALS in mllib package.
>
>
>
>
>
> On Apr 6, 2015, at 12:32 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> So ALSNew.scala is your own application, did you add it with
> spark-submit or spark-shell? The correct command should like
>
> spark-submit --class your.package.name.ALSNew ALSNew.jar [options]
>
> Please check the documentation:
> http://spark.apache.org/docs/latest/submitting-applications.html
>
> -Xiangrui
>
> On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi,
>
> Here is the stack trace:
>
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
> at ALSNew$.main(ALSNew.scala:35)
> at ALSNew.main(ALSNew.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Thanks,
> Jay
>
>
>
> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please attach the full stack trace. -Xiangrui
>
> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>
>
> Hi all,
>
> I got a runtime error while running the ALS.
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>
>
> The error that I am getting is at the following code:
>
> val ratings = purchase.map ( line =>
>   line.split(',') match { case Array(user, item, rate) =>
>   (user.toInt, item.toInt, rate.toFloat)
>   }).toDF()
>
>
> Any help is appreciated !
>
> I have tried passing the spark-sql jar using the -jar
> spark-sql_2.11-1.3.0.jar
>
> Thanks,
> Jay
>
>
>
> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please remember to copy the user list next time. I might not be able
> to respond quickly. There are many others who can help or who can
> benefit from the discussion. Thanks! -Xiangrui
>
> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Great Xiangrui. It works now.
>
> Sorry that I needed to bug you :)
>
> Jay
>
>
> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please check this section in the user guide:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>
> You need `import sqlContext.implicits._` to use `toDF()`.
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi Xiangrui,
> Thanks a lot for the quick reply.
>
> I am still facing an issue.
>
> I have tried the code snippet that you have suggested:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate”)}
>
> for this, I got the below error:
>
> error: ';' expected but '.' found.
> [INFO] }.toDF("user", "item", "rate”)}
> [INFO]  ^
>
> when I tried below code
>
> val ratings = purchase.map ( line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }).toDF("user", "item", "rate")
>
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
> Float)]
> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
> [INFO]     }).toDF("user", "item", "rate")
>
>
>
> I have looked at the document that you have shared and tried the following
> code:
>
> case class Record(user: Int, item: Int, rate:Double)
> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>
> for this, I got the below error:
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>
>
> Appreciate your help !
>
> Thanks,
> Jay
>
>
> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Try this:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate")
>
> Doc for DataFrames:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
> def train(
>  ratings: RDD[Rating],
>  rank: Int,
>  iterations: Int,
>  lambda: Double,
>  blocks: Int,
>  seed: Long
> ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
> "val sc = new SparkContext(conf)
>
> val pfile = args(0)
> val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>    Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>   .setRank(rank)
>  .setRegParam(regParam)
>  .setImplicitPrefs(implicitPrefs)
>  .setNumUserBlocks(numUserBlocks)
>  .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
> val pfile = args(0)
> val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>    Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: org.apache.spark.ml.recommendation.ALS

Posted by Jay Katukuri <jk...@apple.com>.

Hi Xiangrui,

I tried running this on my local machine  (laptop) and got the same error:

Here is what I did:

1. downloaded spark 1.30 release version (prebuilt for hadoop 2.4 and later)  "spark-1.3.0-bin-hadoop2.4.tgz".
2. Ran the following command:

spark-submit --class ALSNew  --master local[8] ALSNew.jar  /input_path 


The stack trace is exactly same.

Thanks,
Jay



On Apr 8, 2015, at 10:47 AM, Jay Katukuri <jk...@apple.com> wrote:

> some additional context:
> 
> Since, I am using features of spark 1.3.0, I have downloaded spark 1.3.0 and used spark-submit from there.
> The cluster is still on spark-1.2.0.
> 
> So, this looks to me that at runtime, the executors could not find some libraries of spark-1.3.0, even though I ran spark-submit from my downloaded spark-1.30.
> 
>  
> 
> On Apr 6, 2015, at 1:37 PM, Jay Katukuri <jk...@apple.com> wrote:
> 
>> Here is the command that I have used :
>> 
>> spark-submit —class packagename.ALSNew --num-executors 100 --master yarn ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path 
>> 
>> Btw - I could run the old ALS in mllib package.
>> 
>> 
>>  
>> 
>> 
>> On Apr 6, 2015, at 12:32 PM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>>> So ALSNew.scala is your own application, did you add it with
>>> spark-submit or spark-shell? The correct command should like
>>> 
>>> spark-submit --class your.package.name.ALSNew ALSNew.jar [options]
>>> 
>>> Please check the documentation:
>>> http://spark.apache.org/docs/latest/submitting-applications.html
>>> 
>>> -Xiangrui
>>> 
>>> On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jk...@apple.com> wrote:
>>>> Hi,
>>>> 
>>>> Here is the stack trace:
>>>> 
>>>> 
>>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>>>> at ALSNew$.main(ALSNew.scala:35)
>>>> at ALSNew.main(ALSNew.scala)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>> at
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>>> at
>>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
>>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
>>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
>>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
>>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>> 
>>>> 
>>>> Thanks,
>>>> Jay
>>>> 
>>>> 
>>>> 
>>>> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
>>>> 
>>>> Please attach the full stack trace. -Xiangrui
>>>> 
>>>> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>>>> 
>>>> 
>>>> Hi all,
>>>> 
>>>> I got a runtime error while running the ALS.
>>>> 
>>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>>>> 
>>>> 
>>>> The error that I am getting is at the following code:
>>>> 
>>>> val ratings = purchase.map ( line =>
>>>>   line.split(',') match { case Array(user, item, rate) =>
>>>>   (user.toInt, item.toInt, rate.toFloat)
>>>>   }).toDF()
>>>> 
>>>> 
>>>> Any help is appreciated !
>>>> 
>>>> I have tried passing the spark-sql jar using the -jar
>>>> spark-sql_2.11-1.3.0.jar
>>>> 
>>>> Thanks,
>>>> Jay
>>>> 
>>>> 
>>>> 
>>>> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>>>> 
>>>> Please remember to copy the user list next time. I might not be able
>>>> to respond quickly. There are many others who can help or who can
>>>> benefit from the discussion. Thanks! -Xiangrui
>>>> 
>>>> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>>>> 
>>>> Great Xiangrui. It works now.
>>>> 
>>>> Sorry that I needed to bug you :)
>>>> 
>>>> Jay
>>>> 
>>>> 
>>>> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>> 
>>>> Please check this section in the user guide:
>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>>> 
>>>> You need `import sqlContext.implicits._` to use `toDF()`.
>>>> 
>>>> -Xiangrui
>>>> 
>>>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>>>> 
>>>> Hi Xiangrui,
>>>> Thanks a lot for the quick reply.
>>>> 
>>>> I am still facing an issue.
>>>> 
>>>> I have tried the code snippet that you have suggested:
>>>> 
>>>> val ratings = purchase.map { line =>
>>>> line.split(',') match { case Array(user, item, rate) =>
>>>> (user.toInt, item.toInt, rate.toFloat)
>>>> }.toDF("user", "item", "rate”)}
>>>> 
>>>> for this, I got the below error:
>>>> 
>>>> error: ';' expected but '.' found.
>>>> [INFO] }.toDF("user", "item", "rate”)}
>>>> [INFO]  ^
>>>> 
>>>> when I tried below code
>>>> 
>>>> val ratings = purchase.map ( line =>
>>>> line.split(',') match { case Array(user, item, rate) =>
>>>> (user.toInt, item.toInt, rate.toFloat)
>>>> }).toDF("user", "item", "rate")
>>>> 
>>>> 
>>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>>>> Float)]
>>>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>>>> [INFO]     }).toDF("user", "item", "rate")
>>>> 
>>>> 
>>>> 
>>>> I have looked at the document that you have shared and tried the following
>>>> code:
>>>> 
>>>> case class Record(user: Int, item: Int, rate:Double)
>>>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>>>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>>>> 
>>>> for this, I got the below error:
>>>> 
>>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>>>> 
>>>> 
>>>> Appreciate your help !
>>>> 
>>>> Thanks,
>>>> Jay
>>>> 
>>>> 
>>>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>> 
>>>> Try this:
>>>> 
>>>> val ratings = purchase.map { line =>
>>>> line.split(',') match { case Array(user, item, rate) =>
>>>> (user.toInt, item.toInt, rate.toFloat)
>>>> }.toDF("user", "item", "rate")
>>>> 
>>>> Doc for DataFrames:
>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>>>> 
>>>> -Xiangrui
>>>> 
>>>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>>>> 
>>>> Hi all,
>>>> I am trying to use the new ALS implementation under
>>>> org.apache.spark.ml.recommendation.ALS.
>>>> 
>>>> 
>>>> 
>>>> The new method to invoke for training seems to be  override def fit(dataset:
>>>> DataFrame, paramMap: ParamMap): ALSModel.
>>>> 
>>>> How do I create a dataframe object from ratings data set that is on hdfs ?
>>>> 
>>>> 
>>>> where as the method in the old ALS implementation under
>>>> org.apache.spark.mllib.recommendation.ALS was
>>>> def train(
>>>>  ratings: RDD[Rating],
>>>>  rank: Int,
>>>>  iterations: Int,
>>>>  lambda: Double,
>>>>  blocks: Int,
>>>>  seed: Long
>>>> ): MatrixFactorizationModel
>>>> 
>>>> My code to run the old ALS train method is as below:
>>>> 
>>>> "val sc = new SparkContext(conf)
>>>> 
>>>> val pfile = args(0)
>>>> val purchase=sc.textFile(pfile)
>>>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>>> rate) =>
>>>>    Rating(user.toInt, item.toInt, rate.toInt)
>>>> })
>>>> 
>>>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>>>> 
>>>> 
>>>> Now, for the new ALS fit method, I am trying to use the below code to run,
>>>> but getting a compilation error:
>>>> 
>>>> val als = new ALS()
>>>>   .setRank(rank)
>>>>  .setRegParam(regParam)
>>>>  .setImplicitPrefs(implicitPrefs)
>>>>  .setNumUserBlocks(numUserBlocks)
>>>>  .setNumItemBlocks(numItemBlocks)
>>>> 
>>>> val sc = new SparkContext(conf)
>>>> 
>>>> val pfile = args(0)
>>>> val purchase=sc.textFile(pfile)
>>>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>>> rate) =>
>>>>    Rating(user.toInt, item.toInt, rate.toInt)
>>>> })
>>>> 
>>>> val model = als.fit(ratings.toDF())
>>>> 
>>>> I get an error that the method toDF() is not a member of
>>>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>>>> 
>>>> Appreciate the help !
>>>> 
>>>> Thanks,
>>>> Jay
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>

Re: org.apache.spark.ml.recommendation.ALS

Posted by Jay Katukuri <jk...@apple.com>.

some additional context:

Since, I am using features of spark 1.3.0, I have downloaded spark 1.3.0 and used spark-submit from there.
The cluster is still on spark-1.2.0.

So, this looks to me that at runtime, the executors could not find some libraries of spark-1.3.0, even though I ran spark-submit from my downloaded spark-1.30.

 

On Apr 6, 2015, at 1:37 PM, Jay Katukuri <jk...@apple.com> wrote:

> Here is the command that I have used :
> 
> spark-submit —class packagename.ALSNew --num-executors 100 --master yarn ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path 
> 
> Btw - I could run the old ALS in mllib package.
> 
> 
>  
> 
> 
> On Apr 6, 2015, at 12:32 PM, Xiangrui Meng <me...@gmail.com> wrote:
> 
>> So ALSNew.scala is your own application, did you add it with
>> spark-submit or spark-shell? The correct command should like
>> 
>> spark-submit --class your.package.name.ALSNew ALSNew.jar [options]
>> 
>> Please check the documentation:
>> http://spark.apache.org/docs/latest/submitting-applications.html
>> 
>> -Xiangrui
>> 
>> On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jk...@apple.com> wrote:
>>> Hi,
>>> 
>>> Here is the stack trace:
>>> 
>>> 
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>>> at ALSNew$.main(ALSNew.scala:35)
>>> at ALSNew.main(ALSNew.scala)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:483)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>> 
>>> 
>>> Thanks,
>>> Jay
>>> 
>>> 
>>> 
>>> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
>>> 
>>> Please attach the full stack trace. -Xiangrui
>>> 
>>> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>>> 
>>> 
>>> Hi all,
>>> 
>>> I got a runtime error while running the ALS.
>>> 
>>> Exception in thread "main" java.lang.NoSuchMethodError:
>>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>>> 
>>> 
>>> The error that I am getting is at the following code:
>>> 
>>> val ratings = purchase.map ( line =>
>>>   line.split(',') match { case Array(user, item, rate) =>
>>>   (user.toInt, item.toInt, rate.toFloat)
>>>   }).toDF()
>>> 
>>> 
>>> Any help is appreciated !
>>> 
>>> I have tried passing the spark-sql jar using the -jar
>>> spark-sql_2.11-1.3.0.jar
>>> 
>>> Thanks,
>>> Jay
>>> 
>>> 
>>> 
>>> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>>> 
>>> Please remember to copy the user list next time. I might not be able
>>> to respond quickly. There are many others who can help or who can
>>> benefit from the discussion. Thanks! -Xiangrui
>>> 
>>> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>>> 
>>> Great Xiangrui. It works now.
>>> 
>>> Sorry that I needed to bug you :)
>>> 
>>> Jay
>>> 
>>> 
>>> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>> 
>>> Please check this section in the user guide:
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>> 
>>> You need `import sqlContext.implicits._` to use `toDF()`.
>>> 
>>> -Xiangrui
>>> 
>>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>>> 
>>> Hi Xiangrui,
>>> Thanks a lot for the quick reply.
>>> 
>>> I am still facing an issue.
>>> 
>>> I have tried the code snippet that you have suggested:
>>> 
>>> val ratings = purchase.map { line =>
>>> line.split(',') match { case Array(user, item, rate) =>
>>> (user.toInt, item.toInt, rate.toFloat)
>>> }.toDF("user", "item", "rate”)}
>>> 
>>> for this, I got the below error:
>>> 
>>> error: ';' expected but '.' found.
>>> [INFO] }.toDF("user", "item", "rate”)}
>>> [INFO]  ^
>>> 
>>> when I tried below code
>>> 
>>> val ratings = purchase.map ( line =>
>>> line.split(',') match { case Array(user, item, rate) =>
>>> (user.toInt, item.toInt, rate.toFloat)
>>> }).toDF("user", "item", "rate")
>>> 
>>> 
>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>>> Float)]
>>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>>> [INFO]     }).toDF("user", "item", "rate")
>>> 
>>> 
>>> 
>>> I have looked at the document that you have shared and tried the following
>>> code:
>>> 
>>> case class Record(user: Int, item: Int, rate:Double)
>>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>>> 
>>> for this, I got the below error:
>>> 
>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>>> 
>>> 
>>> Appreciate your help !
>>> 
>>> Thanks,
>>> Jay
>>> 
>>> 
>>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>> 
>>> Try this:
>>> 
>>> val ratings = purchase.map { line =>
>>> line.split(',') match { case Array(user, item, rate) =>
>>> (user.toInt, item.toInt, rate.toFloat)
>>> }.toDF("user", "item", "rate")
>>> 
>>> Doc for DataFrames:
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>>> 
>>> -Xiangrui
>>> 
>>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>>> 
>>> Hi all,
>>> I am trying to use the new ALS implementation under
>>> org.apache.spark.ml.recommendation.ALS.
>>> 
>>> 
>>> 
>>> The new method to invoke for training seems to be  override def fit(dataset:
>>> DataFrame, paramMap: ParamMap): ALSModel.
>>> 
>>> How do I create a dataframe object from ratings data set that is on hdfs ?
>>> 
>>> 
>>> where as the method in the old ALS implementation under
>>> org.apache.spark.mllib.recommendation.ALS was
>>> def train(
>>>  ratings: RDD[Rating],
>>>  rank: Int,
>>>  iterations: Int,
>>>  lambda: Double,
>>>  blocks: Int,
>>>  seed: Long
>>> ): MatrixFactorizationModel
>>> 
>>> My code to run the old ALS train method is as below:
>>> 
>>> "val sc = new SparkContext(conf)
>>> 
>>> val pfile = args(0)
>>> val purchase=sc.textFile(pfile)
>>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>> rate) =>
>>>    Rating(user.toInt, item.toInt, rate.toInt)
>>> })
>>> 
>>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>>> 
>>> 
>>> Now, for the new ALS fit method, I am trying to use the below code to run,
>>> but getting a compilation error:
>>> 
>>> val als = new ALS()
>>>   .setRank(rank)
>>>  .setRegParam(regParam)
>>>  .setImplicitPrefs(implicitPrefs)
>>>  .setNumUserBlocks(numUserBlocks)
>>>  .setNumItemBlocks(numItemBlocks)
>>> 
>>> val sc = new SparkContext(conf)
>>> 
>>> val pfile = args(0)
>>> val purchase=sc.textFile(pfile)
>>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>> rate) =>
>>>    Rating(user.toInt, item.toInt, rate.toInt)
>>> })
>>> 
>>> val model = als.fit(ratings.toDF())
>>> 
>>> I get an error that the method toDF() is not a member of
>>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>>> 
>>> Appreciate the help !
>>> 
>>> Thanks,
>>> Jay
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>> 
>>> 
>>> 
>>> 
>>> 
>

Re: org.apache.spark.ml.recommendation.ALS

Posted by Jay Katukuri <jk...@apple.com>.

Here is the command that I have used :

spark-submit —class packagename.ALSNew --num-executors 100 --master yarn ALSNew.jar -jar spark-sql_2.11-1.3.0.jar hdfs://input_path 

Btw - I could run the old ALS in mllib package.


 


On Apr 6, 2015, at 12:32 PM, Xiangrui Meng <me...@gmail.com> wrote:

> So ALSNew.scala is your own application, did you add it with
> spark-submit or spark-shell? The correct command should like
> 
> spark-submit --class your.package.name.ALSNew ALSNew.jar [options]
> 
> Please check the documentation:
> http://spark.apache.org/docs/latest/submitting-applications.html
> 
> -Xiangrui
> 
> On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jk...@apple.com> wrote:
>> Hi,
>> 
>> Here is the stack trace:
>> 
>> 
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>> at ALSNew$.main(ALSNew.scala:35)
>> at ALSNew.main(ALSNew.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:483)
>> at
>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>> 
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please attach the full stack trace. -Xiangrui
>> 
>> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> 
>> Hi all,
>> 
>> I got a runtime error while running the ALS.
>> 
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>> 
>> 
>> The error that I am getting is at the following code:
>> 
>> val ratings = purchase.map ( line =>
>>   line.split(',') match { case Array(user, item, rate) =>
>>   (user.toInt, item.toInt, rate.toFloat)
>>   }).toDF()
>> 
>> 
>> Any help is appreciated !
>> 
>> I have tried passing the spark-sql jar using the -jar
>> spark-sql_2.11-1.3.0.jar
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please remember to copy the user list next time. I might not be able
>> to respond quickly. There are many others who can help or who can
>> benefit from the discussion. Thanks! -Xiangrui
>> 
>> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Great Xiangrui. It works now.
>> 
>> Sorry that I needed to bug you :)
>> 
>> Jay
>> 
>> 
>> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please check this section in the user guide:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>> 
>> You need `import sqlContext.implicits._` to use `toDF()`.
>> 
>> -Xiangrui
>> 
>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Hi Xiangrui,
>> Thanks a lot for the quick reply.
>> 
>> I am still facing an issue.
>> 
>> I have tried the code snippet that you have suggested:
>> 
>> val ratings = purchase.map { line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }.toDF("user", "item", "rate”)}
>> 
>> for this, I got the below error:
>> 
>> error: ';' expected but '.' found.
>> [INFO] }.toDF("user", "item", "rate”)}
>> [INFO]  ^
>> 
>> when I tried below code
>> 
>> val ratings = purchase.map ( line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }).toDF("user", "item", "rate")
>> 
>> 
>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>> Float)]
>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>> [INFO]     }).toDF("user", "item", "rate")
>> 
>> 
>> 
>> I have looked at the document that you have shared and tried the following
>> code:
>> 
>> case class Record(user: Int, item: Int, rate:Double)
>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>> 
>> for this, I got the below error:
>> 
>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>> 
>> 
>> Appreciate your help !
>> 
>> Thanks,
>> Jay
>> 
>> 
>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Try this:
>> 
>> val ratings = purchase.map { line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }.toDF("user", "item", "rate")
>> 
>> Doc for DataFrames:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>> 
>> -Xiangrui
>> 
>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>> 
>> Hi all,
>> I am trying to use the new ALS implementation under
>> org.apache.spark.ml.recommendation.ALS.
>> 
>> 
>> 
>> The new method to invoke for training seems to be  override def fit(dataset:
>> DataFrame, paramMap: ParamMap): ALSModel.
>> 
>> How do I create a dataframe object from ratings data set that is on hdfs ?
>> 
>> 
>> where as the method in the old ALS implementation under
>> org.apache.spark.mllib.recommendation.ALS was
>> def train(
>>  ratings: RDD[Rating],
>>  rank: Int,
>>  iterations: Int,
>>  lambda: Double,
>>  blocks: Int,
>>  seed: Long
>> ): MatrixFactorizationModel
>> 
>> My code to run the old ALS train method is as below:
>> 
>> "val sc = new SparkContext(conf)
>> 
>> val pfile = args(0)
>> val purchase=sc.textFile(pfile)
>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>> rate) =>
>>    Rating(user.toInt, item.toInt, rate.toInt)
>> })
>> 
>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>> 
>> 
>> Now, for the new ALS fit method, I am trying to use the below code to run,
>> but getting a compilation error:
>> 
>> val als = new ALS()
>>   .setRank(rank)
>>  .setRegParam(regParam)
>>  .setImplicitPrefs(implicitPrefs)
>>  .setNumUserBlocks(numUserBlocks)
>>  .setNumItemBlocks(numItemBlocks)
>> 
>> val sc = new SparkContext(conf)
>> 
>> val pfile = args(0)
>> val purchase=sc.textFile(pfile)
>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>> rate) =>
>>    Rating(user.toInt, item.toInt, rate.toInt)
>> })
>> 
>> val model = als.fit(ratings.toDF())
>> 
>> I get an error that the method toDF() is not a member of
>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>> 
>> Appreciate the help !
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
>> 
>> 
>> 
>>

Re: org.apache.spark.ml.recommendation.ALS

Posted by Xiangrui Meng <me...@gmail.com>.

So ALSNew.scala is your own application, did you add it with
spark-submit or spark-shell? The correct command should like

spark-submit --class your.package.name.ALSNew ALSNew.jar [options]

Please check the documentation:
http://spark.apache.org/docs/latest/submitting-applications.html

-Xiangrui

On Mon, Apr 6, 2015 at 12:27 PM, Jay Katukuri <jk...@apple.com> wrote:
> Hi,
>
> Here is the stack trace:
>
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
> at ALSNew$.main(ALSNew.scala:35)
> at ALSNew.main(ALSNew.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>
> Thanks,
> Jay
>
>
>
> On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please attach the full stack trace. -Xiangrui
>
> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>
>
> Hi all,
>
> I got a runtime error while running the ALS.
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>
>
> The error that I am getting is at the following code:
>
> val ratings = purchase.map ( line =>
>    line.split(',') match { case Array(user, item, rate) =>
>    (user.toInt, item.toInt, rate.toFloat)
>    }).toDF()
>
>
> Any help is appreciated !
>
> I have tried passing the spark-sql jar using the -jar
> spark-sql_2.11-1.3.0.jar
>
> Thanks,
> Jay
>
>
>
> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please remember to copy the user list next time. I might not be able
> to respond quickly. There are many others who can help or who can
> benefit from the discussion. Thanks! -Xiangrui
>
> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Great Xiangrui. It works now.
>
> Sorry that I needed to bug you :)
>
> Jay
>
>
> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please check this section in the user guide:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>
> You need `import sqlContext.implicits._` to use `toDF()`.
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi Xiangrui,
> Thanks a lot for the quick reply.
>
> I am still facing an issue.
>
> I have tried the code snippet that you have suggested:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate”)}
>
> for this, I got the below error:
>
> error: ';' expected but '.' found.
> [INFO] }.toDF("user", "item", "rate”)}
> [INFO]  ^
>
> when I tried below code
>
> val ratings = purchase.map ( line =>
>  line.split(',') match { case Array(user, item, rate) =>
>  (user.toInt, item.toInt, rate.toFloat)
>  }).toDF("user", "item", "rate")
>
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
> Float)]
> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
> [INFO]     }).toDF("user", "item", "rate")
>
>
>
> I have looked at the document that you have shared and tried the following
> code:
>
> case class Record(user: Int, item: Int, rate:Double)
> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>
> for this, I got the below error:
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>
>
> Appreciate your help !
>
> Thanks,
> Jay
>
>
> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Try this:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate")
>
> Doc for DataFrames:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
> def train(
>   ratings: RDD[Rating],
>   rank: Int,
>   iterations: Int,
>   lambda: Double,
>   blocks: Int,
>   seed: Long
> ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
> "val sc = new SparkContext(conf)
>
>  val pfile = args(0)
>  val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>     Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>    .setRank(rank)
>   .setRegParam(regParam)
>   .setImplicitPrefs(implicitPrefs)
>   .setNumUserBlocks(numUserBlocks)
>   .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
>  val pfile = args(0)
>  val purchase=sc.textFile(pfile)
> val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>     Rating(user.toInt, item.toInt, rate.toInt)
> })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: org.apache.spark.ml.recommendation.ALS

Posted by Jay Katukuri <jk...@apple.com>.

Hi,

Here is the stack trace:


Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
	at ALSNew$.main(ALSNew.scala:35)
	at ALSNew.main(ALSNew.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:483)
	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


Thanks,
Jay



On Apr 6, 2015, at 12:24 PM, Xiangrui Meng <me...@gmail.com> wrote:

> Please attach the full stack trace. -Xiangrui
> 
> On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Hi all,
>> 
>> I got a runtime error while running the ALS.
>> 
>> Exception in thread "main" java.lang.NoSuchMethodError:
>> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>> 
>> 
>> The error that I am getting is at the following code:
>> 
>> val ratings = purchase.map ( line =>
>>    line.split(',') match { case Array(user, item, rate) =>
>>    (user.toInt, item.toInt, rate.toFloat)
>>    }).toDF()
>> 
>> 
>> Any help is appreciated !
>> 
>> I have tried passing the spark-sql jar using the -jar
>> spark-sql_2.11-1.3.0.jar
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please remember to copy the user list next time. I might not be able
>> to respond quickly. There are many others who can help or who can
>> benefit from the discussion. Thanks! -Xiangrui
>> 
>> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Great Xiangrui. It works now.
>> 
>> Sorry that I needed to bug you :)
>> 
>> Jay
>> 
>> 
>> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Please check this section in the user guide:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>> 
>> You need `import sqlContext.implicits._` to use `toDF()`.
>> 
>> -Xiangrui
>> 
>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>> 
>> Hi Xiangrui,
>> Thanks a lot for the quick reply.
>> 
>> I am still facing an issue.
>> 
>> I have tried the code snippet that you have suggested:
>> 
>> val ratings = purchase.map { line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }.toDF("user", "item", "rate”)}
>> 
>> for this, I got the below error:
>> 
>> error: ';' expected but '.' found.
>> [INFO] }.toDF("user", "item", "rate”)}
>> [INFO]  ^
>> 
>> when I tried below code
>> 
>> val ratings = purchase.map ( line =>
>>  line.split(',') match { case Array(user, item, rate) =>
>>  (user.toInt, item.toInt, rate.toFloat)
>>  }).toDF("user", "item", "rate")
>> 
>> 
>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>> Float)]
>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>> [INFO]     }).toDF("user", "item", "rate")
>> 
>> 
>> 
>> I have looked at the document that you have shared and tried the following
>> code:
>> 
>> case class Record(user: Int, item: Int, rate:Double)
>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>> 
>> for this, I got the below error:
>> 
>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>> 
>> 
>> Appreciate your help !
>> 
>> Thanks,
>> Jay
>> 
>> 
>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>> Try this:
>> 
>> val ratings = purchase.map { line =>
>> line.split(',') match { case Array(user, item, rate) =>
>> (user.toInt, item.toInt, rate.toFloat)
>> }.toDF("user", "item", "rate")
>> 
>> Doc for DataFrames:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>> 
>> -Xiangrui
>> 
>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>> 
>> Hi all,
>> I am trying to use the new ALS implementation under
>> org.apache.spark.ml.recommendation.ALS.
>> 
>> 
>> 
>> The new method to invoke for training seems to be  override def fit(dataset:
>> DataFrame, paramMap: ParamMap): ALSModel.
>> 
>> How do I create a dataframe object from ratings data set that is on hdfs ?
>> 
>> 
>> where as the method in the old ALS implementation under
>> org.apache.spark.mllib.recommendation.ALS was
>> def train(
>>   ratings: RDD[Rating],
>>   rank: Int,
>>   iterations: Int,
>>   lambda: Double,
>>   blocks: Int,
>>   seed: Long
>> ): MatrixFactorizationModel
>> 
>> My code to run the old ALS train method is as below:
>> 
>> "val sc = new SparkContext(conf)
>> 
>>  val pfile = args(0)
>>  val purchase=sc.textFile(pfile)
>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>> rate) =>
>>     Rating(user.toInt, item.toInt, rate.toInt)
>> })
>> 
>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>> 
>> 
>> Now, for the new ALS fit method, I am trying to use the below code to run,
>> but getting a compilation error:
>> 
>> val als = new ALS()
>>    .setRank(rank)
>>   .setRegParam(regParam)
>>   .setImplicitPrefs(implicitPrefs)
>>   .setNumUserBlocks(numUserBlocks)
>>   .setNumItemBlocks(numItemBlocks)
>> 
>> val sc = new SparkContext(conf)
>> 
>>  val pfile = args(0)
>>  val purchase=sc.textFile(pfile)
>> val ratings = purchase.map(_.split(',') match { case Array(user, item,
>> rate) =>
>>     Rating(user.toInt, item.toInt, rate.toInt)
>> })
>> 
>> val model = als.fit(ratings.toDF())
>> 
>> I get an error that the method toDF() is not a member of
>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>> 
>> Appreciate the help !
>> 
>> Thanks,
>> Jay
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 
>> 
>> 
>>

Re: org.apache.spark.ml.recommendation.ALS

Posted by Xiangrui Meng <me...@gmail.com>.

Please attach the full stack trace. -Xiangrui

On Mon, Apr 6, 2015 at 12:06 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi all,
>
> I got a runtime error while running the ALS.
>
> Exception in thread "main" java.lang.NoSuchMethodError:
> scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;
>
>
> The error that I am getting is at the following code:
>
> val ratings = purchase.map ( line =>
>     line.split(',') match { case Array(user, item, rate) =>
>     (user.toInt, item.toInt, rate.toFloat)
>     }).toDF()
>
>
> Any help is appreciated !
>
> I have tried passing the spark-sql jar using the -jar
> spark-sql_2.11-1.3.0.jar
>
> Thanks,
> Jay
>
>
>
> On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please remember to copy the user list next time. I might not be able
> to respond quickly. There are many others who can help or who can
> benefit from the discussion. Thanks! -Xiangrui
>
> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Great Xiangrui. It works now.
>
> Sorry that I needed to bug you :)
>
> Jay
>
>
> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Please check this section in the user guide:
> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>
> You need `import sqlContext.implicits._` to use `toDF()`.
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>
> Hi Xiangrui,
> Thanks a lot for the quick reply.
>
> I am still facing an issue.
>
> I have tried the code snippet that you have suggested:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate”)}
>
> for this, I got the below error:
>
> error: ';' expected but '.' found.
> [INFO] }.toDF("user", "item", "rate”)}
> [INFO]  ^
>
> when I tried below code
>
> val ratings = purchase.map ( line =>
>   line.split(',') match { case Array(user, item, rate) =>
>   (user.toInt, item.toInt, rate.toFloat)
>   }).toDF("user", "item", "rate")
>
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
> Float)]
> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
> [INFO]     }).toDF("user", "item", "rate")
>
>
>
> I have looked at the document that you have shared and tried the following
> code:
>
> case class Record(user: Int, item: Int, rate:Double)
> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>
> for this, I got the below error:
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>
>
> Appreciate your help !
>
> Thanks,
> Jay
>
>
> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Try this:
>
> val ratings = purchase.map { line =>
> line.split(',') match { case Array(user, item, rate) =>
> (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate")
>
> Doc for DataFrames:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
> def train(
>    ratings: RDD[Rating],
>    rank: Int,
>    iterations: Int,
>    lambda: Double,
>    blocks: Int,
>    seed: Long
>  ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
> "val sc = new SparkContext(conf)
>
>   val pfile = args(0)
>   val purchase=sc.textFile(pfile)
>  val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>      Rating(user.toInt, item.toInt, rate.toInt)
>  })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>     .setRank(rank)
>    .setRegParam(regParam)
>    .setImplicitPrefs(implicitPrefs)
>    .setNumUserBlocks(numUserBlocks)
>    .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
>   val pfile = args(0)
>   val purchase=sc.textFile(pfile)
>  val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>      Rating(user.toInt, item.toInt, rate.toInt)
>  })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

org.apache.spark.ml.recommendation.ALS

Posted by Jay Katukuri <jk...@apple.com>.

Hi all,

I got a runtime error while running the ALS.

Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaUniverse$JavaMirror;


The error that I am getting is at the following code:

val ratings = purchase.map ( line =>	
    line.split(',') match { case Array(user, item, rate) =>
    (user.toInt, item.toInt, rate.toFloat)
    }).toDF()


Any help is appreciated !

I have tried passing the spark-sql jar using the -jar spark-sql_2.11-1.3.0.jar

Thanks,
Jay



On Mar 17, 2015, at 12:50 PM, Xiangrui Meng <me...@gmail.com> wrote:

> Please remember to copy the user list next time. I might not be able
> to respond quickly. There are many others who can help or who can
> benefit from the discussion. Thanks! -Xiangrui
> 
> On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
>> Great Xiangrui. It works now.
>> 
>> Sorry that I needed to bug you :)
>> 
>> Jay
>> 
>> 
>> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>> 
>>> Please check this section in the user guide:
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>> 
>>> You need `import sqlContext.implicits._` to use `toDF()`.
>>> 
>>> -Xiangrui
>>> 
>>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>>>> Hi Xiangrui,
>>>> Thanks a lot for the quick reply.
>>>> 
>>>> I am still facing an issue.
>>>> 
>>>> I have tried the code snippet that you have suggested:
>>>> 
>>>> val ratings = purchase.map { line =>
>>>> line.split(',') match { case Array(user, item, rate) =>
>>>> (user.toInt, item.toInt, rate.toFloat)
>>>> }.toDF("user", "item", "rate”)}
>>>> 
>>>> for this, I got the below error:
>>>> 
>>>> error: ';' expected but '.' found.
>>>> [INFO] }.toDF("user", "item", "rate”)}
>>>> [INFO]  ^
>>>> 
>>>> when I tried below code
>>>> 
>>>> val ratings = purchase.map ( line =>
>>>>   line.split(',') match { case Array(user, item, rate) =>
>>>>   (user.toInt, item.toInt, rate.toFloat)
>>>>   }).toDF("user", "item", "rate")
>>>> 
>>>> 
>>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>>>> Float)]
>>>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>>>> [INFO]     }).toDF("user", "item", "rate")
>>>> 
>>>> 
>>>> 
>>>> I have looked at the document that you have shared and tried the following
>>>> code:
>>>> 
>>>> case class Record(user: Int, item: Int, rate:Double)
>>>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>>>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>>>> 
>>>> for this, I got the below error:
>>>> 
>>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>>>> 
>>>> 
>>>> Appreciate your help !
>>>> 
>>>> Thanks,
>>>> Jay
>>>> 
>>>> 
>>>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>> 
>>>> Try this:
>>>> 
>>>> val ratings = purchase.map { line =>
>>>> line.split(',') match { case Array(user, item, rate) =>
>>>> (user.toInt, item.toInt, rate.toFloat)
>>>> }.toDF("user", "item", "rate")
>>>> 
>>>> Doc for DataFrames:
>>>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>>>> 
>>>> -Xiangrui
>>>> 
>>>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>>>> 
>>>> Hi all,
>>>> I am trying to use the new ALS implementation under
>>>> org.apache.spark.ml.recommendation.ALS.
>>>> 
>>>> 
>>>> 
>>>> The new method to invoke for training seems to be  override def fit(dataset:
>>>> DataFrame, paramMap: ParamMap): ALSModel.
>>>> 
>>>> How do I create a dataframe object from ratings data set that is on hdfs ?
>>>> 
>>>> 
>>>> where as the method in the old ALS implementation under
>>>> org.apache.spark.mllib.recommendation.ALS was
>>>> def train(
>>>>    ratings: RDD[Rating],
>>>>    rank: Int,
>>>>    iterations: Int,
>>>>    lambda: Double,
>>>>    blocks: Int,
>>>>    seed: Long
>>>>  ): MatrixFactorizationModel
>>>> 
>>>> My code to run the old ALS train method is as below:
>>>> 
>>>> "val sc = new SparkContext(conf)
>>>> 
>>>>   val pfile = args(0)
>>>>   val purchase=sc.textFile(pfile)
>>>>  val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>>> rate) =>
>>>>      Rating(user.toInt, item.toInt, rate.toInt)
>>>>  })
>>>> 
>>>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>>>> 
>>>> 
>>>> Now, for the new ALS fit method, I am trying to use the below code to run,
>>>> but getting a compilation error:
>>>> 
>>>> val als = new ALS()
>>>>     .setRank(rank)
>>>>    .setRegParam(regParam)
>>>>    .setImplicitPrefs(implicitPrefs)
>>>>    .setNumUserBlocks(numUserBlocks)
>>>>    .setNumItemBlocks(numItemBlocks)
>>>> 
>>>> val sc = new SparkContext(conf)
>>>> 
>>>>   val pfile = args(0)
>>>>   val purchase=sc.textFile(pfile)
>>>>  val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>>> rate) =>
>>>>      Rating(user.toInt, item.toInt, rate.toInt)
>>>>  })
>>>> 
>>>> val model = als.fit(ratings.toDF())
>>>> 
>>>> I get an error that the method toDF() is not a member of
>>>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>>>> 
>>>> Appreciate the help !
>>>> 
>>>> Thanks,
>>>> Jay
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>> 
>>>> 
>>

Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Posted by Xiangrui Meng <me...@gmail.com>.

Please remember to copy the user list next time. I might not be able
to respond quickly. There are many others who can help or who can
benefit from the discussion. Thanks! -Xiangrui

On Tue, Mar 17, 2015 at 12:04 PM, Jay Katukuri <jk...@apple.com> wrote:
> Great Xiangrui. It works now.
>
> Sorry that I needed to bug you :)
>
> Jay
>
>
> On Mar 17, 2015, at 11:48 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
>> Please check this section in the user guide:
>> http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>
>> You need `import sqlContext.implicits._` to use `toDF()`.
>>
>> -Xiangrui
>>
>> On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
>>> Hi Xiangrui,
>>> Thanks a lot for the quick reply.
>>>
>>> I am still facing an issue.
>>>
>>> I have tried the code snippet that you have suggested:
>>>
>>> val ratings = purchase.map { line =>
>>> line.split(',') match { case Array(user, item, rate) =>
>>> (user.toInt, item.toInt, rate.toFloat)
>>> }.toDF("user", "item", "rate”)}
>>>
>>> for this, I got the below error:
>>>
>>> error: ';' expected but '.' found.
>>> [INFO] }.toDF("user", "item", "rate”)}
>>> [INFO]  ^
>>>
>>> when I tried below code
>>>
>>> val ratings = purchase.map ( line =>
>>>    line.split(',') match { case Array(user, item, rate) =>
>>>    (user.toInt, item.toInt, rate.toFloat)
>>>    }).toDF("user", "item", "rate")
>>>
>>>
>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
>>> Float)]
>>> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
>>> [INFO]     }).toDF("user", "item", "rate")
>>>
>>>
>>>
>>> I have looked at the document that you have shared and tried the following
>>> code:
>>>
>>> case class Record(user: Int, item: Int, rate:Double)
>>> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
>>> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>>>
>>> for this, I got the below error:
>>>
>>> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>>>
>>>
>>> Appreciate your help !
>>>
>>> Thanks,
>>> Jay
>>>
>>>
>>> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>>>
>>> Try this:
>>>
>>> val ratings = purchase.map { line =>
>>> line.split(',') match { case Array(user, item, rate) =>
>>> (user.toInt, item.toInt, rate.toFloat)
>>> }.toDF("user", "item", "rate")
>>>
>>> Doc for DataFrames:
>>> http://spark.apache.org/docs/latest/sql-programming-guide.html
>>>
>>> -Xiangrui
>>>
>>> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>>>
>>> Hi all,
>>> I am trying to use the new ALS implementation under
>>> org.apache.spark.ml.recommendation.ALS.
>>>
>>>
>>>
>>> The new method to invoke for training seems to be  override def fit(dataset:
>>> DataFrame, paramMap: ParamMap): ALSModel.
>>>
>>> How do I create a dataframe object from ratings data set that is on hdfs ?
>>>
>>>
>>> where as the method in the old ALS implementation under
>>> org.apache.spark.mllib.recommendation.ALS was
>>> def train(
>>>     ratings: RDD[Rating],
>>>     rank: Int,
>>>     iterations: Int,
>>>     lambda: Double,
>>>     blocks: Int,
>>>     seed: Long
>>>   ): MatrixFactorizationModel
>>>
>>> My code to run the old ALS train method is as below:
>>>
>>> "val sc = new SparkContext(conf)
>>>
>>>    val pfile = args(0)
>>>    val purchase=sc.textFile(pfile)
>>>   val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>> rate) =>
>>>       Rating(user.toInt, item.toInt, rate.toInt)
>>>   })
>>>
>>> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>>>
>>>
>>> Now, for the new ALS fit method, I am trying to use the below code to run,
>>> but getting a compilation error:
>>>
>>> val als = new ALS()
>>>      .setRank(rank)
>>>     .setRegParam(regParam)
>>>     .setImplicitPrefs(implicitPrefs)
>>>     .setNumUserBlocks(numUserBlocks)
>>>     .setNumItemBlocks(numItemBlocks)
>>>
>>> val sc = new SparkContext(conf)
>>>
>>>    val pfile = args(0)
>>>    val purchase=sc.textFile(pfile)
>>>   val ratings = purchase.map(_.split(',') match { case Array(user, item,
>>> rate) =>
>>>       Rating(user.toInt, item.toInt, rate.toInt)
>>>   })
>>>
>>> val model = als.fit(ratings.toDF())
>>>
>>> I get an error that the method toDF() is not a member of
>>> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>>>
>>> Appreciate the help !
>>>
>>> Thanks,
>>> Jay
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Posted by Xiangrui Meng <me...@gmail.com>.

Please check this section in the user guide:
http://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection

You need `import sqlContext.implicits._` to use `toDF()`.

-Xiangrui

On Mon, Mar 16, 2015 at 2:34 PM, Jay Katukuri <jk...@apple.com> wrote:
> Hi Xiangrui,
> Thanks a lot for the quick reply.
>
> I am still facing an issue.
>
> I have tried the code snippet that you have suggested:
>
>  val ratings = purchase.map { line =>
>  line.split(',') match { case Array(user, item, rate) =>
>  (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate”)}
>
> for this, I got the below error:
>
> error: ';' expected but '.' found.
> [INFO] }.toDF("user", "item", "rate”)}
> [INFO]  ^
>
> when I tried below code
>
>  val ratings = purchase.map ( line =>
>     line.split(',') match { case Array(user, item, rate) =>
>     (user.toInt, item.toInt, rate.toFloat)
>     }).toDF("user", "item", "rate")
>
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[(Int, Int,
> Float)]
> [INFO] possible cause: maybe a semicolon is missing before `value toDF'?
> [INFO]     }).toDF("user", "item", "rate")
>
>
>
> I have looked at the document that you have shared and tried the following
> code:
>
> case class Record(user: Int, item: Int, rate:Double)
> val ratings = purchase.map(_.split(',')).map(r =>Record(r(0).toInt,
> r(1).toInt, r(2).toDouble)) .toDF("user", "item", "rate")
>
> for this, I got the below error:
>
> error: value toDF is not a member of org.apache.spark.rdd.RDD[Record]
>
>
> Appreciate your help !
>
> Thanks,
> Jay
>
>
> On Mar 16, 2015, at 11:35 AM, Xiangrui Meng <me...@gmail.com> wrote:
>
> Try this:
>
> val ratings = purchase.map { line =>
>  line.split(',') match { case Array(user, item, rate) =>
>  (user.toInt, item.toInt, rate.toFloat)
> }.toDF("user", "item", "rate")
>
> Doc for DataFrames:
> http://spark.apache.org/docs/latest/sql-programming-guide.html
>
> -Xiangrui
>
> On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
>
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
> def train(
>      ratings: RDD[Rating],
>      rank: Int,
>      iterations: Int,
>      lambda: Double,
>      blocks: Int,
>      seed: Long
>    ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
> "val sc = new SparkContext(conf)
>
>     val pfile = args(0)
>     val purchase=sc.textFile(pfile)
>    val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>        Rating(user.toInt, item.toInt, rate.toInt)
>    })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>       .setRank(rank)
>      .setRegParam(regParam)
>      .setImplicitPrefs(implicitPrefs)
>      .setNumUserBlocks(numUserBlocks)
>      .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
>     val pfile = args(0)
>     val purchase=sc.textFile(pfile)
>    val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>        Rating(user.toInt, item.toInt, rate.toInt)
>    })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: RDD to DataFrame for using ALS under org.apache.spark.ml.recommendation.ALS

Posted by Xiangrui Meng <me...@gmail.com>.

Try this:

val ratings = purchase.map { line =>
  line.split(',') match { case Array(user, item, rate) =>
  (user.toInt, item.toInt, rate.toFloat)
}.toDF("user", "item", "rate")

Doc for DataFrames:
http://spark.apache.org/docs/latest/sql-programming-guide.html

-Xiangrui

On Mon, Mar 16, 2015 at 9:08 AM, jaykatukuri <jk...@apple.com> wrote:
> Hi all,
> I am trying to use the new ALS implementation under
> org.apache.spark.ml.recommendation.ALS.
>
>
>
> The new method to invoke for training seems to be  override def fit(dataset:
> DataFrame, paramMap: ParamMap): ALSModel.
>
> How do I create a dataframe object from ratings data set that is on hdfs ?
>
>
> where as the method in the old ALS implementation under
> org.apache.spark.mllib.recommendation.ALS was
>  def train(
>       ratings: RDD[Rating],
>       rank: Int,
>       iterations: Int,
>       lambda: Double,
>       blocks: Int,
>       seed: Long
>     ): MatrixFactorizationModel
>
> My code to run the old ALS train method is as below:
>
>  "val sc = new SparkContext(conf)
>
>      val pfile = args(0)
>      val purchase=sc.textFile(pfile)
>     val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>         Rating(user.toInt, item.toInt, rate.toInt)
>     })
>
> val model = ALS.train(ratings, rank, numIterations, 0.01)"
>
>
> Now, for the new ALS fit method, I am trying to use the below code to run,
> but getting a compilation error:
>
> val als = new ALS()
>        .setRank(rank)
>       .setRegParam(regParam)
>       .setImplicitPrefs(implicitPrefs)
>       .setNumUserBlocks(numUserBlocks)
>       .setNumItemBlocks(numItemBlocks)
>
> val sc = new SparkContext(conf)
>
>      val pfile = args(0)
>      val purchase=sc.textFile(pfile)
>     val ratings = purchase.map(_.split(',') match { case Array(user, item,
> rate) =>
>         Rating(user.toInt, item.toInt, rate.toInt)
>     })
>
> val model = als.fit(ratings.toDF())
>
> I get an error that the method toDF() is not a member of
> org.apache.spark.rdd.RDD[org.apache.spark.ml.recommendation.ALS.Rating[Int]].
>
> Appreciate the help !
>
> Thanks,
> Jay
>
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-DataFrame-for-using-ALS-under-org-apache-spark-ml-recommendation-ALS-tp22083.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org