You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kristina Rogale Plazonic <kp...@gmail.com> on 2015/10/05 19:05:19 UTC

Where to put import sqlContext.implicits._ to be able to work on DataFrames in another file?

Hi all,

I have a Scala project with multiple files: a main file and a file with
utility functions on DataFrames. However, using $"colname" to refer to a
column of the DataFrame in the utils file (see code below) produces a
compile-time error as follows:

"value $ is not a member of StringContext"

My utils code works fine if (I work in spark-shell or):
-  I pass sqlContext as a parameter  to each util function
-  I do import sqlContext.implicits._   inside each util function  (as
below)
(that solution seems ugly and onerous to me?)

My questions:

1. Shouldn't implicits be part of a companion object (e.g. of the object
SQLContext), rather than (singleton) class instance sqlContext? If
implicits are part of the companion object, they could be defined as
imports at the top of each file?

2. Where can I put import sqlContext.implicits._ in order not to invoke it
in every function?

3. Googling, I saw Scala 2.11 might solve this problem? But won't that
cause possible compatibility problems with different jars in 2.10? (I'd
rather stick with 2.10)

Many thanks for any suggestions and insights!
Kristina

My toy code (my use case is data munging in preparation for ml/mllib and I
wanted to separate preprocessing of data to another file):

Hello.scala:

import HelloUtils._

object HelloWorld {
    val conf = new SparkConf().setAppName("HelloDataFrames")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)

    import sqlContext.implicits._

    case class RecordTest(name:String, category:String, age:Int)
    val adf = sc.parallelize(Seq(
             RecordTest("a", "cat1", 1),
             RecordTest("b", "cat2", 5)   )).toDF

    test(adf, sqlContext)  // calling function in HelloUtils
}

HelloUtils.scala:

import org.apache.spark.sql.{DataFrame,SQLContext}

object HelloUtils{

 def test(adf:DataFrame, sqlContext:SQLContext) =  {
    import sqlContext.implicits._   // I want to get rid of this line
    adf.filter( $"name" === "a").show()
 }

 /// desired way of writing test() function
   def testDesired(adf:DataFrame) =
         adf.filter( $"name" === "a").show()
}