You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by V0lleyBallJunki3 <ve...@gmail.com> on 2018/09/04 03:47:03 UTC

Set can be passed in as an input argument but not as output

I find that if the input is a Set then Spark doesn't try to find an encoder
for the Set but at the same time if the output of a method is a Set it does
try to find an encoder and if not found errors out. My understanding is that
even the input set has to be transferred to the executors?

The first method testMethodSet takes an Array[Set[String]] and returns a
Set[String]
def testMethodSet(info: Row, arrSetString: Array[Set[String]]): Set[String]
= {
  val assignments = info.getAs[Seq[String]](0).toSet
  for (setString <- arrSetString) {
    if (setString.subsetOf(assignments)) {
      return setString
    }
  }
  Set[String]()
}

Another method takes the same arguments as the first one but returns a
Seq[String]
 def testMethodSeq(info: Row,  arrSetString: Array[Set[String]]):
Seq[String] = {
  val assignments = info.getAs[Seq[String]](0).toSet

  for (setString <- arrSetString) {
    if (setString.subsetOf(assignments)) {
      return setString.toSeq
    }
  }

  Seq[String]()
}

The testMethodSet method throws an error
testRows.map(s => testMethodSet(s,"test",Array((Set("aaaa","bbbb")))))
<console>:20: error: Unable to find encoder for type stored in a Dataset. 
Primitive types (Int, String, etc) and Product types (case classes) are
supported by importing spark.implicits._  Support for serializing other
types will be added in future releases.
       testRows.map(s =>
testMethodSet(s,"test",Array((Set("aaaa","bbbb")))))

The testMethodSeq method works fine

testRows.map(s => testMethodSeq(s, Array((Set("aaaa","bbbb")))))
res12: org.apache.spark.sql.Dataset[Seq[String]] = [value: array<string>]

   



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org