You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2016/10/09 20:50:31 UTC
when does a Row object have a schema
the Spark-SQL Row trait has a schema that by default is null. when the
schema is null operations that rely on fieldIndex such as
getAs[T](fieldName: String): T do not work.
i noticed that when i convert a DataFrame to Rdd[Row] that the Row objects
do have schemas. can i rely on this?
when can i be sure that the schema is not null? what is the expectation
here?
thanks! koert
Re: when does a Row object have a schema
Posted by Divya Gehlot <di...@gmail.com>.
A value of a row can be accessed through both generic access by ordinal,
which will incur boxing overhead for primitives, as well as native
primitive access. An example of generic access by ordinal:
import org.apache.spark.sql._
val row = Row(1, true, "a string", null)
// row: Row = [1,true,a string,null]
val firstValue = row(0)
// firstValue: Any = 1
val fourthValue = row(3)
// fourthValue: Any = null
For native primitive access, it is invalid to use the native primitive
interface to retrieve a value that is null, instead a user must check
isNullAt before attempting to retrieve a value that might be null. An
example of native primitive access:
// using the row from the previous example.
val firstValue = row.getInt(0)
// firstValue: Int = 1
val isNull = row.isNullAt(3)
// isNull: Boolean = true
In Scala, fields in a Row
<https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/Row.html>
object
can be extracted in a pattern match. Example:
import org.apache.spark.sql._
val pairs = sql("SELECT key, value FROM src").rdd.map {
case Row(key: Int, value: String) =>
key -> value
}
Hope this helps
for more info please refer to
https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/Row.html
On 10 October 2016 at 04:50, Koert Kuipers <ko...@tresata.com> wrote:
> the Spark-SQL Row trait has a schema that by default is null. when the
> schema is null operations that rely on fieldIndex such as
> getAs[T](fieldName: String): T do not work.
>
> i noticed that when i convert a DataFrame to Rdd[Row] that the Row objects
> do have schemas. can i rely on this?
>
> when can i be sure that the schema is not null? what is the expectation
> here?
>
> thanks! koert
>
>