You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Koert Kuipers <ko...@tresata.com> on 2016/10/09 20:50:31 UTC

when does a Row object have a schema

the Spark-SQL Row trait has a schema that by default is null. when the
schema is null operations that rely on fieldIndex such as
getAs[T](fieldName: String): T do not work.

i noticed that when i convert a DataFrame to Rdd[Row] that the Row objects
do have schemas. can i rely on this?

when can i be sure that the schema is not null? what is the expectation
here?

thanks! koert

Re: when does a Row object have a schema

Posted by Divya Gehlot <di...@gmail.com>.

A value of a row can be accessed through both generic access by ordinal,
which will incur boxing overhead for primitives, as well as native
primitive access. An example of generic access by ordinal:

 import org.apache.spark.sql._

 val row = Row(1, true, "a string", null)
 // row: Row = [1,true,a string,null]
 val firstValue = row(0)
 // firstValue: Any = 1
 val fourthValue = row(3)
 // fourthValue: Any = null

For native primitive access, it is invalid to use the native primitive
interface to retrieve a value that is null, instead a user must check
isNullAt before attempting to retrieve a value that might be null. An
example of native primitive access:

 // using the row from the previous example.
 val firstValue = row.getInt(0)
 // firstValue: Int = 1
 val isNull = row.isNullAt(3)
 // isNull: Boolean = true

In Scala, fields in a Row
<https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/Row.html>
object
can be extracted in a pattern match. Example:

 import org.apache.spark.sql._

 val pairs = sql("SELECT key, value FROM src").rdd.map {
   case Row(key: Int, value: String) =>
     key -> value
 }

Hope this helps

for more info please refer to
https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/Row.html

On 10 October 2016 at 04:50, Koert Kuipers <ko...@tresata.com> wrote:

> the Spark-SQL Row trait has a schema that by default is null. when the
> schema is null operations that rely on fieldIndex such as
> getAs[T](fieldName: String): T do not work.
>
> i noticed that when i convert a DataFrame to Rdd[Row] that the Row objects
> do have schemas. can i rely on this?
>
> when can i be sure that the schema is not null? what is the expectation
> here?
>
> thanks! koert
>
>