You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aleksander Eskilson <al...@gmail.com> on 2016/10/04 15:36:28 UTC

Extracting Row Value for Deserializer Expression

Hi there,

Currently working on a custom Encoder for a kind of schema-based Java
object. For the object's schema, field positions, and types are isomorphic
to SQL column ordinals and types. The implementation should be quite
similar to the JavaBean Encoder, but as we have a schema, class-based
reflection should be unnecessary. As the JavaBean deserializer does, I'm
working on placing column values of the serialized row into a newly created
object with analogous fields. I collect a list of setter arguments, then
use an expression similar to InitializeJavaBean to call the setter
expressions one-by-one on the new struct-like object. I've tried two
methods for extracting column values as arguments to NewInstance expressions,
which are then arguments to the setters:

First, since I will always know the ordinal and type, I've attempted to use
the
GetColumnByOrdinal(ordinal, type)
function as the input expression argument. Then calling something like
val objectFromRow = expressionEncoder.resolveAndBind(attrs).fromRow(row)
yields
org.apache.spark.sql.AnalysisException: unresolved operator
'DeserializeToObject ...
I've also tried to reference the value using a bound symbol, with the
attrs DslSymbol
sequence naming the symbol and its type, then using the
UnresolvedAttribute(fieldName) function to attempt to let the compiler
replace the symbol with the correct accessor methods, as is done in
JavaBean. For a row with a single integer value, and an object with a field
of type java.lang.Integer and name val, I receive instead
org.apache.spark.sql.AnalysisException: resolved attribute(s) 'val missing
from val#1 in operator 'DeserializeToObject ...
with the quoted symbol appearing as my argument to my object's setter
method.

What is the current proper way to extract a value from a row column in the
Expressions API for use as an argument expression in a deserializer?

Thanks,
Alek