You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jean Georges Perrin <jg...@jgp.net> on 2017/02/02 21:05:39 UTC
Spark 2 + Java + UDF + unknown return type...
Hi fellow Sparkans,
I am building a UDF (in Java) that can return various data types, basically the signature of the function itself is:
public Object call(String a, Object b, String c, Object d, String e) throws Exception
When I register my function, I need to provide a type, e.g.:
spark.udf().register("f2", new Udf5(), DataTypes.LongType);
In my test it is a long now, but can become a string or a float. Of course, I do not know the expected return type before I call the function, which I call like:
df = df.selectExpr("*", "f2('x1', x, 'c2', y, 'op') as op");
Is there a way to have an Object being returned from a UDF and to store an Object in a Dataset/dataframe? I don't need to know the datatype at that point and can leave it hanging for now? Or should I play it safe and always return a DataTypes.StringType (and then try to transform it if needed)?
I hope I am clear enough :).
Thanks for any tip/idea/comment...
jg
Re: Spark 2 + Java + UDF + unknown return type...
Posted by Jörn Franke <jo...@gmail.com>.
Not sure what your udf is exactly doing, but why not on udf / type ? You avoid issues converting it, it is more obvious for the user of your udf etc
You could of course return a complex type with one long, one string and one double and you fill them in the udf as needed, but this would be probably not a clean solution...
> On 2 Feb 2017, at 22:05, Jean Georges Perrin <jg...@jgp.net> wrote:
>
> Hi fellow Sparkans,
>
> I am building a UDF (in Java) that can return various data types, basically the signature of the function itself is:
>
> public Object call(String a, Object b, String c, Object d, String e) throws Exception
>
> When I register my function, I need to provide a type, e.g.:
>
> spark.udf().register("f2", new Udf5(), DataTypes.LongType);
>
> In my test it is a long now, but can become a string or a float. Of course, I do not know the expected return type before I call the function, which I call like:
>
> df = df.selectExpr("*", "f2('x1', x, 'c2', y, 'op') as op");
>
> Is there a way to have an Object being returned from a UDF and to store an Object in a Dataset/dataframe? I don't need to know the datatype at that point and can leave it hanging for now? Or should I play it safe and always return a DataTypes.StringType (and then try to transform it if needed)?
>
> I hope I am clear enough :).
>
> Thanks for any tip/idea/comment...
>
> jg
Re: Spark 2 + Java + UDF + unknown return type...
Posted by Koert Kuipers <ko...@tresata.com>.
A UDF that does not return a single type is not supported. and spark has no
concept of union types.
On Feb 2, 2017 16:05, "Jean Georges Perrin" <jg...@jgp.net> wrote:
Hi fellow Sparkans,
I am building a UDF (in Java) that can return various data types, basically
the signature of the function itself is:
public Object call(String a, Object b, String c, Object d, String e) throws
Exception
When I register my function, I need to provide a type, e.g.:
spark.udf().register("f2", new Udf5(), DataTypes.LongType);
In my test it is a long now, but can become a string or a float. Of course,
I do not know the expected return type before I call the function, which I
call like:
df = df.selectExpr("*", "f2('x1', x, 'c2', y, 'op') as op");
Is there a way to have an Object being returned from a UDF and to store an
Object in a Dataset/dataframe? I don't need to know the datatype at that
point and can leave it hanging for now? Or should I play it safe and always
return a DataTypes.StringType (and then try to transform it if needed)?
I hope I am clear enough :).
Thanks for any tip/idea/comment...
jg