You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jean Georges Perrin <jg...@jgp.net> on 2017/02/02 21:05:39 UTC

Spark 2 + Java + UDF + unknown return type...

Hi fellow Sparkans,

I am building a UDF (in Java) that can return various data types, basically the signature of the function itself is:

	public Object call(String a, Object b, String c, Object d, String e) throws Exception

When I register my function, I need to provide a type, e.g.:

	spark.udf().register("f2", new Udf5(), DataTypes.LongType);

In my test it is a long now, but can become a string or a float. Of course, I do not know the expected return type before I call the function, which I call like:

	df = df.selectExpr("*", "f2('x1', x, 'c2', y, 'op') as op");

Is there a way to have an Object being returned from a UDF and to store an Object in a Dataset/dataframe? I don't need to know the datatype at that point and can leave it hanging for now? Or should I play it safe and always return a DataTypes.StringType (and then try to transform it if needed)?

I hope I am clear enough :).

Thanks for any tip/idea/comment...

jg

Re: Spark 2 + Java + UDF + unknown return type...

Posted by Jörn Franke <jo...@gmail.com>.
 Not sure what your udf is exactly doing, but why not on udf / type ? You avoid issues converting it, it is more obvious for the user of your udf etc
You could of course return a complex type with one long, one string and one double and you fill them in the udf as needed, but this would be probably not a clean solution...

> On 2 Feb 2017, at 22:05, Jean Georges Perrin <jg...@jgp.net> wrote:
> 
> Hi fellow Sparkans,
> 
> I am building a UDF (in Java) that can return various data types, basically the signature of the function itself is:
> 
> 	public Object call(String a, Object b, String c, Object d, String e) throws Exception
> 
> When I register my function, I need to provide a type, e.g.:
> 
> 	spark.udf().register("f2", new Udf5(), DataTypes.LongType);
> 
> In my test it is a long now, but can become a string or a float. Of course, I do not know the expected return type before I call the function, which I call like:
> 
> 	df = df.selectExpr("*", "f2('x1', x, 'c2', y, 'op') as op");
> 
> Is there a way to have an Object being returned from a UDF and to store an Object in a Dataset/dataframe? I don't need to know the datatype at that point and can leave it hanging for now? Or should I play it safe and always return a DataTypes.StringType (and then try to transform it if needed)?
> 
> I hope I am clear enough :).
> 
> Thanks for any tip/idea/comment...
> 
> jg

Re: Spark 2 + Java + UDF + unknown return type...

Posted by Koert Kuipers <ko...@tresata.com>.
A UDF that does not return a single type is not supported. and spark has no
concept of union types.



On Feb 2, 2017 16:05, "Jean Georges Perrin" <jg...@jgp.net> wrote:

Hi fellow Sparkans,

I am building a UDF (in Java) that can return various data types, basically
the signature of the function itself is:

public Object call(String a, Object b, String c, Object d, String e) throws
Exception

When I register my function, I need to provide a type, e.g.:

spark.udf().register("f2", new Udf5(), DataTypes.LongType);

In my test it is a long now, but can become a string or a float. Of course,
I do not know the expected return type before I call the function, which I
call like:

df = df.selectExpr("*", "f2('x1', x, 'c2', y, 'op') as op");

Is there a way to have an Object being returned from a UDF and to store an
Object in a Dataset/dataframe? I don't need to know the datatype at that
point and can leave it hanging for now? Or should I play it safe and always
return a DataTypes.StringType (and then try to transform it if needed)?

I hope I am clear enough :).

Thanks for any tip/idea/comment...

jg