You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by lk_spark <lk...@163.com> on 2017/06/16 03:51:05 UTC
how to call udf with parameters
hi,all
I define a udf with multiple parameters ,but I don't know how to call it with DataFrame
UDF:
def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.....
Call :
scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Int(2)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
+- LocalRelation [_1#2, _2#3]
I need help!!
2017-06-16
lk_spark
Re: Re: Re: how to call udf with parameters
Posted by lk_spark <lk...@163.com>.
thanks Kumar , that really helpful !!
2017-06-16
lk_spark
发件人:Pralabh Kumar <pr...@gmail.com>
发送时间:2017-06-16 18:30
主题:Re: Re: how to call udf with parameters
收件人:"lk_spark"<lk...@163.com>
抄送:"user.spark"<us...@spark.apache.org>
val getlength=udf((idx1:Int,idx2:Int, data : String)=> data.substring(idx1,idx2))
data.select(getlength(lit(1),lit(2),data("col1"))).collect
On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pr...@gmail.com> wrote:
Use lit , give me some time , I'll provide an example
On 16-Jun-2017 10:15 AM, "lk_spark" <lk...@163.com> wrote:
thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type?
2017-06-16
lk_spark
发件人:Pralabh Kumar <pr...@gmail.com>
发送时间:2017-06-16 17:49
主题:Re: how to call udf with parameters
收件人:"lk_spark"<lk...@163.com>
抄送:"user.spark"<us...@spark.apache.org>
sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))
On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk...@163.com> wrote:
hi,all
I define a udf with multiple parameters ,but I don't know how to call it with DataFrame
UDF:
def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.....
Call :
scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Int(2)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
+- LocalRelation [_1#2, _2#3]
I need help!!
2017-06-16
lk_spark
Re: Re: how to call udf with parameters
Posted by Pralabh Kumar <pr...@gmail.com>.
val getlength=udf((idx1:Int,idx2:Int, data : String)=>
data.substring(idx1,idx2))
data.select(getlength(lit(1),lit(2),data("col1"))).collect
On Fri, Jun 16, 2017 at 10:22 AM, Pralabh Kumar <pr...@gmail.com>
wrote:
> Use lit , give me some time , I'll provide an example
>
> On 16-Jun-2017 10:15 AM, "lk_spark" <lk...@163.com> wrote:
>
>> thanks Kumar , I want to know how to cao udf with multiple parameters ,
>> maybe an udf to make a substr function,how can I pass parameter with begin
>> and end index ? I try it with errors. Does the udf parameters could only
>> be a column type?
>>
>> 2017-06-16
>> ------------------------------
>> lk_spark
>> ------------------------------
>>
>> *发件人:*Pralabh Kumar <pr...@gmail.com>
>> *发送时间:*2017-06-16 17:49
>> *主题:*Re: how to call udf with parameters
>> *收件人:*"lk_spark"<lk...@163.com>
>> *抄送:*"user.spark"<us...@spark.apache.org>
>>
>> sample UDF
>> val getlength=udf((data:String)=>data.length())
>> data.select(getlength(data("col1")))
>>
>> On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk...@163.com> wrote:
>>
>>> hi,all
>>> I define a udf with multiple parameters ,but I don't know how to
>>> call it with DataFrame
>>>
>>> UDF:
>>>
>>> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean,
>>> minTermLen: Int) =>
>>> val terms = HanLP.segment(sentence).asScala
>>> .....
>>>
>>> Call :
>>>
>>> scala> val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>> <console>:40: error: type mismatch;
>>> found : Boolean(true)
>>> required: org.apache.spark.sql.Column
>>> val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>> ^
>>> <console>:40: error: type mismatch;
>>> found : Boolean(true)
>>> required: org.apache.spark.sql.Column
>>> val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>> ^
>>> <console>:40: error: type mismatch;
>>> found : Int(2)
>>> required: org.apache.spark.sql.Column
>>> val output = input.select(ssplit2($"text",t
>>> rue,true,2).as('words))
>>> ^
>>>
>>> scala> val output = input.select(ssplit2($"text",$
>>> "true",$"true",$"2").as('words))
>>> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given
>>> input columns: [id, text];;
>>> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
>>> +- Project [_1#2 AS id#5, _2#3 AS text#6]
>>> +- LocalRelation [_1#2, _2#3]
>>>
>>> I need help!!
>>>
>>>
>>> 2017-06-16
>>> ------------------------------
>>> lk_spark
>>>
>>
>>
Re: Re: how to call udf with parameters
Posted by Pralabh Kumar <pr...@gmail.com>.
Use lit , give me some time , I'll provide an example
On 16-Jun-2017 10:15 AM, "lk_spark" <lk...@163.com> wrote:
> thanks Kumar , I want to know how to cao udf with multiple parameters ,
> maybe an udf to make a substr function,how can I pass parameter with begin
> and end index ? I try it with errors. Does the udf parameters could only
> be a column type?
>
> 2017-06-16
> ------------------------------
> lk_spark
> ------------------------------
>
> *发件人:*Pralabh Kumar <pr...@gmail.com>
> *发送时间:*2017-06-16 17:49
> *主题:*Re: how to call udf with parameters
> *收件人:*"lk_spark"<lk...@163.com>
> *抄送:*"user.spark"<us...@spark.apache.org>
>
> sample UDF
> val getlength=udf((data:String)=>data.length())
> data.select(getlength(data("col1")))
>
> On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk...@163.com> wrote:
>
>> hi,all
>> I define a udf with multiple parameters ,but I don't know how to
>> call it with DataFrame
>>
>> UDF:
>>
>> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean,
>> minTermLen: Int) =>
>> val terms = HanLP.segment(sentence).asScala
>> .....
>>
>> Call :
>>
>> scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
>> <console>:40: error: type mismatch;
>> found : Boolean(true)
>> required: org.apache.spark.sql.Column
>> val output = input.select(ssplit2($"text",true,true,2).as('words))
>> ^
>> <console>:40: error: type mismatch;
>> found : Boolean(true)
>> required: org.apache.spark.sql.Column
>> val output = input.select(ssplit2($"text",true,true,2).as('words))
>> ^
>> <console>:40: error: type mismatch;
>> found : Int(2)
>> required: org.apache.spark.sql.Column
>> val output = input.select(ssplit2($"text",true,true,2).as('words))
>> ^
>>
>> scala> val output = input.select(ssplit2($"text",$
>> "true",$"true",$"2").as('words))
>> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given
>> input columns: [id, text];;
>> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
>> +- Project [_1#2 AS id#5, _2#3 AS text#6]
>> +- LocalRelation [_1#2, _2#3]
>>
>> I need help!!
>>
>>
>> 2017-06-16
>> ------------------------------
>> lk_spark
>>
>
>
Re: Re: how to call udf with parameters
Posted by lk_spark <lk...@163.com>.
thanks Kumar , I want to know how to cao udf with multiple parameters , maybe an udf to make a substr function,how can I pass parameter with begin and end index ? I try it with errors. Does the udf parameters could only be a column type?
2017-06-16
lk_spark
发件人:Pralabh Kumar <pr...@gmail.com>
发送时间:2017-06-16 17:49
主题:Re: how to call udf with parameters
收件人:"lk_spark"<lk...@163.com>
抄送:"user.spark"<us...@spark.apache.org>
sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))
On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk...@163.com> wrote:
hi,all
I define a udf with multiple parameters ,but I don't know how to call it with DataFrame
UDF:
def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.....
Call :
scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Int(2)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
+- LocalRelation [_1#2, _2#3]
I need help!!
2017-06-16
lk_spark
Re: how to call udf with parameters
Posted by Yong Zhang <ja...@hotmail.com>.
What version of spark you are using? I cannot reproduce your error:
scala> spark.version
res9: String = 2.1.1
scala> val dataset = Seq((0, "hello"), (1, "world")).toDF("id", "text")
dataset: org.apache.spark.sql.DataFrame = [id: int, text: string]
scala> import org.apache.spark.sql.functions.udf
import org.apache.spark.sql.functions.udf
// define a method in similar way like you did
scala> def len = udf { (data: String) => data.length > 0 }
len: org.apache.spark.sql.expressions.UserDefinedFunction
// use it
scala> dataset.select(len($"text").as('length)).show
+------+
|length|
+------+
| true|
| true|
+------+
Yong
________________________________
From: Pralabh Kumar <pr...@gmail.com>
Sent: Friday, June 16, 2017 12:19 AM
To: lk_spark
Cc: user.spark
Subject: Re: how to call udf with parameters
sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))
On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk...@163.com>> wrote:
hi,all
I define a udf with multiple parameters ,but I don't know how to call it with DataFrame
UDF:
def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean, minTermLen: Int) =>
val terms = HanLP.segment(sentence).asScala
.....
Call :
scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Boolean(true)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
<console>:40: error: type mismatch;
found : Int(2)
required: org.apache.spark.sql.Column
val output = input.select(ssplit2($"text",true,true,2).as('words))
^
scala> val output = input.select(ssplit2($"text",$"true",$"true",$"2").as('words))
org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given input columns: [id, text];;
'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
+- Project [_1#2 AS id#5, _2#3 AS text#6]
+- LocalRelation [_1#2, _2#3]
I need help!!
2017-06-16
________________________________
lk_spark
Re: how to call udf with parameters
Posted by Pralabh Kumar <pr...@gmail.com>.
sample UDF
val getlength=udf((data:String)=>data.length())
data.select(getlength(data("col1")))
On Fri, Jun 16, 2017 at 9:21 AM, lk_spark <lk...@163.com> wrote:
> hi,all
> I define a udf with multiple parameters ,but I don't know how to
> call it with DataFrame
>
> UDF:
>
> def ssplit2 = udf { (sentence: String, delNum: Boolean, delEn: Boolean,
> minTermLen: Int) =>
> val terms = HanLP.segment(sentence).asScala
> .....
>
> Call :
>
> scala> val output = input.select(ssplit2($"text",true,true,2).as('words))
> <console>:40: error: type mismatch;
> found : Boolean(true)
> required: org.apache.spark.sql.Column
> val output = input.select(ssplit2($"text",true,true,2).as('words))
> ^
> <console>:40: error: type mismatch;
> found : Boolean(true)
> required: org.apache.spark.sql.Column
> val output = input.select(ssplit2($"text",true,true,2).as('words))
> ^
> <console>:40: error: type mismatch;
> found : Int(2)
> required: org.apache.spark.sql.Column
> val output = input.select(ssplit2($"text",true,true,2).as('words))
> ^
>
> scala> val output = input.select(ssplit2($"text",$
> "true",$"true",$"2").as('words))
> org.apache.spark.sql.AnalysisException: cannot resolve '`true`' given
> input columns: [id, text];;
> 'Project [UDF(text#6, 'true, 'true, '2) AS words#16]
> +- Project [_1#2 AS id#5, _2#3 AS text#6]
> +- LocalRelation [_1#2, _2#3]
>
> I need help!!
>
>
> 2017-06-16
> ------------------------------
> lk_spark
>