You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by zml张明磊 <mi...@Ctrip.com> on 2015/12/25 08:35:11 UTC
How can I get the column data based on specific column name and
then stored these data in array or list ?
Hi,
I am a new to Scala and Spark and trying to find relative API in DataFrame to solve my problem as title described. However, I just only find this API DataFrame.col(colName : String) : Column which returns an object of Column. Not the content. If only DataFrame support such API which like Column.toArray : Type is enough for me. But now, it doesn’t. How can I do can achieve this function ?
Thanks,
Minglei.
答复: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?
Posted by zml张明磊 <mi...@Ctrip.com>.
Yes. It’s a good method . But UDF ? What is UDF ? U……………..D……………F ? OK, I can learn from it.
Thanks,
Minglei.
发件人: Jeff Zhang [mailto:zjffdu@gmail.com]
发送时间: 2015年12月25日 16:00
收件人: zml张明磊
抄送: dev@spark.apache.org
主题: Re: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?
You can use udf to convert one column for array type. Here's one sample
val conf = new SparkConf().setMaster("local[4]").setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
import sqlContext._
sqlContext.udf.register("f", (a:String) => Array(a,a))
val df1 = Seq(
(1, "jeff", 12),
(2, "andy", 34),
(3, "pony", 23),
(4, "jeff", 14)
).toDF("id", "name", "age")
val df2=df1.withColumn("name", expr("f(name)"))
df2.printSchema()
df2.show()
On Fri, Dec 25, 2015 at 3:44 PM, zml张明磊 <mi...@ctrip.com>> wrote:
Thanks, Jeff. It’s not choose some columns of a Row. It’s just choose all data in a column and convert it to an Array. Do you understand my mean ?
In Chinese
我是想基于这个列名把这一列中的所有数据都选出来,然后放到数组里面去。
发件人: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
发送时间: 2015年12月25日 15:39
收件人: zml张明磊
抄送: dev@spark.apache.org<ma...@spark.apache.org>
主题: Re: How can I get the column data based on specific column name and then stored these data in array or list ?
Not sure what you mean. Do you want to choose some columns of a Row and convert it to an Arrray ?
On Fri, Dec 25, 2015 at 3:35 PM, zml张明磊 <mi...@ctrip.com>> wrote:
Hi,
I am a new to Scala and Spark and trying to find relative API in DataFrame to solve my problem as title described. However, I just only find this API DataFrame.col(colName : String) : Column which returns an object of Column. Not the content. If only DataFrame support such API which like Column.toArray : Type is enough for me. But now, it doesn’t. How can I do can achieve this function ?
Thanks,
Minglei.
--
Best Regards
Jeff Zhang
--
Best Regards
Jeff Zhang
答复: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?
Posted by zml张明磊 <mi...@Ctrip.com>.
咦 ??? I will have a try.
Thanks,
Minglei.
发件人: Yanbo Liang [mailto:ybliang8@gmail.com]
发送时间: 2015年12月25日 16:07
收件人: Jeff Zhang
抄送: zml张明磊; dev@spark.apache.org
主题: Re: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?
Actually you can call df.collect_list("a").
2015-12-25 16:00 GMT+08:00 Jeff Zhang <zj...@gmail.com>>:
You can use udf to convert one column for array type. Here's one sample
val conf = new SparkConf().setMaster("local[4]").setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
import sqlContext._
sqlContext.udf.register("f", (a:String) => Array(a,a))
val df1 = Seq(
(1, "jeff", 12),
(2, "andy", 34),
(3, "pony", 23),
(4, "jeff", 14)
).toDF("id", "name", "age")
val df2=df1.withColumn("name", expr("f(name)"))
df2.printSchema()
df2.show()
On Fri, Dec 25, 2015 at 3:44 PM, zml张明磊 <mi...@ctrip.com>> wrote:
Thanks, Jeff. It’s not choose some columns of a Row. It’s just choose all data in a column and convert it to an Array. Do you understand my mean ?
In Chinese
我是想基于这个列名把这一列中的所有数据都选出来,然后放到数组里面去。
发件人: Jeff Zhang [mailto:zjffdu@gmail.com<ma...@gmail.com>]
发送时间: 2015年12月25日 15:39
收件人: zml张明磊
抄送: dev@spark.apache.org<ma...@spark.apache.org>
主题: Re: How can I get the column data based on specific column name and then stored these data in array or list ?
Not sure what you mean. Do you want to choose some columns of a Row and convert it to an Arrray ?
On Fri, Dec 25, 2015 at 3:35 PM, zml张明磊 <mi...@ctrip.com>> wrote:
Hi,
I am a new to Scala and Spark and trying to find relative API in DataFrame to solve my problem as title described. However, I just only find this API DataFrame.col(colName : String) : Column which returns an object of Column. Not the content. If only DataFrame support such API which like Column.toArray : Type is enough for me. But now, it doesn’t. How can I do can achieve this function ?
Thanks,
Minglei.
--
Best Regards
Jeff Zhang
--
Best Regards
Jeff Zhang
Re: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?
Posted by Yanbo Liang <yb...@gmail.com>.
Actually you can call df.collect_list("a").
2015-12-25 16:00 GMT+08:00 Jeff Zhang <zj...@gmail.com>:
> You can use udf to convert one column for array type. Here's one sample
>
> val conf = new SparkConf().setMaster("local[4]").setAppName("test")
> val sc = new SparkContext(conf)
> val sqlContext = new SQLContext(sc)
> import sqlContext.implicits._
> import sqlContext._
> sqlContext.udf.register("f", (a:String) => Array(a,a))
> val df1 = Seq(
> (1, "jeff", 12),
> (2, "andy", 34),
> (3, "pony", 23),
> (4, "jeff", 14)
> ).toDF("id", "name", "age")
>
> val df2=df1.withColumn("name", expr("f(name)"))
> df2.printSchema()
> df2.show()
>
>
> On Fri, Dec 25, 2015 at 3:44 PM, zml张明磊 <mi...@ctrip.com> wrote:
>
>> Thanks, Jeff. It’s not choose some columns of a Row. It’s just choose all
>> data in a column and convert it to an Array. Do you understand my mean ?
>>
>>
>>
>> In Chinese
>>
>> 我是想基于这个列名把这一列中的所有数据都选出来,然后放到数组里面去。
>>
>>
>>
>>
>>
>> *发件人:* Jeff Zhang [mailto:zjffdu@gmail.com]
>> *发送时间:* 2015年12月25日 15:39
>> *收件人:* zml张明磊
>> *抄送:* dev@spark.apache.org
>> *主题:* Re: How can I get the column data based on specific column name
>> and then stored these data in array or list ?
>>
>>
>>
>> Not sure what you mean. Do you want to choose some columns of a Row and
>> convert it to an Arrray ?
>>
>>
>>
>> On Fri, Dec 25, 2015 at 3:35 PM, zml张明磊 <mi...@ctrip.com> wrote:
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am a new to Scala and Spark and trying to find relative API in DataFrame
>> to solve my problem as title described. However, I just only find this
>> API *DataFrame.col(colName : String) : Column * which returns an object
>> of Column. Not the content. If only DataFrame support such API which like *Column.toArray
>> : Type* is enough for me. But now, it doesn’t. How can I do can achieve
>> this function ?
>>
>>
>>
>> Thanks,
>>
>> Minglei.
>>
>>
>>
>>
>>
>> --
>>
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
Re: 答复: How can I get the column data based on specific column name and then stored these data in array or list ?
Posted by Jeff Zhang <zj...@gmail.com>.
You can use udf to convert one column for array type. Here's one sample
val conf = new SparkConf().setMaster("local[4]").setAppName("test")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
import sqlContext._
sqlContext.udf.register("f", (a:String) => Array(a,a))
val df1 = Seq(
(1, "jeff", 12),
(2, "andy", 34),
(3, "pony", 23),
(4, "jeff", 14)
).toDF("id", "name", "age")
val df2=df1.withColumn("name", expr("f(name)"))
df2.printSchema()
df2.show()
On Fri, Dec 25, 2015 at 3:44 PM, zml张明磊 <mi...@ctrip.com> wrote:
> Thanks, Jeff. It’s not choose some columns of a Row. It’s just choose all
> data in a column and convert it to an Array. Do you understand my mean ?
>
>
>
> In Chinese
>
> 我是想基于这个列名把这一列中的所有数据都选出来,然后放到数组里面去。
>
>
>
>
>
> *发件人:* Jeff Zhang [mailto:zjffdu@gmail.com]
> *发送时间:* 2015年12月25日 15:39
> *收件人:* zml张明磊
> *抄送:* dev@spark.apache.org
> *主题:* Re: How can I get the column data based on specific column name and
> then stored these data in array or list ?
>
>
>
> Not sure what you mean. Do you want to choose some columns of a Row and
> convert it to an Arrray ?
>
>
>
> On Fri, Dec 25, 2015 at 3:35 PM, zml张明磊 <mi...@ctrip.com> wrote:
>
>
>
> Hi,
>
>
>
> I am a new to Scala and Spark and trying to find relative API in DataFrame
> to solve my problem as title described. However, I just only find this API *DataFrame.col(colName
> : String) : Column * which returns an object of Column. Not the content.
> If only DataFrame support such API which like *Column.toArray : Type* is
> enough for me. But now, it doesn’t. How can I do can achieve this function
> ?
>
>
>
> Thanks,
>
> Minglei.
>
>
>
>
>
> --
>
> Best Regards
>
> Jeff Zhang
>
--
Best Regards
Jeff Zhang
答复: How can I get the column data based on specific column name and then stored these data in array or list ?
Posted by zml张明磊 <mi...@Ctrip.com>.
Thanks, Jeff. It’s not choose some columns of a Row. It’s just choose all data in a column and convert it to an Array. Do you understand my mean ?
In Chinese
我是想基于这个列名把这一列中的所有数据都选出来,然后放到数组里面去。
发件人: Jeff Zhang [mailto:zjffdu@gmail.com]
发送时间: 2015年12月25日 15:39
收件人: zml张明磊
抄送: dev@spark.apache.org
主题: Re: How can I get the column data based on specific column name and then stored these data in array or list ?
Not sure what you mean. Do you want to choose some columns of a Row and convert it to an Arrray ?
On Fri, Dec 25, 2015 at 3:35 PM, zml张明磊 <mi...@ctrip.com>> wrote:
Hi,
I am a new to Scala and Spark and trying to find relative API in DataFrame to solve my problem as title described. However, I just only find this API DataFrame.col(colName : String) : Column which returns an object of Column. Not the content. If only DataFrame support such API which like Column.toArray : Type is enough for me. But now, it doesn’t. How can I do can achieve this function ?
Thanks,
Minglei.
--
Best Regards
Jeff Zhang
Re: How can I get the column data based on specific column name and
then stored these data in array or list ?
Posted by Jeff Zhang <zj...@gmail.com>.
Not sure what you mean. Do you want to choose some columns of a Row and
convert it to an Arrray ?
On Fri, Dec 25, 2015 at 3:35 PM, zml张明磊 <mi...@ctrip.com> wrote:
>
>
> Hi,
>
>
>
> I am a new to Scala and Spark and trying to find relative API in DataFrame
> to solve my problem as title described. However, I just only find this API *DataFrame.col(colName
> : String) : Column * which returns an object of Column. Not the content.
> If only DataFrame support such API which like *Column.toArray : Type* is
> enough for me. But now, it doesn’t. How can I do can achieve this function
> ?
>
>
>
> Thanks,
>
> Minglei.
>
--
Best Regards
Jeff Zhang