You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by James Starks <su...@protonmail.com.INVALID> on 2018/08/07 15:09:11 UTC
Newbie question on how to extract column value
I am very new to Spark. Just successfully setup Spark SQL connecting to postgresql database, and am able to display table with code
sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show()
Now I want to perform filter and map function on col_b value. In plain scala it would be something like
Seq((1, "http://a.com/a"), (2, "http://b.com/b"), (3, "unknown")).filter { case (_, url) => isValid(url) }.map { case (id, url) => (id, pathOf(url)) }
where filter will remove invalid url, and then map (id, url) to (id, path of url).
However, when applying this concept to spark sql with code snippet
sparkSession.sql("...").filter(isValid($"url"))
Compiler complains type mismatch because $"url" is ColumnName type. How can I extract column value i.e. http://... for the column url in order to perform filter function?
Thanks
Java 1.8.0
Scala 2.11.8
Spark 2.1.0
Re: Newbie question on how to extract column value
Posted by James Starks <su...@protonmail.com.INVALID>.
Because of some legacy issues I can't immediately upgrade spark version. But I try filter data before loading it into spark based on the suggestion by
val df = sparkSession.read.format("jdbc").option(...).option("dbtable", "(select .. from ... where url <> '') table_name")....load()
df.createOrReplaceTempView("new_table")
Then perform custom operation do the trick.
sparkSession.sql("select id, url from new_table").as[(String, String)].map { case (id, url) =>
val derived_data = ... // operation on url
(id, derived_data)
}.show()
Thanks for the advice, it's really helpful!
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On August 7, 2018 5:33 PM, Gourav Sengupta <go...@gmail.com> wrote:
> Hi James,
>
> It is always advisable to use the latest SPARK version. That said, can you please giving a try to dataframes and udf if possible. I think, that would be a much scalable way to address the issue.
>
> Also in case possible, it is always advisable to use the filter option before fetching the data to Spark.
>
> Thanks and Regards,
> Gourav
>
> On Tue, Aug 7, 2018 at 4:09 PM, James Starks <su...@protonmail.com.invalid> wrote:
>
>> I am very new to Spark. Just successfully setup Spark SQL connecting to postgresql database, and am able to display table with code
>>
>> sparkSession.sql("SELECT id, url from table_a where col_b <> '' ").show()
>>
>> Now I want to perform filter and map function on col_b value. In plain scala it would be something like
>>
>> Seq((1, "http://a.com/a"), (2, "http://b.com/b"), (3, "unknown")).filter { case (_, url) => isValid(url) }.map { case (id, url) => (id, pathOf(url)) }
>>
>> where filter will remove invalid url, and then map (id, url) to (id, path of url).
>>
>> However, when applying this concept to spark sql with code snippet
>>
>> sparkSession.sql("...").filter(isValid($"url"))
>>
>> Compiler complains type mismatch because $"url" is ColumnName type. How can I extract column value i.e. http://... for the column url in order to perform filter function?
>>
>> Thanks
>>
>> Java 1.8.0
>> Scala 2.11.8
>> Spark 2.1.0
Re: Newbie question on how to extract column value
Posted by Gourav Sengupta <go...@gmail.com>.
Hi James,
It is always advisable to use the latest SPARK version. That said, can you
please giving a try to dataframes and udf if possible. I think, that would
be a much scalable way to address the issue.
Also in case possible, it is always advisable to use the filter option
before fetching the data to Spark.
Thanks and Regards,
Gourav
On Tue, Aug 7, 2018 at 4:09 PM, James Starks <suserft@protonmail.com.invalid
> wrote:
> I am very new to Spark. Just successfully setup Spark SQL connecting to
> postgresql database, and am able to display table with code
>
> sparkSession.sql("SELECT id, url from table_a where col_b <> ''
> ").show()
>
> Now I want to perform filter and map function on col_b value. In plain
> scala it would be something like
>
> Seq((1, "http://a.com/a"), (2, "http://b.com/b"), (3,
> "unknown")).filter { case (_, url) => isValid(url) }.map { case (id, url)
> => (id, pathOf(url)) }
>
> where filter will remove invalid url, and then map (id, url) to (id, path
> of url).
>
> However, when applying this concept to spark sql with code snippet
>
> sparkSession.sql("...").filter(isValid($"url"))
>
> Compiler complains type mismatch because $"url" is ColumnName type. How
> can I extract column value i.e. http://... for the column url in order to
> perform filter function?
>
> Thanks
>
> Java 1.8.0
> Scala 2.11.8
> Spark 2.1.0
>
>
>
>
>
>