You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Nicholas Chammas (JIRA)" <ji...@apache.org> on 2016/11/04 20:19:58 UTC

[jira] [Updated] (SPARK-18277) na.fill() and friends should work on struct fields

     [ https://issues.apache.org/jira/browse/SPARK-18277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicholas Chammas updated SPARK-18277:
-------------------------------------
    Description: 
It appears that you cannot use {{fill()}} and friends to quickly modify struct fields.

For example:

{code}
>>> df = spark.createDataFrame([Row(a=Row(b='yeah yeah'), c='alright'), Row(a=Row(b=None), c=None)])
>>> df.printSchema()
root
 |-- a: struct (nullable = true)
 |    |-- b: string (nullable = true)
 |-- c: string (nullable = true)

>>> df.show()
+-----------+-------+
|          a|      c|
+-----------+-------+
|[yeah yeah]|alright|
|     [null]|   null|
+-----------+-------+

>>> df.na.fill('').show()
+-----------+-------+
|          a|      c|
+-----------+-------+
|[yeah yeah]|alright|
|     [null]|       |
+-----------+-------+
{code}

{{c}} got filled in, but {{a.b}} didn't.

I don't know if it's "appropriate", but it would be nice if {{fill()}} and friends worked automatically on struct fields.

As things are today, there doesn't appear to be a way to fill in null values inside structs.

If you try {{when()}}, you realize that you cannot do {{when(col('a.b') is None, '')}} because {{Column}} doesn't implement the appropriate protocol for {{is}}, and if you try {{when(col('a.b') == None, '')}} it doesn't catch the null values.

  was:
It appears that you cannot use {{fill()}} and friends to quickly modify struct fields.

For example:

{code}
>>> df = spark.createDataFrame([Row(a=Row(b='yeah yeah'), c='alright'), Row(a=Row(b=None), c=None)])
>>> df.printSchema()
root
 |-- a: struct (nullable = true)
 |    |-- b: string (nullable = true)
 |-- c: string (nullable = true)

>>> df.show()
+-----------+-------+
|          a|      c|
+-----------+-------+
|[yeah yeah]|alright|
|     [null]|   null|
+-----------+-------+

>>> df.na.fill('').show()
+-----------+-------+
|          a|      c|
+-----------+-------+
|[yeah yeah]|alright|
|     [null]|       |
+-----------+-------+
{code}

{{c}} got filled in, but {{a.b}} didn't.

I don't know if it's "appropriate", but it would be nice if {{fill()}} and friends worked automatically on struct fields.

As things are today, it appears that you have to manually unpack and rebuild structs to do things like fill in missing field values.


> na.fill() and friends should work on struct fields
> --------------------------------------------------
>
>                 Key: SPARK-18277
>                 URL: https://issues.apache.org/jira/browse/SPARK-18277
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Nicholas Chammas
>            Priority: Minor
>
> It appears that you cannot use {{fill()}} and friends to quickly modify struct fields.
> For example:
> {code}
> >>> df = spark.createDataFrame([Row(a=Row(b='yeah yeah'), c='alright'), Row(a=Row(b=None), c=None)])
> >>> df.printSchema()
> root
>  |-- a: struct (nullable = true)
>  |    |-- b: string (nullable = true)
>  |-- c: string (nullable = true)
> >>> df.show()
> +-----------+-------+
> |          a|      c|
> +-----------+-------+
> |[yeah yeah]|alright|
> |     [null]|   null|
> +-----------+-------+
> >>> df.na.fill('').show()
> +-----------+-------+
> |          a|      c|
> +-----------+-------+
> |[yeah yeah]|alright|
> |     [null]|       |
> +-----------+-------+
> {code}
> {{c}} got filled in, but {{a.b}} didn't.
> I don't know if it's "appropriate", but it would be nice if {{fill()}} and friends worked automatically on struct fields.
> As things are today, there doesn't appear to be a way to fill in null values inside structs.
> If you try {{when()}}, you realize that you cannot do {{when(col('a.b') is None, '')}} because {{Column}} doesn't implement the appropriate protocol for {{is}}, and if you try {{when(col('a.b') == None, '')}} it doesn't catch the null values.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org