You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Daniel Davies <da...@gmail.com> on 2022/01/02 19:59:46 UTC
[Spark Core]: Support for un-pivoting data ('melt')
Level: Intermediate (I think?)
Scenario: Feature Request
Hello dev@,
(First time posting on this mailing list; apologies in advance if this
should have been routed elsewhere or is missing any information).
Un-pivoting data is supported on numerous SQL engines & in Pandas (with the
'melt' function), but it isn't directly available in spark. It's easy
enough to derive this functionality using the 'stack' function or a
combination of struct, array, and explode (e.g. such as the reproduction of
the melt function in pandas-on-pyspark here
<https://github.com/apache/spark/blob/c92bd5cafe62ca5226176446735171cc877e805a/python/pyspark/pandas/frame.py#L9651>),
but I was wondering whether a more native solution had been considered? It
would make end-user code more lightweight at the very least; and I wonder
whether it could be made more efficient than using the stack
function/struct-array-explode method.
I'm happy to try and make a PR if this is something that might be useful
within spark. No worries if this is not something that you think should be
supported; the methods above work and are well documented on StackOverflow.
I was personally just caught out by this, and thought it would be useful to
raise.
I did see a thread in the Pony archive about this issue, but it looks like
it didn't go anywhere. Does anyone else have context on this
<https://lists.apache.org/list?dev@spark.apache.org:lte=60M:unpivot>?
Kind Regards,
--
*Daniel Davies*
Re: [Spark Core]: Support for un-pivoting data ('melt')
Posted by Enrico Minack <in...@enrico.minack.dev>.
The melt function has recently been implemented in the PySpark Pandas
API (because melt is part of the Pandas API). I think, Scala/Java
Dataset and Python DataFrame APIs deserve this method equally well,
ideally all based on one implementation.
I'd like to fuel the conversation with some code:
https://github.com/apache/spark/pull/36150
Cheers,
Enrico
Am 02.01.22 um 20:59 schrieb Daniel Davies:
> Level: Intermediate (I think?)
> Scenario: Feature Request
>
> Hello dev@,
>
> (First time posting on this mailing list; apologies in advance if this
> should have been routed elsewhere or is missing any information).
>
> Un-pivoting data is supported on numerous SQL engines & in Pandas
> (with the 'melt' function), but it isn't directly available in spark.
> It's easy enough to derive this functionality using the 'stack'
> function or a combination of struct, array, and explode (e.g. such as
> the reproduction of the melt function in pandas-on-pyspark here
> <https://github.com/apache/spark/blob/c92bd5cafe62ca5226176446735171cc877e805a/python/pyspark/pandas/frame.py#L9651>),
> but I was wondering whether a more native solution had been
> considered? It would make end-user code more lightweight at the very
> least; and I wonder whether it could be made more efficient than using
> the stack function/struct-array-explode method.
>
> I'm happy to try and make a PR if this is something that might be
> useful within spark. No worries if this is not something that you
> think should be supported; the methods above work and are well
> documented on StackOverflow. I was personally just caught out by this,
> and thought it would be useful to raise.
>
> I did see a thread in the Pony archive about this issue, but it looks
> like it didn't go anywhere. Does anyone else have context on this
> <https://lists.apache.org/list?dev@spark.apache.org:lte=60M:unpivot>?
>
> Kind Regards,
>
> --
> /_*Daniel Davies*_/