You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Don Drake <do...@gmail.com> on 2016/07/24 19:18:36 UTC

Outer Explode needed

I have a nested data structure (array of structures) that I'm using the DSL
df.explode() API to flatten the data.  However, when the array is empty,
I'm not getting the rest of the row in my output as it is skipped.

This is the intended behavior, and Hive supports a SQL "OUTER explode()" to
generate the row when the explode would not yield any output.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

Can we get this same outer explode in the DSL?  I have to jump through some
outer join hoops to get the rows where the array is empty.

Thanks.

-Don

-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143

Re: Outer Explode needed

Posted by Yong Zhang <ja...@hotmail.com>.
The reason of no response is that this feature is not available yet.


You can vote and following this JIRA https://issues.apache.org/jira/browse/SPARK-13721, if you really need this feature.


Yong


________________________________
From: Don Drake <do...@gmail.com>
Sent: Monday, July 25, 2016 9:12 PM
To: dev@spark.apache.org
Subject: Fwd: Outer Explode needed

No response on the Users list, I thought I would repost here.

See below.

-Don
---------- Forwarded message ----------
From: Don Drake <do...@gmail.com>>
Date: Sun, Jul 24, 2016 at 2:18 PM
Subject: Outer Explode needed
To: user <us...@spark.apache.org>>


I have a nested data structure (array of structures) that I'm using the DSL df.explode() API to flatten the data.  However, when the array is empty, I'm not getting the rest of the row in my output as it is skipped.

This is the intended behavior, and Hive supports a SQL "OUTER explode()" to generate the row when the explode would not yield any output.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

Can we get this same outer explode in the DSL?  I have to jump through some outer join hoops to get the rows where the array is empty.

Thanks.

-Don

--
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake<http://www.MailLaunder.com/>
800-733-2143<tel:800-733-2143>



--
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake<http://www.MailLaunder.com/>
800-733-2143

Re: Outer Explode needed

Posted by Yong Zhang <ja...@hotmail.com>.
The reason of no response is that this feature is not available yet.


You can vote and following this JIRA https://issues.apache.org/jira/browse/SPARK-13721, if you really need this feature.


Yong


________________________________
From: Don Drake <do...@gmail.com>
Sent: Monday, July 25, 2016 9:12 PM
To: dev@spark.apache.org
Subject: Fwd: Outer Explode needed

No response on the Users list, I thought I would repost here.

See below.

-Don
---------- Forwarded message ----------
From: Don Drake <do...@gmail.com>>
Date: Sun, Jul 24, 2016 at 2:18 PM
Subject: Outer Explode needed
To: user <us...@spark.apache.org>>


I have a nested data structure (array of structures) that I'm using the DSL df.explode() API to flatten the data.  However, when the array is empty, I'm not getting the rest of the row in my output as it is skipped.

This is the intended behavior, and Hive supports a SQL "OUTER explode()" to generate the row when the explode would not yield any output.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

Can we get this same outer explode in the DSL?  I have to jump through some outer join hoops to get the rows where the array is empty.

Thanks.

-Don

--
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake<http://www.MailLaunder.com/>
800-733-2143<tel:800-733-2143>



--
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake<http://www.MailLaunder.com/>
800-733-2143

Re: Outer Explode needed

Posted by Michael Armbrust <mi...@databricks.com>.
I don't think this would be hard to implement.  The physical explode
operator supports it (for our HiveQL compatibility).

Perhaps comment on this JIRA?
https://issues.apache.org/jira/browse/SPARK-13721

It could probably just be another argument to explode()

Michael

On Mon, Jul 25, 2016 at 6:12 PM, Don Drake <do...@gmail.com> wrote:

> No response on the Users list, I thought I would repost here.
>
> See below.
>
> -Don
>
> ---------- Forwarded message ----------
> From: Don Drake <do...@gmail.com>
> Date: Sun, Jul 24, 2016 at 2:18 PM
> Subject: Outer Explode needed
> To: user <us...@spark.apache.org>
>
>
> I have a nested data structure (array of structures) that I'm using the
> DSL df.explode() API to flatten the data.  However, when the array is
> empty, I'm not getting the rest of the row in my output as it is skipped.
>
> This is the intended behavior, and Hive supports a SQL "OUTER explode()"
> to generate the row when the explode would not yield any output.
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView
>
> Can we get this same outer explode in the DSL?  I have to jump through
> some outer join hoops to get the rows where the array is empty.
>
> Thanks.
>
> -Don
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake <http://www.MailLaunder.com/>
> 800-733-2143
>
>
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake <http://www.MailLaunder.com/>
> 800-733-2143
>

Fwd: Outer Explode needed

Posted by Don Drake <do...@gmail.com>.
No response on the Users list, I thought I would repost here.

See below.

-Don
---------- Forwarded message ----------
From: Don Drake <do...@gmail.com>
Date: Sun, Jul 24, 2016 at 2:18 PM
Subject: Outer Explode needed
To: user <us...@spark.apache.org>


I have a nested data structure (array of structures) that I'm using the DSL
df.explode() API to flatten the data.  However, when the array is empty,
I'm not getting the rest of the row in my output as it is skipped.

This is the intended behavior, and Hive supports a SQL "OUTER explode()" to
generate the row when the explode would not yield any output.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LateralView

Can we get this same outer explode in the DSL?  I have to jump through some
outer join hoops to get the rows where the array is empty.

Thanks.

-Don

-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143



-- 
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake <http://www.MailLaunder.com/>
800-733-2143