You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Liu Genie <ge...@outlook.com> on 2020/05/19 12:39:42 UTC

回复: array_sort function behaviour

I will extract the element I want to sort, then combine it with the old struct as a new struct whose first element is what I want to sort.
________________________________
发件人: neeraj bhadani <bh...@gmail.com>
发送时间: 2020年5月19日 19:09
收件人: user <us...@spark.apache.org>
主题: array_sort function behaviour

Hi All,
   I need to sort the array<struct> based on a particular element from a struct. I am trying to use the "array_sort" function and could see that by default it is sorting the array but based on the first numerical element. Is this the expected behaviour? PFB sample code and output.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// SAMPLE CODE
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
val jsonData = """
{
"topping":
[
{ "id": "5001", "id1": "5001", "type": "None" },
{ "id": "5002", "id1": "5008", "type": "Glazed" },
{ "id": "5005", "id1": "5007", "type": "Sugar" },
{ "id": "5007", "id1": "5002", "type": "Powdered Sugar" },
{ "id": "5006", "id1": "5005", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "id1": "5004", "type": "Chocolate" },
{ "id": "5004", "id1": "5003", "type": "Maple" }
]
}
"""
val json_df = spark.read.json(Seq(jsonData).toDS)
val sort_df = json_df.select(array_sort($"topping").as("sort_col"))
display(sort_df)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
//// OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[Screenshot 2020-05-19 12.06.30.png]
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

As you could see the above output is sorted based on the "id" element which is the first numerical element in the struct.

Is there any way to specify the element based on which sorting can be done?

Regards,
Neeraj