You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jinxing64 <gi...@git.apache.org> on 2018/10/13 10:42:10 UTC

[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/22712

    [SPARK-25724] Add sorting functionality in MapType.

    ## What changes were proposed in this pull request?
    
    This is related to https://github.com/apache/spark/pull/19330.
    As subtask of SPARK-18134, this PR proposes to add a functionality in `MapType` to sort `MapData`, in which all entries are already sorted.
    
    ## How was this patch tested?
    Tests added.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-25724

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22712.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22712
    
----
commit f8dd1a6008acc82b15fb63607c1ccd84e5487039
Author: jinxing <ji...@...>
Date:   2018-10-12T16:55:37Z

    [SPARK-25724] Add sorting functionality in MapType.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    **[Test build #97339 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97339/testReport)** for PR 22712 at commit [`f8dd1a6`](https://github.com/apache/spark/commit/f8dd1a6008acc82b15fb63607c1ccd84e5487039).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22712#discussion_r225047888
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala ---
    @@ -53,6 +53,10 @@ class InterpretedOrdering(ordering: Seq[SortOrder]) extends Ordering[InternalRow
                 a.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
               case a: ArrayType if order.direction == Descending =>
                 a.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
    +          case m: MapType if m.isOrdered && order.direction == Ascending =>
    +            m.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
    +          case m: MapType if m.isOrdered && order.direction == Descending =>
    +            m.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
    --- End diff --
    
    We need to care about this ordering direction? We just need comparable maps?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3943/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    BTW, how does hive implement comparable maps?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97339/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    **[Test build #97339 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97339/testReport)** for PR 22712 at commit [`f8dd1a6`](https://github.com/apache/spark/commit/f8dd1a6008acc82b15fb63607c1ccd84e5487039).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    @maropu 
    I want to split https://github.com/apache/spark/pull/19330 to two parts:
    1. Approach to compare two Maps with themselves already sorted. (This PR)
    2. Approach to sort the entries of a single Map -- `SortMaps` (A following PR)
    
    Please take a look when you have time.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22712#discussion_r224961789
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/MapType.scala ---
    @@ -73,6 +74,90 @@ case class MapType(
       override private[spark] def existsRecursively(f: (DataType) => Boolean): Boolean = {
         f(this) || keyType.existsRecursively(f) || valueType.existsRecursively(f)
       }
    +
    +  private[this] class OrderedWrapper {
    +    var isOrdered: Boolean = false
    --- End diff --
    
    Thanks for quick reply :)
    
    Actually I'm not pretty sure about this.
    If we do it like below
    ```
    case class MapType(
      keyType: DataType,
      valueType: DataType,
      valueContainsNull: Boolean,
      ordered: Boolean)
    ```
    The `ordered` will be spread to lots places in the code (especially in the `...match ... case ...` ) and users can will also  see it. But I think `ordered` is a pretty internal parameter/characteristic and only used when sorting map. So I try to make it private and lazy created.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22712#discussion_r227201626
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala ---
    @@ -53,6 +53,10 @@ class InterpretedOrdering(ordering: Seq[SortOrder]) extends Ordering[InternalRow
                 a.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
               case a: ArrayType if order.direction == Descending =>
                 a.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
    +          case m: MapType if m.isOrdered && order.direction == Ascending =>
    +            m.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
    +          case m: MapType if m.isOrdered && order.direction == Descending =>
    +            m.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
    --- End diff --
    
    You mean we can't remove this? If not necessary, better to remove it off.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22712#discussion_r225628985
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ordering.scala ---
    @@ -53,6 +53,10 @@ class InterpretedOrdering(ordering: Seq[SortOrder]) extends Ordering[InternalRow
                 a.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
               case a: ArrayType if order.direction == Descending =>
                 a.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
    +          case m: MapType if m.isOrdered && order.direction == Ascending =>
    +            m.interpretedOrdering.asInstanceOf[Ordering[Any]].compare(left, right)
    +          case m: MapType if m.isOrdered && order.direction == Descending =>
    +            m.interpretedOrdering.asInstanceOf[Ordering[Any]].reverse.compare(left, right)
    --- End diff --
    
    Actually, this is not necessary, but just to make the logic complete.
    https://github.com/apache/spark/pull/9718 did the same thing.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22712: [SPARK-25724] Add sorting functionality in MapTyp...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22712#discussion_r224957118
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/MapType.scala ---
    @@ -73,6 +74,90 @@ case class MapType(
       override private[spark] def existsRecursively(f: (DataType) => Boolean): Boolean = {
         f(this) || keyType.existsRecursively(f) || valueType.existsRecursively(f)
       }
    +
    +  private[this] class OrderedWrapper {
    +    var isOrdered: Boolean = false
    --- End diff --
    
    I prefer not to make this mutable if we can. That can be a source of some pretty weird errors if we move from an unordered to an ordered map. Why do you need this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22712: [SPARK-25724] Add sorting functionality in MapType.

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/22712
  
    > BTW, how does hive implement comparable maps?
    Is it below piece of code ?
    
    https://github.com/apache/hive/blob/master/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/FullMapEqualComparer.java


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org