You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/09/11 01:01:03 UTC

[GitHub] [spark] mingjialiu removed a comment on pull request #29564: [WIP][SPARK-32708] Query optimization fails to reuse exchange with DataSourceV2

mingjialiu removed a comment on pull request #29564:
URL: https://github.com/apache/spark/pull/29564#issuecomment-690599004


   Hi,
   
   Regarding test coverage,  it's a bit tricky to repro in a unit test.
   Can I get some pointers on populating different expression ids for the same
   column? Or test suggestions?
   
      - CAN'T repro example in unit test:
   
       val df = spark.read.format(classOf[AdvancedDataSourceV2].getName).load()
       val q1 = df.select(($"i" + 1).as("k"), ($"i" - 1).as("j")).filter('i >
   5)
       val q2 = df.select(($"i" + 1).as("k"), ($"i" - 1).as("j")).filter('i >
   5)
       val scans1 = getV2ScanExecs(q1.join(q2, "j"))
       assert(scans1(0).sameResult(scans1(1)))
   
      scans1(0).sameResult(scans1(1)) will always return true even if filtered
   columns are not properly canonicalized (as circled in screenshots)
   
   
   
   [image: other_canonicalized.png]
   
   [image: this_canonicalized.png]
   
   
   
   
      - CAN repro query :  SPARK-32708
      <https://issues.apache.org/jira/browse/SPARK-32708>
   
      The key to repro is to have d_day_name and d_year  assigned different
   expression Ids.
      Relative implementation : preserve old expressionId if column not found
   <https://github.com/apache/spark/blob/branch-2.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala#L285>
   
   
   Thank you,
   Mingjia
   
   
   
   
   On Wed, Sep 9, 2020 at 11:05 PM Wenchen Fan <no...@github.com>
   wrote:
   
   > The fix LGTM, can you add a test?
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/spark/pull/29564#issuecomment-690008778>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/AEMVJMYZD527O7EY27RUISLSFBUCZANCNFSM4QNQGG5Q>
   > .
   >
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org