You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "olaky (via GitHub)" <gi...@apache.org> on 2023/10/25 07:17:15 UTC

[PR] [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule [spark]

olaky opened a new pull request, #43524:
URL: https://github.com/apache/spark/pull/43524

   ### What changes were proposed in this pull request?
   
   The ComputeCurrentTime optimizer rule does produce unique timestamp Literals for current time expressions of a query. For CurrentDate and LocalTimestamp the Literal objects are not re-used though, but semantically equal objects are created for each instance. This can cost unnecessary much memory in case there are many such Literal objects.
   
   This PR adds a map that caches timestamp literals in case they are used more than once.
   
   ### Why are the changes needed?
   
   A query that has a lot of equal literals could use unnecessary high amounts of memory
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Added a new Unit Test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk closed pull request #43524: [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule
URL: https://github.com/apache/spark/pull/43524


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43524:
URL: https://github.com/apache/spark/pull/43524#issuecomment-1779685990

   All GAs passed: https://github.com/olaky/spark/runs/18042506979. Merging to master.
   Thank you, @olaky.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule [spark]

Posted by "olaky (via GitHub)" <gi...@apache.org>.
olaky commented on PR #43524:
URL: https://github.com/apache/spark/pull/43524#issuecomment-1778659208

   cc @tomvanbussel @bart-samwel


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43524:
URL: https://github.com/apache/spark/pull/43524#discussion_r1371561931


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ComputeCurrentTimeSuite.scala:
##########
@@ -135,6 +135,39 @@ class ComputeCurrentTimeSuite extends PlanTest {
     assert(offsetsFromQuarterHour.size == 1)
   }
 
+  val timeReferenceFactories: Seq[(String) => Expression] = Seq(
+    { _: String => CurrentTimestamp() },
+    { zoneId: String => LocalTimestamp(Some(zoneId)) },
+    { _: String => Now() },
+    { zoneId: String => CurrentDate(Some(zoneId)) }
+  )
+  for ((timeReferenceFactory, index) <- timeReferenceFactories.zipWithIndex) {
+    test(s"No duplicate literals index=$index") {

Review Comment:
   From my personal experience, it is better to put the loop inside of the test. This should simplify debugging.
   
   For example:
   ```scala
     test("No duplicate literals") {
       def checkLiterals(f: (String) => Expression, expected: Int): Unit = {
         ...
       }
   
       checkLiterals((_: String) => CurrentTimestamp(), 1)
       checkLiterals((zoneId: String) => LocalTimestamp(Some(zoneId)), numTimezones)
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on code in PR #43524:
URL: https://github.com/apache/spark/pull/43524#discussion_r1371764586


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ComputeCurrentTimeSuite.scala:
##########
@@ -135,6 +135,34 @@ class ComputeCurrentTimeSuite extends PlanTest {
     assert(offsetsFromQuarterHour.size == 1)
   }
 
+  test(s"No duplicate literals") {

Review Comment:
   Please, remove s before "..."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45660] Re-use Literal objects in ComputeCurrentTime rule [spark]

Posted by "olaky (via GitHub)" <gi...@apache.org>.
olaky commented on code in PR #43524:
URL: https://github.com/apache/spark/pull/43524#discussion_r1371773983


##########
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ComputeCurrentTimeSuite.scala:
##########
@@ -135,6 +135,34 @@ class ComputeCurrentTimeSuite extends PlanTest {
     assert(offsetsFromQuarterHour.size == 1)
   }
 
+  test(s"No duplicate literals") {

Review Comment:
   right, that is not required any more indeed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org