You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2024/01/07 13:07:31 UTC

[PR] [WIP][CORE] Replace SimpleDateFormat with DateTimeFormatter [spark]

beliefer opened a new pull request, #44616:
URL: https://github.com/apache/spark/pull/44616

   ### What changes were proposed in this pull request?
   This PR propose to replace `SimpleDateFormat` with `DateTimeFormatter`.
   
   
   ### Why are the changes needed?
   According to the javadoc of `SimpleDateFormat`, it recommended to use `DateTimeFormatter` too.
   ![1](https://github.com/apache/spark/assets/8486025/97b16bbb-e5b7-4b3f-9bc8-0b0b8c907542)
   In addition, `DateTimeFormatter` have better performance than `SimpleDateFormat` too.
   
   Note: `SimpleDateFormat` and `DateTimeFormatter` are not completely compatible, for example, the formats supported by `parse` are not exactly the same.
   
   
   ### Does this PR introduce _any_ user-facing change?
   'No'.
   
   
   ### How was this patch tested?
   GA and manual test.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1898012328

   I had to manually resolve the jira @beliefer - can you please check if I assigned it to you ? There were two id's with same name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1898009353

   Merged to master.
   Thanks for fixing this @beliefer !
   Thanks for the review @LuciferYang :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm closed pull request #44616: [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter
URL: https://github.com/apache/spark/pull/44616


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1884149950

   cc @mridulm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "mridulm (via GitHub)" <gi...@apache.org>.
mridulm commented on code in PR #44616:
URL: https://github.com/apache/spark/pull/44616#discussion_r1452860600


##########
core/src/main/scala/org/apache/spark/deploy/master/Master.scala:
##########
@@ -58,7 +59,10 @@ private[deploy] class Master(
   private val appIdPattern = conf.get(APP_ID_PATTERN)
 
   // For application IDs
-  private def createDateFormat = new SimpleDateFormat("yyyyMMddHHmmss", Locale.US)
+  private lazy val dateTimeFormatter =

Review Comment:
   nit: Does not need to be `lazy` and can be moved to `object Master`, same in other places as well.



##########
core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala:
##########
@@ -103,10 +104,13 @@ class NewHadoopRDD[K, V](
   private val confBroadcast = sc.broadcast(new SerializableConfiguration(_conf))
   // private val serializableConf = new SerializableWritable(_conf)
 
-  private val jobTrackerId: String = {
-    val formatter = new SimpleDateFormat("yyyyMMddHHmmss", Locale.US)
-    formatter.format(new Date())
-  }
+  @transient
+  private val dateTimeFormatter =
+    DateTimeFormatter
+      .ofPattern("yyyyMMddHHmmss", Locale.US)
+      .withZone(ZoneId.systemDefault())
+
+  private val jobTrackerId: String = dateTimeFormatter.format(new Date().toInstant)

Review Comment:
   We dont need this field
   
   ```suggestion
     private val jobTrackerId: String = {
     val dateTimeFormatter =
         DateTimeFormatter
           .ofPattern("yyyyMMddHHmmss", Locale.US)
           .withZone(ZoneId.systemDefault())  
       dateTimeFormatter.format(new Date().toInstant)
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1898417744

   > I had to manually resolve the jira @beliefer - can you please check if I assigned it to you ? There were two id's with same name.
   
   I checked. It's OK. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1895581566

   The GA failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [WIP][CORE] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1880231491

   cc @MaxGekk BTW, we would have to replace `FastDateFormat` too if possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44616:
URL: https://github.com/apache/spark/pull/44616#discussion_r1452943400


##########
core/src/main/scala/org/apache/spark/deploy/master/Master.scala:
##########
@@ -58,7 +59,10 @@ private[deploy] class Master(
   private val appIdPattern = conf.get(APP_ID_PATTERN)
 
   // For application IDs
-  private def createDateFormat = new SimpleDateFormat("yyyyMMddHHmmss", Locale.US)
+  private lazy val dateTimeFormatter =

Review Comment:
   +1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1898319059

   @mridulm @LuciferYang @HyukjinKwon Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #44616:
URL: https://github.com/apache/spark/pull/44616#issuecomment-1881037245

   ping @dongjoon-hyun @srowen @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-46623][CORE][MLLIB][SQL] Replace SimpleDateFormat with DateTimeFormatter [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #44616:
URL: https://github.com/apache/spark/pull/44616#discussion_r1454594178


##########
mllib/src/main/scala/org/apache/spark/mllib/pmml/export/PMMLModelExport.scala:
##########
@@ -34,7 +40,7 @@ private[mllib] trait PMMLModelExport {
     val version = getClass.getPackage.getImplementationVersion
     val app = new Application("Apache Spark MLlib").setVersion(version)
     val timestamp = new Timestamp()
-      .addContent(new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss", Locale.US).format(new Date()))
+      .addContent(DATE_TIME_FORMATTER.format(new Date().toInstant))

Review Comment:
   ditto



##########
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTempPath.scala:
##########
@@ -41,6 +42,11 @@ class HiveTempPath(session: SparkSession, val hadoopConf: Configuration, path: P
 
   lazy val externalTempPath: Path = getExternalTmpPath(path)
 
+  private lazy val dateTimeFormatter =

Review Comment:
   This definition can also be moved to `object HiveTempPath`



##########
core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala:
##########
@@ -821,7 +820,7 @@ private[deploy] class Worker(
   }
 
   private def generateWorkerId(): String = {
-    workerIdPattern.format(createDateFormat.format(new Date), host, port)
+    workerIdPattern.format(Worker.DATE_TIME_FORMATTER.format(new Date().toInstant), host, port)

Review Comment:
   Is it possible to directly use `Instant.now()` instead of `new Date().toInstant`?
   
   



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousTextSocketSource.scala:
##########
@@ -185,8 +185,8 @@ class TextSocketContinuousStream(
             TextSocketContinuousStream.this.synchronized {
               currentOffset += 1
               val newData = (line,
-                Timestamp.valueOf(
-                  TextSocketReader.DATE_FORMAT.format(Calendar.getInstance().getTime()))
+                Timestamp.valueOf(TextSocketReader.DATE_TIME_FORMATTER.format(
+                  Calendar.getInstance().getTime().toInstant))

Review Comment:
   Is it possible to directly use `Instant.now()` instead of new `alendar.getInstance().getTime().toInstant`?



##########
core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala:
##########
@@ -104,8 +105,11 @@ class NewHadoopRDD[K, V](
   // private val serializableConf = new SerializableWritable(_conf)
 
   private val jobTrackerId: String = {
-    val formatter = new SimpleDateFormat("yyyyMMddHHmmss", Locale.US)
-    formatter.format(new Date())
+    val dateTimeFormatter =
+      DateTimeFormatter
+        .ofPattern("yyyyMMddHHmmss", Locale.US)
+        .withZone(ZoneId.systemDefault())
+    dateTimeFormatter.format(new Date().toInstant)

Review Comment:
   ditto



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org