You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/02 19:11:11 UTC

[GitHub] [spark] Jonathancui123 commented on a diff in pull request #37327: [SPARK-39904][SQL] Rename inferDate to prefersDate and clarify semantics of the option in CSV data source

Jonathancui123 commented on code in PR #37327:
URL: https://github.com/apache/spark/pull/37327#discussion_r935939678


##########
docs/sql-data-sources-csv.md:
##########
@@ -109,9 +109,9 @@ Data source options of CSV can be set via:
     <td>read</td>
   </tr>
   <tr>
-    <td><code>inferDate</code></td> 
+    <td><code>prefersDate</code></td>
     <td>false</td>
-    <td>Whether or not to infer columns that satisfy the <code>dateFormat</code> option as <code>Date</code>. Requires <code>inferSchema</code> to be <code>true</code>. When <code>false</code>, columns with dates will be inferred as <code>String</code> (or as <code>Timestamp</code> if it fits the <code>timestampFormat</code>).</td>
+    <td>Attempts to infer string columns as <code>Date</code> if the values satisfy <code>dateFormat</code> option and failed to be parsed by the respective formatter during schema inference (<code>inferSchema</code>). When used in conjunction with a user-provided schema, attempts to parse timestamp columns as dates using <code>dateFormat</code> if they fail to conform to <code>timestampFormat</code>, the parsed values will be cast to timestamp type afterwards.</td>

Review Comment:
   ```suggestion
       <td>During schema inference (<code>inferSchema</code>), attempts to infer string columns that contain dates or timestamps as <code>Date</code> if the values satisfy the <code>dateFormat</code> option and failed to be parsed by the respective formatter. Attempts to parse timestamp columns as dates using <code>dateFormat</code> if they fail to conform to <code>timestampFormat</code>. The parsed values will be cast to timestamp type afterwards.</td>
   ```
   
   1. Re-order sentence for readability
   2. Removed: "When used in conjunction with a user-provided schema" because parsing timestamp columns as dates happens regardless of whether there is a user-provided schema
   3. Split sentence at comma for readabilityWhen used in conjunction with a user-provided schema



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org