You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Oron Navon (JIRA)" <ji...@apache.org> on 2018/10/23 08:22:00 UTC
[jira] [Created] (SPARK-25807) Mitigate 1-based substr() confusion
Oron Navon created SPARK-25807:
----------------------------------
Summary: Mitigate 1-based substr() confusion
Key: SPARK-25807
URL: https://issues.apache.org/jira/browse/SPARK-25807
Project: Spark
Issue Type: Improvement
Components: Java API, PySpark
Affects Versions: 2.3.2, 1.3.0, 2.4.0, 2.5.0, 3.0.0
Reporter: Oron Navon
The method {{Column.substr()}} is 1-based, conforming with SQL and Hive's {{SUBSTRING}}, and contradicting both Python's {{substr}} and Java's {{substr}}, which are zero-based. Both PySpark users and Java API users often naturally expect a 0-based {{substr()}}. Adding to the confusion, {{substr()}} currently allows a {{startPos}} value of 0, which returns the same result as {{startPos==1}}.
Since changing {{substr()}} to 0-based is probably NOT a reasonable option here, I suggest making one or more of the following changes:
# Adding a method {{substr0}}, which would be zero-based
# Renaming {{substr}} to {{substr1}}
# Making the existing {{substr()}} throw an exception on {{startPos==0}}, which should catch and alert most users who expect zero-based behavior.
This is my first discussion on this project, apologies for any faux pas.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org