You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by William Creger <Cl...@Mscience.com> on 2019/03/06 18:51:29 UTC

PysPark date_add function suggestion

I've been looking at the source code of the PySpark date_add function (https://spark.apache.org/docs/latest/api/python/_modules/pyspark/sql/functions.html#date_add) and I'm wondering why the days input variable is not cast to a java column like the start variable. This effectively means that when working with data frames, you can only add one number of days to all of your dates. I think it would make more sense to cast the days variable to a java column, so that you could add different days to different dates. The jvm function date_add has no problem doing this because I can add a date and integer column using the expr function (expr("date_add(start, days)"). And if you wanted to add the same date, you could just make a lit column with the same number. This argument applies to the functions date_sub and add_months as well.

Clay



M Science archives and monitors outgoing and incoming e-mail. The contents of this email, including any attachments, are confidential to the ordinary user of the email address to which it was addressed. If you are not the addressee of this email you may not copy, forward, disclose or otherwise use it or any part of it in any form whatsoever. This email may be produced at the request of regulators or in connection with civil litigation. M Science accepts no liability for any errors or omissions arising as a result of transmission. Use by other than intended recipients is prohibited.