You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Edmondo Porcu <ed...@gmail.com> on 2022/09/02 23:44:22 UTC

Unsafe loop in CatalystTypeConverters

Hello Spark,

as I tried to implement an improvement on Delta Lake by changing a
signature of a command by making it return some data (see
https://github.com/delta-io/delta/pull/1240/files#diff-f993636fe0ceb5f15de09ed4a0a71ea405a5e89ad66d36047099fcc1387a375aR123)
the code produced a runtime exception because I didn't add override the
output here

https://github.com/delta-io/delta/pull/1240/files#diff-f993636fe0ceb5f15de09ed4a0a71ea405a5e89ad66d36047099fcc1387a375aR123

The loop here makes https://github.com/apache/spark/pull/37778/files about
the length of two collections and uses the index of the first collection to
iterate on the second as well.  A more defensive approach would have been
to zip the two collections or at least check if this pre-requisite is met.

I understand it might not be a big problem, but if you think this is worth
addressing or improving, besides the "throw a more meaningful error"
approach suggested in a PR, other suitable approaches could be:
- removing the default value for the output collection and forcing a clear
decision from the developer upon override, providing a mixable trait
"NoOutputCommand"
- adding scalaDoc
to org.apache.spark.sql.catalyst.plans.logical.Command#output
- overwriting the default output instance in LeafCommand and making it
return a ???


What do you think?
Edmondo