You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "dtenedor (via GitHub)" <gi...@apache.org> on 2024/02/28 19:13:20 UTC
[PR] [SPARK-47214][Python] Create API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
dtenedor opened a new pull request, #45319:
URL: https://github.com/apache/spark/pull/45319
### What changes were proposed in this pull request?
This PR creates an API for 'analyze' method to differentiate constant NULL arguments and other types of arguments.
### Why are the changes needed?
Before this PR, each `AnalyzeArgument` had a `value` field which was non-None if the argument was a constant-foldable scalar expression:
```
@dataclass(frozen=True)
class AnalyzeArgument:
"""
The argument for Python UDTF's analyze static method.
Parameters
----------
dataType : :class:`DataType`
The argument's data type
value : any, optional
The calculated value if the argument is foldable; otherwise None
isTable : bool
If True, the argument is a table argument.
"""
dataType: DataType
value: Optional[Any]
isTable: bool
```
However, it was also `None` if the argument is literal NULL.
Now we add a new field to tell the difference:
```
isConstantExpression : bool
If True, the argument is a constant-foldable scalar expression. Then the 'value' field
contains None if the argument is a NULL literal, or a non-None value if the argument is a
non-NULL literal. In this way, we can distinguish between a literal NULL argument and other
types of arguments such as complex expression trees or table arguments where the 'value'
field is always None.
```
### Does this PR introduce _any_ user-facing change?
Yes, see above.
### How was this patch tested?
This PR applies test coverage.
### Was this patch authored or co-authored using generative AI tooling?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "amaliujia (via GitHub)" <gi...@apache.org>.
amaliujia commented on code in PR #45319:
URL: https://github.com/apache/spark/pull/45319#discussion_r1506632295
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala:
##########
@@ -207,14 +207,22 @@ class UserDefinedPythonTableFunctionAnalyzeRunner(
case NamedArgumentExpression(k, v) => (Some(k), v)
case _ => (None, expr)
}
+ // Write the 'value' field to provide the argument's value if it is a foldable scalar\
Review Comment:
nit: extra `\` after `scalar`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #45319:
URL: https://github.com/apache/spark/pull/45319#discussion_r1506689984
##########
python/pyspark/sql/worker/analyze_udtf.py:
##########
@@ -78,14 +78,16 @@ def read_arguments(infile: IO) -> Tuple[List[AnalyzeArgument], Dict[str, Analyze
kwargs: Dict[str, AnalyzeArgument] = {}
for _ in range(num_args):
dt = _parse_datatype_json_string(utf8_deserializer.loads(infile))
- if read_bool(infile): # is foldable
+ is_constant_expression = read_bool(infile)
+ if is_constant_expression:
value = pickleSer._read_with_length(infile)
if dt.needConversion():
value = dt.fromInternal(value)
else:
value = None
- is_table = read_bool(infile) # is table argument
- argument = AnalyzeArgument(dataType=dt, value=value, isTable=is_table)
+ is_table = read_bool(infile)
+ argument = AnalyzeArgument(
+ dataType=dt, value=value, isTable=is_table, isConstantExpression=is_constant_expression)
Review Comment:
Seems to be making the linter angry.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on PR #45319:
URL: https://github.com/apache/spark/pull/45319#issuecomment-1970173842
The failed test seems not related to this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on PR #45319:
URL: https://github.com/apache/spark/pull/45319#issuecomment-1969949502
Thanks @amaliujia @ueshin for your reviews! I made the suggestions that you proposed, please take another look.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on code in PR #45319:
URL: https://github.com/apache/spark/pull/45319#discussion_r1506692938
##########
python/pyspark/sql/worker/analyze_udtf.py:
##########
@@ -78,14 +78,16 @@ def read_arguments(infile: IO) -> Tuple[List[AnalyzeArgument], Dict[str, Analyze
kwargs: Dict[str, AnalyzeArgument] = {}
for _ in range(num_args):
dt = _parse_datatype_json_string(utf8_deserializer.loads(infile))
- if read_bool(infile): # is foldable
+ is_constant_expression = read_bool(infile)
+ if is_constant_expression:
value = pickleSer._read_with_length(infile)
if dt.needConversion():
value = dt.fromInternal(value)
else:
value = None
- is_table = read_bool(infile) # is table argument
- argument = AnalyzeArgument(dataType=dt, value=value, isTable=is_table)
+ is_table = read_bool(infile)
+ argument = AnalyzeArgument(
+ dataType=dt, value=value, isTable=is_table, isConstantExpression=is_constant_expression)
Review Comment:
👍
```
spark$ dev/reformat-python
reformatted spark/python/pyspark/sql/worker/analyze_udtf.py
All done! ✨ 🍰 ✨
1 file reformatted, 984 files left unchanged.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #45319:
URL: https://github.com/apache/spark/pull/45319#discussion_r1506560801
##########
python/pyspark/sql/worker/analyze_udtf.py:
##########
@@ -84,8 +84,10 @@ def read_arguments(infile: IO) -> Tuple[List[AnalyzeArgument], Dict[str, Analyze
value = dt.fromInternal(value)
else:
value = None
- is_table = read_bool(infile) # is table argument
- argument = AnalyzeArgument(dataType=dt, value=value, isTable=is_table)
+ is_table = read_bool(infile)
+ is_constant_expression = read_bool(infile)
Review Comment:
Then receive `is_constant_expression` from `read_bool(inline)` at line 81?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "dtenedor (via GitHub)" <gi...@apache.org>.
dtenedor commented on PR #45319:
URL: https://github.com/apache/spark/pull/45319#issuecomment-1969675584
cc @dongjoon-hyun @ueshin @allisonwang-db
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on code in PR #45319:
URL: https://github.com/apache/spark/pull/45319#discussion_r1506558129
##########
sql/core/src/main/scala/org/apache/spark/sql/execution/python/UserDefinedPythonFunction.scala:
##########
@@ -207,14 +207,22 @@ class UserDefinedPythonTableFunctionAnalyzeRunner(
case NamedArgumentExpression(k, v) => (Some(k), v)
case _ => (None, expr)
}
+ // Write the 'value' field to provide the argument's value if it is a foldable scalar\
+ // expression.
if (value.foldable) {
dataOut.writeBoolean(true)
Review Comment:
Can we reuse this? This `true` and the `false` at line 217 should be the same as `value.foldable`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin closed pull request #45319: [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments
URL: https://github.com/apache/spark/pull/45319
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-47214][Python] Create UDTF API for 'analyze' method to differentiate constant NULL arguments and other types of arguments [spark]
Posted by "ueshin (via GitHub)" <gi...@apache.org>.
ueshin commented on PR #45319:
URL: https://github.com/apache/spark/pull/45319#issuecomment-1970174021
Thanks! merging to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org