You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rui Wang (Jira)" <ji...@apache.org> on 2023/01/06 18:36:00 UTC
[jira] [Comment Edited] (SPARK-41918) Refine the naming in proto messages
[ https://issues.apache.org/jira/browse/SPARK-41918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655545#comment-17655545 ]
Rui Wang edited comment on SPARK-41918 at 1/6/23 6:35 PM:
----------------------------------------------------------
[~grundprinzip-db]
I am a bit confused on the renaming and what compatibility it offers:
```
message Foo {
int a = 1;
}
```
On the receiver side it access the a
val t = foo.a + 1
then we allow rename field
```
message Foo {
int b = 1;
}
```
Any renaming will break the receiver side's code? Do I misunderstand `WIRE compatibility` that the receiver should be able to read the output after the wire?
was (Author: amaliujia):
[~grundprinzip-db]
I am a bit confused on the renaming and what compatibility it offers:
```
message Foo {
int a = 1;
}
```
On the receiver side it access the a
val t = foo.a + 1
Any renaming will break the receiver side's code? Do I misunderstand `WIRE compatibility` that the receiver should be able to read the output after the wire?
> Refine the naming in proto messages
> -----------------------------------
>
> Key: SPARK-41918
> URL: https://issues.apache.org/jira/browse/SPARK-41918
> Project: Spark
> Issue Type: Sub-task
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: Ruifeng Zheng
> Priority: Major
>
> normally, we name the fields after the corresponding LogiclalPlan or DataFrame API, but they are not consistent in protos, for example, the column name:
> {code:java}
> message UnresolvedRegex {
> // (Required) The column name used to extract column with regex.
> string col_name = 1;
> }
> {code}
> {code:java}
> message Alias {
> // (Required) The expression that alias will be added on.
> Expression expr = 1;
> // (Required) a list of name parts for the alias.
> //
> // Scalar columns only has one name that presents.
> repeated string name = 2;
> // (Optional) Alias metadata expressed as a JSON map.
> optional string metadata = 3;
> }
> {code}
> {code:java}
> // Relation of type [[Deduplicate]] which have duplicate rows removed, could consider either only
> // the subset of columns or all the columns.
> message Deduplicate {
> // (Required) Input relation for a Deduplicate.
> Relation input = 1;
> // (Optional) Deduplicate based on a list of column names.
> //
> // This field does not co-use with `all_columns_as_keys`.
> repeated string column_names = 2;
> // (Optional) Deduplicate based on all the columns of the input relation.
> //
> // This field does not co-use with `column_names`.
> optional bool all_columns_as_keys = 3;
> }
> {code}
> {code:java}
> // Computes basic statistics for numeric and string columns, including count, mean, stddev, min,
> // and max. If no columns are given, this function computes statistics for all numerical or
> // string columns.
> message StatDescribe {
> // (Required) The input relation.
> Relation input = 1;
> // (Optional) Columns to compute statistics on.
> repeated string cols = 2;
> }
> {code}
> we probably should unify the naming:
> single column -> `column`
> multi columns -> `columns`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org