You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rui Wang (Jira)" <ji...@apache.org> on 2023/01/06 18:36:00 UTC

[jira] [Comment Edited] (SPARK-41918) Refine the naming in proto messages

    [ https://issues.apache.org/jira/browse/SPARK-41918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655545#comment-17655545 ] 

Rui Wang edited comment on SPARK-41918 at 1/6/23 6:35 PM:
----------------------------------------------------------

[~grundprinzip-db]

I am a bit confused on the renaming and what compatibility it offers:

```
message Foo {
   int a = 1;
}
``` 

On the receiver side it access the a
val t = foo.a + 1


then we allow rename field
```
message Foo {
   int b = 1;
}
``` 

Any renaming will break the receiver side's code? Do I misunderstand `WIRE compatibility` that the receiver should be able to read the output after the wire?



was (Author: amaliujia):
[~grundprinzip-db]

I am a bit confused on the renaming and what compatibility it offers:

```
message Foo {
   int a = 1;
}
``` 

On the receiver side it access the a
val t = foo.a + 1


Any renaming will break the receiver side's code? Do I misunderstand `WIRE compatibility` that the receiver should be able to read the output after the wire?


> Refine the naming in proto messages
> -----------------------------------
>
>                 Key: SPARK-41918
>                 URL: https://issues.apache.org/jira/browse/SPARK-41918
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: Ruifeng Zheng
>            Priority: Major
>
> normally, we name the fields after the corresponding LogiclalPlan or DataFrame API, but they are not consistent in protos, for example, the column name:
> {code:java}
>   message UnresolvedRegex {
>     // (Required) The column name used to extract column with regex.
>     string col_name = 1;
>   }
> {code}
> {code:java}
>   message Alias {
>     // (Required) The expression that alias will be added on.
>     Expression expr = 1;
>     // (Required) a list of name parts for the alias.
>     //
>     // Scalar columns only has one name that presents.
>     repeated string name = 2;
>     // (Optional) Alias metadata expressed as a JSON map.
>     optional string metadata = 3;
>   }
> {code}
> {code:java}
> // Relation of type [[Deduplicate]] which have duplicate rows removed, could consider either only
> // the subset of columns or all the columns.
> message Deduplicate {
>   // (Required) Input relation for a Deduplicate.
>   Relation input = 1;
>   // (Optional) Deduplicate based on a list of column names.
>   //
>   // This field does not co-use with `all_columns_as_keys`.
>   repeated string column_names = 2;
>   // (Optional) Deduplicate based on all the columns of the input relation.
>   //
>   // This field does not co-use with `column_names`.
>   optional bool all_columns_as_keys = 3;
> }
> {code}
> {code:java}
> // Computes basic statistics for numeric and string columns, including count, mean, stddev, min,
> // and max. If no columns are given, this function computes statistics for all numerical or
> // string columns.
> message StatDescribe {
>   // (Required) The input relation.
>   Relation input = 1;
>   // (Optional) Columns to compute statistics on.
>   repeated string cols = 2;
> }
> {code}
> we probably should unify the naming:
> single column -> `column`
> multi columns -> `columns`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org