You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruifeng Zheng (Jira)" <ji...@apache.org> on 2023/01/06 02:52:00 UTC
[jira] [Created] (SPARK-41918) Refine the naming in proto messages
Ruifeng Zheng created SPARK-41918:
-------------------------------------
Summary: Refine the naming in proto messages
Key: SPARK-41918
URL: https://issues.apache.org/jira/browse/SPARK-41918
Project: Spark
Issue Type: Sub-task
Components: Connect
Affects Versions: 3.4.0
Reporter: Ruifeng Zheng
normally, we name the fields after the corresponding LogiclalPlan or DataFrame API, but they are not consistent in protos, for example, the column name:
{code:java}
message UnresolvedRegex {
// (Required) The column name used to extract column with regex.
string col_name = 1;
}
{code}
{code:java}
message Alias {
// (Required) The expression that alias will be added on.
Expression expr = 1;
// (Required) a list of name parts for the alias.
//
// Scalar columns only has one name that presents.
repeated string name = 2;
// (Optional) Alias metadata expressed as a JSON map.
optional string metadata = 3;
}
{code}
{code:java}
// Relation of type [[Deduplicate]] which have duplicate rows removed, could consider either only
// the subset of columns or all the columns.
message Deduplicate {
// (Required) Input relation for a Deduplicate.
Relation input = 1;
// (Optional) Deduplicate based on a list of column names.
//
// This field does not co-use with `all_columns_as_keys`.
repeated string column_names = 2;
// (Optional) Deduplicate based on all the columns of the input relation.
//
// This field does not co-use with `column_names`.
optional bool all_columns_as_keys = 3;
}
{code}
{code:java}
// Computes basic statistics for numeric and string columns, including count, mean, stddev, min,
// and max. If no columns are given, this function computes statistics for all numerical or
// string columns.
message StatDescribe {
// (Required) The input relation.
Relation input = 1;
// (Optional) Columns to compute statistics on.
repeated string cols = 2;
}
{code}
we probably should unify the naming:
single column -> `column`
multi columns -> `columns`
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org