You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruifeng Zheng (Jira)" <ji...@apache.org> on 2023/01/06 02:52:00 UTC
[jira] [Created] (SPARK-41918) Refine the naming in proto messages

Ruifeng Zheng created SPARK-41918:
-------------------------------------

             Summary: Refine the naming in proto messages
                 Key: SPARK-41918
                 URL: https://issues.apache.org/jira/browse/SPARK-41918
             Project: Spark
          Issue Type: Sub-task
          Components: Connect
    Affects Versions: 3.4.0
            Reporter: Ruifeng Zheng


normally, we name the fields after the corresponding LogiclalPlan or DataFrame API, but they are not consistent in protos, for example, the column name:


{code:java}
  message UnresolvedRegex {
    // (Required) The column name used to extract column with regex.
    string col_name = 1;
  }
{code}


{code:java}
  message Alias {
    // (Required) The expression that alias will be added on.
    Expression expr = 1;

    // (Required) a list of name parts for the alias.
    //
    // Scalar columns only has one name that presents.
    repeated string name = 2;

    // (Optional) Alias metadata expressed as a JSON map.
    optional string metadata = 3;
  }
{code}



{code:java}
// Relation of type [[Deduplicate]] which have duplicate rows removed, could consider either only
// the subset of columns or all the columns.
message Deduplicate {
  // (Required) Input relation for a Deduplicate.
  Relation input = 1;

  // (Optional) Deduplicate based on a list of column names.
  //
  // This field does not co-use with `all_columns_as_keys`.
  repeated string column_names = 2;

  // (Optional) Deduplicate based on all the columns of the input relation.
  //
  // This field does not co-use with `column_names`.
  optional bool all_columns_as_keys = 3;
}
{code}


{code:java}
// Computes basic statistics for numeric and string columns, including count, mean, stddev, min,
// and max. If no columns are given, this function computes statistics for all numerical or
// string columns.
message StatDescribe {
  // (Required) The input relation.
  Relation input = 1;

  // (Optional) Columns to compute statistics on.
  repeated string cols = 2;
}
{code}


we probably should unify the naming:

single column -> `column`

multi columns -> `columns`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org