You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/11 16:09:47 UTC

[GitHub] [spark] srowen commented on a change in pull request #30323: Spelling

srowen commented on a change in pull request #30323:
URL: https://github.com/apache/spark/pull/30323#discussion_r521447840



##########
File path: docs/monitoring.md
##########
@@ -421,7 +421,7 @@ to handle the Spark Context setup and tear down.
 
 In addition to viewing the metrics in the UI, they are also available as JSON.  This gives developers
 an easy way to create new visualizations and monitoring tools for Spark.  The JSON is available for
-both running applications, and in the history server.  The endpoints are mounted at `/api/v1`.  Eg.,
+both running applications, and in the history server.  The endpoints are mounted at `/api/v1`.  E.g.,

Review comment:
       You're welcome to write out "For example" in a case like this, where "E.g." isn't really correct usage to start a sentence. But, fine to leave it as is here.

##########
File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/DistanceMeasure.scala
##########
@@ -78,7 +78,7 @@ private[spark] abstract class DistanceMeasure extends Serializable {
   /**
    * Compute distance between centers in a distributed way.
    */
-  def computeStatisticsDistributedly(
+  def computeStatisticsDistributively(

Review comment:
       Although "Distributedly" is a weird neologism, I don't think it's wrong, or at least, "Distributively" sounds weirder. I'd revert this.

##########
File path: core/src/main/scala/org/apache/spark/util/Utils.scala
##########
@@ -1090,20 +1090,20 @@ private[spark] object Utils extends Logging {
     }
     // checks if the hostport contains IPV6 ip and parses the host, port
     if (hostPort != null && hostPort.split(":").length > 2) {
-      val indx: Int = hostPort.lastIndexOf("]:")

Review comment:
       Likewise, not necessary but I don't mind fixing a few odd local var names. That said also fine to not do this in the name of keeping this large change from getting really large.

##########
File path: core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala
##########
@@ -968,7 +968,7 @@ private[spark] object JsonProtocolSuite extends Assertions {
   private val stackTrace = {
     Array[StackTraceElement](
       new StackTraceElement("Apollo", "Venus", "Mercury", 42),
-      new StackTraceElement("Afollo", "Vemus", "Mercurry", 420),
+      new StackTraceElement("Afollo", "Vemus", "Mercury", 420),
       new StackTraceElement("Ayollo", "Vesus", "Blackberry", 4200)

Review comment:
       I think it's intentional, for whatever reason. You can leave it or add a comment

##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationsSuite.scala
##########
@@ -81,7 +81,7 @@ class UnsupportedOperationsSuite extends SparkFunSuite {
 
   // Commands
   assertNotSupportedInStreamingPlan(
-    "commmands",
+    "commands",
     DummyCommand(),
     outputMode = Append,
     expectedMsgs = "commands" :: Nil)

Review comment:
       I'd just leave it

##########
File path: python/pyspark/mllib/regression.py
##########
@@ -739,7 +739,7 @@ def _validate(self, dstream):
                 "dstream should be a DStream object, got %s" % type(dstream))
         if not self._model:
             raise ValueError(
-                "Model must be intialized using setInitialWeights")
+                "Model must be initialized using setInitialWeights")

Review comment:
       Yes, use this instead of 'initialised'

##########
File path: core/src/test/scala/org/apache/spark/deploy/worker/WorkerSuite.scala
##########
@@ -342,7 +342,7 @@ class WorkerSuite extends SparkFunSuite with Matchers with BeforeAndAfter {
     testWorkDirCleanupAndRemoveMetadataWithConfig(true)
   }
 
-  test("WorkdDirCleanup cleans only app dirs when" +
+  test("WorkDirCleanup cleans only app dirs when" +

Review comment:
       What's suspicious? looks like a clean typo fix

##########
File path: common/network-common/src/test/java/org/apache/spark/network/crypto/AuthEngineSuite.java
##########
@@ -150,8 +150,8 @@ public void testEncryptedMessage() throws Exception {
 
       ByteArrayWritableChannel channel = new ByteArrayWritableChannel(data.length);
       TransportCipher.EncryptedMessage emsg = handler.createEncryptedMessage(buf);
-      while (emsg.transfered() < emsg.count()) {
-        emsg.transferTo(channel, emsg.transfered());
+      while (emsg.transferred() < emsg.count()) {
+        emsg.transferTo(channel, emsg.transferred());

Review comment:
       Yeah, looks like this was misspelled in netty and there is a new correctly-spelled version, which we can use safely.

##########
File path: core/src/main/resources/org/apache/spark/ui/static/dataTables.rowsGroup.js
##########
@@ -166,7 +166,7 @@ RowsGroup.prototype = {
 			this._mergeColumn(newSequenceRow, (iRow-1), columnsIndexesCopy)
 	},
 	
-	_toogleDirection: function(dir)
+	_toggleDirection: function(dir)

Review comment:
       Yeah I wouldn't bother fixing these; if we update it'll get overwritten anyway

##########
File path: R/pkg/tests/fulltests/test_sparkSQL.R
##########
@@ -2066,7 +2066,7 @@ test_that("higher order functions", {
     createDataFrame(data.frame(id = 1)),
     expr("CAST(array(1.0, 2.0, -3.0, -4.0) AS array<double>) xs"),
     expr("CAST(array(0.0, 3.0, 48.0) AS array<double>) ys"),
-    expr("array('FAILED', 'SUCCEDED') as vs"),
+    expr("array('FAILED', 'SUCCEEDED') as vs"),

Review comment:
       What's the issue here?

##########
File path: R/pkg/tests/fulltests/test_jvm_api.R
##########
@@ -20,11 +20,11 @@ context("JVM API")
 sparkSession <- sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE)
 
 test_that("Create and call methods on object", {
-  jarr <- sparkR.newJObject("java.util.ArrayList")
+  jarray <- sparkR.newJObject("java.util.ArrayList")

Review comment:
       I don't think you _have_ to change these as they're more abbreviation than typo, but if there aren't many, it's OK. 

##########
File path: core/src/test/scala/org/apache/spark/rpc/netty/NettyRpcEnvSuite.scala
##########
@@ -73,7 +73,7 @@ class NettyRpcEnvSuite extends RpcEnvSuite with MockitoSugar with TimeLimits {
 
     val nettyEnv = env.asInstanceOf[NettyRpcEnv]
     val client = mock[TransportClient]
-    val senderAddress = RpcAddress("locahost", 12345)
+    val senderAddress = RpcAddress("localhost", 12345)

Review comment:
       LOL may not matter for the test but yes

##########
File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala
##########
@@ -58,10 +58,10 @@ private[sql] class AvroDeserializer(
 
   private lazy val decimalConversions = new DecimalConversion()
 
-  private val dateRebaseFunc = DataSourceUtils.creteDateRebaseFuncInRead(
+  private val dateRebaseFunc = DataSourceUtils.createDateRebaseFuncInRead(
     datetimeRebaseMode, "Avro")
 
-  private val timestampRebaseFunc = DataSourceUtils.creteTimestampRebaseFuncInRead(
+  private val timestampRebaseFunc = DataSourceUtils.createTimestampRebaseFuncInRead(

Review comment:
       LOL I'm sure it is not. Just like the POSIX function `creat` was a mistake so many years ago.

##########
File path: common/network-common/src/main/java/org/apache/spark/network/client/TransportClient.java
##########
@@ -303,7 +303,7 @@ public void close() {
   @Override
   public String toString() {
     return new ToStringBuilder(this, ToStringStyle.SHORT_PREFIX_STYLE)
-      .append("remoteAdress", channel.remoteAddress())
+      .append("remoteAddress", channel.remoteAddress())

Review comment:
       This is probably fine; it's just the string representation and I can't see anyone relying on the toString of TransportClient

##########
File path: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
##########
@@ -93,7 +93,7 @@
    * Maximum time (in ms) to wait for a child process to connect back to the launcher server
    * when using @link{#start()}.
    */
-  public static final String CHILD_CONNECTION_TIMEOUT = "spark.launcher.childConectionTimeout";
+  public static final String CHILD_CONNECTION_TIMEOUT = "spark.launcher.childConnectionTimeout";

Review comment:
       Yeah... unfortunately I would leave this as is here. We can deprecate this and recognize the correctly spelled version, but that could be separate.

##########
File path: project/MimaExcludes.scala
##########
@@ -1488,7 +1488,7 @@ object MimaExcludes {
       ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.SparkEnv.getThreadLocal"),
       ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.rdd.RDDFunctions.treeReduce"),
       ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.rdd.RDDFunctions.treeAggregate"),
-      ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.configuration.Strategy.defaultStategy"),
+      ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.mllib.tree.configuration.Strategy.defaultStrategy"),

Review comment:
       Likewise.

##########
File path: python/pyspark/mllib/clustering.py
##########
@@ -843,7 +843,7 @@ def setInitialCenters(self, centers, weights):
     @since('1.5.0')
     def setRandomCenters(self, dim, weight, seed):
         """
-        Set the initial centres to be random samples from
+        Set the initial centers to be random samples from

Review comment:
       Yes it's British spelling. We generally use US spelling. This is OK but also OK to ignore.

##########
File path: python/pyspark/cloudpickle/cloudpickle_fast.py
##########
@@ -556,7 +556,7 @@ def dump(self, obj):
         # `dispatch` attribute.  Earlier versions of the protocol 5 CloudPickler
         # used `CloudPickler.dispatch` as a class-level attribute storing all
         # reducers implemented by cloudpickle, but the attribute name was not a
-        # great choice given the meaning of `Cloudpickler.dispatch` when
+        # great choice given the meaning of `Cloudpickle.dispatch` when

Review comment:
       This is correct already it seems. CloudPickler is a thing

##########
File path: mllib/src/main/scala/org/apache/spark/ml/feature/Selector.scala
##########
@@ -77,7 +77,7 @@ private[feature] trait SelectorParams extends Params
    * @group param
    */
   @Since("3.1.0")
-  final val fpr = new DoubleParam(this, "fpr", "The higest p-value for features to be kept.",
+  final val fpr = new DoubleParam(this, "fpr", "The highest p-value for features to be kept.",

Review comment:
       Not in any meaningful API sense. This is fine, just docs

##########
File path: project/MimaExcludes.scala
##########
@@ -730,7 +730,7 @@ object MimaExcludes {
     ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment"),
 
     // [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
-    ProblemFilters.exclude[FinalMethodProblem]("org.apache.spark.network.util.AbstractFileRegion.transfered"),
+    ProblemFilters.exclude[FinalMethodProblem]("org.apache.spark.network.util.AbstractFileRegion.transferred"),

Review comment:
       This you probably need to leave as is

##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
##########
@@ -566,7 +566,7 @@ object RewriteCorrelatedScalarSubquery extends Rule[LogicalPlan] with AliasHelpe
                 subqueryRoot = Project(projList ++ havingInputs, subqueryRoot)
               case s @ SubqueryAlias(alias, _) =>
                 subqueryRoot = SubqueryAlias(alias, subqueryRoot)
-              case op => sys.error(s"Unexpected operator $op in corelated subquery")
+              case op => sys.error(s"Unexpected operator $op in correlated subquery")

Review comment:
       It's fine.

##########
File path: python/pyspark/ml/regression.py
##########
@@ -1442,7 +1442,7 @@ def setParams(self, *, featuresCol="features", labelCol="label", predictionCol="
                   maxDepth=5, maxBins=32, minInstancesPerNode=1, minInfoGain=0.0,
                   maxMemoryInMB=256, cacheNodeIds=False, subsamplingRate=1.0,
                   checkpointInterval=10, lossType="squared", maxIter=20, stepSize=0.1, seed=None,
-                  impuriy="variance", featureSubsetStrategy="all", validationTol=0.01,
+                  impurity="variance", featureSubsetStrategy="all", validationTol=0.01,

Review comment:
       Yikes, looks like a bug. I'm not sure this would have had effect before, as it mismatches the param name. @HyukjinKwon are there implications to changing this in python?

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
##########
@@ -486,7 +486,7 @@ public void readDoubles(
     }
   }
 
-  public void readBinarys(
+  public void readBinaries(

Review comment:
       Yeah I wouldn't change this here (unfortunately) even though it's probably effectively private to Spark




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org