You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/03/29 18:34:53 UTC

[GitHub] [daffodil] tuxji commented on a change in pull request #768: Daffodil 2663 pluggable charsets

tuxji commented on a change in pull request #768:
URL: https://github.com/apache/daffodil/pull/768#discussion_r837671129



##########
File path: daffodil-io/src/main/resources/META-INF/services/org.apache.daffodil.processors.charset.CharsetCompiler
##########
@@ -0,0 +1,54 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+
+org.apache.daffodil.processors.charset.ASCIICompiler
+org.apache.daffodil.processors.charset.ASCIICompilerAlias
+org.apache.daffodil.processors.charset.BitsCharsetAISPayloadArmoringCompiler

Review comment:
       I have looked at this pull request as carefully as I had time to do and I think there are better ways to name some things.  I know that Mike suggested doing the same thing as daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/Layer* but I think copying `Compiler` and `Transformer` from the Layer* classes is going too far.  These names make some sense for layer classes which transform data, but make less sense for classes which only create new `BitsCharset` objects.  
   
   I suggest that we use this naming strategy instead:
   
   `BitsCharset` - Daffodil's primary abstraction for dealing with character sets
   `BitsCharsetFactory` - Daffodil's abstraction for creating a new `BitsCharset` (no `Transformer`)
   `BitsCharsetDefinition` - Daffodil's abstraction for creating a new `BitsCharsetFactory` (not `CharsetCompiler`)
   
   `BitsCharsetUSASCII` - charset for "US-ASCII" and "ASCII" character sets
   `BitsCharsetUSASCIIFactory` - factory for creating a new `BitsCharsetUSACII` (no `Transformer`)
   `BitsCharsetUSASCIIDefinition` - definition for creating a new `BitsCharsetUSASCIIFactory` from "US-ASCII"
   `BitsCharsetUSASCIIAliasDefinition` - alias definition for creating a new `BitsCharsetUSASCIIFactory` from "ASCII"
   
   Given this new naming strategy, I would make the first three lines of this file look like this and replace `Compiler` with `Definition` in the rest of the lines:
   
   ```
   org.apache.daffodil.processors.charset.BitsCharsetUSASCIIDefinition
   org.apache.daffodil.processors.charset.BitsCharsetUSASCIIAliasDefinition
   org.apache.daffodil.processors.charset.BitsCharsetAISPayloadArmoringDefinition
   ```
   
   Then I would sort all the definitions' lines alphabetically to establish a standard ordering of definitions in this file.  Finally, I would rename this file from `CharsetCompiler` to `BitsCharsetDefinition`.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/AISPayloadArmoring.scala
##########
@@ -44,3 +44,19 @@ object BitsCharsetAISPayloadArmoring extends {
   override val replacementCharCode = 0x30
   override val requiredBitOrder = BitOrder.MostSignificantBitFirst
 } with BitsCharsetNonByteSize
+
+final class BitsCharsetAISPayloadArmoringCompiler
+  extends CharsetCompiler("X-DAFFODIL-AIS-PAYLOAD-ARMORING") {
+
+  override def compileCharset() = {
+    new BitsCharsetAISPayloadArmoringTransformerFactory(name)

Review comment:
       Given our new naming strategy, I would make these lines look like this,
   
   ```scala
   final class BitsCharsetAISPayloadArmoringDefinition
     extends BitsCharsetDefinition("X-DAFFODIL-AIS-PAYLOAD-ARMORING") {
   
     override def newFactory() = {
       new BitsCharsetAISPayloadArmoringFactory
     }
   }
   ```
   
   Note that I would not pass `name` to the factory because I see no justification for passing it.  The factory only creates one charset statically chosen at compile time; it's not supposed to look at `name` and create one of several possible charsets dynamically chosen at runtime.  Do you agree, Mike?

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetCompiler.scala
##########
@@ -0,0 +1,35 @@
+

Review comment:
       Please rename this file from 'CharsetCompiler.scala' to 'BitsCharsetDefinition.scala' and remove the blank first line.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetTransformerFactory.scala
##########
@@ -0,0 +1,33 @@
+

Review comment:
       Please rename this file from `BitsCharsetTransformerFactory.scala` to `BitsCharsetFactory.scala` and remove the first line (it's just a blank line).

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/AISPayloadArmoring.scala
##########
@@ -44,3 +44,19 @@ object BitsCharsetAISPayloadArmoring extends {
   override val replacementCharCode = 0x30
   override val requiredBitOrder = BitOrder.MostSignificantBitFirst
 } with BitsCharsetNonByteSize
+
+final class BitsCharsetAISPayloadArmoringCompiler
+  extends CharsetCompiler("X-DAFFODIL-AIS-PAYLOAD-ARMORING") {
+
+  override def compileCharset() = {
+    new BitsCharsetAISPayloadArmoringTransformerFactory(name)
+  }
+}
+
+class BitsCharsetAISPayloadArmoringTransformerFactory(name: String)

Review comment:
       ```scala
   final class BitsCharsetAISPayloadArmoringFactory()
   ```
   
   I would add 'final' (unless there is a reason to subclass factories?), remove `Transformer`, and remove `name: String` since I see no reason to pass a string that's never used inside the class and I can't imagine it would be useful for debugging either.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetCompilerRegistry.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+import org.apache.daffodil.exceptions.ThrowsSDE
+import org.apache.daffodil.util.SimpleNamedServiceLoader
+
+object CharsetCompilerRegistry {

Review comment:
       Likewise, rename `CharsetCompilerRegistry` to `BitsCharsetDefinitionRegistry`.

##########
File path: daffodil-runtime1/src/main/resources/META-INF/services/org.apache.daffodil.charsets.CharsetCompiler
##########
@@ -0,0 +1,31 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+# PLEASE NOTE:
+#
+# These are in src/test/resources not src/main/resources because they're not
+# "real" in the sense of being fully featured. E.g., AIS only does the payload
+# armoring, not everything else about AIS. IPv4 just does the IPv4 header, not
+# TCP, UDP, ICMP, etc. all of which have related checksum algorithms that should
+# share code. A real CheckDigits has to have different layer length kinds supported
+# so that one can compute check digits from elements with different
+# of length kinds.
+#
+# These are partial/subsets of "real" functionality that give us test coverage
+# of the layering features of Daffodil. They can be reused to create "real"
+# layering, but one should expect to have to adapt their code substantially.
+#
+org.apache.daffodil.processors.charset.CustomNonByteSizeCharsetCompiler
+org.apache.daffodil.processors.charset.CustomJavaCharsetCompiler

Review comment:
       Rename file from `CharsetCompiler` to `BitsCharsetDefinition`.  Replace `Compiler` with `Definition`.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetCompilerRegistry.scala
##########
@@ -0,0 +1,57 @@
+/*

Review comment:
       Suggest renaming this class from `CharsetCompilerRegistry.scala` to `BitsCharsetDefinitionRegistry.scala`.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/Base4.scala
##########
@@ -31,10 +31,42 @@ object BitsCharsetBase4LSBF extends {
   override val requiredBitOrder = BitOrder.LeastSignificantBitFirst
 } with BitsCharsetNonByteSize
 
+final class BitsCharsetBase4LSBFCompiler
+  extends CharsetCompiler("X-DFDL-BASE4-LSBF") {
+
+  override def compileCharset() = {
+    new BitsCharsetBase4LSBFTransformerFactory(name)
+  }
+}
+
+class BitsCharsetBase4LSBFTransformerFactory(name: String)

Review comment:
       Likewise, I would make similar changes to the rest of the new definitions in this pull request (`Compiler` -> `Definition`, `compileCharset` -> `newFactory`, `TransfomerFactory` -> `Factory`, add `final`, remove `name`):
   
   ```scala
   final class BitsCharsetBase4LSBFDefinition
     extends BitsCharsetDefinition("X-DFDL-BASE4-LSBF") {
   
     override def newFactory() = {
       new BitsCharsetBase4LSBFFactory
     }
   }
   
   final class BitsCharsetBase4LSBFFactory()

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetTransformerFactory.scala
##########
@@ -0,0 +1,33 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.processors.charset
+
+/**
+ * Factory for a charset transformer.
+ *
+ * This is the serialized object which is saved as part of a processor.
+ * It constructs the charset transformer at runtime when newInstance() is called.
+ *
+ * This must be implmented as part of implementation of a daffodil charset.

Review comment:
       Suggest rewriting as
   
   ```
    * Factory for a Daffodil charset.
    *
    * This is the serialized object which is saved as part of a processor.
    * It constructs a charset at runtime when newInstance() is called.
    *
    * This must be implemented as part of implementing a Daffodil charset.
    ```

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetCompilerRegistry.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+import org.apache.daffodil.exceptions.ThrowsSDE
+import org.apache.daffodil.util.SimpleNamedServiceLoader
+
+object CharsetCompilerRegistry {
+
+  private lazy val charsetCompilerMap: Map[String, CharsetCompiler] =
+    SimpleNamedServiceLoader.loadClass[CharsetCompiler](classOf[CharsetCompiler])
+
+  /**
+   * Given name, finds the factory for the transformer. SDE otherwise.
+   *
+   * The state is passed in order to provide diagnostic context if not found.
+   */
+  def find(name: String, context: ThrowsSDE): CharsetCompiler = {
+    val optCompiler: Option[CharsetCompiler] = charsetCompilerMap.get(name)
+    if (optCompiler.isEmpty) {
+      val choices = charsetCompilerMap.keySet.mkString(", ")
+      context.SDE("The encoding '%s' was not found. Available choices are: %s", name, choices)
+    } else {
+      optCompiler.get
+    }
+  }
+
+    def find(name: String): CharsetCompiler = {
+    val optCompiler: Option[CharsetCompiler] = charsetCompilerMap.get(name)
+    if (optCompiler.isEmpty) {
+      val choices = charsetCompilerMap.keySet.mkString(", ")
+      SDE("The encoding '%s' was not found. Available choices are: %s", name, choices)
+    } else {
+      optCompiler.get
+    }
+  }
+
+      def SDE(id: String, args: Any*): Nothing = {
+      throw new Exception(id.format(args: _*))
+    }

Review comment:
       Fix indentation.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetCompilerRegistry.scala
##########
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+import org.apache.daffodil.exceptions.ThrowsSDE
+import org.apache.daffodil.util.SimpleNamedServiceLoader
+
+object CharsetCompilerRegistry {
+
+  private lazy val charsetCompilerMap: Map[String, CharsetCompiler] =

Review comment:
       Change to
   
   ```scala
     private lazy val definitionMap: Map[String, BitsCharsetDefinition] =
       SimpleNamedServiceLoader.loadClass[BitsCharsetDefinition](classOf[BitsCharsetDefinition])
   ```
   
   Update rest of file likewise.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetUtils.scala
##########
@@ -33,11 +33,10 @@ object CharsetUtils {
    * encodings as well as the standard ones.
    */
   def getCharset(name: String): BitsCharset = {
-    val cs = DaffodilCharsetProvider.charsetForName(name)
-    cs
+    CharsetCompilerRegistry.find(name).compileCharset.newInstance()

Review comment:
       Will become
   
   ```scala
       BitsCharsetDefinitionRegistry.find(name).newFactory().newInstance()
   ```
   
   Not sure if Scala conventions would keep empty parentheses or not.

##########
File path: daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.processors.charset.CharsetCompiler
##########
@@ -0,0 +1,18 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+
+org.apache.daffodil.charsets.BitsCharset_ISO_8859_1_Reverse_Compiler
+org.apache.daffodil.charsets.BitsCharsetTest_ISO_8859_13_Compiler

Review comment:
       Rename file from CharsetCompiler to BitsCharsetDefinition. Replace Compiler with Definition.

##########
File path: daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/CharsetCompiler.scala
##########
@@ -0,0 +1,35 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.processors.charset
+
+/**
+ * Must be implemented by all charsets.
+ *
+ * These are the classes which must be dynamically loaded in order to add a charset implementation
+ * to Daffodil.
+ *
+ * These instances are NOT serialized as part of a saved processor. The compileCharset method
+ * is called and the resulting CharsetTransformerFactory is the serialized object.
+ */
+abstract class CharsetCompiler(nom: String) {
+
+  def name() = nom
+
+  def compileCharset(): BitsCharsetFactory
+
+}

Review comment:
       Suggest writing as follows
   
   ```scala
   /**
    * Defines a class which must be dynamically loaded in order to add a charset implementation
    * to Daffodil.  All charsets must implement this class.
    *
    * A saved processor does NOT serialize this class. It calls the newFactory method and serializes
    * the resulting BitsCharsetFactory.
    */
   abstract class BitsCharsetDefinition(nom: String) {
     def name() = nom
   
     def newFactory(): BitsCharsetFactory
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org