You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by GitBox <gi...@apache.org> on 2022/03/10 21:59:05 UTC

[GitHub] [daffodil] tuxji commented on a change in pull request #768: Daffodil 2663 pulggable charsets

tuxji commented on a change in pull request #768:
URL: https://github.com/apache/daffodil/pull/768#discussion_r824191920



##########
File path: daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.charsets.CharsetCompiler
##########
@@ -0,0 +1,31 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+# PLEASE NOTE:
+#
+# These are in src/test/resources not src/main/resources because they're not
+# "real" in the sense of being fully featured. E.g., AIS only does the payload
+# armoring, not everything else about AIS. IPv4 just does the IPv4 header, not
+# TCP, UDP, ICMP, etc. all of which have related checksum algorithms that should
+# share code. A real CheckDigits has to have different layer length kinds supported
+# so that one can compute check digits from elements with different
+# of length kinds.
+#
+# These are partial/subsets of "real" functionality that give us test coverage
+# of the layering features of Daffodil. They can be reused to create "real"
+# layering, but one should expect to have to adapt their code substantially.

Review comment:
       Those comments were pasted here from another place and don't belong here, or should be updated to talk only about your test charsets.

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/charsets/CharsetTransformerFactory.scala
##########
@@ -0,0 +1,33 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.charsets
+
+/**
+ * Factory for a charset transformer.
+ *
+ * This is the serialized object which is saved as part of a processor.
+ * It constructs the charset transformer at runtime when newInstance() is called.
+ *
+ * This must be implmented as part of implementation of a daffodil charset.
+ */
+abstract class CharsetTransformerFactory(val name: String)
+  extends Serializable {
+
+  def newInstance(): CharsetTransformer

Review comment:
       Mike said `CharsetTransformer` should simply be `BitsCharset`, so you probably want to rename `CharsetTransformerFactory` to `BitsCharsetFactory`.

##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.dfdl.xsd
##########
@@ -0,0 +1,69 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:org.apache.daffodil.charsets.CustomCharset"
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset"
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+
+  <include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+  <annotation>
+    <appinfo source="http://www.ogf.org/dfdl/">
+      <dfdl:format ref="tns:GeneralFormat" initiator="" terminator="" leadingSkip="0" trailingSkip="0" textBidi="no" floating="no" encoding="ASCII" byteOrder="bigEndian" alignment="implicit" alignmentUnits="bits" fillByte="f" occursCountKind="implicit" truncateSpecifiedLengthString="no" ignoreCase="no" representation="text" lengthKind="delimited" nilValueDelimiterPolicy="both" emptyValueDelimiterPolicy="none" documentFinalTerminatorCanBeMissing="yes" initiatedContent="no" separatorSuppressionPolicy="anyEmpty" separatorPosition="infix" textTrimKind="none" />

Review comment:
       Please override only properties that really need to be different than GeneralFormat's properties.

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/charsets/CharsetCompilerRegistry.scala
##########
@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.daffodil.charsets
+
+import org.apache.daffodil.exceptions.ThrowsSDE
+import org.apache.daffodil.util.SimpleNamedServiceLoader
+import org.apache.daffodil.processors.charset.CharsetUtils
+
+
+/**
+ * Transformers have factories. This lets you find the transformer factory
+ * by the name obtained from dfdlx:layerTransform.
+ */
+object CharsetCompilerRegistry {
+
+  private lazy val charsetCompilerMap: Map[String, CharsetCompiler] =
+    SimpleNamedServiceLoader.loadClass[CharsetCompiler](classOf[CharsetCompiler])
+
+  /**
+   * Given name, finds the factory for the transformer. SDE otherwise.
+   *
+   * The state is passed in order to provide diagnostic context if not found.
+   */
+  def find(name: String, context: ThrowsSDE): CharsetCompiler = {
+    val maybeCompiler: Option[CharsetCompiler] = charsetCompilerMap.get(name)
+    if (maybeCompiler.isEmpty) {
+      val choices = charsetCompilerMap.keySet.mkString(", ")
+      context.SDE("The charset '%s' was not found. Available choices are: %s", name, choices + ", " + CharsetUtils.supportedEncodingsString)

Review comment:
       Mike also said he would like Daffodil's current charsets re-implemented as pluggable charsets, so you'll probably need to remove `CharsetUtils.supportedEncodingsString` from this line.

##########
File path: daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.charsets.CharsetCompiler
##########
@@ -0,0 +1,31 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+# PLEASE NOTE:
+#
+# These are in src/test/resources not src/main/resources because they're not
+# "real" in the sense of being fully featured. E.g., AIS only does the payload
+# armoring, not everything else about AIS. IPv4 just does the IPv4 header, not
+# TCP, UDP, ICMP, etc. all of which have related checksum algorithms that should
+# share code. A real CheckDigits has to have different layer length kinds supported
+# so that one can compute check digits from elements with different
+# of length kinds.
+#
+# These are partial/subsets of "real" functionality that give us test coverage
+# of the layering features of Daffodil. They can be reused to create "real"
+# layering, but one should expect to have to adapt their code substantially.

Review comment:
       Those comments were pasted here from another place and don't belong here, or should be updated to talk only about your test charsets.

##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.dfdl.xsd
##########
@@ -0,0 +1,69 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:org.apache.daffodil.charsets.CustomCharset"
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset"
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:xsd="http://www.w3.org/2001/XMLSchema"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
+
+  <include schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+  <annotation>
+    <appinfo source="http://www.ogf.org/dfdl/">
+      <dfdl:format ref="tns:GeneralFormat" initiator="" terminator="" leadingSkip="0" trailingSkip="0" textBidi="no" floating="no" encoding="ASCII" byteOrder="bigEndian" alignment="implicit" alignmentUnits="bits" fillByte="f" occursCountKind="implicit" truncateSpecifiedLengthString="no" ignoreCase="no" representation="text" lengthKind="delimited" nilValueDelimiterPolicy="both" emptyValueDelimiterPolicy="none" documentFinalTerminatorCanBeMissing="yes" initiatedContent="no" separatorSuppressionPolicy="anyEmpty" separatorPosition="infix" textTrimKind="none" />

Review comment:
       Please override only properties that really need to be different than GeneralFormat's properties, and define them on multiple lines for readability.

##########
File path: daffodil-runtime1/src/main/scala/org/apache/daffodil/charsets/CharsetTransformerFactory.scala
##########
@@ -0,0 +1,33 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.charsets
+
+/**
+ * Factory for a charset transformer.
+ *
+ * This is the serialized object which is saved as part of a processor.
+ * It constructs the charset transformer at runtime when newInstance() is called.
+ *
+ * This must be implmented as part of implementation of a daffodil charset.
+ */
+abstract class CharsetTransformerFactory(val name: String)
+  extends Serializable {
+
+  def newInstance(): CharsetTransformer

Review comment:
       Mike said `CharsetTransformer` should simply be `BitsCharset`, so you probably want to rename `CharsetTransformerFactory` to `BitsCharsetFactory`.

##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.tdml
##########
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+  xmlns:fn="http://www.w3.org/2005/xpath-functions"
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+  xmlns:ex="http://example.com"
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset" defaultRoundTrip="none">
+
+  <tdml:parserTestCase name="parse_charsets" root="s1" model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd" roundTrip="none">

Review comment:
       We already have `defaultRoundTrip="none"` above so we don't need `roundTrip="none"` in each test.

##########
File path: daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.tdml
##########
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData"
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
+  xmlns:xs="http://www.w3.org/2001/XMLSchema"
+  xmlns:fn="http://www.w3.org/2005/xpath-functions"
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions"
+  xmlns:ex="http://example.com"
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset" defaultRoundTrip="none">
+
+  <tdml:parserTestCase name="parse_charsets" root="s1" model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd" roundTrip="none">

Review comment:
       We already have `defaultRoundTrip="none"` above so we don't need `roundTrip="none"` in each test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org