You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/02/04 13:03:31 UTC

Best practice for replacing a numeric value with a symbolic value?

Hello DFDL community,

I am working on a DFDL schema for Windows EXE files.

There are many places in my DFDL schema where I replace a numeric value with a symbolic.

For example, there is one field called "Storage Class". The specification enumerates a couple dozen numeric values for this field and what each value means:

IMAGE_SYM_CLASS_NULL (0) No assigned storage class

IMAGE_SYM_CLASS_AUTOMATIC (1) The automatic (stack) variable. The Value field specifies the stack frame offset.

IMAGE_SYM_CLASS_EXTERNAL (2) The Value field indicates the size if the section number is IMAGE_SYM_UNDEFINED (0). If the section number is not zero, then the Value field specifies the offset within the section.

IMAGE_SYM_CLASS_STATIC (3) The offset of the symbol within the section. If the Value field is zero, then the symbol represents a section name.
...

Do you have a recommendation for the generated XML? Which of the following is best practice?

(a) 	<Storage_Class>0</Storage_Class>

(b) 	<Storage_Class>IMAGE_SYM_CLASS_NULL</Storage_Class>

(c) 	<Storage_Class>IMAGE_SYM_CLASS_NULL (0)</Storage_Class>

(d) 	<Storage_Class>No assigned storage class</Storage_Class>

(e)	<Storage_Class>IMAGE_SYM_CLASS_NULL (0) No assigned storage class</Storage_Class>

(f) 	Other (what?)

I am eager to hear your thoughts on this.

/Roger

Re: Best practice for replacing a numeric value with a symbolic value?

Posted by Steve Lawrence <sl...@apache.org>.
I think it really depends on who will use the XML infoset and how they
plan to use it. If everyone in the world knows that 0 means
IMAGE_SYMCLASS_NULL, then maybe you don't need the converted value or
description. Or if no one would ever know what 0 means maybe it makes
sense to only have the description. We've come across formats where
people actually do care about the raw value because that's what they
know and what they are use to, but the converted value is more useful in
certain calculations, so we end up including both the raw and logical
values.

Some things to consider:

- Certain field types may be easier to filter on than others. For
example, numeric values can be compared in ranges. Maybe someone only
cares about fields greater than 2--maintaining the numeric values helps
with that.

- Sometimes multiple numeric values map to a single logical value. For
example, maybe 0-4 have unique meanings, but 5-15 just mean
"ILLEGAL_VALUE". If you hide the numeric value, that might pose
difficulties in unparsing so you've lost that information--you now need
to use outputValueCalc to determine which numeric value to unparse when
the logical value is ILLEGAL_VALUE. Maybe there's an obvious answer, but
maybe not.

It might make sense to use separate elements for each type of
identifier. For example, maybe something like this would be more useful:

  <Storage_Class>0</Storage_Class>
  <Storage_Class_Desc_Short>
    IMAGE_SYM_CLASS_NULL
  </Storage_Class_Desc_Short>
  <Storage_Class_Desc_Long>
    No assigned storage class
  </Storage_Class_Desc_Long>

The benefit to this is that a user could query and extract exactly what
they want without having to do any string processing.

- Steve

On 2/4/19 8:03 AM, Costello, Roger L. wrote:
> Hello DFDL community,
> 
> I am working on a DFDL schema for Windows EXE files.
> 
> There are many places in my DFDL schema where I replace a numeric value with a symbolic.
> 
> For example, there is one field called "Storage Class". The specification enumerates a couple dozen numeric values for this field and what each value means:
> 
> IMAGE_SYM_CLASS_NULL (0) No assigned storage class
> 
> IMAGE_SYM_CLASS_AUTOMATIC (1) The automatic (stack) variable. The Value field specifies the stack frame offset.
> 
> IMAGE_SYM_CLASS_EXTERNAL (2) The Value field indicates the size if the section number is IMAGE_SYM_UNDEFINED (0). If the section number is not zero, then the Value field specifies the offset within the section.
> 
> IMAGE_SYM_CLASS_STATIC (3) The offset of the symbol within the section. If the Value field is zero, then the symbol represents a section name.
> ...
> 
> Do you have a recommendation for the generated XML? Which of the following is best practice?
> 
> (a) 	<Storage_Class>0</Storage_Class>
> 
> (b) 	<Storage_Class>IMAGE_SYM_CLASS_NULL</Storage_Class>
> 
> (c) 	<Storage_Class>IMAGE_SYM_CLASS_NULL (0)</Storage_Class>
> 
> (d) 	<Storage_Class>No assigned storage class</Storage_Class>
> 
> (e)	<Storage_Class>IMAGE_SYM_CLASS_NULL (0) No assigned storage class</Storage_Class>
> 
> (f) 	Other (what?)
> 
> I am eager to hear your thoughts on this.
> 
> /Roger
>