You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/07/19 15:43:08 UTC

How to output "unknown" for a CSV field containing a dash ( - ) symbol?

Hello DFDL community,

My input is a comma-separated value (CSV) file about automobiles:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",2999.99
1999,Chevy,Venture Extended Edition,,4900.00
1999,-,Venture Extended Edition,Very Large,5000.00
1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00

Notice that the 4th line has a dash ( - ) in the Make field. The dash denotes "no data available." For that field, I want my DFDL schema to generate this XML:

<Make>unknown</Make>

Truthfully, I don't have any idea how to create a DFDL schema to do this. Would you give me some suggestions on how to do this, please?

/Roger

Re: How to output "unknown" for a CSV field containing a dash ( - ) symbol?

Posted by "Sloane, Brandon" <bs...@tresys.com>.
The standard approach is to have a (possibly hidden) raw element, and use inputValueCalc with an expression such as { if(../raw eq '-' then 'unknown' else ../raw }.


You could also take advantage of the new (experimental) typeCalc feature and define a type such as:


<xs:simpleType name="make"
     dfdlx:inputValueCalc ="{ if(dfdlx:repTypeValue() eq '-' then 'unknown' else dfdlx:repTypeValue() }"
     dfdlx:ouputValueCalc ="{ if(dfdlx:logicalTypeValue() eq 'unknown' then '-' else dfdlx:logicalTypeValue() }"
     dfdlx:repType=”xs:string”>
  <xs:restriction base=”xs:string” />
</xs:simpleType>


In theory, there should be a more declarative way of doing this by leveraging the enumeration support portion of the proposal, but I do not think we have a mechanism to specify "everything but '-'", which is what it seems like you need here.



________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Friday, July 19, 2019 11:43:08 AM
To: users@daffodil.apache.org <us...@daffodil.apache.org>
Subject: How to output "unknown" for a CSV field containing a dash ( - ) symbol?

Hello DFDL community,

My input is a comma-separated value (CSV) file about automobiles:

Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",2999.99
1999,Chevy,Venture Extended Edition,,4900.00
1999,-,Venture Extended Edition,Very Large,5000.00
1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00

Notice that the 4th line has a dash ( - ) in the Make field. The dash denotes "no data available." For that field, I want my DFDL schema to generate this XML:

<Make>unknown</Make>

Truthfully, I don't have any idea how to create a DFDL schema to do this. Would you give me some suggestions on how to do this, please?

/Roger