You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@daffodil.apache.org by "Costello, Roger L." <co...@mitre.org> on 2019/11/10 14:27:58 UTC

How to check that every record in a CSV file has the same number of fields?

Hi Folks,

The RFC for CSV says that every record in a CSV file should have the same number of fields. That check is easily expressed in XPath:

every $i in *[position() gt 1] satisfies count($i/*) eq count($i/preceding-sibling::*[1]/*)

So I added that XPath expression in an assert for the root element (csv):

<xs:element name="csv">
    <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:assert test="{ every $i in *[position() gt 1] satisfies fn:count($i/*) eq fn:count($i/preceding-sibling::*[1]/*) }"
                message="{'Each record should contain the same number of fields.'}" />
        </xs:appinfo>
    </xs:annotation>
    ...

But that yields this error message:

[error] Schema Definition Error: Unable to parse expression. Message: '}' expected but '$' found

What is the correct way to implement the check that all records have the same number of fields?

/Roger


Re: How to check that every record in a CSV file has the same number of fields?

Posted by Steve Lawrence <sl...@apache.org>.
Another option is to use dfdl:occursCount with an expression. That's
what we do in the csv schemas on DFDLSchemas:

https://github.com/DFDLSchemas/CSV/blob/master/src/main/resources/com/tresys/csv/xsd/csvHeaderEnforced.dfdl.xsd

Note, when you complete your CSV schema, it would be nice if you could
add it to the DFDLSchemas CSV repo. The goal for this repo is to contain
schemas for all the different "variants" of CSV, so an RFC compliant
schema would be very welcome.


On 11/10/19 4:34 PM, Sloane, Brandon wrote:
> DFDL only supports a limited subset of XPath. In particular, it does not support 
> any looping construct like you are using in your XPath.
> 
> What I would do instead is put an assertion on each record, checking that it has 
> the same number of fields as the first record.
> --------------------------------------------------------------------------------
> *From:* Costello, Roger L. <co...@mitre.org>
> *Sent:* Sunday, November 10, 2019 9:27 AM
> *To:* users@daffodil.apache.org <us...@daffodil.apache.org>
> *Subject:* How to check that every record in a CSV file has the same number of 
> fields?
> 
> Hi Folks,
> 
> The RFC for CSV says that every record in a CSV file should have the same number 
> of fields. That check is easily expressed in XPath:
> 
> every $i in *[position() gt 1] satisfies count($i/*) eq 
> count($i/preceding-sibling::*[1]/*)
> 
> So I added that XPath expression in an assert for the root element (csv):
> 
> <xs:elementname="csv">
> <xs:annotation>
> <xs:appinfosource="http://www.ogf.org/dfdl/">
> <dfdl:asserttest="{ every $i in *[position() gt 1] satisfies fn:count($i/*) eq 
> fn:count($i/preceding-sibling::*[1]/*) }"
>                  message="{'Each record should contain the same number of 
> fields.'}"/>
> </xs:appinfo>
> </xs:annotation>
>      ...
> 
> But that yields this error message:
> 
> [error] Schema Definition Error: Unable to parse expression. Message: '}' 
> expected but '$' found
> 
> What is the correct way to implement the check that all records have the same 
> number of fields?
> 
> /Roger
> 


Re: How to check that every record in a CSV file has the same number of fields?

Posted by "Sloane, Brandon" <bs...@tresys.com>.
DFDL only supports a limited subset of XPath. In particular, it does not support any looping construct like you are using in your XPath.

What I would do instead is put an assertion on each record, checking that it has the same number of fields as the first record.
________________________________
From: Costello, Roger L. <co...@mitre.org>
Sent: Sunday, November 10, 2019 9:27 AM
To: users@daffodil.apache.org <us...@daffodil.apache.org>
Subject: How to check that every record in a CSV file has the same number of fields?


Hi Folks,



The RFC for CSV says that every record in a CSV file should have the same number of fields. That check is easily expressed in XPath:



every $i in *[position() gt 1] satisfies count($i/*) eq count($i/preceding-sibling::*[1]/*)



So I added that XPath expression in an assert for the root element (csv):



<xs:element name="csv">
    <xs:annotation>
        <xs:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:assert test="{ every $i in *[position() gt 1] satisfies fn:count($i/*) eq fn:count($i/preceding-sibling::*[1]/*) }"
                message="{'Each record should contain the same number of fields.'}" />
        </xs:appinfo>
    </xs:annotation>
    ...



But that yields this error message:



[error] Schema Definition Error: Unable to parse expression. Message: '}' expected but '$' found



What is the correct way to implement the check that all records have the same number of fields?



/Roger