You are viewing a plain text version of this content. The canonical link for it is here.
Posted to c-dev@xerces.apache.org by "Heeg, Michael" <He...@fev.de> on 2004/05/17 08:51:09 UTC

AW: Strange problem with pattern (Xerces 2.5.0 crashes)

Hi everybody,

has anyone found a solution for my "pattern" problem? 

Regards,
Michael

> -----Ursprüngliche Nachricht-----
> Von: Heeg, Michael 
> Gesendet: Donnerstag, 15. April 2004 09:08
> An: 'xerces-c-dev@xml.apache.org'
> Betreff: Strange problem with pattern (Xerces 2.5.0 crashes)
> 
> 
> Hi,
> 
> I am using Xerces-C 2.5.0 in my MS Visual C++ application. 
> When validating
> XML files against a specified schema, the parser sometimes 
> crashes with an
> "unexpected exception". I found out that the reason for the 
> crashes is the
> following restriction of the schema (see "Body" element):
> 
> <xsd:complexType name="InputFileType">
> 	<xsd:sequence>
> 		<xsd:element name="Head" type="HeadType"/>
> 		<xsd:element name="Body">
> 			<xsd:simpleType>
> 				<xsd:restriction base="xsd:string">
> 					<xsd:pattern
> value="(\n*[0-9]*,[0-9]*,(\-*[0-9]*\.*[0-9]*,)*\-*[0-9]+\.*[0-
> 9]*;\n*)*"/>
> 				</xsd:restriction>
> 			</xsd:simpleType>
> 		</xsd:element>
> 	</xsd:sequence>
> </xsd:complexType>
> 
> The restriction is defined to validate <Body> tags like the following:
> 
> <Body>
> 0,10,0.199,10.199,0.008;
> 1,20,0.389,20.389,0.059;
> 2,30,0.565,30.565,0.180;
> 3,40,0.717,40.717,0.369;
> 4,50,0.841,50.841,0.596;
> 5,60,0.932,60.932,0.810;
> ....
> </Body>
> 
> The strange thing is: when the <Body> tag contains a large 
> amount of data,
> the validation of the restriction leads to the unexpected 
> exception. But:
> with a small amount of data, everything works fine. (Also: 
> when I delete the
> restriction from the schema, everything works fine.)
> 
> For me this looks like a Xerces bug?! Am I wrong? Any suggestions or
> comments?
> 
> Best regards,
> Michael
> 
> 
> P.S.: I know that the way we use this <Body> tag is not the 
> best way to
> handle csv-like data, but I had to do this because of an existing file
> format.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> 


Re: AW: Strange problem with pattern (Xerces 2.5.0 crashes)

Posted by la...@laguna1.com.
Hi Gareth,

I am not real knowledgeable on the xerces code base; I'm just a consumer
of xerces, but I monitor this list.  If xerces uses GNU regex (i.e.,
regcomp(3)), the problem may be in the call to regexec() itself.

We had a similar problem trying to use a regexp to parse the body of
emails, allowing all the <CR><LF> terminated lines that were not
followed by the .<CR><LF> termination sequence.  It worked with short
emails, but not with long ones.

As nearly as I can tell (having followed the regex code in gdb as a
regex non-expert), every time the regex parser matches a substring
subject to the final "*", it has to push a jumpback point onto its
internal stack.  Eventually, the stack becomes too big and regexec()
gets unpredictable.  Can't remember whether it was a stack overflow, a
hardcoded limit in the lib, or something else.

I know this is not a very satisfactory observation from the standpoint
of the fix.  We ultimately refactored the regex in the email case to
just finding each <CR><LF> terminated line; that is, we got rid of the
final "*".  Your bug reporter could implement a similar workaround by
making each of his ";" terminated lines a node.

Hope this helps avoid the chase of the wild goose.

Thanks for the great product!

Regards,
Mark

On Mon, 17 May 2004 01:01:27 -0700 (PDT)
Gareth Reakes <ga...@parthenoncomputing.com> wrote:

> Hey,
> 	 this is in my court. I have a minimal sample that reproduces
> 	 the
> problem. I have had a quick look at the code and saw nothing obvious. 
> I have some time scheduled for xerces work today and tomorrow. This is
> after the element from the wrong document being returned bug. If I
> cant fix it in that time then I will commit a bug with the minimal
> sample to see if anyone else wants a go.
> 
> Gareth
> 
> 
> On Mon, 17 May 2004, Heeg, Michael wrote:
> 
> > Hi everybody,
> >
> > has anyone found a solution for my "pattern" problem?
> >
> > Regards,
> > Michael
> >
> > > -----Ursprüngliche Nachricht-----
> > > Von: Heeg, Michael
> > > Gesendet: Donnerstag, 15. April 2004 09:08
> > > An: 'xerces-c-dev@xml.apache.org'
> > > Betreff: Strange problem with pattern (Xerces 2.5.0 crashes)
> > >
> > >
> > > Hi,
> > >
> > > I am using Xerces-C 2.5.0 in my MS Visual C++ application.
> > > When validating
> > > XML files against a specified schema, the parser sometimes
> > > crashes with an
> > > "unexpected exception". I found out that the reason for the
> > > crashes is the
> > > following restriction of the schema (see "Body" element):
> > >
> > > <xsd:complexType name="InputFileType">
> > > 	<xsd:sequence>
> > > 		<xsd:element name="Head" type="HeadType"/>
> > > 		<xsd:element name="Body">
> > > 			<xsd:simpleType>
> > > 				<xsd:restriction base="xsd:string">
> > > 					<xsd:pattern
> > > value="(\n*[0-9]*,[0-9]*,(\-*[0-9]*\.*[0-9]*,)*\-*[0-9]+\.*[0-
> > > 9]*;\n*)*"/>
> > > 				</xsd:restriction>
> > > 			</xsd:simpleType>
> > > 		</xsd:element>
> > > 	</xsd:sequence>
> > > </xsd:complexType>
> > >
> > > The restriction is defined to validate <Body> tags like the
> > > following:
> > >
> > > <Body>
> > > 0,10,0.199,10.199,0.008;
> > > 1,20,0.389,20.389,0.059;
> > > 2,30,0.565,30.565,0.180;
> > > 3,40,0.717,40.717,0.369;
> > > 4,50,0.841,50.841,0.596;
> > > 5,60,0.932,60.932,0.810;
> > > ....
> > > </Body>
> > >
> > > The strange thing is: when the <Body> tag contains a large
> > > amount of data,
> > > the validation of the restriction leads to the unexpected
> > > exception. But:
> > > with a small amount of data, everything works fine. (Also:
> > > when I delete the
> > > restriction from the schema, everything works fine.)
> > >
> > > For me this looks like a Xerces bug?! Am I wrong? Any suggestions
> > > or comments?
> > >
> > > Best regards,
> > > Michael
> > >
> > >
> > > P.S.: I know that the way we use this <Body> tag is not the
> > > best way to
> > > handle csv-like data, but I had to do this because of an existing
> > > file format.
> > >
> > > -----------------------------------------------------------------
> > > ---- To unsubscribe, e-mail:
> > > xerces-c-dev-unsubscribe@xml.apache.org For additional commands,
> > > e-mail: xerces-c-dev-help@xml.apache.org
> > >
> >
> >
> 
> -- 
> Gareth Reakes, Managing Director      Parthenon Computing
> +44-1865-811184                  http://www.parthcomp.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-c-dev-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org


Re: AW: Strange problem with pattern (Xerces 2.5.0 crashes)

Posted by Gareth Reakes <ga...@parthenoncomputing.com>.
Hey,
	 this is in my court. I have a minimal sample that reproduces the
problem. I have had a quick look at the code and saw nothing obvious.  I
have some time scheduled for xerces work today and tomorrow. This is after
the element from the wrong document being returned bug. If I cant fix it
in that time then I will commit a bug with the minimal sample to see if
anyone else wants a go.

Gareth


On Mon, 17 May 2004, Heeg, Michael wrote:

> Hi everybody,
>
> has anyone found a solution for my "pattern" problem?
>
> Regards,
> Michael
>
> > -----Urspr�ngliche Nachricht-----
> > Von: Heeg, Michael
> > Gesendet: Donnerstag, 15. April 2004 09:08
> > An: 'xerces-c-dev@xml.apache.org'
> > Betreff: Strange problem with pattern (Xerces 2.5.0 crashes)
> >
> >
> > Hi,
> >
> > I am using Xerces-C 2.5.0 in my MS Visual C++ application.
> > When validating
> > XML files against a specified schema, the parser sometimes
> > crashes with an
> > "unexpected exception". I found out that the reason for the
> > crashes is the
> > following restriction of the schema (see "Body" element):
> >
> > <xsd:complexType name="InputFileType">
> > 	<xsd:sequence>
> > 		<xsd:element name="Head" type="HeadType"/>
> > 		<xsd:element name="Body">
> > 			<xsd:simpleType>
> > 				<xsd:restriction base="xsd:string">
> > 					<xsd:pattern
> > value="(\n*[0-9]*,[0-9]*,(\-*[0-9]*\.*[0-9]*,)*\-*[0-9]+\.*[0-
> > 9]*;\n*)*"/>
> > 				</xsd:restriction>
> > 			</xsd:simpleType>
> > 		</xsd:element>
> > 	</xsd:sequence>
> > </xsd:complexType>
> >
> > The restriction is defined to validate <Body> tags like the following:
> >
> > <Body>
> > 0,10,0.199,10.199,0.008;
> > 1,20,0.389,20.389,0.059;
> > 2,30,0.565,30.565,0.180;
> > 3,40,0.717,40.717,0.369;
> > 4,50,0.841,50.841,0.596;
> > 5,60,0.932,60.932,0.810;
> > ....
> > </Body>
> >
> > The strange thing is: when the <Body> tag contains a large
> > amount of data,
> > the validation of the restriction leads to the unexpected
> > exception. But:
> > with a small amount of data, everything works fine. (Also:
> > when I delete the
> > restriction from the schema, everything works fine.)
> >
> > For me this looks like a Xerces bug?! Am I wrong? Any suggestions or
> > comments?
> >
> > Best regards,
> > Michael
> >
> >
> > P.S.: I know that the way we use this <Body> tag is not the
> > best way to
> > handle csv-like data, but I had to do this because of an existing file
> > format.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-c-dev-help@xml.apache.org
> >
>
>

-- 
Gareth Reakes, Managing Director      Parthenon Computing
+44-1865-811184                  http://www.parthcomp.com

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-c-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-c-dev-help@xml.apache.org