You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by "Michel, Victor" <mi...@amazon.com> on 2014/11/03 09:12:20 UTC

XMLStreamReader corrupting data

Hi all,

I'd like to report something that looks like a bug in the version of Xerces included in JRE 7u71/7u72/8u20/8u25
The StAX API seems to produce corrupted data, depending on how many bytes the underlying InputStream is actually reading at each invocation of read(byte[], int, int)

The following repro case will lead to different results depending on the version of the JRE. Am I doing something wrong?

Thanks,

Victor

------

import java.io.ByteArrayInputStream;
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;

import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamReader;

/*
 * Correct output (7u67,8u11)
 * rugs
 * 
 * Incorrect output (7u71,7u72,8u20,8u25)
 * bugs
 */
public class XmlReaderBug {

    private static final int BYTES_PER_READ = 6;

    private static final String XML =
        "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
        "<He likes=\"rugs\" because=\"they really tie the room together\"/>";

    public static void main(String[] args) throws Exception {
        final InputStream xmlStream = new ByteArrayInputStream(XML.getBytes(Charset.forName("UTF-8")));
        final InputStream throttledXmlStream = new ThrottledInputStream(xmlStream, BYTES_PER_READ);

        final XMLInputFactory xmlFactory = XMLInputFactory.newInstance();
        final XMLStreamReader xmlStreamReader = xmlFactory.createXMLStreamReader(throttledXmlStream);
        xmlStreamReader.next();

        // bugs or rugs?
        System.out.println(xmlStreamReader.getAttributeValue(null, "likes"));
    }

    // An InputStream implementation that limits the number of bytes read by read(byte[], int, int)
    private static class ThrottledInputStream extends FilterInputStream {
        private final int bytesPerRead;

        public ThrottledInputStream(InputStream stream, int bytesPerRead) throws Exception {
            super(stream);
            this.bytesPerRead = bytesPerRead;
        }

        @Override
        public int read(byte[] b, int off, int len) throws IOException {
            if (off < 0 || len < 0 || len > b.length - off) {
                throw new IndexOutOfBoundsException();
            } else if (len == 0) {
                return 0;
            }

            // Limit bytes read
            int bytesToRead = Math.min(bytesPerRead, len);

            // Ensure deterministic behavior (similar to org.apache.commons.io.IOUtils.read)
            // Useless for this test case, but convenient for consistently reproducing
            // the bug with other stream implementations
            int totalBytesRead = 0;
            int bytesRead = 0;
            do {
                bytesRead = Math.max(0, in.read(b, off + totalBytesRead, bytesToRead));
                bytesToRead -= bytesRead;
                totalBytesRead += bytesRead;
            } while (bytesRead > 0);

            // No more bytes
            if (totalBytesRead == 0) {
                return -1;
            }

            return totalBytesRead;
        }
    }	
}



---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


RE: XMLStreamReader corrupting data

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
I understand that Oracle/Sun made significant changes to their fork of the 
code base in order to support StAX. I wouldn't assume that their version 
of the code resembles the Apache version.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 01:52:27 PM:

> Hi,
> 
> Thanks for the answer
> XMLStreamReader is implemented by:
> > public class XMLStreamReaderImpl implements 
> javax.xml.stream.XMLStreamReader  (in package 
> com.sun.org.apache.xerces.internal.impl )
> It relies heavily on XMLEntityScanner, which is, I believe, the 
> cause of the bug.
> 
> XMLEntityScanner seems to be part of the Xerces library
> https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/
> xerces/impl/XMLEntityScanner.html
> 
> Maybe I misunderstood something? In any case, I have filed a bug on 
> the Oracle website
> 
> Thanks,
> 
> Victor
> 
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
> Sent: Monday, November 03, 2014 8:30 AM
> To: j-dev@xerces.apache.org
> Subject: Re: XMLStreamReader corrupting data
> 
> Hello,
> 
> Apache Xerces does not contain an implementation of the 
> XMLStreamReader interface. The component you're using would have 
> been developed by Oracle/Sun and has not been contributed to Apache.
> We wouldn't know anything about the problem you're experiencing with
> StAX. Probably better to ask your question on one of the JDK forums.
> 
> Thanks.
> 
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 03:12:20 AM:
> 
> > Hi all,
> > 
> > I'd like to report something that looks like a bug in the version of 
> > Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to 
> > produce corrupted data, depending on how many bytes the underlying 
> > InputStream is actually reading at each invocation of read(byte[], 
> > int, int)
> > 
> > The following repro case will lead to different results depending on 
> > the version of the JRE. Am I doing something wrong?
> > 
> > Thanks,
> > 
> > Victor
> > 
> > ------
> > 
> > import java.io.ByteArrayInputStream;
> > import java.io.FilterInputStream;
> > import java.io.IOException;
> > import java.io.InputStream;
> > import java.nio.charset.Charset;
> > 
> > import javax.xml.stream.XMLInputFactory; import 
> > javax.xml.stream.XMLStreamReader;
> > 
> > /*
> >  * Correct output (7u67,8u11)
> >  * rugs
> >  *
> >  * Incorrect output (7u71,7u72,8u20,8u25)
> >  * bugs
> >  */
> > public class XmlReaderBug {
> > 
> >     private static final int BYTES_PER_READ = 6;
> > 
> >     private static final String XML =
> >         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
> >         "<He likes=\"rugs\" because=\"they really tie the room
> together\"/>";
> > 
> >     public static void main(String[] args) throws Exception {
> >         final InputStream xmlStream = new ByteArrayInputStream 
> > (XML.getBytes(Charset.forName("UTF-8")));
> >         final InputStream throttledXmlStream = new 
> > ThrottledInputStream(xmlStream, BYTES_PER_READ);
> > 
> >         final XMLInputFactory xmlFactory =
> XMLInputFactory.newInstance();
> >         final XMLStreamReader xmlStreamReader = 
> > xmlFactory.createXMLStreamReader(throttledXmlStream);
> >         xmlStreamReader.next();
> > 
> >         // bugs or rugs?
> >         System.out.println(xmlStreamReader.getAttributeValue(null,
> "likes"));
> >     }
> > 
> >     // An InputStream implementation that limits the number of bytes 
> > read by read(byte[], int, int)
> >     private static class ThrottledInputStream extends 
> > FilterInputStream
> {
> >         private final int bytesPerRead;
> > 
> >         public ThrottledInputStream(InputStream stream, int
> > bytesPerRead) throws Exception {
> >             super(stream);
> >             this.bytesPerRead = bytesPerRead;
> >         }
> > 
> >         @Override
> >         public int read(byte[] b, int off, int len) throws IOException 
{
> >             if (off < 0 || len < 0 || len > b.length - off) {
> >                 throw new IndexOutOfBoundsException();
> >             } else if (len == 0) {
> >                 return 0;
> >             }
> > 
> >             // Limit bytes read
> >             int bytesToRead = Math.min(bytesPerRead, len);
> > 
> >             // Ensure deterministic behavior (similar to
> > org.apache.commons.io.IOUtils.read)
> >             // Useless for this test case, but convenient for 
> > consistently reproducing
> >             // the bug with other stream implementations
> >             int totalBytesRead = 0;
> >             int bytesRead = 0;
> >             do {
> >                 bytesRead = Math.max(0, in.read(b, off + 
> > totalBytesRead, bytesToRead));
> >                 bytesToRead -= bytesRead;
> >                 totalBytesRead += bytesRead;
> >             } while (bytesRead > 0);
> > 
> >             // No more bytes
> >             if (totalBytesRead == 0) {
> >                 return -1;
> >             }
> > 
> >             return totalBytesRead;
> >         }
> >     }
> > }
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-dev-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


RE: XMLStreamReader corrupting data

Posted by "Michel, Victor" <mi...@amazon.com>.
Hi,

Thanks for the answer
XMLStreamReader is implemented by:
> public class XMLStreamReaderImpl implements javax.xml.stream.XMLStreamReader  (in package com.sun.org.apache.xerces.internal.impl )
It relies heavily on XMLEntityScanner, which is, I believe, the cause of the bug.

XMLEntityScanner seems to be part of the Xerces library
https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/XMLEntityScanner.html

Maybe I misunderstood something? In any case, I have filed a bug on the Oracle website

Thanks,

Victor

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
Sent: Monday, November 03, 2014 8:30 AM
To: j-dev@xerces.apache.org
Subject: Re: XMLStreamReader corrupting data

Hello,

Apache Xerces does not contain an implementation of the XMLStreamReader interface. The component you're using would have been developed by Oracle/Sun and has not been contributed to Apache. We wouldn't know anything about the problem you're experiencing with StAX. Probably better to ask your question on one of the JDK forums.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 03:12:20 AM:

> Hi all,
> 
> I'd like to report something that looks like a bug in the version of 
> Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to 
> produce corrupted data, depending on how many bytes the underlying 
> InputStream is actually reading at each invocation of read(byte[], 
> int, int)
> 
> The following repro case will lead to different results depending on 
> the version of the JRE. Am I doing something wrong?
> 
> Thanks,
> 
> Victor
> 
> ------
> 
> import java.io.ByteArrayInputStream;
> import java.io.FilterInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.nio.charset.Charset;
> 
> import javax.xml.stream.XMLInputFactory; import 
> javax.xml.stream.XMLStreamReader;
> 
> /*
>  * Correct output (7u67,8u11)
>  * rugs
>  *
>  * Incorrect output (7u71,7u72,8u20,8u25)
>  * bugs
>  */
> public class XmlReaderBug {
> 
>     private static final int BYTES_PER_READ = 6;
> 
>     private static final String XML =
>         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>         "<He likes=\"rugs\" because=\"they really tie the room
together\"/>";
> 
>     public static void main(String[] args) throws Exception {
>         final InputStream xmlStream = new ByteArrayInputStream 
> (XML.getBytes(Charset.forName("UTF-8")));
>         final InputStream throttledXmlStream = new 
> ThrottledInputStream(xmlStream, BYTES_PER_READ);
> 
>         final XMLInputFactory xmlFactory =
XMLInputFactory.newInstance();
>         final XMLStreamReader xmlStreamReader = 
> xmlFactory.createXMLStreamReader(throttledXmlStream);
>         xmlStreamReader.next();
> 
>         // bugs or rugs?
>         System.out.println(xmlStreamReader.getAttributeValue(null,
"likes"));
>     }
> 
>     // An InputStream implementation that limits the number of bytes 
> read by read(byte[], int, int)
>     private static class ThrottledInputStream extends 
> FilterInputStream
{
>         private final int bytesPerRead;
> 
>         public ThrottledInputStream(InputStream stream, int
> bytesPerRead) throws Exception {
>             super(stream);
>             this.bytesPerRead = bytesPerRead;
>         }
> 
>         @Override
>         public int read(byte[] b, int off, int len) throws IOException {
>             if (off < 0 || len < 0 || len > b.length - off) {
>                 throw new IndexOutOfBoundsException();
>             } else if (len == 0) {
>                 return 0;
>             }
> 
>             // Limit bytes read
>             int bytesToRead = Math.min(bytesPerRead, len);
> 
>             // Ensure deterministic behavior (similar to
> org.apache.commons.io.IOUtils.read)
>             // Useless for this test case, but convenient for 
> consistently reproducing
>             // the bug with other stream implementations
>             int totalBytesRead = 0;
>             int bytesRead = 0;
>             do {
>                 bytesRead = Math.max(0, in.read(b, off + 
> totalBytesRead, bytesToRead));
>                 bytesToRead -= bytesRead;
>                 totalBytesRead += bytesRead;
>             } while (bytesRead > 0);
> 
>             // No more bytes
>             if (totalBytesRead == 0) {
>                 return -1;
>             }
> 
>             return totalBytesRead;
>         }
>     }
> }
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


RE: XMLStreamReader corrupting data

Posted by "Michel, Victor" <mi...@amazon.com>.
I should have named this email thread " XMLEntityScanner/XMLEntityManager corrupting data" - sorry for the confusion.
My debugger points me to these two classes, even though I have yet been unable to pinpoint the bug in the code

The repro case I've sent is fairly self-explanatory - the "b" from "because" ends up overwriting the "r" of "rugs".

I used a custom implementation of InputStream in the repro case I sent (to keep it small), but I have seen the bug happen with other implementations of InputStream and much larger XML files.
The corruption happens silently, which makes that bug pretty tricky to detect

Thanks,

Victor Michel
Amazon Web Services


-----Original Message-----
From: Michel, Victor 
Sent: Monday, November 03, 2014 10:52 AM
To: j-dev@xerces.apache.org
Subject: RE: XMLStreamReader corrupting data

Hi,

Thanks for the answer
XMLStreamReader is implemented by:
> public class XMLStreamReaderImpl implements 
> javax.xml.stream.XMLStreamReader  (in package 
> com.sun.org.apache.xerces.internal.impl )
It relies heavily on XMLEntityScanner, which is, I believe, the cause of the bug.

XMLEntityScanner seems to be part of the Xerces library https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/XMLEntityScanner.html

Maybe I misunderstood something? In any case, I have filed a bug on the Oracle website

Thanks,

Victor

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
Sent: Monday, November 03, 2014 8:30 AM
To: j-dev@xerces.apache.org
Subject: Re: XMLStreamReader corrupting data

Hello,

Apache Xerces does not contain an implementation of the XMLStreamReader interface. The component you're using would have been developed by Oracle/Sun and has not been contributed to Apache. We wouldn't know anything about the problem you're experiencing with StAX. Probably better to ask your question on one of the JDK forums.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 03:12:20 AM:

> Hi all,
> 
> I'd like to report something that looks like a bug in the version of 
> Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to 
> produce corrupted data, depending on how many bytes the underlying 
> InputStream is actually reading at each invocation of read(byte[], 
> int, int)
> 
> The following repro case will lead to different results depending on 
> the version of the JRE. Am I doing something wrong?
> 
> Thanks,
> 
> Victor
> 
> ------
> 
> import java.io.ByteArrayInputStream;
> import java.io.FilterInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.nio.charset.Charset;
> 
> import javax.xml.stream.XMLInputFactory; import 
> javax.xml.stream.XMLStreamReader;
> 
> /*
>  * Correct output (7u67,8u11)
>  * rugs
>  *
>  * Incorrect output (7u71,7u72,8u20,8u25)
>  * bugs
>  */
> public class XmlReaderBug {
> 
>     private static final int BYTES_PER_READ = 6;
> 
>     private static final String XML =
>         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>         "<He likes=\"rugs\" because=\"they really tie the room
together\"/>";
> 
>     public static void main(String[] args) throws Exception {
>         final InputStream xmlStream = new ByteArrayInputStream 
> (XML.getBytes(Charset.forName("UTF-8")));
>         final InputStream throttledXmlStream = new 
> ThrottledInputStream(xmlStream, BYTES_PER_READ);
> 
>         final XMLInputFactory xmlFactory =
XMLInputFactory.newInstance();
>         final XMLStreamReader xmlStreamReader = 
> xmlFactory.createXMLStreamReader(throttledXmlStream);
>         xmlStreamReader.next();
> 
>         // bugs or rugs?
>         System.out.println(xmlStreamReader.getAttributeValue(null,
"likes"));
>     }
> 
>     // An InputStream implementation that limits the number of bytes 
> read by read(byte[], int, int)
>     private static class ThrottledInputStream extends 
> FilterInputStream
{
>         private final int bytesPerRead;
> 
>         public ThrottledInputStream(InputStream stream, int
> bytesPerRead) throws Exception {
>             super(stream);
>             this.bytesPerRead = bytesPerRead;
>         }
> 
>         @Override
>         public int read(byte[] b, int off, int len) throws IOException {
>             if (off < 0 || len < 0 || len > b.length - off) {
>                 throw new IndexOutOfBoundsException();
>             } else if (len == 0) {
>                 return 0;
>             }
> 
>             // Limit bytes read
>             int bytesToRead = Math.min(bytesPerRead, len);
> 
>             // Ensure deterministic behavior (similar to
> org.apache.commons.io.IOUtils.read)
>             // Useless for this test case, but convenient for 
> consistently reproducing
>             // the bug with other stream implementations
>             int totalBytesRead = 0;
>             int bytesRead = 0;
>             do {
>                 bytesRead = Math.max(0, in.read(b, off + 
> totalBytesRead, bytesToRead));
>                 bytesToRead -= bytesRead;
>                 totalBytesRead += bytesRead;
>             } while (bytesRead > 0);
> 
>             // No more bytes
>             if (totalBytesRead == 0) {
>                 return -1;
>             }
> 
>             return totalBytesRead;
>         }
>     }
> }
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


RE: XMLStreamReader corrupting data

Posted by "Michel, Victor" <mi...@amazon.com>.
Oh, ok, I understand. Sorry for the spam on this mailing list then!

Victor

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com] 
Sent: Tuesday, November 04, 2014 6:50 AM
To: j-dev@xerces.apache.org
Subject: RE: XMLStreamReader corrupting data

You may have found a bug in OpenJDK, but OpenJDK != Xerces. What you're looking at isn't the Apache codebase. We have no influence over changes made to OpenJDK.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 10:03:50 PM:

> Here are my findings, which are pretty much a dump of what my debugger 
> tells me when running the sample I posted.
> 
> The XML:
> <?xml version="1.0" encoding="UTF-8"?> <He likes="rugs" because="they 
> really tie the room together"/>
> 
> For the start element "He", we first process the "likes" attribute, 
> and extract its value:
> Line 437:
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
> tmpStr is declared. It is passed as first argument of 
> scanAttributeValue on the next line (438)
> 
> That invocation goes on line 835 of XMLScanner:
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLScanner.java.html
> We're in scanAttributeValue.
> "fEntityScanner.scanLiteral(quote, value)" is invoked where "value" is 
> the tmpStr variable above
> 
> then, on line 1156:
> http://cr.openjdk.java.net/~joehw/jdk9/8027359/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLEntityScanner.java.html
> "content.setValues(fCurrentEntity.ch, offset, length);"
> So, tmpStr is now referencing the internal buffer of the current 
> entity, with an offset and a length (offset=0 and length=4 in this 
> case, according to my debugger)
> 
> Line 560:
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
> attributes now references tmpStr, which points to the internal entity
buffer
> 
> Then, the flow continues to:
> Line 254 (next iteration of the loop) we parse the next attribute
"because"
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
> scanAttribute is invoked, which in turns invokes scanQName:
> 
> And then, the corruption actually happens here, on line 779:
> http://cr.openjdk.java.net/~joehw/jdk9/8027359/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLEntityScanner.java.html
> The first character of the internal buffer is overwritten - but in 
> this example, this character is actually part of the XMLString which 
> is supposed to contain the previous attribute value (because 
> offset=0). So, "rugs" is changed into "bugs"
> 
> >> So, every time an XMLString with offset=0 is created for an
> attribute value, its first character is going to be overwritten by the 
> first character of the next attribute name, if any.
> 
> I hope this is convincing! I don't know where the actual regression 
> is, but it really seems to me that the bug is in Xerces, since this 
> execution flow does not leave Xerces code before the corruption happens.
> 
> 
> Thanks,
> 
> Victor Michel
> Amazon Web Services
> 
> 
> -----Original Message-----
> From: Michel, Victor
> Sent: Monday, November 03, 2014 4:16 PM
> To: 'j-dev@xerces.apache.org'
> Subject: RE: XMLStreamReader corrupting data
> 
> I should have named this email thread " XMLEntityScanner/ 
> XMLEntityManager corrupting data" - sorry for the confusion.
> My debugger points me to these two classes, even though I have yet 
> been unable to pinpoint the bug in the code
> 
> The repro case I've sent is fairly self-explanatory - the "b" from 
> "because" ends up overwriting the "r" of "rugs".
> 
> I used a custom implementation of InputStream in the repro case I sent 
> (to keep it small), but I have seen the bug happen with other 
> implementations of InputStream and much larger XML files.
> The corruption happens silently, which makes that bug pretty tricky to
detect
> 
> Thanks,
> 
> Victor Michel
> Amazon Web Services
> 
> 
> -----Original Message-----
> From: Michel, Victor
> Sent: Monday, November 03, 2014 10:52 AM
> To: j-dev@xerces.apache.org
> Subject: RE: XMLStreamReader corrupting data
> 
> Hi,
> 
> Thanks for the answer
> XMLStreamReader is implemented by:
> > public class XMLStreamReaderImpl implements 
> > javax.xml.stream.XMLStreamReader  (in package 
> > com.sun.org.apache.xerces.internal.impl )
> It relies heavily on XMLEntityScanner, which is, I believe, the cause 
> of the bug.
> 
> XMLEntityScanner seems to be part of the Xerces library https:// 
> xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/
> XMLEntityScanner.html
> 
> Maybe I misunderstood something? In any case, I have filed a bug on 
> the Oracle website
> 
> Thanks,
> 
> Victor
> 
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Monday, November 03, 2014 8:30 AM
> To: j-dev@xerces.apache.org
> Subject: Re: XMLStreamReader corrupting data
> 
> Hello,
> 
> Apache Xerces does not contain an implementation of the 
> XMLStreamReader interface. The component you're using would have been 
> developed by Oracle/Sun and has not been contributed to Apache.
> We wouldn't know anything about the problem you're experiencing with 
> StAX. Probably better to ask your question on one of the JDK forums.
> 
> Thanks.
> 
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 03:12:20 AM:
> 
> > Hi all,
> > 
> > I'd like to report something that looks like a bug in the version of 
> > Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to 
> > produce corrupted data, depending on how many bytes the underlying 
> > InputStream is actually reading at each invocation of read(byte[], 
> > int, int)
> > 
> > The following repro case will lead to different results depending on 
> > the version of the JRE. Am I doing something wrong?
> > 
> > Thanks,
> > 
> > Victor
> > 
> > ------
> > 
> > import java.io.ByteArrayInputStream; import 
> > java.io.FilterInputStream; import java.io.IOException; import 
> > java.io.InputStream; import java.nio.charset.Charset;
> > 
> > import javax.xml.stream.XMLInputFactory; import 
> > javax.xml.stream.XMLStreamReader;
> > 
> > /*
> >  * Correct output (7u67,8u11)
> >  * rugs
> >  *
> >  * Incorrect output (7u71,7u72,8u20,8u25)
> >  * bugs
> >  */
> > public class XmlReaderBug {
> > 
> >     private static final int BYTES_PER_READ = 6;
> > 
> >     private static final String XML =
> >         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
> >         "<He likes=\"rugs\" because=\"they really tie the room
> together\"/>";
> > 
> >     public static void main(String[] args) throws Exception {
> >         final InputStream xmlStream = new ByteArrayInputStream 
> > (XML.getBytes(Charset.forName("UTF-8")));
> >         final InputStream throttledXmlStream = new 
> > ThrottledInputStream(xmlStream, BYTES_PER_READ);
> > 
> >         final XMLInputFactory xmlFactory =
> XMLInputFactory.newInstance();
> >         final XMLStreamReader xmlStreamReader = 
> > xmlFactory.createXMLStreamReader(throttledXmlStream);
> >         xmlStreamReader.next();
> > 
> >         // bugs or rugs?
> >         System.out.println(xmlStreamReader.getAttributeValue(null,
> "likes"));
> >     }
> > 
> >     // An InputStream implementation that limits the number of bytes 
> > read by read(byte[], int, int)
> >     private static class ThrottledInputStream extends 
> > FilterInputStream
> {
> >         private final int bytesPerRead;
> > 
> >         public ThrottledInputStream(InputStream stream, int
> > bytesPerRead) throws Exception {
> >             super(stream);
> >             this.bytesPerRead = bytesPerRead;
> >         }
> > 
> >         @Override
> >         public int read(byte[] b, int off, int len) throws 
> > IOException
{
> >             if (off < 0 || len < 0 || len > b.length - off) {
> >                 throw new IndexOutOfBoundsException();
> >             } else if (len == 0) {
> >                 return 0;
> >             }
> > 
> >             // Limit bytes read
> >             int bytesToRead = Math.min(bytesPerRead, len);
> > 
> >             // Ensure deterministic behavior (similar to
> > org.apache.commons.io.IOUtils.read)
> >             // Useless for this test case, but convenient for 
> > consistently reproducing
> >             // the bug with other stream implementations
> >             int totalBytesRead = 0;
> >             int bytesRead = 0;
> >             do {
> >                 bytesRead = Math.max(0, in.read(b, off + 
> > totalBytesRead, bytesToRead));
> >                 bytesToRead -= bytesRead;
> >                 totalBytesRead += bytesRead;
> >             } while (bytesRead > 0);
> > 
> >             // No more bytes
> >             if (totalBytesRead == 0) {
> >                 return -1;
> >             }
> > 
> >             return totalBytesRead;
> >         }
> >     }
> > }
> > 
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-dev-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


RE: XMLStreamReader corrupting data

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
You may have found a bug in OpenJDK, but OpenJDK != Xerces. What you're 
looking at isn't the Apache codebase. We have no influence over changes 
made to OpenJDK.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 10:03:50 PM:

> Here are my findings, which are pretty much a dump of what my 
> debugger tells me when running the sample I posted.
> 
> The XML:
> <?xml version="1.0" encoding="UTF-8"?>
> <He likes="rugs" because="they really tie the room together"/>
> 
> For the start element "He", we first process the "likes" attribute, 
> and extract its value:
> Line 437:
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
> tmpStr is declared. It is passed as first argument of 
> scanAttributeValue on the next line (438)
> 
> That invocation goes on line 835 of XMLScanner:
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLScanner.java.html
> We're in scanAttributeValue.
> "fEntityScanner.scanLiteral(quote, value)" is invoked
> where "value" is the tmpStr variable above
> 
> then, on line 1156:
> http://cr.openjdk.java.net/~joehw/jdk9/8027359/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLEntityScanner.java.html
> "content.setValues(fCurrentEntity.ch, offset, length);"
> So, tmpStr is now referencing the internal buffer of the current 
> entity, with an offset and a length (offset=0 and length=4 in this 
> case, according to my debugger)
> 
> Line 560:
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
> attributes now references tmpStr, which points to the internal entity 
buffer
> 
> Then, the flow continues to:
> Line 254 (next iteration of the loop) we parse the next attribute 
"because"
> http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
> scanAttribute is invoked, which in turns invokes scanQName:
> 
> And then, the corruption actually happens here, on line 779:
> http://cr.openjdk.java.net/~joehw/jdk9/8027359/webrev/src/com/sun/
> org/apache/xerces/internal/impl/XMLEntityScanner.java.html
> The first character of the internal buffer is overwritten - but in 
> this example, this character is actually part of the XMLString which
> is supposed to contain the previous attribute value (because 
> offset=0). So, "rugs" is changed into "bugs"
> 
> >> So, every time an XMLString with offset=0 is created for an 
> attribute value, its first character is going to be overwritten by 
> the first character of the next attribute name, if any.
> 
> I hope this is convincing! I don't know where the actual regression 
> is, but it really seems to me that the bug is in Xerces, since this 
> execution flow does not leave Xerces code before the corruption happens.
> 
> 
> Thanks,
> 
> Victor Michel
> Amazon Web Services
> 
> 
> -----Original Message-----
> From: Michel, Victor 
> Sent: Monday, November 03, 2014 4:16 PM
> To: 'j-dev@xerces.apache.org'
> Subject: RE: XMLStreamReader corrupting data
> 
> I should have named this email thread " XMLEntityScanner/
> XMLEntityManager corrupting data" - sorry for the confusion.
> My debugger points me to these two classes, even though I have yet 
> been unable to pinpoint the bug in the code
> 
> The repro case I've sent is fairly self-explanatory - the "b" from 
> "because" ends up overwriting the "r" of "rugs".
> 
> I used a custom implementation of InputStream in the repro case I 
> sent (to keep it small), but I have seen the bug happen with other 
> implementations of InputStream and much larger XML files.
> The corruption happens silently, which makes that bug pretty tricky to 
detect
> 
> Thanks,
> 
> Victor Michel
> Amazon Web Services
> 
> 
> -----Original Message-----
> From: Michel, Victor
> Sent: Monday, November 03, 2014 10:52 AM
> To: j-dev@xerces.apache.org
> Subject: RE: XMLStreamReader corrupting data
> 
> Hi,
> 
> Thanks for the answer
> XMLStreamReader is implemented by:
> > public class XMLStreamReaderImpl implements 
> > javax.xml.stream.XMLStreamReader  (in package 
> > com.sun.org.apache.xerces.internal.impl )
> It relies heavily on XMLEntityScanner, which is, I believe, the 
> cause of the bug.
> 
> XMLEntityScanner seems to be part of the Xerces library https://
> xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/
> XMLEntityScanner.html
> 
> Maybe I misunderstood something? In any case, I have filed a bug on 
> the Oracle website
> 
> Thanks,
> 
> Victor
> 
> -----Original Message-----
> From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
> Sent: Monday, November 03, 2014 8:30 AM
> To: j-dev@xerces.apache.org
> Subject: Re: XMLStreamReader corrupting data
> 
> Hello,
> 
> Apache Xerces does not contain an implementation of the 
> XMLStreamReader interface. The component you're using would have 
> been developed by Oracle/Sun and has not been contributed to Apache.
> We wouldn't know anything about the problem you're experiencing with
> StAX. Probably better to ask your question on one of the JDK forums.
> 
> Thanks.
> 
> Michael Glavassevich
> XML Technologies and WAS Development
> IBM Toronto Lab
> E-mail: mrglavas@ca.ibm.com
> E-mail: mrglavas@apache.org
> 
> "Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 03:12:20 AM:
> 
> > Hi all,
> > 
> > I'd like to report something that looks like a bug in the version of 
> > Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to 
> > produce corrupted data, depending on how many bytes the underlying 
> > InputStream is actually reading at each invocation of read(byte[], 
> > int, int)
> > 
> > The following repro case will lead to different results depending on 
> > the version of the JRE. Am I doing something wrong?
> > 
> > Thanks,
> > 
> > Victor
> > 
> > ------
> > 
> > import java.io.ByteArrayInputStream;
> > import java.io.FilterInputStream;
> > import java.io.IOException;
> > import java.io.InputStream;
> > import java.nio.charset.Charset;
> > 
> > import javax.xml.stream.XMLInputFactory; import 
> > javax.xml.stream.XMLStreamReader;
> > 
> > /*
> >  * Correct output (7u67,8u11)
> >  * rugs
> >  *
> >  * Incorrect output (7u71,7u72,8u20,8u25)
> >  * bugs
> >  */
> > public class XmlReaderBug {
> > 
> >     private static final int BYTES_PER_READ = 6;
> > 
> >     private static final String XML =
> >         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
> >         "<He likes=\"rugs\" because=\"they really tie the room
> together\"/>";
> > 
> >     public static void main(String[] args) throws Exception {
> >         final InputStream xmlStream = new ByteArrayInputStream 
> > (XML.getBytes(Charset.forName("UTF-8")));
> >         final InputStream throttledXmlStream = new 
> > ThrottledInputStream(xmlStream, BYTES_PER_READ);
> > 
> >         final XMLInputFactory xmlFactory =
> XMLInputFactory.newInstance();
> >         final XMLStreamReader xmlStreamReader = 
> > xmlFactory.createXMLStreamReader(throttledXmlStream);
> >         xmlStreamReader.next();
> > 
> >         // bugs or rugs?
> >         System.out.println(xmlStreamReader.getAttributeValue(null,
> "likes"));
> >     }
> > 
> >     // An InputStream implementation that limits the number of bytes 
> > read by read(byte[], int, int)
> >     private static class ThrottledInputStream extends 
> > FilterInputStream
> {
> >         private final int bytesPerRead;
> > 
> >         public ThrottledInputStream(InputStream stream, int
> > bytesPerRead) throws Exception {
> >             super(stream);
> >             this.bytesPerRead = bytesPerRead;
> >         }
> > 
> >         @Override
> >         public int read(byte[] b, int off, int len) throws IOException 
{
> >             if (off < 0 || len < 0 || len > b.length - off) {
> >                 throw new IndexOutOfBoundsException();
> >             } else if (len == 0) {
> >                 return 0;
> >             }
> > 
> >             // Limit bytes read
> >             int bytesToRead = Math.min(bytesPerRead, len);
> > 
> >             // Ensure deterministic behavior (similar to
> > org.apache.commons.io.IOUtils.read)
> >             // Useless for this test case, but convenient for 
> > consistently reproducing
> >             // the bug with other stream implementations
> >             int totalBytesRead = 0;
> >             int bytesRead = 0;
> >             do {
> >                 bytesRead = Math.max(0, in.read(b, off + 
> > totalBytesRead, bytesToRead));
> >                 bytesToRead -= bytesRead;
> >                 totalBytesRead += bytesRead;
> >             } while (bytesRead > 0);
> > 
> >             // No more bytes
> >             if (totalBytesRead == 0) {
> >                 return -1;
> >             }
> > 
> >             return totalBytesRead;
> >         }
> >     }
> > }
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> > For additional commands, e-mail: j-dev-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


RE: XMLStreamReader corrupting data

Posted by "Michel, Victor" <mi...@amazon.com>.
Here are my findings, which are pretty much a dump of what my debugger tells me when running the sample I posted.

The XML:
<?xml version="1.0" encoding="UTF-8"?>
<He likes="rugs" because="they really tie the room together"/>

For the start element "He", we first process the "likes" attribute, and extract its value:
Line 437:
http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
tmpStr is declared. It is passed as first argument of scanAttributeValue on the next line (438)

That invocation goes on line 835 of XMLScanner:
http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/org/apache/xerces/internal/impl/XMLScanner.java.html
We're in scanAttributeValue.
"fEntityScanner.scanLiteral(quote, value)" is invoked
where "value" is the tmpStr variable above

then, on line 1156:
http://cr.openjdk.java.net/~joehw/jdk9/8027359/webrev/src/com/sun/org/apache/xerces/internal/impl/XMLEntityScanner.java.html
"content.setValues(fCurrentEntity.ch, offset, length);"
So, tmpStr is now referencing the internal buffer of the current entity, with an offset and a length (offset=0 and length=4 in this case, according to my debugger)

Line 560:
http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
attributes now references tmpStr, which points to the internal entity buffer

Then, the flow continues to:
Line 254 (next iteration of the loop) we parse the next attribute "because"
http://cr.openjdk.java.net/~joehw/jdk8/8029236/webrev/src/com/sun/org/apache/xerces/internal/impl/XMLNSDocumentScannerImpl.java.html
scanAttribute is invoked, which in turns invokes scanQName:

And then, the corruption actually happens here, on line 779:
http://cr.openjdk.java.net/~joehw/jdk9/8027359/webrev/src/com/sun/org/apache/xerces/internal/impl/XMLEntityScanner.java.html
The first character of the internal buffer is overwritten - but in this example, this character is actually part of the XMLString which is supposed to contain the previous attribute value (because offset=0). So, "rugs" is changed into "bugs"

>> So, every time an XMLString with offset=0 is created for an attribute value, its first character is going to be overwritten by the first character of the next attribute name, if any.

I hope this is convincing! I don't know where the actual regression is, but it really seems to me that the bug is in Xerces, since this execution flow does not leave Xerces code before the corruption happens.


Thanks,

Victor Michel
Amazon Web Services


-----Original Message-----
From: Michel, Victor 
Sent: Monday, November 03, 2014 4:16 PM
To: 'j-dev@xerces.apache.org'
Subject: RE: XMLStreamReader corrupting data

I should have named this email thread " XMLEntityScanner/XMLEntityManager corrupting data" - sorry for the confusion.
My debugger points me to these two classes, even though I have yet been unable to pinpoint the bug in the code

The repro case I've sent is fairly self-explanatory - the "b" from "because" ends up overwriting the "r" of "rugs".

I used a custom implementation of InputStream in the repro case I sent (to keep it small), but I have seen the bug happen with other implementations of InputStream and much larger XML files.
The corruption happens silently, which makes that bug pretty tricky to detect

Thanks,

Victor Michel
Amazon Web Services


-----Original Message-----
From: Michel, Victor
Sent: Monday, November 03, 2014 10:52 AM
To: j-dev@xerces.apache.org
Subject: RE: XMLStreamReader corrupting data

Hi,

Thanks for the answer
XMLStreamReader is implemented by:
> public class XMLStreamReaderImpl implements 
> javax.xml.stream.XMLStreamReader  (in package 
> com.sun.org.apache.xerces.internal.impl )
It relies heavily on XMLEntityScanner, which is, I believe, the cause of the bug.

XMLEntityScanner seems to be part of the Xerces library https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/XMLEntityScanner.html

Maybe I misunderstood something? In any case, I have filed a bug on the Oracle website

Thanks,

Victor

-----Original Message-----
From: Michael Glavassevich [mailto:mrglavas@ca.ibm.com]
Sent: Monday, November 03, 2014 8:30 AM
To: j-dev@xerces.apache.org
Subject: Re: XMLStreamReader corrupting data

Hello,

Apache Xerces does not contain an implementation of the XMLStreamReader interface. The component you're using would have been developed by Oracle/Sun and has not been contributed to Apache. We wouldn't know anything about the problem you're experiencing with StAX. Probably better to ask your question on one of the JDK forums.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 03:12:20 AM:

> Hi all,
> 
> I'd like to report something that looks like a bug in the version of 
> Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to 
> produce corrupted data, depending on how many bytes the underlying 
> InputStream is actually reading at each invocation of read(byte[], 
> int, int)
> 
> The following repro case will lead to different results depending on 
> the version of the JRE. Am I doing something wrong?
> 
> Thanks,
> 
> Victor
> 
> ------
> 
> import java.io.ByteArrayInputStream;
> import java.io.FilterInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.nio.charset.Charset;
> 
> import javax.xml.stream.XMLInputFactory; import 
> javax.xml.stream.XMLStreamReader;
> 
> /*
>  * Correct output (7u67,8u11)
>  * rugs
>  *
>  * Incorrect output (7u71,7u72,8u20,8u25)
>  * bugs
>  */
> public class XmlReaderBug {
> 
>     private static final int BYTES_PER_READ = 6;
> 
>     private static final String XML =
>         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>         "<He likes=\"rugs\" because=\"they really tie the room
together\"/>";
> 
>     public static void main(String[] args) throws Exception {
>         final InputStream xmlStream = new ByteArrayInputStream 
> (XML.getBytes(Charset.forName("UTF-8")));
>         final InputStream throttledXmlStream = new 
> ThrottledInputStream(xmlStream, BYTES_PER_READ);
> 
>         final XMLInputFactory xmlFactory =
XMLInputFactory.newInstance();
>         final XMLStreamReader xmlStreamReader = 
> xmlFactory.createXMLStreamReader(throttledXmlStream);
>         xmlStreamReader.next();
> 
>         // bugs or rugs?
>         System.out.println(xmlStreamReader.getAttributeValue(null,
"likes"));
>     }
> 
>     // An InputStream implementation that limits the number of bytes 
> read by read(byte[], int, int)
>     private static class ThrottledInputStream extends 
> FilterInputStream
{
>         private final int bytesPerRead;
> 
>         public ThrottledInputStream(InputStream stream, int
> bytesPerRead) throws Exception {
>             super(stream);
>             this.bytesPerRead = bytesPerRead;
>         }
> 
>         @Override
>         public int read(byte[] b, int off, int len) throws IOException {
>             if (off < 0 || len < 0 || len > b.length - off) {
>                 throw new IndexOutOfBoundsException();
>             } else if (len == 0) {
>                 return 0;
>             }
> 
>             // Limit bytes read
>             int bytesToRead = Math.min(bytesPerRead, len);
> 
>             // Ensure deterministic behavior (similar to
> org.apache.commons.io.IOUtils.read)
>             // Useless for this test case, but convenient for 
> consistently reproducing
>             // the bug with other stream implementations
>             int totalBytesRead = 0;
>             int bytesRead = 0;
>             do {
>                 bytesRead = Math.max(0, in.read(b, off + 
> totalBytesRead, bytesToRead));
>                 bytesToRead -= bytesRead;
>                 totalBytesRead += bytesRead;
>             } while (bytesRead > 0);
> 
>             // No more bytes
>             if (totalBytesRead == 0) {
>                 return -1;
>             }
> 
>             return totalBytesRead;
>         }
>     }
> }
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org


Re: XMLStreamReader corrupting data

Posted by Michael Glavassevich <mr...@ca.ibm.com>.
Hello,

Apache Xerces does not contain an implementation of the XMLStreamReader 
interface. The component you're using would have been developed by 
Oracle/Sun and has not been contributed to Apache. We wouldn't know 
anything about the problem you're experiencing with StAX. Probably better 
to ask your question on one of the JDK forums.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: mrglavas@ca.ibm.com
E-mail: mrglavas@apache.org

"Michel, Victor" <mi...@amazon.com> wrote on 11/03/2014 03:12:20 AM:

> Hi all,
> 
> I'd like to report something that looks like a bug in the version of
> Xerces included in JRE 7u71/7u72/8u20/8u25
> The StAX API seems to produce corrupted data, depending on how many 
> bytes the underlying InputStream is actually reading at each 
> invocation of read(byte[], int, int)
> 
> The following repro case will lead to different results depending on
> the version of the JRE. Am I doing something wrong?
> 
> Thanks,
> 
> Victor
> 
> ------
> 
> import java.io.ByteArrayInputStream;
> import java.io.FilterInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.nio.charset.Charset;
> 
> import javax.xml.stream.XMLInputFactory;
> import javax.xml.stream.XMLStreamReader;
> 
> /*
>  * Correct output (7u67,8u11)
>  * rugs
>  * 
>  * Incorrect output (7u71,7u72,8u20,8u25)
>  * bugs
>  */
> public class XmlReaderBug {
> 
>     private static final int BYTES_PER_READ = 6;
> 
>     private static final String XML =
>         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>         "<He likes=\"rugs\" because=\"they really tie the room 
together\"/>";
> 
>     public static void main(String[] args) throws Exception {
>         final InputStream xmlStream = new ByteArrayInputStream
> (XML.getBytes(Charset.forName("UTF-8")));
>         final InputStream throttledXmlStream = new 
> ThrottledInputStream(xmlStream, BYTES_PER_READ);
> 
>         final XMLInputFactory xmlFactory = 
XMLInputFactory.newInstance();
>         final XMLStreamReader xmlStreamReader = 
> xmlFactory.createXMLStreamReader(throttledXmlStream);
>         xmlStreamReader.next();
> 
>         // bugs or rugs?
>         System.out.println(xmlStreamReader.getAttributeValue(null, 
"likes"));
>     }
> 
>     // An InputStream implementation that limits the number of bytes
> read by read(byte[], int, int)
>     private static class ThrottledInputStream extends FilterInputStream 
{
>         private final int bytesPerRead;
> 
>         public ThrottledInputStream(InputStream stream, int 
> bytesPerRead) throws Exception {
>             super(stream);
>             this.bytesPerRead = bytesPerRead;
>         }
> 
>         @Override
>         public int read(byte[] b, int off, int len) throws IOException {
>             if (off < 0 || len < 0 || len > b.length - off) {
>                 throw new IndexOutOfBoundsException();
>             } else if (len == 0) {
>                 return 0;
>             }
> 
>             // Limit bytes read
>             int bytesToRead = Math.min(bytesPerRead, len);
> 
>             // Ensure deterministic behavior (similar to 
> org.apache.commons.io.IOUtils.read)
>             // Useless for this test case, but convenient for 
> consistently reproducing
>             // the bug with other stream implementations
>             int totalBytesRead = 0;
>             int bytesRead = 0;
>             do {
>                 bytesRead = Math.max(0, in.read(b, off + 
> totalBytesRead, bytesToRead));
>                 bytesToRead -= bytesRead;
>                 totalBytesRead += bytesRead;
>             } while (bytesRead > 0);
> 
>             // No more bytes
>             if (totalBytesRead == 0) {
>                 return -1;
>             }
> 
>             return totalBytesRead;
>         }
>     } 
> }
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
> For additional commands, e-mail: j-dev-help@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@xerces.apache.org
For additional commands, e-mail: j-dev-help@xerces.apache.org