You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Martynas Jusevičius <ma...@atomgraph.com> on 2020/07/12 22:01:47 UTC

ParseError while parsing entity in TriX

Hi,

    riot --strict --stop --syntax=TriX --output=nq

gives me

21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
error: ParseError at [row,col]:[2943360,62]

That line is in a <plainLiteral>and looks like this:

- http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;

I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
https://www.codetable.net/hex/c
&#xD; is found on other (previous) lines so it shouldn't be it.

Is &#xc; entity not allowed? The TriX output was produced by Saxon.

JENA_VERSION=3.10.0

Re: ParseError while parsing entity in TriX

Posted by Martynas Jusevičius <ma...@atomgraph.com>.
This looks like the bug: https://bugs.openjdk.java.net/browse/JDK-8059327

It should be fixed in Java 9. I'll try that, running on Java 8 right now.

On Tue, Jul 14, 2020 at 3:14 PM Martynas Jusevičius
<ma...@atomgraph.com> wrote:
>
> OK. I had to do some debugging to understand what' going on. It looks
> like the problem is in the underlying Xerces XML 1.1 parser:
> https://community.oracle.com/thread/1626288
>
> The method I was observing is
> XML11EntityScanner::scanContent(XMLString content) on line 830 (maybe
> method names have changed since 2009), but the problem is exactly the
> same - the parser just stops in the middle of the URI when the
> position reaches 8192:
>
>         // inner loop, scanning for content
>         if (external) {
>             while (fCurrentEntity.position < fCurrentEntity.count) {
>                 c = fCurrentEntity.ch[fCurrentEntity.position++];
>                 if (!XML11Char.isXML11Content(c) || c == 0x85 || c == 0x2028) {
>                     fCurrentEntity.position--;
>                     break;
>                 }
>             }
>         }
>
> Not sure how to create a test case based on that. Any advice on how to proceed?
>
> Thanks.
>
> On Tue, Jul 14, 2020 at 10:33 AM Andy Seaborne <an...@apache.org> wrote:
> >
> >
> >
> > On 13/07/2020 20:07, Martynas Jusevičius wrote:
> > > Andy,
> > >
> > > I've switched the output to XML version 1.1 and started getting a lot
> > > of inexplicable and seemingly random riot warnings, such as
> > >
> > > 18:49:18 WARN  riot                 :: [line: 181, col: 15] Bad IRI:
> > > <ply to this email directly or view it on GitHub:&#xD;
> > > htt035f94/> Spaces are not legal in URIs/IRIs.
> > >
> > > where line 181 simply reads:
> > >
> > >           <uri>https://localhost/messages/65195ff1-3549-4840-8bc2-f37a3a035f94/</uri>
> > >
> > > Those warnings were not there using XML 1.0, which concerns me. From
> > > the warning message it looks like the parser somehow read part of one
> > > term on top of another.
> >
> > TriX processes the output of the XML parser.
> >
> > org.apache.jena.riot.lang.ReaderTriX
> >
> >                  String x = parser.getElementText() ;
> >                  Node n = profile.createURI(x, line, col) ;
> >
> > > I am honestly trying to prepare a test file right away now :) I've cut
> > > it down to ~350 lines, but if I remove a single extra triple or even a
> > > line of string, the warning goes away.
> > > Can I send it off-list to you?
> >
> > 350 lines is still long.
> >
> > Because if I accept any off-list, I get a too much off-list, I don't
> > work that way.
> >
> >      Andy
> >
> > >
> > > On Mon, Jul 13, 2020 at 11:22 AM Martynas Jusevičius
> > > <ma...@atomgraph.com> wrote:
> > >>
> > >> Thanks Andy. I was making an example when I got your message :)
> > >>
> > >> I've found that form feed is not allowed in XML 1.0 but allowed in XML 1.1
> > >> https://stackoverflow.com/questions/15034302/how-can-i-add-form-feed-character-into-text-that-i-am-creating-with-xslt/37790009
> > >>
> > >> I tried TriX as XML version 1.1 and it worked:
> > >>
> > >> <?xml version="1.1" encoding="UTF-8"?>
> > >> <trix xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> > >>        xmlns="http://www.w3.org/2004/03/trix/trix-1/"
> > >>        xsi:schemaLocation="http://www.w3.org/2004/03/trix/trix-1/ trix-1.0.xsd">
> > >>     <graph>
> > >>       <triple>
> > >>         <uri>http://example.org/Bob</uri>
> > >>         <uri>http://example.org/name</uri>
> > >>         <plainLiteral>Bob&#xc;</plainLiteral>
> > >>       </triple>
> > >>     </graph>
> > >> </trix>
> > >>
> > >> Output:
> > >>
> > >> <http://example.org/Bob> <http://example.org/name> "Bob\f" .
> > >>
> > >> I guess I need to figure out how to get Saxon to produce 1.1.
> > >>
> > >> On Mon, Jul 13, 2020 at 11:14 AM Andy Seaborne <an...@apache.org> wrote:
> > >>>
> > >>> Small example?
> > >>> Try with and without &#xc;?
> > >>>
> > >>> <TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/">
> > >>>     <graph>
> > >>>       <triple>
> > >>>         <uri>http://example.org/Bob</uri>
> > >>>         <uri>http://example.org/name</uri>
> > >>>         <plainLiteral>Bob&#xc;</plainLiteral>
> > >>>       </triple>
> > >>>     </graph>
> > >>> </TriX>
> > >>>
> > >>> 10:10:19 ERROR riot            :: [line: 6, col: 29] XML error:
> > >>> ParseError at [row,col]:[6,29]
> > >>> Message: Character reference "&#xc" is an invalid XML character.
> > >>>
> > >>> The "Message:" line isn't from Jena.
> > >>>
> > >>> ReaderTriX.java
> > >>>
> > >>>           } catch (XMLStreamException ex) {
> > >>>               staxError(parser.getLocation(), "XML error:
> > >>> "+ex.getMessage()) ;
> > >>>           }
> > >>>
> > >>>
> > >>> (Jena 3.16.0ish) with JDK XML parser)
> > >>>
> > >>>       Andy
> > >>>
> > >>>
> > >>> On 12/07/2020 23:01, Martynas Jusevičius wrote:
> > >>>> Hi,
> > >>>>
> > >>>>       riot --strict --stop --syntax=TriX --output=nq
> > >>>>
> > >>>> gives me
> > >>>>
> > >>>> 21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
> > >>>> error: ParseError at [row,col]:[2943360,62]
> > >>>>
> > >>>> That line is in a <plainLiteral>and looks like this:
> > >>>>
> > >>>> - http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;
> > >>>>
> > >>>> I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
> > >>>> https://www.codetable.net/hex/c
> > >>>> &#xD; is found on other (previous) lines so it shouldn't be it.
> > >>>>
> > >>>> Is &#xc; entity not allowed? The TriX output was produced by Saxon.
> > >>>>
> > >>>> JENA_VERSION=3.10.0
> > >>>>

Re: ParseError while parsing entity in TriX

Posted by Martynas Jusevičius <ma...@atomgraph.com>.
OK. I had to do some debugging to understand what' going on. It looks
like the problem is in the underlying Xerces XML 1.1 parser:
https://community.oracle.com/thread/1626288

The method I was observing is
XML11EntityScanner::scanContent(XMLString content) on line 830 (maybe
method names have changed since 2009), but the problem is exactly the
same - the parser just stops in the middle of the URI when the
position reaches 8192:

        // inner loop, scanning for content
        if (external) {
            while (fCurrentEntity.position < fCurrentEntity.count) {
                c = fCurrentEntity.ch[fCurrentEntity.position++];
                if (!XML11Char.isXML11Content(c) || c == 0x85 || c == 0x2028) {
                    fCurrentEntity.position--;
                    break;
                }
            }
        }

Not sure how to create a test case based on that. Any advice on how to proceed?

Thanks.

On Tue, Jul 14, 2020 at 10:33 AM Andy Seaborne <an...@apache.org> wrote:
>
>
>
> On 13/07/2020 20:07, Martynas Jusevičius wrote:
> > Andy,
> >
> > I've switched the output to XML version 1.1 and started getting a lot
> > of inexplicable and seemingly random riot warnings, such as
> >
> > 18:49:18 WARN  riot                 :: [line: 181, col: 15] Bad IRI:
> > <ply to this email directly or view it on GitHub:&#xD;
> > htt035f94/> Spaces are not legal in URIs/IRIs.
> >
> > where line 181 simply reads:
> >
> >           <uri>https://localhost/messages/65195ff1-3549-4840-8bc2-f37a3a035f94/</uri>
> >
> > Those warnings were not there using XML 1.0, which concerns me. From
> > the warning message it looks like the parser somehow read part of one
> > term on top of another.
>
> TriX processes the output of the XML parser.
>
> org.apache.jena.riot.lang.ReaderTriX
>
>                  String x = parser.getElementText() ;
>                  Node n = profile.createURI(x, line, col) ;
>
> > I am honestly trying to prepare a test file right away now :) I've cut
> > it down to ~350 lines, but if I remove a single extra triple or even a
> > line of string, the warning goes away.
> > Can I send it off-list to you?
>
> 350 lines is still long.
>
> Because if I accept any off-list, I get a too much off-list, I don't
> work that way.
>
>      Andy
>
> >
> > On Mon, Jul 13, 2020 at 11:22 AM Martynas Jusevičius
> > <ma...@atomgraph.com> wrote:
> >>
> >> Thanks Andy. I was making an example when I got your message :)
> >>
> >> I've found that form feed is not allowed in XML 1.0 but allowed in XML 1.1
> >> https://stackoverflow.com/questions/15034302/how-can-i-add-form-feed-character-into-text-that-i-am-creating-with-xslt/37790009
> >>
> >> I tried TriX as XML version 1.1 and it worked:
> >>
> >> <?xml version="1.1" encoding="UTF-8"?>
> >> <trix xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> >>        xmlns="http://www.w3.org/2004/03/trix/trix-1/"
> >>        xsi:schemaLocation="http://www.w3.org/2004/03/trix/trix-1/ trix-1.0.xsd">
> >>     <graph>
> >>       <triple>
> >>         <uri>http://example.org/Bob</uri>
> >>         <uri>http://example.org/name</uri>
> >>         <plainLiteral>Bob&#xc;</plainLiteral>
> >>       </triple>
> >>     </graph>
> >> </trix>
> >>
> >> Output:
> >>
> >> <http://example.org/Bob> <http://example.org/name> "Bob\f" .
> >>
> >> I guess I need to figure out how to get Saxon to produce 1.1.
> >>
> >> On Mon, Jul 13, 2020 at 11:14 AM Andy Seaborne <an...@apache.org> wrote:
> >>>
> >>> Small example?
> >>> Try with and without &#xc;?
> >>>
> >>> <TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/">
> >>>     <graph>
> >>>       <triple>
> >>>         <uri>http://example.org/Bob</uri>
> >>>         <uri>http://example.org/name</uri>
> >>>         <plainLiteral>Bob&#xc;</plainLiteral>
> >>>       </triple>
> >>>     </graph>
> >>> </TriX>
> >>>
> >>> 10:10:19 ERROR riot            :: [line: 6, col: 29] XML error:
> >>> ParseError at [row,col]:[6,29]
> >>> Message: Character reference "&#xc" is an invalid XML character.
> >>>
> >>> The "Message:" line isn't from Jena.
> >>>
> >>> ReaderTriX.java
> >>>
> >>>           } catch (XMLStreamException ex) {
> >>>               staxError(parser.getLocation(), "XML error:
> >>> "+ex.getMessage()) ;
> >>>           }
> >>>
> >>>
> >>> (Jena 3.16.0ish) with JDK XML parser)
> >>>
> >>>       Andy
> >>>
> >>>
> >>> On 12/07/2020 23:01, Martynas Jusevičius wrote:
> >>>> Hi,
> >>>>
> >>>>       riot --strict --stop --syntax=TriX --output=nq
> >>>>
> >>>> gives me
> >>>>
> >>>> 21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
> >>>> error: ParseError at [row,col]:[2943360,62]
> >>>>
> >>>> That line is in a <plainLiteral>and looks like this:
> >>>>
> >>>> - http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;
> >>>>
> >>>> I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
> >>>> https://www.codetable.net/hex/c
> >>>> &#xD; is found on other (previous) lines so it shouldn't be it.
> >>>>
> >>>> Is &#xc; entity not allowed? The TriX output was produced by Saxon.
> >>>>
> >>>> JENA_VERSION=3.10.0
> >>>>

Re: ParseError while parsing entity in TriX

Posted by Andy Seaborne <an...@apache.org>.

On 13/07/2020 20:07, Martynas Jusevičius wrote:
> Andy,
> 
> I've switched the output to XML version 1.1 and started getting a lot
> of inexplicable and seemingly random riot warnings, such as
> 
> 18:49:18 WARN  riot                 :: [line: 181, col: 15] Bad IRI:
> <ply to this email directly or view it on GitHub:&#xD;
> htt035f94/> Spaces are not legal in URIs/IRIs.
> 
> where line 181 simply reads:
> 
>           <uri>https://localhost/messages/65195ff1-3549-4840-8bc2-f37a3a035f94/</uri>
> 
> Those warnings were not there using XML 1.0, which concerns me. From
> the warning message it looks like the parser somehow read part of one
> term on top of another.

TriX processes the output of the XML parser.

org.apache.jena.riot.lang.ReaderTriX

                 String x = parser.getElementText() ;
                 Node n = profile.createURI(x, line, col) ;

> I am honestly trying to prepare a test file right away now :) I've cut
> it down to ~350 lines, but if I remove a single extra triple or even a
> line of string, the warning goes away.
> Can I send it off-list to you?

350 lines is still long.

Because if I accept any off-list, I get a too much off-list, I don't 
work that way.

     Andy

> 
> On Mon, Jul 13, 2020 at 11:22 AM Martynas Jusevičius
> <ma...@atomgraph.com> wrote:
>>
>> Thanks Andy. I was making an example when I got your message :)
>>
>> I've found that form feed is not allowed in XML 1.0 but allowed in XML 1.1
>> https://stackoverflow.com/questions/15034302/how-can-i-add-form-feed-character-into-text-that-i-am-creating-with-xslt/37790009
>>
>> I tried TriX as XML version 1.1 and it worked:
>>
>> <?xml version="1.1" encoding="UTF-8"?>
>> <trix xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>        xmlns="http://www.w3.org/2004/03/trix/trix-1/"
>>        xsi:schemaLocation="http://www.w3.org/2004/03/trix/trix-1/ trix-1.0.xsd">
>>     <graph>
>>       <triple>
>>         <uri>http://example.org/Bob</uri>
>>         <uri>http://example.org/name</uri>
>>         <plainLiteral>Bob&#xc;</plainLiteral>
>>       </triple>
>>     </graph>
>> </trix>
>>
>> Output:
>>
>> <http://example.org/Bob> <http://example.org/name> "Bob\f" .
>>
>> I guess I need to figure out how to get Saxon to produce 1.1.
>>
>> On Mon, Jul 13, 2020 at 11:14 AM Andy Seaborne <an...@apache.org> wrote:
>>>
>>> Small example?
>>> Try with and without &#xc;?
>>>
>>> <TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/">
>>>     <graph>
>>>       <triple>
>>>         <uri>http://example.org/Bob</uri>
>>>         <uri>http://example.org/name</uri>
>>>         <plainLiteral>Bob&#xc;</plainLiteral>
>>>       </triple>
>>>     </graph>
>>> </TriX>
>>>
>>> 10:10:19 ERROR riot            :: [line: 6, col: 29] XML error:
>>> ParseError at [row,col]:[6,29]
>>> Message: Character reference "&#xc" is an invalid XML character.
>>>
>>> The "Message:" line isn't from Jena.
>>>
>>> ReaderTriX.java
>>>
>>>           } catch (XMLStreamException ex) {
>>>               staxError(parser.getLocation(), "XML error:
>>> "+ex.getMessage()) ;
>>>           }
>>>
>>>
>>> (Jena 3.16.0ish) with JDK XML parser)
>>>
>>>       Andy
>>>
>>>
>>> On 12/07/2020 23:01, Martynas Jusevičius wrote:
>>>> Hi,
>>>>
>>>>       riot --strict --stop --syntax=TriX --output=nq
>>>>
>>>> gives me
>>>>
>>>> 21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
>>>> error: ParseError at [row,col]:[2943360,62]
>>>>
>>>> That line is in a <plainLiteral>and looks like this:
>>>>
>>>> - http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;
>>>>
>>>> I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
>>>> https://www.codetable.net/hex/c
>>>> &#xD; is found on other (previous) lines so it shouldn't be it.
>>>>
>>>> Is &#xc; entity not allowed? The TriX output was produced by Saxon.
>>>>
>>>> JENA_VERSION=3.10.0
>>>>

Re: ParseError while parsing entity in TriX

Posted by Martynas Jusevičius <ma...@atomgraph.com>.
Andy,

I've switched the output to XML version 1.1 and started getting a lot
of inexplicable and seemingly random riot warnings, such as

18:49:18 WARN  riot                 :: [line: 181, col: 15] Bad IRI:
<ply to this email directly or view it on GitHub:&#xD;
htt035f94/> Spaces are not legal in URIs/IRIs.

where line 181 simply reads:

         <uri>https://localhost/messages/65195ff1-3549-4840-8bc2-f37a3a035f94/</uri>

Those warnings were not there using XML 1.0, which concerns me. From
the warning message it looks like the parser somehow read part of one
term on top of another.

I am honestly trying to prepare a test file right away now :) I've cut
it down to ~350 lines, but if I remove a single extra triple or even a
line of string, the warning goes away.
Can I send it off-list to you?

On Mon, Jul 13, 2020 at 11:22 AM Martynas Jusevičius
<ma...@atomgraph.com> wrote:
>
> Thanks Andy. I was making an example when I got your message :)
>
> I've found that form feed is not allowed in XML 1.0 but allowed in XML 1.1
> https://stackoverflow.com/questions/15034302/how-can-i-add-form-feed-character-into-text-that-i-am-creating-with-xslt/37790009
>
> I tried TriX as XML version 1.1 and it worked:
>
> <?xml version="1.1" encoding="UTF-8"?>
> <trix xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>       xmlns="http://www.w3.org/2004/03/trix/trix-1/"
>       xsi:schemaLocation="http://www.w3.org/2004/03/trix/trix-1/ trix-1.0.xsd">
>    <graph>
>      <triple>
>        <uri>http://example.org/Bob</uri>
>        <uri>http://example.org/name</uri>
>        <plainLiteral>Bob&#xc;</plainLiteral>
>      </triple>
>    </graph>
> </trix>
>
> Output:
>
> <http://example.org/Bob> <http://example.org/name> "Bob\f" .
>
> I guess I need to figure out how to get Saxon to produce 1.1.
>
> On Mon, Jul 13, 2020 at 11:14 AM Andy Seaborne <an...@apache.org> wrote:
> >
> > Small example?
> > Try with and without &#xc;?
> >
> > <TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/">
> >    <graph>
> >      <triple>
> >        <uri>http://example.org/Bob</uri>
> >        <uri>http://example.org/name</uri>
> >        <plainLiteral>Bob&#xc;</plainLiteral>
> >      </triple>
> >    </graph>
> > </TriX>
> >
> > 10:10:19 ERROR riot            :: [line: 6, col: 29] XML error:
> > ParseError at [row,col]:[6,29]
> > Message: Character reference "&#xc" is an invalid XML character.
> >
> > The "Message:" line isn't from Jena.
> >
> > ReaderTriX.java
> >
> >          } catch (XMLStreamException ex) {
> >              staxError(parser.getLocation(), "XML error:
> > "+ex.getMessage()) ;
> >          }
> >
> >
> > (Jena 3.16.0ish) with JDK XML parser)
> >
> >      Andy
> >
> >
> > On 12/07/2020 23:01, Martynas Jusevičius wrote:
> > > Hi,
> > >
> > >      riot --strict --stop --syntax=TriX --output=nq
> > >
> > > gives me
> > >
> > > 21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
> > > error: ParseError at [row,col]:[2943360,62]
> > >
> > > That line is in a <plainLiteral>and looks like this:
> > >
> > > - http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;
> > >
> > > I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
> > > https://www.codetable.net/hex/c
> > > &#xD; is found on other (previous) lines so it shouldn't be it.
> > >
> > > Is &#xc; entity not allowed? The TriX output was produced by Saxon.
> > >
> > > JENA_VERSION=3.10.0
> > >

Re: ParseError while parsing entity in TriX

Posted by Martynas Jusevičius <ma...@atomgraph.com>.
Thanks Andy. I was making an example when I got your message :)

I've found that form feed is not allowed in XML 1.0 but allowed in XML 1.1
https://stackoverflow.com/questions/15034302/how-can-i-add-form-feed-character-into-text-that-i-am-creating-with-xslt/37790009

I tried TriX as XML version 1.1 and it worked:

<?xml version="1.1" encoding="UTF-8"?>
<trix xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns="http://www.w3.org/2004/03/trix/trix-1/"
      xsi:schemaLocation="http://www.w3.org/2004/03/trix/trix-1/ trix-1.0.xsd">
   <graph>
     <triple>
       <uri>http://example.org/Bob</uri>
       <uri>http://example.org/name</uri>
       <plainLiteral>Bob&#xc;</plainLiteral>
     </triple>
   </graph>
</trix>

Output:

<http://example.org/Bob> <http://example.org/name> "Bob\f" .

I guess I need to figure out how to get Saxon to produce 1.1.

On Mon, Jul 13, 2020 at 11:14 AM Andy Seaborne <an...@apache.org> wrote:
>
> Small example?
> Try with and without &#xc;?
>
> <TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/">
>    <graph>
>      <triple>
>        <uri>http://example.org/Bob</uri>
>        <uri>http://example.org/name</uri>
>        <plainLiteral>Bob&#xc;</plainLiteral>
>      </triple>
>    </graph>
> </TriX>
>
> 10:10:19 ERROR riot            :: [line: 6, col: 29] XML error:
> ParseError at [row,col]:[6,29]
> Message: Character reference "&#xc" is an invalid XML character.
>
> The "Message:" line isn't from Jena.
>
> ReaderTriX.java
>
>          } catch (XMLStreamException ex) {
>              staxError(parser.getLocation(), "XML error:
> "+ex.getMessage()) ;
>          }
>
>
> (Jena 3.16.0ish) with JDK XML parser)
>
>      Andy
>
>
> On 12/07/2020 23:01, Martynas Jusevičius wrote:
> > Hi,
> >
> >      riot --strict --stop --syntax=TriX --output=nq
> >
> > gives me
> >
> > 21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
> > error: ParseError at [row,col]:[2943360,62]
> >
> > That line is in a <plainLiteral>and looks like this:
> >
> > - http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;
> >
> > I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
> > https://www.codetable.net/hex/c
> > &#xD; is found on other (previous) lines so it shouldn't be it.
> >
> > Is &#xc; entity not allowed? The TriX output was produced by Saxon.
> >
> > JENA_VERSION=3.10.0
> >

Re: ParseError while parsing entity in TriX

Posted by Andy Seaborne <an...@apache.org>.
Small example?
Try with and without &#xc;?

<TriX xmlns="http://www.w3.org/2004/03/trix/trix-1/">
   <graph>
     <triple>
       <uri>http://example.org/Bob</uri>
       <uri>http://example.org/name</uri>
       <plainLiteral>Bob&#xc;</plainLiteral>
     </triple>
   </graph>
</TriX>

10:10:19 ERROR riot            :: [line: 6, col: 29] XML error: 
ParseError at [row,col]:[6,29]
Message: Character reference "&#xc" is an invalid XML character.

The "Message:" line isn't from Jena.

ReaderTriX.java

         } catch (XMLStreamException ex) {
             staxError(parser.getLocation(), "XML error: 
"+ex.getMessage()) ;
         }


(Jena 3.16.0ish) with JDK XML parser)

     Andy


On 12/07/2020 23:01, Martynas Jusevičius wrote:
> Hi,
> 
>      riot --strict --stop --syntax=TriX --output=nq
> 
> gives me
> 
> 21:40:07 ERROR riot                 :: [line: 2943360, col: 62] XML
> error: ParseError at [row,col]:[2943360,62]
> 
> That line is in a <plainLiteral>and looks like this:
> 
> - http://sprout.ics.uci.edu/past_projects/gac/index.html&#xc;&#xD;
> 
> I'm guessing it's the &#xc; entity that riot is failing on? It's the Form Feed:
> https://www.codetable.net/hex/c
> &#xD; is found on other (previous) lines so it shouldn't be it.
> 
> Is &#xc; entity not allowed? The TriX output was produced by Saxon.
> 
> JENA_VERSION=3.10.0
>