You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Steven Bethard (JIRA)" <de...@uima.apache.org> on 2011/03/25 15:19:05 UTC
[jira] [Updated] (UIMA-2101) CasToInlineXml adds whitespace
[ https://issues.apache.org/jira/browse/UIMA-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steven Bethard updated UIMA-2101:
---------------------------------
Description:
CasToInlineXml adds indentation between adjacent XML elements. E.g. for a single character document with a single annotation covering that one character, it will write:
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified">
<uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation>
</uima.tcas.DocumentAnnotation>
</Document>
{noformat}
I think it should instead write everything in a single line, that is:
{noformat}
<?xml version="1.0" encoding="UTF-8"?>
<Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
{noformat}
I believe this could be fixed by replacing the line:
{noformat}
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
{noformat}
with the line:
{noformat}
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
{noformat}
I think it's a bug that CasToInlineXml is changing the character offsets, but I would also be happy if there was an alternate constructor or a method on CasToInlineXml that allowed disabling the formatting.
was:
CasToInlineXml adds indentation between adjacent XML elements. E.g. for a single character document with a single annotation covering that one character, it will write:
<?xml version="1.0" encoding="UTF-8"?>
<Document>
<uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified">
<uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation>
</uima.tcas.DocumentAnnotation>
</Document>
I think it should instead write everything in a single line, that is:
<?xml version="1.0" encoding="UTF-8"?>
<Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
I believe this could be fixed by replacing the line:
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
with the line:
XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
I think it's a bug that CasToInlineXml is changing the character offsets, but I would also be happy if there was an alternate constructor or a method on CasToInlineXml that allowed disabling the formatting.
> CasToInlineXml adds whitespace
> ------------------------------
>
> Key: UIMA-2101
> URL: https://issues.apache.org/jira/browse/UIMA-2101
> Project: UIMA
> Issue Type: Bug
> Affects Versions: 2.3.1SDK
> Reporter: Steven Bethard
>
> CasToInlineXml adds indentation between adjacent XML elements. E.g. for a single character document with a single annotation covering that one character, it will write:
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <Document>
> <uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified">
> <uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation>
> </uima.tcas.DocumentAnnotation>
> </Document>
> {noformat}
> I think it should instead write everything in a single line, that is:
> {noformat}
> <?xml version="1.0" encoding="UTF-8"?>
> <Document><uima.tcas.DocumentAnnotation sofa="Sofa" begin="0" end="1" language="x-unspecified"><uima.tcas.Annotation sofa="Sofa" begin="0" end="1"> </uima.tcas.Annotation></uima.tcas.DocumentAnnotation></Document>
> {noformat}
> I believe this could be fixed by replacing the line:
> {noformat}
> XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream);
> {noformat}
> with the line:
> {noformat}
> XMLSerializer sax2xml = new XMLSerializer(byteArrayOutputStream, false);
> {noformat}
> I think it's a bug that CasToInlineXml is changing the character offsets, but I would also be happy if there was an alternate constructor or a method on CasToInlineXml that allowed disabling the formatting.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira