You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ws.apache.org by "Andreas Veithen (JIRA)" <ji...@apache.org> on 2016/01/05 08:50:39 UTC

[jira] [Updated] (AXIOM-478) Solution for parsing large XML

     [ https://issues.apache.org/jira/browse/AXIOM-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Veithen updated AXIOM-478:
----------------------------------
    Description: 
This is LU Jie from IBM. We use axiom to parse Atom in our project. 
One of our CMIS API will attach file content to the XML. If the file size is large, we will get a large atom.
If we use Entry.getExtension(QName) to parse the content, it will allocate a large memory(around 5-6 times of the file size).
We need you help to clarify if we can use DOM-like API of axiom to get the text of a certain element as stream. That is without allocating a large object in memory.
Or is there an alternative solution for this use case?
We DO know that we can use pull-parser to parse the XML as stream. But we need help to investigate if axiom has already provided an API or solution to avoid writing parser by ourselves.

Here's the sample XML. We need to parse the text of cmisra:base64 element:

{noformat}
<atom:entry
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/"
    xmlns:chemistry="http://chemistry.apache.org/"
    xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">
    <atom:id
        xmlns:atom="http://www.w3.org/2005/Atom">urn:uuid:00000000-0000-0000-0000-00000000000
    </atom:id>
    <atom:title
        xmlns:atom="http://www.w3.org/2005/Atom" type="text">doucment1446016556658.txt
    </atom:title>
    <atom:updated
        xmlns:atom="http://www.w3.org/2005/Atom">2015-10-28T07:15:57.594Z
    </atom:updated>
    <cmisra:content
        xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">
        <cmisra:mediatype
            xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">text/plain
        </cmisra:mediatype>
        <chemistry:filename
            xmlns:chemistry="http://chemistry.apache.org/">doucment1446016556658.txt
        </chemistry:filename>
        <cmisra:base64
            xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">Base64 encoded content of large file
        </cmisra:base64>
    </cmisra:content>
    <cmisra:object
        xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">
        <cmis:properties
            xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">
            <cmis:propertyId
                xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/" propertyDefinitionId="cmis:objectTypeId">
                <cmis:value
                    xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">snx:file
                </cmis:value>
            </cmis:propertyId>
            <cmis:propertyString
                xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/" propertyDefinitionId="cmis:name">
                <cmis:value
                    xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">doucment1446016556658.txt
                </cmis:value>
            </cmis:propertyString>
        </cmis:properties>
    </cmisra:object>
</atom:entry>
{noformat}

  was:
This is LU Jie from IBM. We use axiom to parse Atom in our project. 
One of our CMIS API will attach file content to the XML. If the file size is large, we will get a large atom.
If we use Entry.getExtension(QName) to parse the content, it will allocate a large memory(around 5-6 times of the file size).
We need you help to clarify if we can use DOM-like API of axiom to get the text of a certain element as stream. That is without allocating a large object in memory.
Or is there an alternative solution for this use case?
We DO know that we can use pull-parser to parse the XML as stream. But we need help to investigate if axiom has already provided an API or solution to avoid writing parser by ourselves.

Here's the sample XML. We need to parse the text of cmisra:base64 element:
<atom:entry
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/"
    xmlns:chemistry="http://chemistry.apache.org/"
    xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">
    <atom:id
        xmlns:atom="http://www.w3.org/2005/Atom">urn:uuid:00000000-0000-0000-0000-00000000000
    </atom:id>
    <atom:title
        xmlns:atom="http://www.w3.org/2005/Atom" type="text">doucment1446016556658.txt
    </atom:title>
    <atom:updated
        xmlns:atom="http://www.w3.org/2005/Atom">2015-10-28T07:15:57.594Z
    </atom:updated>
    <cmisra:content
        xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">
        <cmisra:mediatype
            xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">text/plain
        </cmisra:mediatype>
        <chemistry:filename
            xmlns:chemistry="http://chemistry.apache.org/">doucment1446016556658.txt
        </chemistry:filename>
        <cmisra:base64
            xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">Base64 encoded content of large file
        </cmisra:base64>
    </cmisra:content>
    <cmisra:object
        xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">
        <cmis:properties
            xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">
            <cmis:propertyId
                xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/" propertyDefinitionId="cmis:objectTypeId">
                <cmis:value
                    xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">snx:file
                </cmis:value>
            </cmis:propertyId>
            <cmis:propertyString
                xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/" propertyDefinitionId="cmis:name">
                <cmis:value
                    xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">doucment1446016556658.txt
                </cmis:value>
            </cmis:propertyString>
        </cmis:properties>
    </cmisra:object>
</atom:entry>


> Solution for parsing large XML
> ------------------------------
>
>                 Key: AXIOM-478
>                 URL: https://issues.apache.org/jira/browse/AXIOM-478
>             Project: Axiom
>          Issue Type: Question
>            Reporter: LU Jie
>
> This is LU Jie from IBM. We use axiom to parse Atom in our project. 
> One of our CMIS API will attach file content to the XML. If the file size is large, we will get a large atom.
> If we use Entry.getExtension(QName) to parse the content, it will allocate a large memory(around 5-6 times of the file size).
> We need you help to clarify if we can use DOM-like API of axiom to get the text of a certain element as stream. That is without allocating a large object in memory.
> Or is there an alternative solution for this use case?
> We DO know that we can use pull-parser to parse the XML as stream. But we need help to investigate if axiom has already provided an API or solution to avoid writing parser by ourselves.
> Here's the sample XML. We need to parse the text of cmisra:base64 element:
> {noformat}
> <atom:entry
>     xmlns:atom="http://www.w3.org/2005/Atom"
>     xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/"
>     xmlns:chemistry="http://chemistry.apache.org/"
>     xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">
>     <atom:id
>         xmlns:atom="http://www.w3.org/2005/Atom">urn:uuid:00000000-0000-0000-0000-00000000000
>     </atom:id>
>     <atom:title
>         xmlns:atom="http://www.w3.org/2005/Atom" type="text">doucment1446016556658.txt
>     </atom:title>
>     <atom:updated
>         xmlns:atom="http://www.w3.org/2005/Atom">2015-10-28T07:15:57.594Z
>     </atom:updated>
>     <cmisra:content
>         xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">
>         <cmisra:mediatype
>             xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">text/plain
>         </cmisra:mediatype>
>         <chemistry:filename
>             xmlns:chemistry="http://chemistry.apache.org/">doucment1446016556658.txt
>         </chemistry:filename>
>         <cmisra:base64
>             xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">Base64 encoded content of large file
>         </cmisra:base64>
>     </cmisra:content>
>     <cmisra:object
>         xmlns:cmisra="http://docs.oasis-open.org/ns/cmis/restatom/200908/">
>         <cmis:properties
>             xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">
>             <cmis:propertyId
>                 xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/" propertyDefinitionId="cmis:objectTypeId">
>                 <cmis:value
>                     xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">snx:file
>                 </cmis:value>
>             </cmis:propertyId>
>             <cmis:propertyString
>                 xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/" propertyDefinitionId="cmis:name">
>                 <cmis:value
>                     xmlns:cmis="http://docs.oasis-open.org/ns/cmis/core/200908/">doucment1446016556658.txt
>                 </cmis:value>
>             </cmis:propertyString>
>         </cmis:properties>
>     </cmisra:object>
> </atom:entry>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@ws.apache.org
For additional commands, e-mail: dev-help@ws.apache.org