You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by sbarriba <sb...@yahoo.co.uk> on 2007/10/05 09:40:29 UTC

Memory usage issues of importml/exportsysview

Hi all,

During a recent thread Hot Backup Tools were discussed - see
http://www.mail-archive.com/users@jackrabbit.apache.org/msg04255.html.

 

As an outcome of that we're doing 2 things:

1)      "Low-level" backup

o   Backing up the database

o   Backing up the repository file system

2)      "High-level" backup

o   Running exportsysview on each workspace 

 

When migrating between environments or restoring backups solution 2) is very
useful although the XML files are getting very large where the content has
lots of binaries etc. The main issue is that the memory requirements of
"importxml" increase linearly with the size of the XML file. I presume this
is due to either a) the memory required to parse the file, and/or b) the
memory required to hold the transient state of the import.

 

We're now needing to use a 1GB heap size for some imports and obviously this
will hit a crunch point.

 

Any suggestions on how to resolve this memory issue? For example, could the
"importxml" not use a SAX event model to avoid parsing the XML into a
complete DOM etc (note I don't know the internals of importxml as it
stands).

 

All suggestions welcome.

Regards,

Shaun

 


Re: Memory usage issues of importml/exportsysview

Posted by Jacco van Weert <11...@gmail.com>.
Hello Shaun,

We use our own created backup facilty also works "hot".
I wrote a mail about it a few days ago ( it's part of JeCARS ).

The result of the backup is a;
- CND file
- node structure file in plain ASCII, easy parseable
- the binary information is stored as seperate files.

The solution works very well. I use it in an other application in which the
repository is replicated at short intervals.

It is especially usefull when existing nodetypes are changed.... in the
future we will introduce a sort of "evolution scheme".
When e.g. propertynames are changed the "restore" operation can map the
property again.

The source (of the first version) is available.



Greetings,

  Jacco van Weert



On 10/5/07, sbarriba <sb...@yahoo.co.uk> wrote:
>
> Hi all,
>
> During a recent thread Hot Backup Tools were discussed - see
> http://www.mail-archive.com/users@jackrabbit.apache.org/msg04255.html.
>
>
>
> As an outcome of that we're doing 2 things:
>
> 1)      "Low-level" backup
>
> o   Backing up the database
>
> o   Backing up the repository file system
>
> 2)      "High-level" backup
>
> o   Running exportsysview on each workspace
>
>
>
> When migrating between environments or restoring backups solution 2) is
> very
> useful although the XML files are getting very large where the content has
> lots of binaries etc. The main issue is that the memory requirements of
> "importxml" increase linearly with the size of the XML file. I presume
> this
> is due to either a) the memory required to parse the file, and/or b) the
> memory required to hold the transient state of the import.
>
>
>
> We're now needing to use a 1GB heap size for some imports and obviously
> this
> will hit a crunch point.
>
>
>
> Any suggestions on how to resolve this memory issue? For example, could
> the
> "importxml" not use a SAX event model to avoid parsing the XML into a
> complete DOM etc (note I don't know the internals of importxml as it
> stands).
>
>
>
> All suggestions welcome.
>
> Regards,
>
> Shaun
>
>
>
>


-- 
-------------------------------------
Jacco van Weert -- 1111software@gmail.com
JCR Controller -- http://www.xs4all.nl/~weertj/jcr