You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by MARTINEZ Antonio <An...@alcatel-lucent.com> on 2009/02/04 05:55:08 UTC

Performance of export/migration/import

Hello,

It is that time in our project were we need to support export/import
with data migration.
In our case, during migration:
- We actually need to change some parameters for most of the nodes
before we import.
- We definitely have to have Jackrabbit running during the export (and
preferably also during the import)
- We need to be able to export/migrate/import both a subtree or the
entire repository.
- We need to do and export/migration/import in under 3 hours for a 20G
repository (the index in this case is about 8G)

In the wiki
(http://wiki.apache.org/jackrabbit/FrontPage?highlight=((BackupAndMigrat
ion)) I see three options:

1) Low level, copying file system (for index) and tables for data (mysql
in our case)
  I think we can not use this option because I do not know how can I
modify the data (tables  and index) in the process.
  Any options here?

2) Use JCR API (export/import to/from XML)
   This would be great since would allow Jackrabbit to be running in the
process
   It is also great since it allows to specify a subtree for
import/export

   However, we can not use it for performance reasons:
   The test I have done for a typical subtree in our case-  :
       -Exported data: 14M
       -Export time: 1' 45"
   We can have up to 10K subtrees of this size in the repository.

3) Using a tool (like Jecars)
   Anybody has any data to show if this or other tool would support our
requirements ?

Any other options are very welcome !

Thanks in advance,
Antonio


Re: Performance of export/migration/import

Posted by Jacco van Weert <11...@gmail.com>.
>
>
>
> 3) Using a tool (like Jecars)
>   Anybody has any data to show if this or other tool would support our
> requirements ?
>


Looking at performance the JeCARS backup tool is IMHO fast. The non-binary
import/export are being used in a high performance environment. For that
purpose a non-XML format is used for maximum performance.
Binary data's are stored as seperate files.
Looking at your situation the JeCARS backup can only make a hot backup
within JeCARS itself, when used at a plain Jackrabbit repository it uses the
Transient repository mode.


Gr.

   Jacco


-- 
-------------------------------------
Jacco van Weert -- 1111software@gmail.com
JCR Controller -- http://www.xs4all.nl/~weertj/jcr
JeCARS -- http://jecars.sourceforge.net

Re: Performance of export/migration/import

Posted by Paco Avila <mo...@gmail.com>.
https://issues.apache.org/jira/browse/JCR-1972

On Sat, Feb 7, 2009 at 3:11 PM, Alexander Klimetschek <ak...@day.com> wrote:
> On Fri, Feb 6, 2009 at 9:02 AM, Paco Avila <mo...@gmail.com> wrote:
>> I have been working I an migration utility for OpenKM and I performed
>> some changes in jackrabit-core to enable version import, preserving
>> the modification date. Also modified
>> org.apache.jackrabbit.core.NodeImpl to preserve UUID in the migration
>> process.
>
> Sounds interesting!
>
>> I've attache a PDF with the changes needed in Jackrabbit-core. It
>> works and there was no problems with the migrated repository.
>
> This mailing list does not accept attachments. I would suggest you
> create a Jira issue and attach the patch there (a new one, because I
> couldn't find any existing issue for that with a quick jira search).
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
>



-- 
Paco Avila
GIT Consultors
tel: +34 971 498310
fax: +34 971496189
e-mail: pavila@git.es
http://www.git.es

Re: Performance of export/migration/import

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Feb 6, 2009 at 9:02 AM, Paco Avila <mo...@gmail.com> wrote:
> I have been working I an migration utility for OpenKM and I performed
> some changes in jackrabit-core to enable version import, preserving
> the modification date. Also modified
> org.apache.jackrabbit.core.NodeImpl to preserve UUID in the migration
> process.

Sounds interesting!

> I've attache a PDF with the changes needed in Jackrabbit-core. It
> works and there was no problems with the migrated repository.

This mailing list does not accept attachments. I would suggest you
create a Jira issue and attach the patch there (a new one, because I
couldn't find any existing issue for that with a quick jira search).

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Performance of export/migration/import

Posted by Paco Avila <mo...@gmail.com>.
I have been working I an migration utility for OpenKM and I performed
some changes in jackrabit-core to enable version import, preserving
the modification date. Also modified
org.apache.jackrabbit.core.NodeImpl to preserve UUID in the migration
process.

This migration process is needed because there are changes in
repository node definition, and Jackrabbit can't deal with this
actually.

I've attache a PDF with the changes needed in Jackrabbit-core. It
works and there was no problems with the migrated repository.

On Thu, Feb 5, 2009 at 3:48 PM, Ivan Latysh <iv...@gmail.com> wrote:
> Torgeir Veimo wrote:
>
>>>> I forgot an important question... Does it backup  / restore document
>>>> version history?
>>>
>>> That is one major limitation, JCR does not provide an API to manipulate
>>> versions. JCR Backup will back up version history, but won't restore it.
>>>
>>> If you need to backup/restore versions, the only solution is to use DB
>>> Persistence and DB backup/restore.
>>
>> So this tool is not fully usable with versioned content if one need to
>> move from one database vendor to another?
>
> JCR Backup uses JCR API for backup/restore, and does not provide any DB
> specific tools.
>
> --
> Ivan Latysh
> IvanLatysh@gmail.com
>



-- 
Paco Avila
GIT Consultors
tel: +34 971 498310
fax: +34 971496189
e-mail: pavila@git.es
http://www.git.es

Re: Performance of export/migration/import

Posted by Ivan Latysh <iv...@gmail.com>.
Torgeir Veimo wrote:

>>> I forgot an important question... Does it backup  / restore document
>>> version history?
>>
>> That is one major limitation, JCR does not provide an API to 
>> manipulate versions. JCR Backup will back up version history, but 
>> won't restore it.
>>
>> If you need to backup/restore versions, the only solution is to use DB 
>> Persistence and DB backup/restore.
> 
> So this tool is not fully usable with versioned content if one need to 
> move from one database vendor to another?

JCR Backup uses JCR API for backup/restore, and does not provide any DB specific tools.

-- 
Ivan Latysh
IvanLatysh@gmail.com

Re: Performance of export/migration/import

Posted by Torgeir Veimo <to...@pobox.com>.
On 5 Feb 2009, at 23:55, Ivan Latysh wrote:

> aco Avila wrote:
>
>> I forgot an important question... Does it backup  / restore document
>> version history?
>
> That is one major limitation, JCR does not provide an API to  
> manipulate versions. JCR Backup will back up version history, but  
> won't restore it.
>
> If you need to backup/restore versions, the only solution is to use  
> DB Persistence and DB backup/restore.


So this tool is not fully usable with versioned content if one need to  
move from one database vendor to another?

-- 
Torgeir Veimo
torgeir@pobox.com





Re: Performance of export/migration/import

Posted by Ivan Latysh <iv...@gmail.com>.
Paco Avila wrote:

> I forgot an important question... Does it backup  / restore document
> version history?

That is one major limitation, JCR does not provide an API to manipulate versions. JCR Backup will back up version 
history, but won't restore it.

If you need to backup/restore versions, the only solution is to use DB Persistence and DB backup/restore.

-- 
Ivan Latysh
IvanLatysh@gmail.com

Re: Performance of export/migration/import

Posted by Paco Avila <mo...@gmail.com>.
I forgot an important question... Does it backup  / restore document
version history?

On Thu, Feb 5, 2009 at 1:56 AM, IvanLatysh <iv...@yourmail.com> wrote:
> Paco Avila wrote:
>
>>> Does it works with recent Jackrabbit versions? I have seen that it run
>>> with Jackrabbit 1.3
>
> JCR Backup for JR1.4 and JR1.5 has been released today.
>
> --
> Ivan Latysh
> ivan@yourmail.com
>



-- 
Paco Avila
GIT Consultors
tel: +34 971 498310
fax: +34 971496189
e-mail: pavila@git.es
http://www.git.es

Re: Performance of export/migration/import

Posted by IvanLatysh <iv...@yourmail.com>.
Paco Avila wrote:

>> Does it works with recent Jackrabbit versions? I have seen that it run
>> with Jackrabbit 1.3

JCR Backup for JR1.4 and JR1.5 has been released today.

-- 
Ivan Latysh
ivan@yourmail.com

Re: Performance of export/migration/import

Posted by Ivan Latysh <iv...@gmail.com>.
Paco Avila wrote:
> Does it works with recent Jackrabbit versions? I have seen that it run
> with Jackrabbit 1.3

It runs with 1.4 for sure. Let me run it agains 1.5

P.S. Let's move this discussion to JCRBackup project forum.

-- 
Ivan Latysh
IvanLatysh@gmail.com

Re: Performance of export/migration/import

Posted by Paco Avila <mo...@gmail.com>.
Does it works with recent Jackrabbit versions? I have seen that it run
with Jackrabbit 1.3

On Wed, Feb 4, 2009 at 5:25 PM, Ivan Latysh <iv...@gmail.com> wrote:
> MARTINEZ Antonio wrote:
>
> JCRBackup can do all of it(http://sourceforge.net/projects/jcr-backup).
>
>> It is that time in our project were we need to support export/import
>> with data migration.
>> In our case, during migration:
>> - We actually need to change some parameters for most of the nodes
>> before we import.
>
> Data exported as XML, and binaries are exported in a native format into
> separate files,
> so if you can post-process data at any time.
>
>> - We definitely have to have Jackrabbit running during the export (and
>> preferably also during the import)
>
> JCRBackup will do it on a live repo, no problem.
>
>> - We need to be able to export/migrate/import both a subtree or the
>> entire repository.
>
> You can specify a node to start a backup from.
>
>> - We need to do and export/migration/import in under 3 hours for a 20G
>> repository (the index in this case is about 8G)
>
> JCRBackup will export the data, not indexes, indexes will be re-created by
> JCR on import.
>
> One thing that JCR-Backup can't do is versions.
>
> --
> Ivan Latysh
> IvanLatysh@gmail.com
>



-- 
Paco Avila
GIT Consultors
tel: +34 971 498310
fax: +34 971496189
e-mail: pavila@git.es
http://www.git.es

Re: Performance of export/migration/import

Posted by Ivan Latysh <iv...@gmail.com>.
MARTINEZ Antonio wrote:

JCRBackup can do all of it(http://sourceforge.net/projects/jcr-backup).

> It is that time in our project were we need to support export/import
> with data migration.
> In our case, during migration:
> - We actually need to change some parameters for most of the nodes
> before we import.
Data exported as XML, and binaries are exported in a native format into separate files,
so if you can post-process data at any time.

> - We definitely have to have Jackrabbit running during the export (and
> preferably also during the import)
JCRBackup will do it on a live repo, no problem.

> - We need to be able to export/migrate/import both a subtree or the
> entire repository.
You can specify a node to start a backup from.

> - We need to do and export/migration/import in under 3 hours for a 20G
> repository (the index in this case is about 8G)
JCRBackup will export the data, not indexes, indexes will be re-created by JCR on import.

One thing that JCR-Backup can't do is versions.

-- 
Ivan Latysh
IvanLatysh@gmail.com