You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by sam lee <sk...@gmail.com> on 2010/09/29 03:14:50 UTC

Batch Import from mysql?

Hey,

I need to migrate data stored in mysql to jackrabbit.
I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to do
so.

Is there a way to speed things up?

I was wondering if there is a way to directly output JCR files and copy
those to jackrabbit repo, instead of using API.

Thanks.
Sam

Re: Batch Import from mysql?

Posted by sam lee <sk...@gmail.com>.
I initially thought it'd be mysqldump to xml.
But, Node structure in JCR that I need is radically different from mysql
tables.
So, I think I need to spit out xml myself. And doing so would be very
similar to manually creating Nodes.

Is there a way to create document/system view xml without first writing
nodes?
I'm reading API, but the only way to write document/system view xml seems to
be using org.apache.jackrabbit.commons.xml package. And, if I already wrote
the Nodes, I imported :P



On Wed, Sep 29, 2010 at 11:35 AM, Justin Edelson <ju...@gmail.com>wrote:

> I think it depends upon how you do it.
>
> Using the import methods on the Workspace is workspace write, so it
> shouldn't consume much memory. The import methods on the Session could
> consume a significant amount of heap, depending upon the size of your
> import. However, Session.importXML / Session.getImportContentHandler
> should not consume significantly more memory than you would consume by
> creating those same nodes manually.
>
> But yes, node structure is important.
>
> Justin
>
> On 9/29/10 10:10 AM, Rakesh Vidyadharan wrote:
> > In my experience importing from the system view XML requires huge amounts
> of memory.  If the process requires hours to import from MySQL, I would
> think that heap space constraints would limit the import from XML option.
> >
> > To the OP, make sure that your node structure is partitioned.  If not,
> write performance will continue to degrade leading to very poor import
> performance.
> >
> > Rakesh
> >
> > On 28 Sep 2010, at 20:18, Justin Edelson wrote:
> >
> >> You can generate a system view XML file from your MySQL data and then
> >> import that in one shot.
> >>
> >> Justin
> >>
> >> On 9/28/10 9:14 PM, sam lee wrote:
> >>> Hey,
> >>>
> >>> I need to migrate data stored in mysql to jackrabbit.
> >>> I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to
> do
> >>> so.
> >>>
> >>> Is there a way to speed things up?
> >>>
> >>> I was wondering if there is a way to directly output JCR files and copy
> >>> those to jackrabbit repo, instead of using API.
> >>>
> >>> Thanks.
> >>> Sam
> >>>
> >>
> >
> > Rakesh Vidyadharan
> > President & CEO
> > Sans Pareil Technologies, Inc.
> > http://www.sptci.com/
> >
> >
> > | 100 W. Chestnut, Suite 1305 | Chicago, IL 60610-3296 USA |
> > | Ph: +1 (312) 212-3933 | Mobile: +1 (312) 315-1596 | Fax: +1 (312)
> 276-4410 | E-mail: rakesh@sptci.com
> >
> >
> >
>
>

Re: Batch Import from mysql?

Posted by Rakesh Vidyadharan <ra...@sptci.com>.
Using the workspace importXML method, I need about 2.5G of heap to restore from a 140Mb system export file (runs in a few minutes).  Importing the same data directly from MS Sql Server takes about 30 minutes but runs comfortably with 512Mb heap (may run with less, but I have not tested).  Note that  I am performing a session.save literally after each node was created (about 25K in all), so the time for the import could probably be cut down quite significantly by doing batch save.

Rakesh

On 29 Sep 2010, at 10:35, Justin Edelson wrote:

> I think it depends upon how you do it.
> 
> Using the import methods on the Workspace is workspace write, so it
> shouldn't consume much memory. The import methods on the Session could
> consume a significant amount of heap, depending upon the size of your
> import. However, Session.importXML / Session.getImportContentHandler
> should not consume significantly more memory than you would consume by
> creating those same nodes manually.
> 
> But yes, node structure is important.
> 
> Justin
> 
> On 9/29/10 10:10 AM, Rakesh Vidyadharan wrote:
>> In my experience importing from the system view XML requires huge amounts of memory.  If the process requires hours to import from MySQL, I would think that heap space constraints would limit the import from XML option.
>> 
>> To the OP, make sure that your node structure is partitioned.  If not, write performance will continue to degrade leading to very poor import performance.
>> 
>> Rakesh
>> 
>> On 28 Sep 2010, at 20:18, Justin Edelson wrote:
>> 
>>> You can generate a system view XML file from your MySQL data and then
>>> import that in one shot.
>>> 
>>> Justin
>>> 
>>> On 9/28/10 9:14 PM, sam lee wrote:
>>>> Hey,
>>>> 
>>>> I need to migrate data stored in mysql to jackrabbit.
>>>> I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to do
>>>> so.
>>>> 
>>>> Is there a way to speed things up?
>>>> 
>>>> I was wondering if there is a way to directly output JCR files and copy
>>>> those to jackrabbit repo, instead of using API.
>>>> 
>>>> Thanks.
>>>> Sam
>>>> 
>>> 
>> 
>> Rakesh Vidyadharan
>> President & CEO
>> Sans Pareil Technologies, Inc.
>> http://www.sptci.com/
>> 
>> 
>> | 100 W. Chestnut, Suite 1305 | Chicago, IL 60610-3296 USA |
>> | Ph: +1 (312) 212-3933 | Mobile: +1 (312) 315-1596 | Fax: +1 (312) 276-4410 | E-mail: rakesh@sptci.com
>> 
>> 
>> 
> 

Rakesh Vidyadharan
President & CEO
Sans Pareil Technologies, Inc.
http://www.sptci.com/


| 100 W. Chestnut, Suite 1305 | Chicago, IL 60610-3296 USA |
| Ph: +1 (312) 212-3933 | Mobile: +1 (312) 315-1596 | Fax: +1 (312) 276-4410 | E-mail: rakesh@sptci.com



Re: Batch Import from mysql?

Posted by Justin Edelson <ju...@gmail.com>.
I think it depends upon how you do it.

Using the import methods on the Workspace is workspace write, so it
shouldn't consume much memory. The import methods on the Session could
consume a significant amount of heap, depending upon the size of your
import. However, Session.importXML / Session.getImportContentHandler
should not consume significantly more memory than you would consume by
creating those same nodes manually.

But yes, node structure is important.

Justin

On 9/29/10 10:10 AM, Rakesh Vidyadharan wrote:
> In my experience importing from the system view XML requires huge amounts of memory.  If the process requires hours to import from MySQL, I would think that heap space constraints would limit the import from XML option.
> 
> To the OP, make sure that your node structure is partitioned.  If not, write performance will continue to degrade leading to very poor import performance.
> 
> Rakesh
> 
> On 28 Sep 2010, at 20:18, Justin Edelson wrote:
> 
>> You can generate a system view XML file from your MySQL data and then
>> import that in one shot.
>>
>> Justin
>>
>> On 9/28/10 9:14 PM, sam lee wrote:
>>> Hey,
>>>
>>> I need to migrate data stored in mysql to jackrabbit.
>>> I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to do
>>> so.
>>>
>>> Is there a way to speed things up?
>>>
>>> I was wondering if there is a way to directly output JCR files and copy
>>> those to jackrabbit repo, instead of using API.
>>>
>>> Thanks.
>>> Sam
>>>
>>
> 
> Rakesh Vidyadharan
> President & CEO
> Sans Pareil Technologies, Inc.
> http://www.sptci.com/
> 
> 
> | 100 W. Chestnut, Suite 1305 | Chicago, IL 60610-3296 USA |
> | Ph: +1 (312) 212-3933 | Mobile: +1 (312) 315-1596 | Fax: +1 (312) 276-4410 | E-mail: rakesh@sptci.com
> 
> 
> 


Re: Batch Import from mysql?

Posted by sam lee <sk...@gmail.com>.
Ah, I see.
And, I think it's too much work to output XML that I want from MySQL.
I'll look into partitioning.

Thank you.
Sam

On Wed, Sep 29, 2010 at 10:10 AM, Rakesh Vidyadharan <ra...@sptci.com>wrote:

> In my experience importing from the system view XML requires huge amounts
> of memory.  If the process requires hours to import from MySQL, I would
> think that heap space constraints would limit the import from XML option.
>
> To the OP, make sure that your node structure is partitioned.  If not,
> write performance will continue to degrade leading to very poor import
> performance.
>
> Rakesh
>
> On 28 Sep 2010, at 20:18, Justin Edelson wrote:
>
> > You can generate a system view XML file from your MySQL data and then
> > import that in one shot.
> >
> > Justin
> >
> > On 9/28/10 9:14 PM, sam lee wrote:
> >> Hey,
> >>
> >> I need to migrate data stored in mysql to jackrabbit.
> >> I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to
> do
> >> so.
> >>
> >> Is there a way to speed things up?
> >>
> >> I was wondering if there is a way to directly output JCR files and copy
> >> those to jackrabbit repo, instead of using API.
> >>
> >> Thanks.
> >> Sam
> >>
> >
>
> Rakesh Vidyadharan
> President & CEO
> Sans Pareil Technologies, Inc.
> http://www.sptci.com/
>
>
> | 100 W. Chestnut, Suite 1305 | Chicago, IL 60610-3296 USA |
> | Ph: +1 (312) 212-3933 | Mobile: +1 (312) 315-1596 | Fax: +1 (312)
> 276-4410 | E-mail: rakesh@sptci.com
>
>
>

Re: Batch Import from mysql?

Posted by Rakesh Vidyadharan <ra...@sptci.com>.
In my experience importing from the system view XML requires huge amounts of memory.  If the process requires hours to import from MySQL, I would think that heap space constraints would limit the import from XML option.

To the OP, make sure that your node structure is partitioned.  If not, write performance will continue to degrade leading to very poor import performance.

Rakesh

On 28 Sep 2010, at 20:18, Justin Edelson wrote:

> You can generate a system view XML file from your MySQL data and then
> import that in one shot.
> 
> Justin
> 
> On 9/28/10 9:14 PM, sam lee wrote:
>> Hey,
>> 
>> I need to migrate data stored in mysql to jackrabbit.
>> I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to do
>> so.
>> 
>> Is there a way to speed things up?
>> 
>> I was wondering if there is a way to directly output JCR files and copy
>> those to jackrabbit repo, instead of using API.
>> 
>> Thanks.
>> Sam
>> 
> 

Rakesh Vidyadharan
President & CEO
Sans Pareil Technologies, Inc.
http://www.sptci.com/


| 100 W. Chestnut, Suite 1305 | Chicago, IL 60610-3296 USA |
| Ph: +1 (312) 212-3933 | Mobile: +1 (312) 315-1596 | Fax: +1 (312) 276-4410 | E-mail: rakesh@sptci.com



Re: Batch Import from mysql?

Posted by Justin Edelson <ju...@gmail.com>.
It is defined in Section 7 of JSR 283 :
http://www.day.com/specs/jcr/2.0/7_Export.html





On 9/29/10 9:19 AM, sam lee wrote:
> Hey,
> 
> What is "system view XML" ?
> My Google hadouken failed.
> 
> thanks.
> sam
> 
> On Tue, Sep 28, 2010 at 9:18 PM, Justin Edelson <justinedelson@gmail.com
> <ma...@gmail.com>> wrote:
> 
>     You can generate a system view XML file from your MySQL data and then
>     import that in one shot.
> 
>     Justin
> 
>     On 9/28/10 9:14 PM, sam lee wrote:
>     > Hey,
>     >
>     > I need to migrate data stored in mysql to jackrabbit.
>     > I'm using JCR API  (Node.addNode().. etc). And it takes many
>      hours to do
>     > so.
>     >
>     > Is there a way to speed things up?
>     >
>     > I was wondering if there is a way to directly output JCR files and
>     copy
>     > those to jackrabbit repo, instead of using API.
>     >
>     > Thanks.
>     > Sam
>     >
> 
> 


Re: Batch Import from mysql?

Posted by sam lee <sk...@gmail.com>.
Hey,

What is "system view XML" ?
My Google hadouken failed.

thanks.
sam

On Tue, Sep 28, 2010 at 9:18 PM, Justin Edelson <ju...@gmail.com>wrote:

> You can generate a system view XML file from your MySQL data and then
> import that in one shot.
>
> Justin
>
> On 9/28/10 9:14 PM, sam lee wrote:
> > Hey,
> >
> > I need to migrate data stored in mysql to jackrabbit.
> > I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to do
> > so.
> >
> > Is there a way to speed things up?
> >
> > I was wondering if there is a way to directly output JCR files and copy
> > those to jackrabbit repo, instead of using API.
> >
> > Thanks.
> > Sam
> >
>
>

Re: Batch Import from mysql?

Posted by Justin Edelson <ju...@gmail.com>.
You can generate a system view XML file from your MySQL data and then
import that in one shot.

Justin

On 9/28/10 9:14 PM, sam lee wrote:
> Hey,
> 
> I need to migrate data stored in mysql to jackrabbit.
> I'm using JCR API  (Node.addNode().. etc). And it takes many  hours to do
> so.
> 
> Is there a way to speed things up?
> 
> I was wondering if there is a way to directly output JCR files and copy
> those to jackrabbit repo, instead of using API.
> 
> Thanks.
> Sam
>