You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Stefan Kurla <st...@gmail.com> on 2007/04/24 18:27:23 UTC

importing jackrabbit into jackrabbit

I am trying to import my jackrabbit svn directory into jackrabbit.
This dir has a few extra files like the jackrabbit 1.0 release.
Overall we are talking about 713MB on disk with 103K files and 48K
folders.

I use mysql for persistence and the only thing that gets saved in the
filesystem are the indexes. I do a session.save() after importing each
file and then I check it in.

However when I import all these files into mysql 5.0 (default
everything) , I get a failed to write node references: and

com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for
column 'REFS_DATA' at row 1
	at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2868)

error on the server side.

On the client (rmi) side, the error is
javax.jcr.RepositoryException: /: unable to update item.: failed to
write node references: d5f7e01d-1d68-470e-ba68-02b503754b68
	at org.apache.jackrabbit.rmi.server.ServerObject.getRepositoryException(ServerObject.java:136)


This happens when the totalFiles imported is 184 and the
totalImportSize=437,643 bytes (437KB) and the totalDirs imported are
240.

Something is not right.....  jackrabbit cannot croak at 437KB, can it?

Please advice.

Re: importing jackrabbit into jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 4/27/07, Stefan Kurla <st...@gmail.com> wrote:
> It helps and yes this does rub me the wrong way, not because I believe
> in "structure first" but because I believe that all we are doing here
> is a workaround and not addressing the issue. Frankly, I am surprised
> how this issue has not percolated up in the chain of priority items to fix.

My experience with converting relational schemas (mostly in content
and document management domains) to JCR content models has been that a
large majority of "foreign key" references are best handled by
hierarchy structures and that the remaining cases can be handed either
with JCR references or things like mixin type markers.

I believe that the number of cases where the current Jackrabbit limit
on the reference count is a bottleneck is quite low, so this hasn't
been a high priority for now. But the bottleneck still exists and you
are right in raising the issue. Perhaps we can see some renewed
activity on JCR-657.

BR,

Jukka Zitting

Re: importing jackrabbit into jackrabbit

Posted by Stefan Kurla <st...@gmail.com>.

Hi David,

Thank you for your post and explanation. Very nicely put.

It helps and yes this does rub me the wrong way, not because I believe
in "structure first" but because I believe that all we are doing here
is a workaround and not addressing the issue. Frankly, I am surprised
how this issue has not percolated up in the chain of priority items to
fix.

> > I think that the main problem is not really about the specific case,
> > but in general that when people design relational databases, they
> > always use references (or more properly, joins) to define data that
> > belongs logically to many entities, but should not duplicated.
> I completely agree with your statement.
> And I think this is one of the biggest challenges that we are
> going to face.
> People are thinking within the facilities provided by a relational
> database and within the data modeling practices that they have
> been using for decades now. Which is very understandable.
> A content repository offers much richer facilities for content modelling
> primarily through features like a hierarchy, multi-value properties or
> even features like sorted children which in an RDBMS world have
> to be modeled by the application developer.
>
> > Imagine that you have a company tree, with "positions",
> > "departments", "employees", "health plans" etc.
> > An employee could belong to a department, have a position and an
> > health plan, but typically you would not make all those nodes child
> > nodes of the employee: you would instead define references to the
> > proper node in the "position" and "health plan" subtrees.
> I think one-to-many relationship should be modeled as a hierarchy.
> So my initial gut feeling would be a datamodel like this:
> /bigco
> /bigco/marketingdept
> /bigco/marketingdept/joeshmoe
>
> and "joeshmoe" would be of nodetype
>
> [bigco:employee]
> - position
> - healthplan
>
> Now "position", "healthplan" are many-to-many relationships.
> I think that those can either be modeled as references, paths,
> names or strings.
> People that come from a "hard structured" RDBMS background
> very often think that a reference is the only option.
>
> For example "position" might very well be a "string" or a "name"
> if the application can deal with the fact that information is "dangling".
>
> If we continue to model the above tree with...
>
> /bigco/positions/
> /bigco/positions/secretary
> /bigco/positions/svp
>
> ... I think I would personally choose to store a "string"-property that is
> human readable thats actually the name of the target node in
> /bigco/positions.
> So i would store "svp" or "secretary" in the position property.
>
> Since I would not use namespaces for the names of the children
> in "positions" I would not need the overhead of true name property in
> my employee node.

This is a good workaround, and an excellent example.

>
> While this probably rubs a lot "structure first" people the wrong
> way I prefer this model since the information carried in the
> string "secretary" is still valuable even if it is "dangling".
> (...opposed to some UUID)
>
> I think it is important to understand that there certainly are use cases
> where referential integrity is very important, but it is important to understand
> that it comes at a price.
> Both in performance and even more importantly it constrains the
> flexibility of your applications from a "data-first" perspective.
>
> > What could be the right way to model things? Maybe using a "path"
> > property to point to the node instead? Of course, it would not be as
> > easy to use as a reference, and it would be requiring global updates
> > if the pointed node ever change position, but I can't see other options.
> If you would like to protect against "move"-operations but wants to avoid
> the overhead of referential integrity, you can store the UUID of the target
> in a string property. In JSR-283 we are looking at a "weak-reference" to
> express a reference that can dangle in a more formal way.
>
> > It's easy to see how, in a large company, there could be thousands of
> > employee holding the same position and health plan, and those
> > specific nodes ("Secretary" and "Plan A")  would have thousand of
> > references pointing to them.
> > So, given the issue  as explained by Marcel that "whenever a
> > reference is added that points to a node N the complete set of
> > references pointing to N is re-written to the persistence manager",
> > it seems that using references to a node that is very "popular" is
> > really going to be creating problems in the long term.
> Agreed. And I think we will not be able to re-educate everybody with
> an RDBMS background before using Jackrabbit so I think Jackrabbit has
> to be able to deal with very large quantities of references in a very
> efficient way.
> So I would recommend to fix that as noted by Tom in the last sentence of:
> http://issues.apache.org/jira/browse/JCR-657


"re-educate"- I do not think that this has anything to do with RDBMS,
this is basic filing and bookkeeping procedures here. Say you are the
CEO of the company and your reference is in multiple contracts and
legal proceedings that the company is involved in. Would you not want
to keep a reference to the master file in the contracts that are being
filed as a manner of upkeeping.

+1 on fixing this problem.

Re: importing jackrabbit into jackrabbit

Posted by Alessandro Bologna <al...@gmail.com>.

I think that David did say that there are use cases when referential
integrity is important. I took his comment about "re-educating" in
this context, and honestly, I tend to agree that in some case DB folks
use super normalized schemas when maybe it would not have been needed.
Sometime I feel that, left on their own, they would get rid on
integers, because you can always replace them with references to a
table of ints (from 1 to 1e10, I guess).

Just half kidding.

I do agree that the best thing right now is to "fix" the behavior in
Jackrabbit, leveraging delta changes, and at the same time, maybe
create a nice entry on the wiki about when to use and not to use
references.

Alessandro

On 4/27/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 4/27/07, Tako Schotanus <qu...@gmail.com> wrote:
> > So yes, I agree that Jackrabbit should try to make this use-case perform
> > well but not because of uneducated RDMS programmers but because
> > sometimes a lot of references are necessary and no amount of redesigning
> > your data structures will help.
>
> +1
>
> BR,
>
> Jukka Zitting
>

Re: importing jackrabbit into jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 4/27/07, Tako Schotanus <qu...@gmail.com> wrote:
> So yes, I agree that Jackrabbit should try to make this use-case perform
> well but not because of uneducated RDMS programmers but because
> sometimes a lot of references are necessary and no amount of redesigning
> your data structures will help.

+1

BR,

Jukka Zitting

Re: importing jackrabbit into jackrabbit

Posted by Tako Schotanus <qu...@gmail.com>.

On 4/27/07, David Nuescheler <da...@gmail.com> wrote:
>
>
>
> > It's easy to see how, in a large company, there could be thousands of
> > employee holding the same position and health plan, and those
> > specific nodes ("Secretary" and "Plan A")  would have thousand of
> > references pointing to them.
> > So, given the issue  as explained by Marcel that "whenever a
> > reference is added that points to a node N the complete set of
> > references pointing to N is re-written to the persistence manager",
> > it seems that using references to a node that is very "popular" is
> > really going to be creating problems in the long term.
> Agreed. And I think we will not be able to re-educate everybody with
> an RDBMS background before using Jackrabbit so I think Jackrabbit has
> to be able to deal with very large quantities of references in a very
> efficient way.

I think "re-educate" is not the proper word to use here, there are very
valid reasons for wanting referential integrity. Making a system that
handles the health and pension plans for thousands of employees in a large
organization might require exactly that level of data-security. Saying you
can do the same in (newly made, largely untried) code as a (well-known,
stable, proven) RDMS might not convince everyone, especially when the data
is very important.

So yes, I agree that Jackrabbit should try to make this use-case perform
well but not because of uneducated RDMS programmers but because sometimes a
lot of references are necessary and no amount of redesigning your data
structures will help.

Cheers,
 -Tako

Re: importing jackrabbit into jackrabbit

Posted by David Nuescheler <da...@gmail.com>.

Hi Alessandro,

thanks a lot for your thoughtful mail.
I think you hit the nail right on the head.

> I think that the main problem is not really about the specific case,
> but in general that when people design relational databases, they
> always use references (or more properly, joins) to define data that
> belongs logically to many entities, but should not duplicated.
I completely agree with your statement.
And I think this is one of the biggest challenges that we are
going to face.
People are thinking within the facilities provided by a relational
database and within the data modeling practices that they have
been using for decades now. Which is very understandable.
A content repository offers much richer facilities for content modelling
primarily through features like a hierarchy, multi-value properties or
even features like sorted children which in an RDBMS world have
to be modeled by the application developer.

> Imagine that you have a company tree, with "positions",
> "departments", "employees", "health plans" etc.
> An employee could belong to a department, have a position and an
> health plan, but typically you would not make all those nodes child
> nodes of the employee: you would instead define references to the
> proper node in the "position" and "health plan" subtrees.
I think one-to-many relationship should be modeled as a hierarchy.
So my initial gut feeling would be a datamodel like this:
/bigco
/bigco/marketingdept
/bigco/marketingdept/joeshmoe

and "joeshmoe" would be of nodetype

[bigco:employee]
- position
- healthplan

Now "position", "healthplan" are many-to-many relationships.
I think that those can either be modeled as references, paths,
names or strings.
People that come from a "hard structured" RDBMS background
very often think that a reference is the only option.

For example "position" might very well be a "string" or a "name"
if the application can deal with the fact that information is "dangling".

If we continue to model the above tree with...

/bigco/positions/
/bigco/positions/secretary
/bigco/positions/svp

... I think I would personally choose to store a "string"-property that is
human readable thats actually the name of the target node in
/bigco/positions.
So i would store "svp" or "secretary" in the position property.

Since I would not use namespaces for the names of the children
in "positions" I would not need the overhead of true name property in
my employee node.

While this probably rubs a lot "structure first" people the wrong
way I prefer this model since the information carried in the
string "secretary" is still valuable even if it is "dangling".
(...opposed to some UUID)

I think it is important to understand that there certainly are use cases
where referential integrity is very important, but it is important to understand
that it comes at a price.
Both in performance and even more importantly it constrains the
flexibility of your applications from a "data-first" perspective.

> What could be the right way to model things? Maybe using a "path"
> property to point to the node instead? Of course, it would not be as
> easy to use as a reference, and it would be requiring global updates
> if the pointed node ever change position, but I can't see other options.
If you would like to protect against "move"-operations but wants to avoid
the overhead of referential integrity, you can store the UUID of the target
in a string property. In JSR-283 we are looking at a "weak-reference" to
express a reference that can dangle in a more formal way.

> It's easy to see how, in a large company, there could be thousands of
> employee holding the same position and health plan, and those
> specific nodes ("Secretary" and "Plan A")  would have thousand of
> references pointing to them.
> So, given the issue  as explained by Marcel that "whenever a
> reference is added that points to a node N the complete set of
> references pointing to N is re-written to the persistence manager",
> it seems that using references to a node that is very "popular" is
> really going to be creating problems in the long term.
Agreed. And I think we will not be able to re-educate everybody with
an RDBMS background before using Jackrabbit so I think Jackrabbit has
to be able to deal with very large quantities of references in a very
efficient way.
So I would recommend to fix that as noted by Tom in the last sentence of:
http://issues.apache.org/jira/browse/JCR-657

regards,
david

Re: importing jackrabbit into jackrabbit

Posted by Alessandro Bologna <al...@gmail.com>.

I think that the main problem is not really about the specific case,  
but in general that when people design relational databases, they  
always use references (or more properly, joins) to define data that  
belongs logically to many entities, but should not duplicated.

Imagine that you have a company tree, with "positions",  
"departments", "employees", "health plans" etc.
An employee could belong to a department, have a position and an  
health plan, but typically you would not make all those nodes child  
nodes of the employee: you would instead define references to the  
proper node in the "position" and "health plan" subtrees.
It's easy to see how, in a large company, there could be thousands of  
employee holding the same position and health plan, and those  
specific nodes ("Secretary" and "Plan A")  would have thousand of  
references pointing to them.
So, given the issue  as explained by Marcel that "whenever a  
reference is added that points to a node N the complete set of  
references pointing to N is re-written to the persistence manager",  
it seems that using references to a node that is very "popular" is  
really going to be creating problems in the long term.

What could be the right way to model things? Maybe using a "path"  
property to point to the node instead? Of course, it would not be as  
easy to use as a reference, and it would be requiring global updates  
if the pointed node ever change position, but I can't see other options.

Any suggestions?

Alessandro Bologna

On Apr 26, 2007, at 2:38 PM, Jukka Zitting wrote:

> Hi,
>
> On 4/26/07, Stefan Kurla <st...@gmail.com> wrote:
>> I would appreciate the thoughts on references though. Reason being
>> that one of the biggest strengths of JSR-170 is the ability to store
>> references. I imagine a situation where i could have a nodetype call
>> docType which is either pdf or word strings. Say 80% of my documents
>> are word documents. Then the docType will have a reference to 80% of
>> all documents in my repository. If my repository is 100,000 files  
>> then
>> docType references 80,000 nodes.
>>
>> If what you say is correct that at every new reference, the complete
>> set of references are rewritten, then obviously this is a bottleneck.
>>
>> Should such a situation be avoided?
>
> Why would you need to use such references structure? I would rather
> use the node types to model such information. A search query like
> //element(*,my:wordDocument) will efficiently return you all such Word
> documents in your workspace.
>
> BR,
>
> Jukka Zitting

Re: importing jackrabbit into jackrabbit

Posted by Stefan Kurla <st...@gmail.com>.

Hi,

On 4/26/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 4/26/07, Stefan Kurla <st...@gmail.com> wrote:
> > I would appreciate the thoughts on references though. Reason being
> > that one of the biggest strengths of JSR-170 is the ability to store
> > references. I imagine a situation where i could have a nodetype call
> > docType which is either pdf or word strings. Say 80% of my documents
> > are word documents. Then the docType will have a reference to 80% of
> > all documents in my repository. If my repository is 100,000 files then
> > docType references 80,000 nodes.
> >
> > If what you say is correct that at every new reference, the complete
> > set of references are rewritten, then obviously this is a bottleneck.
> >
> > Should such a situation be avoided?
>
> Why would you need to use such references structure? I would rather
> use the node types to model such information. A search query like
> //element(*,my:wordDocument) will efficiently return you all such Word
> documents in your workspace.
>
Maybe not in that specific case, for example what about the situation
where you need to control access to all the my:document nodetypes. And
say a superuser needs to be granted access to all documents in the
repository (1M documents) then this instance of my:user security
property could have 1M UUIDs in the reference structure.

Should this situation be avoided?

Re: importing jackrabbit into jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 4/26/07, Stefan Kurla <st...@gmail.com> wrote:
> I would appreciate the thoughts on references though. Reason being
> that one of the biggest strengths of JSR-170 is the ability to store
> references. I imagine a situation where i could have a nodetype call
> docType which is either pdf or word strings. Say 80% of my documents
> are word documents. Then the docType will have a reference to 80% of
> all documents in my repository. If my repository is 100,000 files then
> docType references 80,000 nodes.
>
> If what you say is correct that at every new reference, the complete
> set of references are rewritten, then obviously this is a bottleneck.
>
> Should such a situation be avoided?

Why would you need to use such references structure? I would rather
use the node types to model such information. A search query like
//element(*,my:wordDocument) will efficiently return you all such Word
documents in your workspace.

BR,

Jukka Zitting

Re: importing jackrabbit into jackrabbit

Posted by Stefan Kurla <st...@gmail.com>.

> Stefan Kurla wrote:
> > As far as the file nodetype is concerned, this is a custom nodetype
> > which has 4 references per file imported and currently, all the
> > references are made to the same UUID since we are testing, this could
> > change in the future.
>
> this may be the time consuming factor. whenever a reference is added that points
> to a node N the complete set of references pointing to N is re-written to the
> persistence manager. with increasing number of references to N this will slow
> down your import. is there a reason why all files point to the same node?
>
Imagine that you are the admin node N and you have access to every
file in the system. That could be one reason why all the nodes could
have references to N. This is when you have security structure inside
your workspace.

If this is the case, would it be wise to take the security out of the
workspace and store security infomation in a separate DB or workspace?

> > Any tips or ideas? I will update the results of the test. Right now I
> > have imported 1K out of 12K files and the import time has gone up to 4
> > seconds per file. Is this normal? Remember since I am importing the
> > jackrabbit SVN all files are put under one nt:folder which is
> > "jackrabbit". This is a pretty normal case of about 12K files and only
> > 78MB. We have plans of a 1TB repository.
>
> I did a quick test with an adapted version of
> http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-core/src/test/java/org/apache/jackrabbit/core/query/TextExtractorTest.java
> that saves changes whenever 100 files have been imported.
>
> I used the svn export of jackrabbit/trunk (~3000 files in ~900 folders)
>
> configuration:
> - jackrabbit in-process
> - o.a.j.c.persistence.db.DerbyPersistenceManager (externalBlobs = false)
> - text extractors: pdf, xml and plain text
>
> test result:
>
> Imported 2978 files in 50484 ms.

I ran the test case in a main class and got the repository over RMI
running over localhost and connecting to mysql running over localhost.
Test case size - 2226 files and 136MB, took 397 seconds.

Will try this test case with my code and node type structure now. Will
keep this thread updated.

I would appreciate the thoughts on references though. Reason being
that one of the biggest strengths of JSR-170 is the ability to store
references. I imagine a situation where i could have a nodetype call
docType which is either pdf or word strings. Say 80% of my documents
are word documents. Then the docType will have a reference to 80% of
all documents in my repository. If my repository is 100,000 files then
docType references 80,000 nodes.

If what you say is correct that at every new reference, the complete
set of references are rewritten, then obviously this is a bottleneck.

Should such a situation be avoided?

Thanks.
>
> regards
>   marcel
>

Re: importing jackrabbit into jackrabbit

Posted by Marcel Reutegger <ma...@gmx.net>.

Stefan Kurla wrote:
> As far as the file nodetype is concerned, this is a custom nodetype
> which has 4 references per file imported and currently, all the
> references are made to the same UUID since we are testing, this could
> change in the future.

this may be the time consuming factor. whenever a reference is added that points 
to a node N the complete set of references pointing to N is re-written to the 
persistence manager. with increasing number of references to N this will slow 
down your import. is there a reason why all files point to the same node?

> Any tips or ideas? I will update the results of the test. Right now I
> have imported 1K out of 12K files and the import time has gone up to 4
> seconds per file. Is this normal? Remember since I am importing the
> jackrabbit SVN all files are put under one nt:folder which is
> "jackrabbit". This is a pretty normal case of about 12K files and only
> 78MB. We have plans of a 1TB repository.

I did a quick test with an adapted version of 
http://svn.apache.org/repos/asf/jackrabbit/trunk/jackrabbit-core/src/test/java/org/apache/jackrabbit/core/query/TextExtractorTest.java
that saves changes whenever 100 files have been imported.

I used the svn export of jackrabbit/trunk (~3000 files in ~900 folders)

configuration:
- jackrabbit in-process
- o.a.j.c.persistence.db.DerbyPersistenceManager (externalBlobs = false)
- text extractors: pdf, xml and plain text

test result:

Imported 2978 files in 50484 ms.

regards
  marcel

Re: importing jackrabbit into jackrabbit

Posted by Stefan Kurla <st...@gmail.com>.

Hi,

Thanks for the link. I found the link yesterday and made the changes.
I ran the exact same configuration against Mysql with mediumblobs and
ran the importing system overnight. Again, we are talking about the
same number of files. The test machine that I was running on had only
512MB RAM. Maybe because of that each new file input was taking about
30 seconds on average to import into the repository, I was upto
importing 1K files when I stopped the import.

So I am now testing the importing system against a machine which has
2GB RAM. The data that is being imported is 78.6 MB on disk and has
12K file and 8K folders (the fresh SVN update from jackrabbit). The
importer is running using RMI to get to the tomcat server on the
localhost. MYSQL is also onthe localhost.

The import has been running over the last hour and I see at file
number 500 the import time has gone upto 3 seconds per file and mysql
process is running at 30% CPU every couple of seconds.

As far as the file nodetype is concerned, this is a custom nodetype
which has 4 references per file imported and currently, all the
references are made to the same UUID since we are testing, this could
change in the future.

Any tips or ideas? I will update the results of the test. Right now I
have imported 1K out of 12K files and the import time has gone up to 4
seconds per file. Is this normal? Remember since I am importing the
jackrabbit SVN all files are put under one nt:folder which is
"jackrabbit". This is a pretty normal case of about 12K files and only
78MB. We have plans of a 1TB repository.

Stefan.

On 4/25/07, Stefan Guggisberg <st...@gmail.com> wrote:
> hi stefan,
>
> On 4/24/07, Stefan Kurla <st...@gmail.com> wrote:
> > I am trying to import my jackrabbit svn directory into jackrabbit.
> > This dir has a few extra files like the jackrabbit 1.0 release.
> > Overall we are talking about 713MB on disk with 103K files and 48K
> > folders.
> >
> > I use mysql for persistence and the only thing that gets saved in the
> > filesystem are the indexes. I do a session.save() after importing each
> > file and then I check it in.
> >
> > However when I import all these files into mysql 5.0 (default
> > everything) , I get a failed to write node references: and
> >
> > com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for
> > column 'REFS_DATA' at row 1
> >         at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2868)
>
> you probably hit the size limit of the 'blob' data type. for more information
> see https://issues.apache.org/jira/browse/JCR-760. please note that this
> issue has been fixed in the latest 1.3 release. you can also make those
> changes on an existing database using 'alter table' commands in the
> mysql console.
>
> however, the previous schema allowed for rouglhy 800-1000 references
> (depending on the ref. property name size) on a given target node.
>
> do you explicitly create references? can you share same code (fragements)?
>
> cheers
> stefan
>
>
> >
> > error on the server side.
> >
> > On the client (rmi) side, the error is
> > javax.jcr.RepositoryException: /: unable to update item.: failed to
> > write node references: d5f7e01d-1d68-470e-ba68-02b503754b68
> >         at org.apache.jackrabbit.rmi.server.ServerObject.getRepositoryException(ServerObject.java:136)
> >
> >
> > This happens when the totalFiles imported is 184 and the
> > totalImportSize=437,643 bytes (437KB) and the totalDirs imported are
> > 240.
> >
> > Something is not right.....  jackrabbit cannot croak at 437KB, can it?
> >
> > Please advice.
> >
>

Re: importing jackrabbit into jackrabbit

Posted by Stefan Guggisberg <st...@gmail.com>.

hi stefan,

On 4/24/07, Stefan Kurla <st...@gmail.com> wrote:
> I am trying to import my jackrabbit svn directory into jackrabbit.
> This dir has a few extra files like the jackrabbit 1.0 release.
> Overall we are talking about 713MB on disk with 103K files and 48K
> folders.
>
> I use mysql for persistence and the only thing that gets saved in the
> filesystem are the indexes. I do a session.save() after importing each
> file and then I check it in.
>
> However when I import all these files into mysql 5.0 (default
> everything) , I get a failed to write node references: and
>
> com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for
> column 'REFS_DATA' at row 1
>         at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:2868)

you probably hit the size limit of the 'blob' data type. for more information
see https://issues.apache.org/jira/browse/JCR-760. please note that this
issue has been fixed in the latest 1.3 release. you can also make those
changes on an existing database using 'alter table' commands in the
mysql console.

however, the previous schema allowed for rouglhy 800-1000 references
(depending on the ref. property name size) on a given target node.

do you explicitly create references? can you share same code (fragements)?

cheers
stefan


>
> error on the server side.
>
> On the client (rmi) side, the error is
> javax.jcr.RepositoryException: /: unable to update item.: failed to
> write node references: d5f7e01d-1d68-470e-ba68-02b503754b68
>         at org.apache.jackrabbit.rmi.server.ServerObject.getRepositoryException(ServerObject.java:136)
>
>
> This happens when the totalFiles imported is 184 and the
> totalImportSize=437,643 bytes (437KB) and the totalDirs imported are
> 240.
>
> Something is not right.....  jackrabbit cannot croak at 437KB, can it?
>
> Please advice.
>