You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ofbiz.apache.org by Jonathon -- Improov <jo...@improov.com> on 2007/04/13 06:06:39 UTC

MySQL connection times out, how about PostgreSQL?

MySQL connections in the database connection pool time out (usually set to 8 hours). Possible 
workarounds(?):

1. Increase the time out value and hope that someone will connect to OFBiz
    before the connections time out.

2. Fix OFBiz to allow to a "validateQuery" mechanism.

3. Use PostgreSQL.

In Tomcat, we usually use the "validateQuery" so the DBCP will test each connection before giving 
it to the application. If all connections in the pool has timed out (say no one has accessed OFBiz 
in 8 hours), the DBCP creates new connections for the pool.

If someone will tell me that this doesn't happen for PostgreSQL, I'll simply make the switch to 
PostgreSQL rather than fix things in OFBiz for MySQL.

Thanks.

Jonathon

Re: MySQL connection times out, how about PostgreSQL?

Posted by Jonathon -- Improov <jo...@improov.com>.
Hans,

Thanks. That's the precise fix I needed. Why isn't it in the SVN trunk?

Jonathon

Hans Holmlund wrote:
> Read this post:
> *http://www.nabble.com/MySQL-connection-problem-tf2486297.html#a9092103
> 
> / Hans
> *
> Jonathon -- Improov skrev:
>> MySQL connections in the database connection pool time out (usually 
>> set to 8 hours). Possible workarounds(?):
>>
>> 1. Increase the time out value and hope that someone will connect to 
>> OFBiz
>>    before the connections time out.
>>
>> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
>>
>> 3. Use PostgreSQL.
>>
>> In Tomcat, we usually use the "validateQuery" so the DBCP will test 
>> each connection before giving it to the application. If all 
>> connections in the pool has timed out (say no one has accessed OFBiz 
>> in 8 hours), the DBCP creates new connections for the pool.
>>
>> If someone will tell me that this doesn't happen for PostgreSQL, I'll 
>> simply make the switch to PostgreSQL rather than fix things in OFBiz 
>> for MySQL.
>>
>> Thanks.
>>
>> Jonathon
>>
>>
> 
> 


Re: MySQL connection times out, how about PostgreSQL?

Posted by Hans Holmlund <ha...@excito.se>.
Read this post:
*http://www.nabble.com/MySQL-connection-problem-tf2486297.html#a9092103

/ Hans
*
Jonathon -- Improov skrev:
> MySQL connections in the database connection pool time out (usually 
> set to 8 hours). Possible workarounds(?):
>
> 1. Increase the time out value and hope that someone will connect to 
> OFBiz
>    before the connections time out.
>
> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
>
> 3. Use PostgreSQL.
>
> In Tomcat, we usually use the "validateQuery" so the DBCP will test 
> each connection before giving it to the application. If all 
> connections in the pool has timed out (say no one has accessed OFBiz 
> in 8 hours), the DBCP creates new connections for the pool.
>
> If someone will tell me that this doesn't happen for PostgreSQL, I'll 
> simply make the switch to PostgreSQL rather than fix things in OFBiz 
> for MySQL.
>
> Thanks.
>
> Jonathon
>
>


Re: MySQL connection times out, how about PostgreSQL?

Posted by Jonathon -- Improov <jo...@improov.com>.
Sorry, it's "validationQuery", not validateQuery.

Jonathon

Jonathon -- Improov wrote:
> MySQL connections in the database connection pool time out (usually set 
> to 8 hours). Possible workarounds(?):
> 
> 1. Increase the time out value and hope that someone will connect to OFBiz
>    before the connections time out.
> 
> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
> 
> 3. Use PostgreSQL.
> 
> In Tomcat, we usually use the "validateQuery" so the DBCP will test each 
> connection before giving it to the application. If all connections in 
> the pool has timed out (say no one has accessed OFBiz in 8 hours), the 
> DBCP creates new connections for the pool.
> 
> If someone will tell me that this doesn't happen for PostgreSQL, I'll 
> simply make the switch to PostgreSQL rather than fix things in OFBiz for 
> MySQL.
> 
> Thanks.
> 
> Jonathon
> 


Re: MySQL connection times out, how about PostgreSQL?

Posted by Jonathon -- Improov <jo...@improov.com>.
Chris, Hans,

Hans, first, thanks for your link to:

http://www.nabble.com/MySQL-connection-problem-tf2486297.html#a9092103

FYI to all. I just tested the fix. Doesn't work. I still need the XAPoolDataSource source codes to 
fix this.

Did Katherine Morgan post anything else? The fix was suggested by her.

For those of you using MySQL, if you wanna test this out, edit your my.cnf file, maybe at 
/etc/my.cnf. Under [mysqld], enter "wait_timeout=60". When OFBiz is fully started up, wait for 1 
minute, then try to do a login.

Don't set it to 10 seconds, or you might never get it to work even if the DBCP functions 
correctly! OFBiz processes may take out a DB connection for more than 10 seconds, you never know.

Jonathon

Jonathon -- Improov wrote:
> Chris,
> 
> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data 
> migration can be a pain, even without data-mapping efforts to go with it 
> (ie, same structure migrated to same structure).
> 
>  > My limited reading through mailing lists on mysql and postgres show 
> that the
>  > timeout handling should be handled by the client (ofbiz) and that mysql
>  > decided to provided an additional safe gaurd by moving an excessive,
>  > configurable, "catch-all" timeout to the server.
> 
> If you're saying that a DBCP should assume that a DB connection can 
> never time out (always good), I rest my case. :) Personally, I think 
> MySQL could've provided the option to drop the time out feature.
> 
> FYI to all. If Hans is right, it appears OFBiz's DBCP XAPoolDataSource 
> does handle connection time out after all! I'll check it out.
> 
> Anyway, the fix, as mentioned by Hans, can be done in 
> MinervaConnectionFactory. I'm gonna try just that. Problem solved. On 
> with real work. You're right, OFBiz does abstract tons of infrastructure 
> details away from application developers.
> 
> Would be good if we can get our hands on the XAPoolDataSource source 
> codes, just to confirm that the issue is fixed.
> 
> Jonathon
> 
> Chris Howe wrote:
>> Most of the data migration issues should be solvable by reading the
>> migration data files (entity-engine-xml) into a Xindice XML database
>> and running ofbiz services from there.  This will allow XPath queries
>> for the hierarchical stuff, and XUpdate of children elements for error
>> messages and thus group error handling using XPath.  I hope to be able
>> to get some of this together shortly as we've been using OFBiz to
>> slowly bring our legacy data in line with the OFBiz data model and the
>> legacy data is still in MySQL (and much more critical and can't really
>> take the risk that I "think" I changed the right data).
>>
>> If you don't have data, there shouldn't be any issue.  Simply run the
>> ant run-install with a delegator calling the postgres datasource.  The
>> impression I get is that most of the community in production is using
>> postgres unless they've purchased a license to a mssql or oracle.
>>
>> My limited reading through mailing lists on mysql and postgres show
>> that the timeout handling should be handled by the client (ofbiz) and
>> that mysql decided to provided an additional safe gaurd by moving an
>> excessive, configurable, "catch-all" timeout to the server.
>>
>> One of my favorite things about OFBiz is that you can get pretty far in
>> developing something useful without ever knowing or worrying about
>> database administration like this and can come back and address it when
>> you're fine tuning your deployment for bottlenecks before going live.
>>
>> --- Jonathon -- Improov <jo...@improov.com> wrote:
>>
>>> Chris,
>>>
>>> Wow, thanks. You sound like you've really been through it (the
>>> migration).
>>>
>>> Seems like quite a pain migrating to PostgreSQL from MySQL. But what
>>> if I don't have data to migrate? What if I just start over with 
>>> PostgreSQL? Any problems?
>>> I'll take your advice regarding the migration gotchas. Thanks.
>>>
>>> Actually, about DBCPs. For RDBMSes, it's correct for database
>>> connections to time out after a set interval of inactivity. That's 
>>> just prudence. When connections are
>>> used inside of DBCPs, it is the DBCP's responsibility to refresh 
>>> timed out connections in the pool. I
>>> did my very first very own DBCP more than a decade ago, and that was 
>>> one of the must-have
>>> functionalities for a DBCP. I was beaten up real bad for missing that 
>>> out. :)
>>>
>>> The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource.
>>> I posted another message asking for the source codes to 
>>> ofbiz-minerva.jar.
>>>
>>> Jonathon
>>>
>>> Chris Howe wrote:
>>>> This does not occur in PostgreSQL.  It is a "feature" of MySQL and
>>> they
>>>> (mysql) will smugly say that OFBiz doesn't handle the connection
>>> pool
>>>> correctly. I don't know and don't really care to know if it's true
>>> or
>>>> not.  I switched over about 2 months ago and have had smooth
>>> sailing
>>>> since (even seemingly eliminated that UserLoginHistory bug that
>>> you're
>>>> aware of). 
>>>> Be warned, it's a bit of a pain to convert from MySQL to Postgres. 
>>>> Most of the issues seem to be of how lax MySQL with data and how
>>>> stringent PostgreSQL is(at least the default installation).  These
>>> were
>>>> some of the issues I came across with my data using the
>>> export/import
>>>> in webtools
>>>>
>>>> 1. the createdBy fields in the various entities weren't in the
>>> correct
>>>> case (i believe this has been solved in OFBiz, I just had data that
>>>> predated the fix)
>>>> 2. UserLogin and Party entites end up with a circular dependency
>>> based
>>>> on the partyId admin if the UserLogin admin created parties. 
>>> Either
>>>> load the single Party record for partyId before loading the
>>> UserLogin
>>>> entity or remove the createdBy data from the Party entity
>>>> 3. Heirarchial parent->child relationships.  This occurs with the
>>> *Type
>>>> entities.  They simply need to be loaded in the correct order. 
>>> There
>>>> is a JIRA issue which solves this problem for about the *Type
>>> entities
>>>> where the child is childId and the parent is parentChildId (e.g.
>>>> partyTypeId -> parentPartyTypeId)
>>>>
>>>> There may have been other referential integrity issues, but I think
>>>> they were self created and not created by OFBiz.
>>>>
>>>> --- Jonathon -- Improov <jo...@improov.com> wrote:
>>>>
>>>>> MySQL connections in the database connection pool time out
>>> (usually
>>>>> set to 8 hours). Possible workarounds(?):
>>>>>
>>>>> 1. Increase the time out value and hope that someone will connect
>>> to
>>>>> OFBiz
>>>>>     before the connections time out.
>>>>>
>>>>> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
>>>>>
>>>>> 3. Use PostgreSQL.
>>>>>
>>>>> In Tomcat, we usually use the "validateQuery" so the DBCP will
>>> test
>>>>> each connection before giving it to the application. If all 
>>>>> connections in the pool has timed
>>> out
>>>>> (say no one has accessed OFBiz in 8 hours), the DBCP creates new 
>>>>> connections for the pool.
>>>>>
>>>>> If someone will tell me that this doesn't happen for PostgreSQL,
>>> I'll
>>>>> simply make the switch to PostgreSQL rather than fix things in 
>>>>> OFBiz for MySQL.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> Jonathon
>>>>>
>>>>
>>>
>>
>>
> 
> 


Re: Data Import

Posted by Jacques Le Roux <ja...@les7arts.com>.
And it worked well for me, in such case you really love it :o)

Jacques

>
> There is also a tool as I mentioned before to remove and re-add all
> foreign keys. It is on the Check/Update page in WebTools.
>
> -David
>
>
> On Apr 15, 2007, at 9:40 PM, Chris Howe wrote:
>
> > Comments inline...
> > --- Jonathon -- Improov <jo...@improov.com> wrote:
> >
> >> Chris, David,
> >>
> >> Wow, this is getting complicated.
> >>
> >> 2 questions.
> >>
> >> This data import tool, can it do a data dump with
foreign_key_checks
> >> turned off? I'd like to do a
> >> data import from XML, but not have to do the graph walk.
> >>
> >
> > If by this tool, you mean the one currently in OFBiz, it does it by
> > using a "insert dummy key" indicator.  It's only recommended if you
> > know your data is clean. As I've mentioned previously, I tried using
> > this to go from MySQL to Postgres and it did not work.  I did not
look
> > into why.  There are a multitude of reasons why this may not have
> > worked. 95% of those reasons are on this side of the keyboard as
I've
> > touched the data myself, so it likely isn't clean.
> >
> >> Anybody knows of any tools to dump MySQL data (verbatim, with only
> >> platform-specific translations)
> >> into a PostgreSQL?
> >>
> >
> > There are quite a number of solutions.  A google search should find
> > them.  I didn't see any free choices when I looked.
> >
> >> I've looked into such "walk the graph/hierarchy" thing when I did
my
> >> first data migration (not
> >> even data mapping, just dumping same structure to same structure).
> >> After spending a week to
> >> finally "get it!", I was slapped for wasting my time. Somebody did
a
> >> "foreign_key_checks=0",
> >> dumped the data into database, and set the flag back to "on". I'd
> >> agree this is a tough nut to
> >> crack; haven't found anyone in the IT industry (other than PhDs
doing
> >> database design theses?)
> >> who'd want to bother cracking this.
> >>
> > That would have the same use scenario with the insert dummy key
> > approach, you have to be certain that you data is clean.  Depending
on
> > how old your data is in OFBiz, this may be unlikely.  For instance,
> > before case sensitive logins were implemented as the default the
case
> > in which the login was typed is the case that was stored in the
> > various
> > entities that have a relationship to UserLogin.  I believe, MySQL in
> > the default installation mode treats "ofbiz" and "OFBIZ" the same
and
> > so found related records as expected.  Postgres in the default
> > installation mode does not treat the two as the same and your
> > application will not work as expected (I'm dealing with the fallout
of
> > this at the moment).
> >
> >
> >> About highly normalized data. Yes, I agree there are loops. In the
> >> real world, a Vendor can often
> >> be a Customer of another Vendor, who is a Customer of the first
> >> Vendor. Ah, you get the picture
> >> (I'm getting a headache).
> >>
> >
> > Because OFBiz is designed for the most part to be a generic data
> > model,
> > this case scenario doesn't create a loop for referential
> > integrity.  In
> > this case, there are four entities of importance;  RoleType, Party,
> > PartyRole and PartyRelationship.  RoleType and Party do not rely on
> > the
> > later two, so they get entered cleanly.  PartyRole only depends on
> > RoleType and Party, and since all of those records are already in,
it
> > gets entered cleanly.  PartyRelationship relies on PartyRole and
Party
> > and since all of those records are already in, they get entered
> > cleanly.
> >
> >> Jonathon
> >>
> >
>
>


Re: Data Import

Posted by "David E. Jones" <jo...@hotwaxmedia.com>.
There is also a tool as I mentioned before to remove and re-add all  
foreign keys. It is on the Check/Update page in WebTools.

-David


On Apr 15, 2007, at 9:40 PM, Chris Howe wrote:

> Comments inline...
> --- Jonathon -- Improov <jo...@improov.com> wrote:
>
>> Chris, David,
>>
>> Wow, this is getting complicated.
>>
>> 2 questions.
>>
>> This data import tool, can it do a data dump with foreign_key_checks
>> turned off? I'd like to do a
>> data import from XML, but not have to do the graph walk.
>>
>
> If by this tool, you mean the one currently in OFBiz, it does it by
> using a "insert dummy key" indicator.  It's only recommended if you
> know your data is clean. As I've mentioned previously, I tried using
> this to go from MySQL to Postgres and it did not work.  I did not look
> into why.  There are a multitude of reasons why this may not have
> worked. 95% of those reasons are on this side of the keyboard as I've
> touched the data myself, so it likely isn't clean.
>
>> Anybody knows of any tools to dump MySQL data (verbatim, with only
>> platform-specific translations)
>> into a PostgreSQL?
>>
>
> There are quite a number of solutions.  A google search should find
> them.  I didn't see any free choices when I looked.
>
>> I've looked into such "walk the graph/hierarchy" thing when I did my
>> first data migration (not
>> even data mapping, just dumping same structure to same structure).
>> After spending a week to
>> finally "get it!", I was slapped for wasting my time. Somebody did a
>> "foreign_key_checks=0",
>> dumped the data into database, and set the flag back to "on". I'd
>> agree this is a tough nut to
>> crack; haven't found anyone in the IT industry (other than PhDs doing
>> database design theses?)
>> who'd want to bother cracking this.
>>
> That would have the same use scenario with the insert dummy key
> approach, you have to be certain that you data is clean.  Depending on
> how old your data is in OFBiz, this may be unlikely.  For instance,
> before case sensitive logins were implemented as the default the case
> in which the login was typed is the case that was stored in the  
> various
> entities that have a relationship to UserLogin.  I believe, MySQL in
> the default installation mode treats "ofbiz" and "OFBIZ" the same and
> so found related records as expected.  Postgres in the default
> installation mode does not treat the two as the same and your
> application will not work as expected (I'm dealing with the fallout of
> this at the moment).
>
>
>> About highly normalized data. Yes, I agree there are loops. In the
>> real world, a Vendor can often
>> be a Customer of another Vendor, who is a Customer of the first
>> Vendor. Ah, you get the picture
>> (I'm getting a headache).
>>
>
> Because OFBiz is designed for the most part to be a generic data  
> model,
> this case scenario doesn't create a loop for referential  
> integrity.  In
> this case, there are four entities of importance;  RoleType, Party,
> PartyRole and PartyRelationship.  RoleType and Party do not rely on  
> the
> later two, so they get entered cleanly.  PartyRole only depends on
> RoleType and Party, and since all of those records are already in, it
> gets entered cleanly.  PartyRelationship relies on PartyRole and Party
> and since all of those records are already in, they get entered
> cleanly.
>
>> Jonathon
>>
>


Re: Data Import

Posted by Jonathon -- Improov <jo...@improov.com>.
Chris,

 > you have to be certain that you data is clean.  Depending on how old your
 > data is in OFBiz, this may be unlikely.

You just reminded me that my existing data can still get "unclean" simply because new OFBiz 
updates changes the data structure.

Adding another item to my SOP for "Importing OFBiz updates": Check changes to entity models, and 
perform any necessary data-mapping and data-migration.

 > I believe, MySQL in the default installation mode treats "ofbiz" and "OFBIZ"
 > the same and so found related records as expected.

If you're talking about table names, I think the default installation on Windows treats user_login 
and USER_LOGIN the same. But the default on Linux is totally case-sensitive.

If you're talking about database collation, there really isn't much in the way of "default". We 
set our own. In general, I always opt for _cs (case sensitive), never for _ci. The default in 
OFBiz's entityengine.xml is latin1_general_cs; I've been using that for all OFBiz installations.

 > Postgres in the default installation mode does not treat the two as the same
 > and your application will not work as expected (I'm dealing with the fallout
 > of this at the moment).

Convert all affected keys to lower case, and tell all users about the change?

Jonathon

Chris Howe wrote:
> Comments inline...
> --- Jonathon -- Improov <jo...@improov.com> wrote:
> 
>> Chris, David,
>>
>> Wow, this is getting complicated.
>>
>> 2 questions.
>>
>> This data import tool, can it do a data dump with foreign_key_checks
>> turned off? I'd like to do a 
>> data import from XML, but not have to do the graph walk.
>>
> 
> If by this tool, you mean the one currently in OFBiz, it does it by
> using a "insert dummy key" indicator.  It's only recommended if you
> know your data is clean. As I've mentioned previously, I tried using
> this to go from MySQL to Postgres and it did not work.  I did not look
> into why.  There are a multitude of reasons why this may not have
> worked. 95% of those reasons are on this side of the keyboard as I've
> touched the data myself, so it likely isn't clean.
> 
>> Anybody knows of any tools to dump MySQL data (verbatim, with only
>> platform-specific translations) 
>> into a PostgreSQL?
>>
> 
> There are quite a number of solutions.  A google search should find
> them.  I didn't see any free choices when I looked.
> 
>> I've looked into such "walk the graph/hierarchy" thing when I did my
>> first data migration (not 
>> even data mapping, just dumping same structure to same structure).
>> After spending a week to 
>> finally "get it!", I was slapped for wasting my time. Somebody did a
>> "foreign_key_checks=0", 
>> dumped the data into database, and set the flag back to "on". I'd
>> agree this is a tough nut to 
>> crack; haven't found anyone in the IT industry (other than PhDs doing
>> database design theses?) 
>> who'd want to bother cracking this.
>>
> That would have the same use scenario with the insert dummy key
> approach, you have to be certain that you data is clean.  Depending on
> how old your data is in OFBiz, this may be unlikely.  For instance,
> before case sensitive logins were implemented as the default the case
> in which the login was typed is the case that was stored in the various
> entities that have a relationship to UserLogin.  I believe, MySQL in
> the default installation mode treats "ofbiz" and "OFBIZ" the same and
> so found related records as expected.  Postgres in the default
> installation mode does not treat the two as the same and your
> application will not work as expected (I'm dealing with the fallout of
> this at the moment).
> 
> 
>> About highly normalized data. Yes, I agree there are loops. In the
>> real world, a Vendor can often 
>> be a Customer of another Vendor, who is a Customer of the first
>> Vendor. Ah, you get the picture 
>> (I'm getting a headache).
>>
> 
> Because OFBiz is designed for the most part to be a generic data model,
> this case scenario doesn't create a loop for referential integrity.  In
> this case, there are four entities of importance;  RoleType, Party,
> PartyRole and PartyRelationship.  RoleType and Party do not rely on the
> later two, so they get entered cleanly.  PartyRole only depends on
> RoleType and Party, and since all of those records are already in, it
> gets entered cleanly.  PartyRelationship relies on PartyRole and Party
> and since all of those records are already in, they get entered
> cleanly.
> 
>> Jonathon
>>
> 
> 


Re: Data Import

Posted by Chris Howe <cj...@yahoo.com>.
Comments inline...
--- Jonathon -- Improov <jo...@improov.com> wrote:

> Chris, David,
> 
> Wow, this is getting complicated.
> 
> 2 questions.
> 
> This data import tool, can it do a data dump with foreign_key_checks
> turned off? I'd like to do a 
> data import from XML, but not have to do the graph walk.
> 

If by this tool, you mean the one currently in OFBiz, it does it by
using a "insert dummy key" indicator.  It's only recommended if you
know your data is clean. As I've mentioned previously, I tried using
this to go from MySQL to Postgres and it did not work.  I did not look
into why.  There are a multitude of reasons why this may not have
worked. 95% of those reasons are on this side of the keyboard as I've
touched the data myself, so it likely isn't clean.

> Anybody knows of any tools to dump MySQL data (verbatim, with only
> platform-specific translations) 
> into a PostgreSQL?
> 

There are quite a number of solutions.  A google search should find
them.  I didn't see any free choices when I looked.

> I've looked into such "walk the graph/hierarchy" thing when I did my
> first data migration (not 
> even data mapping, just dumping same structure to same structure).
> After spending a week to 
> finally "get it!", I was slapped for wasting my time. Somebody did a
> "foreign_key_checks=0", 
> dumped the data into database, and set the flag back to "on". I'd
> agree this is a tough nut to 
> crack; haven't found anyone in the IT industry (other than PhDs doing
> database design theses?) 
> who'd want to bother cracking this.
> 
That would have the same use scenario with the insert dummy key
approach, you have to be certain that you data is clean.  Depending on
how old your data is in OFBiz, this may be unlikely.  For instance,
before case sensitive logins were implemented as the default the case
in which the login was typed is the case that was stored in the various
entities that have a relationship to UserLogin.  I believe, MySQL in
the default installation mode treats "ofbiz" and "OFBIZ" the same and
so found related records as expected.  Postgres in the default
installation mode does not treat the two as the same and your
application will not work as expected (I'm dealing with the fallout of
this at the moment).


> About highly normalized data. Yes, I agree there are loops. In the
> real world, a Vendor can often 
> be a Customer of another Vendor, who is a Customer of the first
> Vendor. Ah, you get the picture 
> (I'm getting a headache).
> 

Because OFBiz is designed for the most part to be a generic data model,
this case scenario doesn't create a loop for referential integrity.  In
this case, there are four entities of importance;  RoleType, Party,
PartyRole and PartyRelationship.  RoleType and Party do not rely on the
later two, so they get entered cleanly.  PartyRole only depends on
RoleType and Party, and since all of those records are already in, it
gets entered cleanly.  PartyRelationship relies on PartyRole and Party
and since all of those records are already in, they get entered
cleanly.

> Jonathon
> 


Re: Data Import

Posted by Jonathon -- Improov <jo...@improov.com>.
Chris, David,

Wow, this is getting complicated.

2 questions.

This data import tool, can it do a data dump with foreign_key_checks turned off? I'd like to do a 
data import from XML, but not have to do the graph walk.

Anybody knows of any tools to dump MySQL data (verbatim, with only platform-specific translations) 
into a PostgreSQL?

I've looked into such "walk the graph/hierarchy" thing when I did my first data migration (not 
even data mapping, just dumping same structure to same structure). After spending a week to 
finally "get it!", I was slapped for wasting my time. Somebody did a "foreign_key_checks=0", 
dumped the data into database, and set the flag back to "on". I'd agree this is a tough nut to 
crack; haven't found anyone in the IT industry (other than PhDs doing database design theses?) 
who'd want to bother cracking this.

About highly normalized data. Yes, I agree there are loops. In the real world, a Vendor can often 
be a Customer of another Vendor, who is a Customer of the first Vendor. Ah, you get the picture 
(I'm getting a headache).

Jonathon

Chris Howe wrote:
> First, thank you very much for the feedback!  Your comments have been
> extremely helpful in me wrapping my brain around this nut.  Comments
> inline.
> 
> --- "David E. Jones" <jo...@hotwaxmedia.com> wrote:
> 
>> I'm not sure what you used to find loops in the graph but I know just
>> off the top of my head that there are others, and I'm guessing quite 
>> a few others. Actually a highly normalized data model that fulfills a
>> large number of different requirements probably tends to cause more  
>> loops.
>>
> 
> I put the logic at the end for critique
> 
>> Some off the of my head real quick:
>>
>> - ProductCategory points to self
> 
> All of the self referential issues have been accounted for.  These are
> found with a relationship of entity=relEntity and therefore records
> must be ordered for the pass to be successful.
> 
>> - ProductStore points to Facility, Facility points to ProductStore
> 
> This is a one-nofk relationship.  I assumed that this type of
> relationship doesn't require maintaining referential integrity.  Was
> this assumption correct?  If so, would this be a better relationship
> type for UserLogin->Party?
> 
>> Certain other ones are loops but are actually workable, like  
>> ProductCategory(1) -> ProductCategoryRollup -> ProductCategory(2) -> 
>> ProductCategoryRollup -> ProductCategory(1) which is okay because you
>> can do the ProductCategory records first and then do the rollups.
> 
> This is another one-nofk
> 
>> One thing to consider is that there are certain loops that can  
>> possibly be resolved by sorting records instead of entities. That  
>> makes things significantly more complicated, but for certain  
>> scenarios it's the only solution.
> 
> This is how the UserLogin -> Party would be handled assuming we don't
> change the relationship to one-nofk :-)
> 
>> If you really want to go for this and try to code something that will
>> sort both entities and records within entities to get something that 
>> will handle most but not all cases, go for it, I won't stop you.
>>
>> I'm just warning you that I looked into this once and did the  
>> background work and found it to be a tough nut to crack, probably not
>> worth the effort as other solutions are easier. Also, this has been  
>> tried before by other people who made some progress when they got  
>> into testing with real data and such ran into problem after problem  
>> and never found a workable solution.
>>
>> Maybe the best thing for me to do is shut up and let you give it a  
>> try. Let me know how it goes.
> 
> I wish you wouldn't.  The gotcha scenarios are very helpful in the
> cracking process.  If the gotchas can't ungotched...then warnings and
> caveats can be put into the most workable solution so that it can be
> left for someone else to be able to pick it up and only work on the
> uncracked portions.
> 
>> -David
>>
> 
> Two Entities:
> 1) EntityMap [entity, insertionPass, insertionScenario]
> 2) EntityRelationshipMap [primaryEntity, relatedEntity,
> ignoreRelationshipFlag]
> 
> Logic process:
> Setup-
> 1) Load all entities into EntityMaps;
> 2) Load all relationships of type one into EntityRelationshipMaps; set
> all ignoreRelationshipFlags to "N";
> 
> Locate relationship scenario A->B->A
> 3) Iterate list looking for instances of primaryEntity_relatedEntity =
> relatedEntity_primaryEntity. When found set ignoreRelationshipFlag to
> "Y" and set EntityMap.insertionScenario for the primaryEntity and
> relatedEntity to "ABA"
> 
> Locate relationship scenario A->A
> 4) Iterate list looking for instances where primaryEntity =
> relatedEntity; set ignoreRelationshipFlag to "Y" and set
> EntityMap.insertionScarion to "AA"
> 
> Determine order
> 5) Find all EntityMaps where insertionPass is null
> 6) Iterate list; find the number of times it exists in the
> EntityRelationshipMap where ignoreRelationshipFlag = "N"
> 7) For all that return 0, set EntityMaps.insertionPass to current pass
> number; find all records where the current entity exists in
> EntityRelationshipMap.relatedEntity and set
> EntityRelationshipMap.ignoreRelationshipFlag to "Y"
> 8) increment the current pass; go to 5 until no more exist.
> 
> 


Re: Data Import

Posted by Chris Howe <cj...@yahoo.com>.
First, thank you very much for the feedback!  Your comments have been
extremely helpful in me wrapping my brain around this nut.  Comments
inline.

--- "David E. Jones" <jo...@hotwaxmedia.com> wrote:

> 
> I'm not sure what you used to find loops in the graph but I know just
> off the top of my head that there are others, and I'm guessing quite 
> a few others. Actually a highly normalized data model that fulfills a
> large number of different requirements probably tends to cause more  
> loops.
> 

I put the logic at the end for critique

> Some off the of my head real quick:
> 
> - ProductCategory points to self

All of the self referential issues have been accounted for.  These are
found with a relationship of entity=relEntity and therefore records
must be ordered for the pass to be successful.

> - ProductStore points to Facility, Facility points to ProductStore

This is a one-nofk relationship.  I assumed that this type of
relationship doesn't require maintaining referential integrity.  Was
this assumption correct?  If so, would this be a better relationship
type for UserLogin->Party?

> 
> Certain other ones are loops but are actually workable, like  
> ProductCategory(1) -> ProductCategoryRollup -> ProductCategory(2) -> 
> ProductCategoryRollup -> ProductCategory(1) which is okay because you
> can do the ProductCategory records first and then do the rollups.

This is another one-nofk

> One thing to consider is that there are certain loops that can  
> possibly be resolved by sorting records instead of entities. That  
> makes things significantly more complicated, but for certain  
> scenarios it's the only solution.

This is how the UserLogin -> Party would be handled assuming we don't
change the relationship to one-nofk :-)

> 
> If you really want to go for this and try to code something that will
> sort both entities and records within entities to get something that 
> will handle most but not all cases, go for it, I won't stop you.
> 
> I'm just warning you that I looked into this once and did the  
> background work and found it to be a tough nut to crack, probably not
> worth the effort as other solutions are easier. Also, this has been  
> tried before by other people who made some progress when they got  
> into testing with real data and such ran into problem after problem  
> and never found a workable solution.
> 
> Maybe the best thing for me to do is shut up and let you give it a  
> try. Let me know how it goes.

I wish you wouldn't.  The gotcha scenarios are very helpful in the
cracking process.  If the gotchas can't ungotched...then warnings and
caveats can be put into the most workable solution so that it can be
left for someone else to be able to pick it up and only work on the
uncracked portions.

> 
> -David
> 

Two Entities:
1) EntityMap [entity, insertionPass, insertionScenario]
2) EntityRelationshipMap [primaryEntity, relatedEntity,
ignoreRelationshipFlag]

Logic process:
Setup-
1) Load all entities into EntityMaps;
2) Load all relationships of type one into EntityRelationshipMaps; set
all ignoreRelationshipFlags to "N";

Locate relationship scenario A->B->A
3) Iterate list looking for instances of primaryEntity_relatedEntity =
relatedEntity_primaryEntity. When found set ignoreRelationshipFlag to
"Y" and set EntityMap.insertionScenario for the primaryEntity and
relatedEntity to "ABA"

Locate relationship scenario A->A
4) Iterate list looking for instances where primaryEntity =
relatedEntity; set ignoreRelationshipFlag to "Y" and set
EntityMap.insertionScarion to "AA"

Determine order
5) Find all EntityMaps where insertionPass is null
6) Iterate list; find the number of times it exists in the
EntityRelationshipMap where ignoreRelationshipFlag = "N"
7) For all that return 0, set EntityMaps.insertionPass to current pass
number; find all records where the current entity exists in
EntityRelationshipMap.relatedEntity and set
EntityRelationshipMap.ignoreRelationshipFlag to "Y"
8) increment the current pass; go to 5 until no more exist.

Re: Data Import

Posted by "David E. Jones" <jo...@hotwaxmedia.com>.
I'm not sure what you used to find loops in the graph but I know just  
off the top of my head that there are others, and I'm guessing quite  
a few others. Actually a highly normalized data model that fulfills a  
large number of different requirements probably tends to cause more  
loops.

Some off the of my head real quick:

- ProductCategory points to self
- ProductStore points to Facility, Facility points to ProductStore

Certain other ones are loops but are actually workable, like  
ProductCategory(1) -> ProductCategoryRollup -> ProductCategory(2) ->  
ProductCategoryRollup -> ProductCategory(1) which is okay because you  
can do the ProductCategory records first and then do the rollups.

One thing to consider is that there are certain loops that can  
possibly be resolved by sorting records instead of entities. That  
makes things significantly more complicated, but for certain  
scenarios it's the only solution.

If you really want to go for this and try to code something that will  
sort both entities and records within entities to get something that  
will handle most but not all cases, go for it, I won't stop you.

I'm just warning you that I looked into this once and did the  
background work and found it to be a tough nut to crack, probably not  
worth the effort as other solutions are easier. Also, this has been  
tried before by other people who made some progress when they got  
into testing with real data and such ran into problem after problem  
and never found a workable solution.

Maybe the best thing for me to do is shut up and let you give it a  
try. Let me know how it goes.

-David


On Apr 15, 2007, at 4:22 AM, Chris Howe wrote:

>
> --- "David E. Jones" <jo...@hotwaxmedia.com> wrote:
>
>>
>> Yeah, there are various loops: self-referencing entities (A -> A) and
>> mulit-entity loops (A->B->A; A->B->C->A; etc).
>> -David
>>
>
> So, I went ahead and wrote a script to walk and order the entities and
> it turns out there are only two loops which are actually more like
> kinks (granted it takes 17 passes to reduce the relationships to these
> two loops, but it does get there ).  Knowing the order of entering
> entity data that won't fail and need to be retried on subsequent  
> passes
> will more than make up for the three minutes of processing time it
> takes to determine.
>
> These should be identifiable by A-B = B-A.  The A->B->C->A loops and
> greater would obviously be difficult to identify, but it doesn't
> currently exist in OFBiz, so I'll assume that it's theoretical and not
> likely to exist in a highly normalized generic data model.
>
> You have the self-referencing entities (A=A) which you can avoid
> referential integrity issues by walking the record hierarchy of that
> entity parent->child.  These are easily identified by having both the
> entity and rel-entity equal to one another.
>
> The two restricting loops are both A->B->A
> 1. UserLogin->Party->UserLogin
> This is caused by a denormalized(non-normalized) field Party.createdBy
> and the application specific field UserLogin.partyId.
>
> 2. FinAccountTrans->Payment->FinAccountTrans
> I haven't looked at the application logic, but it appears by  
> looking at
> the data model that either FinAccountTrans.paymentId or
> Payment.finAccountTransId is redundant.  Judging by the rest of
> FinAccountTrans, I would say that the paymentId is the one  
> misplaced as
> there is much denormalized information.  I wouldn't suspect that  this
> is a heavily read area of the data model that requires  
> denormalization.
>
> #1 can be addressed by ordering the records or by treating as a graph
> whereby creating a two column temporary join table (A__B ie
> UserLogin__Party) hold the referential data, set the fK to null, load
> all the records, then run an update from the temporary table to the
> original entity.
>
> #2 can be probably be addressed by fixing the logic as there are  
> likely
> 1:1 relationships between the records and therefore a misplaced fk.


Re: Data Import

Posted by Chris Howe <cj...@yahoo.com>.
--- "David E. Jones" <jo...@hotwaxmedia.com> wrote:

> 
> Yeah, there are various loops: self-referencing entities (A -> A) and
> mulit-entity loops (A->B->A; A->B->C->A; etc).
> -David
> 

So, I went ahead and wrote a script to walk and order the entities and
it turns out there are only two loops which are actually more like
kinks (granted it takes 17 passes to reduce the relationships to these
two loops, but it does get there ).  Knowing the order of entering
entity data that won't fail and need to be retried on subsequent passes
will more than make up for the three minutes of processing time it
takes to determine.

These should be identifiable by A-B = B-A.  The A->B->C->A loops and
greater would obviously be difficult to identify, but it doesn't
currently exist in OFBiz, so I'll assume that it's theoretical and not
likely to exist in a highly normalized generic data model.

You have the self-referencing entities (A=A) which you can avoid
referential integrity issues by walking the record hierarchy of that
entity parent->child.  These are easily identified by having both the
entity and rel-entity equal to one another.

The two restricting loops are both A->B->A 
1. UserLogin->Party->UserLogin
This is caused by a denormalized(non-normalized) field Party.createdBy
and the application specific field UserLogin.partyId.

2. FinAccountTrans->Payment->FinAccountTrans
I haven't looked at the application logic, but it appears by looking at
the data model that either FinAccountTrans.paymentId or
Payment.finAccountTransId is redundant.  Judging by the rest of
FinAccountTrans, I would say that the paymentId is the one misplaced as
there is much denormalized information.  I wouldn't suspect that  this
is a heavily read area of the data model that requires denormalization.

#1 can be addressed by ordering the records or by treating as a graph
whereby creating a two column temporary join table (A__B ie
UserLogin__Party) hold the referential data, set the fK to null, load
all the records, then run an update from the temporary table to the
original entity.

#2 can be probably be addressed by fixing the logic as there are likely
1:1 relationships between the records and therefore a misplaced fk.

Re: Data Import

Posted by Chris Howe <cj...@yahoo.com>.
--- "David E. Jones" <jo...@hotwaxmedia.com> wrote:
[...] 
> 2. in the import use the dummy-fks option which will insert dummy  
> records as needed with pk only values to satisfy primary key
> constraints
> 

Are the dummy records supposed to actually be inserted into the
database or simply pacify the entity engine?  If it's the former, it
did not occur in my migration with code from about middle December '06
with the target being postgres.  Thanks!

Re: Data Import

Posted by "David E. Jones" <jo...@hotwaxmedia.com>.
With the OFBiz tools there are a couple of options.

1. on the check/update page in webtools you can drop and recreate all  
foreign keys
2. in the import use the dummy-fks option which will insert dummy  
records as needed with pk only values to satisfy primary key constraints

Yeah, there are various loops: self-referencing entities (A -> A) and  
mulit-entity loops (A->B->A; A->B->C->A; etc).

-David


On Apr 14, 2007, at 12:25 AM, Jonathon -- Improov wrote:

> David,
>
> Yeah, you're right. You just reminded me of the loops in the data  
> structure (self-referencing entities?).
>
> I'll look for a way to port MySQL data directly to PostgreSQL then.  
> Even using standard SQL shouldn't be a problem, I suppose.
>
> I think it's time I started getting in touch with PostgreSQL.
>
> Jonathon
>
> David E. Jones wrote:
>> On Apr 13, 2007, at 9:42 PM, Jonathon -- Improov wrote:
>>> The hierarchical walk you mentioned seems like too much work,  
>>> perhaps even computationally expensive. I would've preferred a  
>>> dumb dump (same structure to same structure), complete with  
>>> referential keys. I'm sure there are very largely similar SQL- 
>>> standard features between MySQL and PostgreSQL.
>> Actually, it's worse, it is not possible. The problem is it isn't  
>> an hierarchy, it's a graph, and this graph has loops in it.... So,  
>> yeah, an ordered traversal is impossible.
>> -David
>


Re: Data Import

Posted by Jonathon -- Improov <jo...@improov.com>.
David,

Yeah, you're right. You just reminded me of the loops in the data structure (self-referencing 
entities?).

I'll look for a way to port MySQL data directly to PostgreSQL then. Even using standard SQL 
shouldn't be a problem, I suppose.

I think it's time I started getting in touch with PostgreSQL.

Jonathon

David E. Jones wrote:
> 
> On Apr 13, 2007, at 9:42 PM, Jonathon -- Improov wrote:
> 
>> The hierarchical walk you mentioned seems like too much work, perhaps 
>> even computationally expensive. I would've preferred a dumb dump (same 
>> structure to same structure), complete with referential keys. I'm sure 
>> there are very largely similar SQL-standard features between MySQL and 
>> PostgreSQL.
> 
> Actually, it's worse, it is not possible. The problem is it isn't an 
> hierarchy, it's a graph, and this graph has loops in it.... So, yeah, an 
> ordered traversal is impossible.
> 
> -David
> 
> 


Re: Data Import

Posted by "David E. Jones" <jo...@hotwaxmedia.com>.
On Apr 13, 2007, at 9:42 PM, Jonathon -- Improov wrote:

> The hierarchical walk you mentioned seems like too much work,  
> perhaps even computationally expensive. I would've preferred a dumb  
> dump (same structure to same structure), complete with referential  
> keys. I'm sure there are very largely similar SQL-standard features  
> between MySQL and PostgreSQL.

Actually, it's worse, it is not possible. The problem is it isn't an  
hierarchy, it's a graph, and this graph has loops in it.... So, yeah,  
an ordered traversal is impossible.

-David



Re: Data Import

Posted by Jonathon -- Improov <jo...@improov.com>.
Chris,

Yeah, MyISAM was... "difficult". Other than lack of referential integrity, transaction atomicity 
had to be especially handled. I don't know where MySQL will be today if not for InnoDB. I don't 
know where they'll be in future, since InnoDB license seems to be changing soon.

In relation to your question of MySQL database being consistent or not, it is. Unless OFBiz entity 
engine doesn't create the referential keys (which it does), the MySQL-based OFBiz database is 
consistent.

The hierarchical walk you mentioned seems like too much work, perhaps even computationally 
expensive. I would've preferred a dumb dump (same structure to same structure), complete with 
referential keys. I'm sure there are very largely similar SQL-standard features between MySQL and 
PostgreSQL.

A typical application migration (turn off old app, migrate CURRENT data and whatnot, turn on new 
app) must take less than 8 hours (overnight). All migration work must be scripted to run 
automatically, completely free of human error.

Should we enhance the data import utility, perhaps? You mentioned it can only take in a few(?) 
tables/files (XML) at a time?

Jonathon

Chris Howe wrote:
> Jonathon,
> 
> That's interesting.  Until you just said that, it has never even
> occurred to me to trust that the data being imported was consistent. 
> When I first started playing with php and mysql a few years ago, I had
> only used MyISAM tables, which doesn't (or at least didn't, perhaps
> still doesn't) handle the enforcement of referential integrity.  So, I
> guess that's where my bias lies.  
> 
> In regards to the hierarchy, it's already specified in the entity
> model.  It just needs to be walked.  In any regards, I'll be playing
> with this in the near term as it has several advantages and will help
> me with several other ideas that deal with the loads of XML in OFBiz.  
> 
> --- Jonathon -- Improov <jo...@improov.com> wrote:
> 
>> Chris,
>>
>> Usually, in MySQL (or MSSQL or many other RDBMSes), we disable the
>> foreign key constraints check 
>> first, then dump in all the data, then re-enable the key constraints.
>> That way, we don't have to 
>> figure out the top-down hierarchy of the database structure and
>> insert data from the top to the 
>> bottom.
>>
>> Does the data import tool allow the above? It's usually just a single
>> flag to toggle on/off 
>> (foreign_key_checks for MySQL).
>>
>> Jonathon
>>
>> Chris Howe wrote:
>>> The current approach is this
>>> 1.Export MySQL (or origin database) to XML files
>>> 2.Import the XML files in one at a time and either succeed or fail
>> the
>>> entire file.
>>>
>>> Because of the SAX parser you're limited to in best case scenario
>> to
>>> what you can fit into memory (because of the data OFBiz's data
>> import
>>> logic, you're limited to one file at a time, even if two or more
>> will
>>> fit in memory).  Therefore you're guessing if referential integrity
>> is
>>> maintained (exists) in subsequent XML data files.  
>>>
>>> In the case of Postgres, once OFBiz creates the database schema,
>>> postgres handles referential integrity constraints.  Because of
>> this,
>>> it's not enough just to have the entity engine ignore its error on
>>> referential integrity with a dummy key, the dummy key actually has
>> to
>>> be written to the destination database(which in my experience did
>> not
>>> happen, I didn't look any deeper into this as a solution because of
>> the
>>> other remaining issues).  
>>>
>>> By reading all of the XML data files into an XML database first,
>>> children elements can be added to each record with error
>> information. 
>>> You'll also be able to trace through referential integrity to
>> ensure
>>> it's maintained in subsequent records and then actually import the
>> data
>>> with the PK records going in first.  Additionally, you can test an
>>> entire data set for importation, report back all the errors and let
>> the
>>> user make adjustments as is needed.  It should be very interesting
>> and
>>> fairly simple to implement as the solution is more logic based
>> instead
>>> of code based.
>>>
>>> It would be great if someone could review OFBIZ-851 and maybe add
>> it as
>>> a specialpurpose app so that others can play with it and contribute
>>> ideas.
>>>
>>> --- Jonathon -- Improov <jo...@improov.com> wrote:
>>>
>>>> Chris,
>>>>
>>>> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know,
>> data
>>>> migration can be a pain, even 
>>>> without data-mapping efforts to go with it (ie, same structure
>>>> migrated to same structure).
>>>
>>
> 
> 


Re: Data Import

Posted by Chris Howe <cj...@yahoo.com>.
Jonathon,

That's interesting.  Until you just said that, it has never even
occurred to me to trust that the data being imported was consistent. 
When I first started playing with php and mysql a few years ago, I had
only used MyISAM tables, which doesn't (or at least didn't, perhaps
still doesn't) handle the enforcement of referential integrity.  So, I
guess that's where my bias lies.  

In regards to the hierarchy, it's already specified in the entity
model.  It just needs to be walked.  In any regards, I'll be playing
with this in the near term as it has several advantages and will help
me with several other ideas that deal with the loads of XML in OFBiz.  

--- Jonathon -- Improov <jo...@improov.com> wrote:

> Chris,
> 
> Usually, in MySQL (or MSSQL or many other RDBMSes), we disable the
> foreign key constraints check 
> first, then dump in all the data, then re-enable the key constraints.
> That way, we don't have to 
> figure out the top-down hierarchy of the database structure and
> insert data from the top to the 
> bottom.
> 
> Does the data import tool allow the above? It's usually just a single
> flag to toggle on/off 
> (foreign_key_checks for MySQL).
> 
> Jonathon
> 
> Chris Howe wrote:
> > The current approach is this
> > 1.Export MySQL (or origin database) to XML files
> > 2.Import the XML files in one at a time and either succeed or fail
> the
> > entire file.
> > 
> > Because of the SAX parser you're limited to in best case scenario
> to
> > what you can fit into memory (because of the data OFBiz's data
> import
> > logic, you're limited to one file at a time, even if two or more
> will
> > fit in memory).  Therefore you're guessing if referential integrity
> is
> > maintained (exists) in subsequent XML data files.  
> > 
> > In the case of Postgres, once OFBiz creates the database schema,
> > postgres handles referential integrity constraints.  Because of
> this,
> > it's not enough just to have the entity engine ignore its error on
> > referential integrity with a dummy key, the dummy key actually has
> to
> > be written to the destination database(which in my experience did
> not
> > happen, I didn't look any deeper into this as a solution because of
> the
> > other remaining issues).  
> > 
> > By reading all of the XML data files into an XML database first,
> > children elements can be added to each record with error
> information. 
> > You'll also be able to trace through referential integrity to
> ensure
> > it's maintained in subsequent records and then actually import the
> data
> > with the PK records going in first.  Additionally, you can test an
> > entire data set for importation, report back all the errors and let
> the
> > user make adjustments as is needed.  It should be very interesting
> and
> > fairly simple to implement as the solution is more logic based
> instead
> > of code based.
> > 
> > It would be great if someone could review OFBIZ-851 and maybe add
> it as
> > a specialpurpose app so that others can play with it and contribute
> > ideas.
> > 
> > --- Jonathon -- Improov <jo...@improov.com> wrote:
> > 
> >> Chris,
> >>
> >> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know,
> data
> >> migration can be a pain, even 
> >> without data-mapping efforts to go with it (ie, same structure
> >> migrated to same structure).
> > 
> > 
> 
> 


Re: Data Import (was Re: MySQL connection times out, how about PostgreSQL?)

Posted by Jonathon -- Improov <jo...@improov.com>.
Chris,

Usually, in MySQL (or MSSQL or many other RDBMSes), we disable the foreign key constraints check 
first, then dump in all the data, then re-enable the key constraints. That way, we don't have to 
figure out the top-down hierarchy of the database structure and insert data from the top to the 
bottom.

Does the data import tool allow the above? It's usually just a single flag to toggle on/off 
(foreign_key_checks for MySQL).

Jonathon

Chris Howe wrote:
> The current approach is this
> 1.Export MySQL (or origin database) to XML files
> 2.Import the XML files in one at a time and either succeed or fail the
> entire file.
> 
> Because of the SAX parser you're limited to in best case scenario to
> what you can fit into memory (because of the data OFBiz's data import
> logic, you're limited to one file at a time, even if two or more will
> fit in memory).  Therefore you're guessing if referential integrity is
> maintained (exists) in subsequent XML data files.  
> 
> In the case of Postgres, once OFBiz creates the database schema,
> postgres handles referential integrity constraints.  Because of this,
> it's not enough just to have the entity engine ignore its error on
> referential integrity with a dummy key, the dummy key actually has to
> be written to the destination database(which in my experience did not
> happen, I didn't look any deeper into this as a solution because of the
> other remaining issues).  
> 
> By reading all of the XML data files into an XML database first,
> children elements can be added to each record with error information. 
> You'll also be able to trace through referential integrity to ensure
> it's maintained in subsequent records and then actually import the data
> with the PK records going in first.  Additionally, you can test an
> entire data set for importation, report back all the errors and let the
> user make adjustments as is needed.  It should be very interesting and
> fairly simple to implement as the solution is more logic based instead
> of code based.
> 
> It would be great if someone could review OFBIZ-851 and maybe add it as
> a specialpurpose app so that others can play with it and contribute
> ideas.
> 
> --- Jonathon -- Improov <jo...@improov.com> wrote:
> 
>> Chris,
>>
>> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data
>> migration can be a pain, even 
>> without data-mapping efforts to go with it (ie, same structure
>> migrated to same structure).
> 
> 


Data Import (was Re: MySQL connection times out, how about PostgreSQL?)

Posted by Chris Howe <cj...@yahoo.com>.
The current approach is this
1.Export MySQL (or origin database) to XML files
2.Import the XML files in one at a time and either succeed or fail the
entire file.

Because of the SAX parser you're limited to in best case scenario to
what you can fit into memory (because of the data OFBiz's data import
logic, you're limited to one file at a time, even if two or more will
fit in memory).  Therefore you're guessing if referential integrity is
maintained (exists) in subsequent XML data files.  

In the case of Postgres, once OFBiz creates the database schema,
postgres handles referential integrity constraints.  Because of this,
it's not enough just to have the entity engine ignore its error on
referential integrity with a dummy key, the dummy key actually has to
be written to the destination database(which in my experience did not
happen, I didn't look any deeper into this as a solution because of the
other remaining issues).  

By reading all of the XML data files into an XML database first,
children elements can be added to each record with error information. 
You'll also be able to trace through referential integrity to ensure
it's maintained in subsequent records and then actually import the data
with the PK records going in first.  Additionally, you can test an
entire data set for importation, report back all the errors and let the
user make adjustments as is needed.  It should be very interesting and
fairly simple to implement as the solution is more logic based instead
of code based.

It would be great if someone could review OFBIZ-851 and maybe add it as
a specialpurpose app so that others can play with it and contribute
ideas.

--- Jonathon -- Improov <jo...@improov.com> wrote:

> Chris,
> 
> You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data
> migration can be a pain, even 
> without data-mapping efforts to go with it (ie, same structure
> migrated to same structure).


Re: MySQL connection times out, how about PostgreSQL?

Posted by Jonathon -- Improov <jo...@improov.com>.
Chris,

You mean go from MySQL to Xindice to PostgreSQL? Yeah, I know, data migration can be a pain, even 
without data-mapping efforts to go with it (ie, same structure migrated to same structure).

 > My limited reading through mailing lists on mysql and postgres show that the
 > timeout handling should be handled by the client (ofbiz) and that mysql
 > decided to provided an additional safe gaurd by moving an excessive,
 > configurable, "catch-all" timeout to the server.

If you're saying that a DBCP should assume that a DB connection can never time out (always good), 
I rest my case. :) Personally, I think MySQL could've provided the option to drop the time out 
feature.

FYI to all. If Hans is right, it appears OFBiz's DBCP XAPoolDataSource does handle connection time 
out after all! I'll check it out.

Anyway, the fix, as mentioned by Hans, can be done in MinervaConnectionFactory. I'm gonna try just 
that. Problem solved. On with real work. You're right, OFBiz does abstract tons of infrastructure 
details away from application developers.

Would be good if we can get our hands on the XAPoolDataSource source codes, just to confirm that 
the issue is fixed.

Jonathon

Chris Howe wrote:
> Most of the data migration issues should be solvable by reading the
> migration data files (entity-engine-xml) into a Xindice XML database
> and running ofbiz services from there.  This will allow XPath queries
> for the hierarchical stuff, and XUpdate of children elements for error
> messages and thus group error handling using XPath.  I hope to be able
> to get some of this together shortly as we've been using OFBiz to
> slowly bring our legacy data in line with the OFBiz data model and the
> legacy data is still in MySQL (and much more critical and can't really
> take the risk that I "think" I changed the right data).
> 
> If you don't have data, there shouldn't be any issue.  Simply run the
> ant run-install with a delegator calling the postgres datasource.  The
> impression I get is that most of the community in production is using
> postgres unless they've purchased a license to a mssql or oracle.
> 
> My limited reading through mailing lists on mysql and postgres show
> that the timeout handling should be handled by the client (ofbiz) and
> that mysql decided to provided an additional safe gaurd by moving an
> excessive, configurable, "catch-all" timeout to the server.
> 
> One of my favorite things about OFBiz is that you can get pretty far in
> developing something useful without ever knowing or worrying about
> database administration like this and can come back and address it when
> you're fine tuning your deployment for bottlenecks before going live.
> 
> --- Jonathon -- Improov <jo...@improov.com> wrote:
> 
>> Chris,
>>
>> Wow, thanks. You sound like you've really been through it (the
>> migration).
>>
>> Seems like quite a pain migrating to PostgreSQL from MySQL. But what
>> if I don't have data to 
>> migrate? What if I just start over with PostgreSQL? Any problems?
>> I'll take your advice regarding 
>> the migration gotchas. Thanks.
>>
>> Actually, about DBCPs. For RDBMSes, it's correct for database
>> connections to time out after a set 
>> interval of inactivity. That's just prudence. When connections are
>> used inside of DBCPs, it is the 
>> DBCP's responsibility to refresh timed out connections in the pool. I
>> did my very first very own 
>> DBCP more than a decade ago, and that was one of the must-have
>> functionalities for a DBCP. I was 
>> beaten up real bad for missing that out. :)
>>
>> The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource.
>> I posted another message 
>> asking for the source codes to ofbiz-minerva.jar.
>>
>> Jonathon
>>
>> Chris Howe wrote:
>>> This does not occur in PostgreSQL.  It is a "feature" of MySQL and
>> they
>>> (mysql) will smugly say that OFBiz doesn't handle the connection
>> pool
>>> correctly. I don't know and don't really care to know if it's true
>> or
>>> not.  I switched over about 2 months ago and have had smooth
>> sailing
>>> since (even seemingly eliminated that UserLoginHistory bug that
>> you're
>>> aware of).  
>>>
>>> Be warned, it's a bit of a pain to convert from MySQL to Postgres. 
>>> Most of the issues seem to be of how lax MySQL with data and how
>>> stringent PostgreSQL is(at least the default installation).  These
>> were
>>> some of the issues I came across with my data using the
>> export/import
>>> in webtools
>>>
>>> 1. the createdBy fields in the various entities weren't in the
>> correct
>>> case (i believe this has been solved in OFBiz, I just had data that
>>> predated the fix)
>>> 2. UserLogin and Party entites end up with a circular dependency
>> based
>>> on the partyId admin if the UserLogin admin created parties. 
>> Either
>>> load the single Party record for partyId before loading the
>> UserLogin
>>> entity or remove the createdBy data from the Party entity
>>> 3. Heirarchial parent->child relationships.  This occurs with the
>> *Type
>>> entities.  They simply need to be loaded in the correct order. 
>> There
>>> is a JIRA issue which solves this problem for about the *Type
>> entities
>>> where the child is childId and the parent is parentChildId (e.g.
>>> partyTypeId -> parentPartyTypeId)
>>>
>>> There may have been other referential integrity issues, but I think
>>> they were self created and not created by OFBiz.
>>>
>>> --- Jonathon -- Improov <jo...@improov.com> wrote:
>>>
>>>> MySQL connections in the database connection pool time out
>> (usually
>>>> set to 8 hours). Possible 
>>>> workarounds(?):
>>>>
>>>> 1. Increase the time out value and hope that someone will connect
>> to
>>>> OFBiz
>>>>     before the connections time out.
>>>>
>>>> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
>>>>
>>>> 3. Use PostgreSQL.
>>>>
>>>> In Tomcat, we usually use the "validateQuery" so the DBCP will
>> test
>>>> each connection before giving 
>>>> it to the application. If all connections in the pool has timed
>> out
>>>> (say no one has accessed OFBiz 
>>>> in 8 hours), the DBCP creates new connections for the pool.
>>>>
>>>> If someone will tell me that this doesn't happen for PostgreSQL,
>> I'll
>>>> simply make the switch to 
>>>> PostgreSQL rather than fix things in OFBiz for MySQL.
>>>>
>>>> Thanks.
>>>>
>>>> Jonathon
>>>>
>>>
>>
> 
> 


Re: MySQL connection times out, how about PostgreSQL?

Posted by Chris Howe <cj...@yahoo.com>.
Most of the data migration issues should be solvable by reading the
migration data files (entity-engine-xml) into a Xindice XML database
and running ofbiz services from there.  This will allow XPath queries
for the hierarchical stuff, and XUpdate of children elements for error
messages and thus group error handling using XPath.  I hope to be able
to get some of this together shortly as we've been using OFBiz to
slowly bring our legacy data in line with the OFBiz data model and the
legacy data is still in MySQL (and much more critical and can't really
take the risk that I "think" I changed the right data).

If you don't have data, there shouldn't be any issue.  Simply run the
ant run-install with a delegator calling the postgres datasource.  The
impression I get is that most of the community in production is using
postgres unless they've purchased a license to a mssql or oracle.

My limited reading through mailing lists on mysql and postgres show
that the timeout handling should be handled by the client (ofbiz) and
that mysql decided to provided an additional safe gaurd by moving an
excessive, configurable, "catch-all" timeout to the server.

One of my favorite things about OFBiz is that you can get pretty far in
developing something useful without ever knowing or worrying about
database administration like this and can come back and address it when
you're fine tuning your deployment for bottlenecks before going live.

--- Jonathon -- Improov <jo...@improov.com> wrote:

> Chris,
> 
> Wow, thanks. You sound like you've really been through it (the
> migration).
> 
> Seems like quite a pain migrating to PostgreSQL from MySQL. But what
> if I don't have data to 
> migrate? What if I just start over with PostgreSQL? Any problems?
> I'll take your advice regarding 
> the migration gotchas. Thanks.
> 
> Actually, about DBCPs. For RDBMSes, it's correct for database
> connections to time out after a set 
> interval of inactivity. That's just prudence. When connections are
> used inside of DBCPs, it is the 
> DBCP's responsibility to refresh timed out connections in the pool. I
> did my very first very own 
> DBCP more than a decade ago, and that was one of the must-have
> functionalities for a DBCP. I was 
> beaten up real bad for missing that out. :)
> 
> The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource.
> I posted another message 
> asking for the source codes to ofbiz-minerva.jar.
> 
> Jonathon
> 
> Chris Howe wrote:
> > This does not occur in PostgreSQL.  It is a "feature" of MySQL and
> they
> > (mysql) will smugly say that OFBiz doesn't handle the connection
> pool
> > correctly. I don't know and don't really care to know if it's true
> or
> > not.  I switched over about 2 months ago and have had smooth
> sailing
> > since (even seemingly eliminated that UserLoginHistory bug that
> you're
> > aware of).  
> > 
> > Be warned, it's a bit of a pain to convert from MySQL to Postgres. 
> > Most of the issues seem to be of how lax MySQL with data and how
> > stringent PostgreSQL is(at least the default installation).  These
> were
> > some of the issues I came across with my data using the
> export/import
> > in webtools
> > 
> > 1. the createdBy fields in the various entities weren't in the
> correct
> > case (i believe this has been solved in OFBiz, I just had data that
> > predated the fix)
> > 2. UserLogin and Party entites end up with a circular dependency
> based
> > on the partyId admin if the UserLogin admin created parties. 
> Either
> > load the single Party record for partyId before loading the
> UserLogin
> > entity or remove the createdBy data from the Party entity
> > 3. Heirarchial parent->child relationships.  This occurs with the
> *Type
> > entities.  They simply need to be loaded in the correct order. 
> There
> > is a JIRA issue which solves this problem for about the *Type
> entities
> > where the child is childId and the parent is parentChildId (e.g.
> > partyTypeId -> parentPartyTypeId)
> > 
> > There may have been other referential integrity issues, but I think
> > they were self created and not created by OFBiz.
> > 
> > --- Jonathon -- Improov <jo...@improov.com> wrote:
> > 
> >> MySQL connections in the database connection pool time out
> (usually
> >> set to 8 hours). Possible 
> >> workarounds(?):
> >>
> >> 1. Increase the time out value and hope that someone will connect
> to
> >> OFBiz
> >>     before the connections time out.
> >>
> >> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
> >>
> >> 3. Use PostgreSQL.
> >>
> >> In Tomcat, we usually use the "validateQuery" so the DBCP will
> test
> >> each connection before giving 
> >> it to the application. If all connections in the pool has timed
> out
> >> (say no one has accessed OFBiz 
> >> in 8 hours), the DBCP creates new connections for the pool.
> >>
> >> If someone will tell me that this doesn't happen for PostgreSQL,
> I'll
> >> simply make the switch to 
> >> PostgreSQL rather than fix things in OFBiz for MySQL.
> >>
> >> Thanks.
> >>
> >> Jonathon
> >>
> > 
> > 
> 
> 


Re: MySQL connection times out, how about PostgreSQL?

Posted by Jonathon -- Improov <jo...@improov.com>.
Chris,

Wow, thanks. You sound like you've really been through it (the migration).

Seems like quite a pain migrating to PostgreSQL from MySQL. But what if I don't have data to 
migrate? What if I just start over with PostgreSQL? Any problems? I'll take your advice regarding 
the migration gotchas. Thanks.

Actually, about DBCPs. For RDBMSes, it's correct for database connections to time out after a set 
interval of inactivity. That's just prudence. When connections are used inside of DBCPs, it is the 
DBCP's responsibility to refresh timed out connections in the pool. I did my very first very own 
DBCP more than a decade ago, and that was one of the must-have functionalities for a DBCP. I was 
beaten up real bad for missing that out. :)

The fix for this, in OFBiz, is in OFBiz's DBCP --- XAPoolDataSource. I posted another message 
asking for the source codes to ofbiz-minerva.jar.

Jonathon

Chris Howe wrote:
> This does not occur in PostgreSQL.  It is a "feature" of MySQL and they
> (mysql) will smugly say that OFBiz doesn't handle the connection pool
> correctly. I don't know and don't really care to know if it's true or
> not.  I switched over about 2 months ago and have had smooth sailing
> since (even seemingly eliminated that UserLoginHistory bug that you're
> aware of).  
> 
> Be warned, it's a bit of a pain to convert from MySQL to Postgres. 
> Most of the issues seem to be of how lax MySQL with data and how
> stringent PostgreSQL is(at least the default installation).  These were
> some of the issues I came across with my data using the export/import
> in webtools
> 
> 1. the createdBy fields in the various entities weren't in the correct
> case (i believe this has been solved in OFBiz, I just had data that
> predated the fix)
> 2. UserLogin and Party entites end up with a circular dependency based
> on the partyId admin if the UserLogin admin created parties.  Either
> load the single Party record for partyId before loading the UserLogin
> entity or remove the createdBy data from the Party entity
> 3. Heirarchial parent->child relationships.  This occurs with the *Type
> entities.  They simply need to be loaded in the correct order.  There
> is a JIRA issue which solves this problem for about the *Type entities
> where the child is childId and the parent is parentChildId (e.g.
> partyTypeId -> parentPartyTypeId)
> 
> There may have been other referential integrity issues, but I think
> they were self created and not created by OFBiz.
> 
> --- Jonathon -- Improov <jo...@improov.com> wrote:
> 
>> MySQL connections in the database connection pool time out (usually
>> set to 8 hours). Possible 
>> workarounds(?):
>>
>> 1. Increase the time out value and hope that someone will connect to
>> OFBiz
>>     before the connections time out.
>>
>> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
>>
>> 3. Use PostgreSQL.
>>
>> In Tomcat, we usually use the "validateQuery" so the DBCP will test
>> each connection before giving 
>> it to the application. If all connections in the pool has timed out
>> (say no one has accessed OFBiz 
>> in 8 hours), the DBCP creates new connections for the pool.
>>
>> If someone will tell me that this doesn't happen for PostgreSQL, I'll
>> simply make the switch to 
>> PostgreSQL rather than fix things in OFBiz for MySQL.
>>
>> Thanks.
>>
>> Jonathon
>>
> 
> 


Re: MySQL connection times out, how about PostgreSQL?

Posted by Chris Howe <cj...@yahoo.com>.
This does not occur in PostgreSQL.  It is a "feature" of MySQL and they
(mysql) will smugly say that OFBiz doesn't handle the connection pool
correctly. I don't know and don't really care to know if it's true or
not.  I switched over about 2 months ago and have had smooth sailing
since (even seemingly eliminated that UserLoginHistory bug that you're
aware of).  

Be warned, it's a bit of a pain to convert from MySQL to Postgres. 
Most of the issues seem to be of how lax MySQL with data and how
stringent PostgreSQL is(at least the default installation).  These were
some of the issues I came across with my data using the export/import
in webtools

1. the createdBy fields in the various entities weren't in the correct
case (i believe this has been solved in OFBiz, I just had data that
predated the fix)
2. UserLogin and Party entites end up with a circular dependency based
on the partyId admin if the UserLogin admin created parties.  Either
load the single Party record for partyId before loading the UserLogin
entity or remove the createdBy data from the Party entity
3. Heirarchial parent->child relationships.  This occurs with the *Type
entities.  They simply need to be loaded in the correct order.  There
is a JIRA issue which solves this problem for about the *Type entities
where the child is childId and the parent is parentChildId (e.g.
partyTypeId -> parentPartyTypeId)

There may have been other referential integrity issues, but I think
they were self created and not created by OFBiz.

--- Jonathon -- Improov <jo...@improov.com> wrote:

> MySQL connections in the database connection pool time out (usually
> set to 8 hours). Possible 
> workarounds(?):
> 
> 1. Increase the time out value and hope that someone will connect to
> OFBiz
>     before the connections time out.
> 
> 2. Fix OFBiz to allow to a "validateQuery" mechanism.
> 
> 3. Use PostgreSQL.
> 
> In Tomcat, we usually use the "validateQuery" so the DBCP will test
> each connection before giving 
> it to the application. If all connections in the pool has timed out
> (say no one has accessed OFBiz 
> in 8 hours), the DBCP creates new connections for the pool.
> 
> If someone will tell me that this doesn't happen for PostgreSQL, I'll
> simply make the switch to 
> PostgreSQL rather than fix things in OFBiz for MySQL.
> 
> Thanks.
> 
> Jonathon
>