You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cayenne.apache.org by Giulio Cesare Solaroli <gi...@gmail.com> on 2007/09/21 16:31:20 UTC

Optimize cascade deletes

Hello everybody,

I am facing a performance bottle neck while commiting the dataContext
when many "cascade" rules are applied to relevant relationships.

This scenario is quite simple: in my db schema there is a core entity
that is the User; this entity has many subordinate information that
are collected in order to provide a useful service.

When I try to delete an user, all the cascade rules are correctly
applied, but the all process takes about 7 minutes for a regular
account. This is much more than we can bear for any operation.

My understanding is that Cayenne, in order to perform a cascade
delete, load all the objects matched by the given relationship, and
delete them one at a time. This works beautifully when I have a few
records attached to the main record, but it is failing miserably when
the number of depending records grows considerably.

Looking into the documentation I could found a  very promsing
"BatchDeleteQuery", but I have not understood how I am supposed to use
it. What I would like to do, is using the query structure stored into
the model to tell the which object to delete in batch.

The SQL I am expecting Cayenne to generate would look like this:

delete from user_detail where id_user = <id of the user I want to delete>;


Am I barking at the wrong tree, or is BatchDeleteQuery the right place
where to look to achieve this result?

Thank you in advance for any comment or suggestion.

Best regards,

Giulio Cesare

Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
Hello Tore,

On 10/2/07, Tore Halset <ha...@pvv.ntnu.no> wrote:
> Hello.
>
> This does not sound like a cayenne issue any more.

Definitely.


> Here is what I would do to get going.
>
>   * Remove the extra OS layer if you are still using it. Was it
> Parallells?

Parallels could alter the absolute values of the operation timing; but
I have see an consistent difference between 20 and 100 times faster on
the newly created table; this make me think that it is not a problem
bound to the usage of a virtual machine.


>   * Update to latest PostgreSQL and JDBC driver.

I may try to update to the latest PostgreSQL version, but it is not a
JDBC issue as I got the same problem issuing the query right from
psql.


>   * Use vacuum analyze and autovacuum

I have already tried this.


>   * Find slow queries and use "explain <query>"

The slow query is a simple delete on a table accessed using the
primary key. The explain plan reports an index scan, as expected.


>   * Make sure PostgreSQL are set up with enough memory.

The current instance i probably running with a very low memory
footprint, but if a copy of the same table on the same PostgreSQL
instance is running the same query 20 to 100 times faster, there is
something else to fix before tuning the memory.


>   * Ask on pgsql-performance@postgresql.org

Great pointer. Thanks!!!! :-)

Best regards,

Giulio Cesare




> On Oct 1, 2007, at 19:05 , Giulio Cesare Solaroli wrote:
>
> > Yes,
> >
> > I have vacuumed it and also rebuild the indexes to no avail. :-(
> >
> > I am not a PostgreSQL expert (and not even an expert DBA in general),
> > so it is very possible that I am doing something very wrong.
> >
> > This is both a good news (the application should be fine, and I don't
> > need to tune it much further) and bad news (I need to learn how to
> > tune a PostgreSQL db, or find someone that could help).
> >
> > Best regards,
> >
> > Giulio Cesare
> >
> >
> >
> >
> > On 10/1/07, Michael Gentry <bl...@gmail.com> wrote:
> >> Have you vacuumed the original (slow) table?
> >>
> >> /dev/mrg
> >>
> >> On 10/1/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> >>> Hello everybody,
> >>>
> >>> today I have being able to do some more tests and I have found out
> >>> that the problem seems to be caused by a "fragmentation" of the data
> >>> on the Postgresql table space.
> >>>
> >>> I have created a copy of the table where delete statements where
> >>> very
> >>> slow (using the "create table .... as select from ..." syntax) and
> >>> matching the structure of the new copy to the original one (index
> >>> and
> >>> constraint wise); in this new table the performance of the delete
> >>> statements where from 20 to 100 times faster that in the original
> >>> table. :-(
> >>>
> >>> This rules out Cayenne as the culprit, but leave me wondering how to
> >>> avoid this degradation of performances on Postgresql; but this is
> >>> probably not the right place where to start.
> >>>
> >>> Thank you everybody for your attention and for the very useful
> >>> suggestions or pointers that greatly helped me in understanding this
> >>> problem.
> >>>
> >>> Best regards,
> >>>
> >>> Giulio Cesare
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> >>>> Hello,
> >>>>
> >>>> I have collected a few more data, as suggested by Aristedes.
> >>>>
> >>>> On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> >>>>> On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
> >>>>>>
> >>>>>> On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:
> >>>>>>
> >>>>>>> log4j.logger.org.apache.cayenne=ERROR
> >>>>>>> log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
> >>>>>>>
> >>>>>>>
> >>>>>>> Is there anything going on on the Cayenne code between the
> >>>>>>> last [batch
> >>>>>>> bind:xxx] log and the "updated" log?
> >>>>>>
> >>>>>> Well, you could always try:
> >>>>>>
> >>>>>> log4j.logger.org.apache.cayenne=DEBUG
> >>>>>
> >>>>> Argh. I should have thought about this myself. :-(
> >>>>
> >>>> Even with full log enabled I got the same exact log, so it looks
> >>>> like
> >>>> all the time is really spent on the DB. And the next logs confirm
> >>>> this.
> >>>>
> >>>>
> >>>>>>> Is all this time spent on the DB only?
> >>>>>>
> >>>>>> What does turning on log_min_duration_statement tell you?
> >>>>
> >>>> I have being able to run PostgreSQL with logging enabled only on my
> >>>> Parallels instance, as I can not start/stop the db on the
> >>>> deployment
> >>>> server; this will make the absolute times not that relevant, but I
> >>>> hope we can understand something more anyway.
> >>>>
> >>>> First of all, there is a huge difference between the deletion
> >>>> time of
> >>>> the "cascade" records in different tables.
> >>>>
> >>>> [...]
> >>>> LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> >>>> LOG:  duration: 0.021 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> >>>> LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> >>>> [...]
> >>>>
> >>>> To delete rows on the USRCNNRQS (aka UserConnectionRequest)
> >>>> table, it
> >>>> averages at bout 0.020 ms.
> >>>>
> >>>>
> >>>> For another table (USRCNN aka UserConnection) the times range from
> >>>> 0.145 to 0.060, for later settling for 0.040 ms:
> >>>>
> >>>> [...]
> >>>> LOG:  duration: 0.090 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> >>>> LOG:  duration: 0.071 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> >>>> LOG:  duration: 0.067 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> >>>> [...]
> >>>>
> >>>>
> >>>> It later took 43 seconds (!!) to parse and execute the deletion
> >>>> of a
> >>>> single record, ...
> >>>> [...]
> >>>> LOG:  statement: PREPARE S_5 AS DELETE FROM clipperz.RCR WHERE
> >>>> ID_RCR = $1
> >>>> LOG:  statement: <BIND>
> >>>> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> >>>> clipperz.RCR WHERE ID_RCR = $1]
> >>>> LOG:  duration: 42.998 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> >>>> LOG:  statement: <BIND>
> >>>> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> >>>> clipperz.RCR WHERE ID_RCR = $1]
> >>>> [...]
> >>>>
> >>>> ...later spotting time as low as 0.042 for the execution of the
> >>>> same query:
> >>>> [...]
> >>>> LOG:  duration: 0.042 ms  statement: EXECUTE <unnamed>  [PREPARE:
> >>>> DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> >>>> LOG:  statement: <BIND>
> >>>> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> >>>> clipperz.RCR WHERE ID_RCR = $1]
> >>>> [...]
> >>>>
> >>>> But as I may understand this different timing may be due to the
> >>>> interaction of the Virtual machine with the real server, what is
> >>>> puzzling me is that the deletion from the latest table had a
> >>>> "constant" time of about 2 seconds each:
> >>>> [...]
> >>>> LOG:  duration: 1929.043 ms  statement: EXECUTE <unnamed>
> >>>> [PREPARE:
> >>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> >>>> LOG:  duration: 1991.311 ms  statement: EXECUTE <unnamed>
> >>>> [PREPARE:
> >>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> >>>> LOG:  duration: 2078.249 ms  statement: EXECUTE <unnamed>
> >>>> [PREPARE:
> >>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> >>>> LOG:  duration: 2015.047 ms  statement: EXECUTE <unnamed>
> >>>> [PREPARE:
> >>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> >>>> [...]
> >>>>
> >>>>
> >>>> If nobody as something else to suggest, I will try to get back
> >>>> the the
> >>>> DB schema to see if there is some constraint/index that is slowing
> >>>> down the deletion of these records.
> >>>>
> >>>> Thank you very much for your attention.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Giulio Cesare
> >>>>
> >>>
> >>
> >
>
>

Re: Optimize cascade deletes

Posted by Tore Halset <ha...@pvv.ntnu.no>.
Hello.

This does not sound like a cayenne issue any more. Here is what I  
would do to get going.

  * Remove the extra OS layer if you are still using it. Was it  
Parallells?
  * Update to latest PostgreSQL and JDBC driver.
  * Use vacuum analyze and autovacuum
  * Find slow queries and use "explain <query>"
  * Make sure PostgreSQL are set up with enough memory.
  * Ask on pgsql-performance@postgresql.org

Regards,
  - Tore.

On Oct 1, 2007, at 19:05 , Giulio Cesare Solaroli wrote:

> Yes,
>
> I have vacuumed it and also rebuild the indexes to no avail. :-(
>
> I am not a PostgreSQL expert (and not even an expert DBA in general),
> so it is very possible that I am doing something very wrong.
>
> This is both a good news (the application should be fine, and I don't
> need to tune it much further) and bad news (I need to learn how to
> tune a PostgreSQL db, or find someone that could help).
>
> Best regards,
>
> Giulio Cesare
>
>
>
>
> On 10/1/07, Michael Gentry <bl...@gmail.com> wrote:
>> Have you vacuumed the original (slow) table?
>>
>> /dev/mrg
>>
>> On 10/1/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
>>> Hello everybody,
>>>
>>> today I have being able to do some more tests and I have found out
>>> that the problem seems to be caused by a "fragmentation" of the data
>>> on the Postgresql table space.
>>>
>>> I have created a copy of the table where delete statements where  
>>> very
>>> slow (using the "create table .... as select from ..." syntax) and
>>> matching the structure of the new copy to the original one (index  
>>> and
>>> constraint wise); in this new table the performance of the delete
>>> statements where from 20 to 100 times faster that in the original
>>> table. :-(
>>>
>>> This rules out Cayenne as the culprit, but leave me wondering how to
>>> avoid this degradation of performances on Postgresql; but this is
>>> probably not the right place where to start.
>>>
>>> Thank you everybody for your attention and for the very useful
>>> suggestions or pointers that greatly helped me in understanding this
>>> problem.
>>>
>>> Best regards,
>>>
>>> Giulio Cesare
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
>>>> Hello,
>>>>
>>>> I have collected a few more data, as suggested by Aristedes.
>>>>
>>>> On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
>>>>> On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
>>>>>>
>>>>>> On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:
>>>>>>
>>>>>>> log4j.logger.org.apache.cayenne=ERROR
>>>>>>> log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
>>>>>>>
>>>>>>>
>>>>>>> Is there anything going on on the Cayenne code between the  
>>>>>>> last [batch
>>>>>>> bind:xxx] log and the "updated" log?
>>>>>>
>>>>>> Well, you could always try:
>>>>>>
>>>>>> log4j.logger.org.apache.cayenne=DEBUG
>>>>>
>>>>> Argh. I should have thought about this myself. :-(
>>>>
>>>> Even with full log enabled I got the same exact log, so it looks  
>>>> like
>>>> all the time is really spent on the DB. And the next logs confirm
>>>> this.
>>>>
>>>>
>>>>>>> Is all this time spent on the DB only?
>>>>>>
>>>>>> What does turning on log_min_duration_statement tell you?
>>>>
>>>> I have being able to run PostgreSQL with logging enabled only on my
>>>> Parallels instance, as I can not start/stop the db on the  
>>>> deployment
>>>> server; this will make the absolute times not that relevant, but I
>>>> hope we can understand something more anyway.
>>>>
>>>> First of all, there is a huge difference between the deletion  
>>>> time of
>>>> the "cascade" records in different tables.
>>>>
>>>> [...]
>>>> LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
>>>> LOG:  duration: 0.021 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
>>>> LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
>>>> [...]
>>>>
>>>> To delete rows on the USRCNNRQS (aka UserConnectionRequest)  
>>>> table, it
>>>> averages at bout 0.020 ms.
>>>>
>>>>
>>>> For another table (USRCNN aka UserConnection) the times range from
>>>> 0.145 to 0.060, for later settling for 0.040 ms:
>>>>
>>>> [...]
>>>> LOG:  duration: 0.090 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
>>>> LOG:  duration: 0.071 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
>>>> LOG:  duration: 0.067 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
>>>> [...]
>>>>
>>>>
>>>> It later took 43 seconds (!!) to parse and execute the deletion  
>>>> of a
>>>> single record, ...
>>>> [...]
>>>> LOG:  statement: PREPARE S_5 AS DELETE FROM clipperz.RCR WHERE  
>>>> ID_RCR = $1
>>>> LOG:  statement: <BIND>
>>>> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
>>>> clipperz.RCR WHERE ID_RCR = $1]
>>>> LOG:  duration: 42.998 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
>>>> LOG:  statement: <BIND>
>>>> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
>>>> clipperz.RCR WHERE ID_RCR = $1]
>>>> [...]
>>>>
>>>> ...later spotting time as low as 0.042 for the execution of the  
>>>> same query:
>>>> [...]
>>>> LOG:  duration: 0.042 ms  statement: EXECUTE <unnamed>  [PREPARE:
>>>> DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
>>>> LOG:  statement: <BIND>
>>>> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
>>>> clipperz.RCR WHERE ID_RCR = $1]
>>>> [...]
>>>>
>>>> But as I may understand this different timing may be due to the
>>>> interaction of the Virtual machine with the real server, what is
>>>> puzzling me is that the deletion from the latest table had a
>>>> "constant" time of about 2 seconds each:
>>>> [...]
>>>> LOG:  duration: 1929.043 ms  statement: EXECUTE <unnamed>   
>>>> [PREPARE:
>>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
>>>> LOG:  duration: 1991.311 ms  statement: EXECUTE <unnamed>   
>>>> [PREPARE:
>>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
>>>> LOG:  duration: 2078.249 ms  statement: EXECUTE <unnamed>   
>>>> [PREPARE:
>>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
>>>> LOG:  duration: 2015.047 ms  statement: EXECUTE <unnamed>   
>>>> [PREPARE:
>>>> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
>>>> [...]
>>>>
>>>>
>>>> If nobody as something else to suggest, I will try to get back  
>>>> the the
>>>> DB schema to see if there is some constraint/index that is slowing
>>>> down the deletion of these records.
>>>>
>>>> Thank you very much for your attention.
>>>>
>>>> Best regards,
>>>>
>>>> Giulio Cesare
>>>>
>>>
>>
>


Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
Yes,

I have vacuumed it and also rebuild the indexes to no avail. :-(

I am not a PostgreSQL expert (and not even an expert DBA in general),
so it is very possible that I am doing something very wrong.

This is both a good news (the application should be fine, and I don't
need to tune it much further) and bad news (I need to learn how to
tune a PostgreSQL db, or find someone that could help).

Best regards,

Giulio Cesare




On 10/1/07, Michael Gentry <bl...@gmail.com> wrote:
> Have you vacuumed the original (slow) table?
>
> /dev/mrg
>
> On 10/1/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> > Hello everybody,
> >
> > today I have being able to do some more tests and I have found out
> > that the problem seems to be caused by a "fragmentation" of the data
> > on the Postgresql table space.
> >
> > I have created a copy of the table where delete statements where very
> > slow (using the "create table .... as select from ..." syntax) and
> > matching the structure of the new copy to the original one (index and
> > constraint wise); in this new table the performance of the delete
> > statements where from 20 to 100 times faster that in the original
> > table. :-(
> >
> > This rules out Cayenne as the culprit, but leave me wondering how to
> > avoid this degradation of performances on Postgresql; but this is
> > probably not the right place where to start.
> >
> > Thank you everybody for your attention and for the very useful
> > suggestions or pointers that greatly helped me in understanding this
> > problem.
> >
> > Best regards,
> >
> > Giulio Cesare
> >
> >
> >
> >
> >
> >
> >
> > On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> > > Hello,
> > >
> > > I have collected a few more data, as suggested by Aristedes.
> > >
> > > On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> > > > On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
> > > > >
> > > > > On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:
> > > > >
> > > > > > log4j.logger.org.apache.cayenne=ERROR
> > > > > > log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
> > > > > >
> > > > > >
> > > > > > Is there anything going on on the Cayenne code between the last [batch
> > > > > > bind:xxx] log and the "updated" log?
> > > > >
> > > > > Well, you could always try:
> > > > >
> > > > > log4j.logger.org.apache.cayenne=DEBUG
> > > >
> > > > Argh. I should have thought about this myself. :-(
> > >
> > > Even with full log enabled I got the same exact log, so it looks like
> > > all the time is really spent on the DB. And the next logs confirm
> > > this.
> > >
> > >
> > > > > > Is all this time spent on the DB only?
> > > > >
> > > > > What does turning on log_min_duration_statement tell you?
> > >
> > > I have being able to run PostgreSQL with logging enabled only on my
> > > Parallels instance, as I can not start/stop the db on the deployment
> > > server; this will make the absolute times not that relevant, but I
> > > hope we can understand something more anyway.
> > >
> > > First of all, there is a huge difference between the deletion time of
> > > the "cascade" records in different tables.
> > >
> > > [...]
> > > LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> > > LOG:  duration: 0.021 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> > > LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> > > [...]
> > >
> > > To delete rows on the USRCNNRQS (aka UserConnectionRequest) table, it
> > > averages at bout 0.020 ms.
> > >
> > >
> > > For another table (USRCNN aka UserConnection) the times range from
> > > 0.145 to 0.060, for later settling for 0.040 ms:
> > >
> > > [...]
> > > LOG:  duration: 0.090 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> > > LOG:  duration: 0.071 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> > > LOG:  duration: 0.067 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> > > [...]
> > >
> > >
> > > It later took 43 seconds (!!) to parse and execute the deletion of a
> > > single record, ...
> > > [...]
> > > LOG:  statement: PREPARE S_5 AS DELETE FROM clipperz.RCR WHERE ID_RCR = $1
> > > LOG:  statement: <BIND>
> > > LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> > > clipperz.RCR WHERE ID_RCR = $1]
> > > LOG:  duration: 42.998 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> > > LOG:  statement: <BIND>
> > > LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> > > clipperz.RCR WHERE ID_RCR = $1]
> > > [...]
> > >
> > > ...later spotting time as low as 0.042 for the execution of the same query:
> > > [...]
> > > LOG:  duration: 0.042 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> > > LOG:  statement: <BIND>
> > > LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> > > clipperz.RCR WHERE ID_RCR = $1]
> > > [...]
> > >
> > > But as I may understand this different timing may be due to the
> > > interaction of the Virtual machine with the real server, what is
> > > puzzling me is that the deletion from the latest table had a
> > > "constant" time of about 2 seconds each:
> > > [...]
> > > LOG:  duration: 1929.043 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > > LOG:  duration: 1991.311 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > > LOG:  duration: 2078.249 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > > LOG:  duration: 2015.047 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > > [...]
> > >
> > >
> > > If nobody as something else to suggest, I will try to get back the the
> > > DB schema to see if there is some constraint/index that is slowing
> > > down the deletion of these records.
> > >
> > > Thank you very much for your attention.
> > >
> > > Best regards,
> > >
> > > Giulio Cesare
> > >
> >
>

Re: Optimize cascade deletes

Posted by Michael Gentry <bl...@gmail.com>.
Have you vacuumed the original (slow) table?

/dev/mrg

On 10/1/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> Hello everybody,
>
> today I have being able to do some more tests and I have found out
> that the problem seems to be caused by a "fragmentation" of the data
> on the Postgresql table space.
>
> I have created a copy of the table where delete statements where very
> slow (using the "create table .... as select from ..." syntax) and
> matching the structure of the new copy to the original one (index and
> constraint wise); in this new table the performance of the delete
> statements where from 20 to 100 times faster that in the original
> table. :-(
>
> This rules out Cayenne as the culprit, but leave me wondering how to
> avoid this degradation of performances on Postgresql; but this is
> probably not the right place where to start.
>
> Thank you everybody for your attention and for the very useful
> suggestions or pointers that greatly helped me in understanding this
> problem.
>
> Best regards,
>
> Giulio Cesare
>
>
>
>
>
>
>
> On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> > Hello,
> >
> > I have collected a few more data, as suggested by Aristedes.
> >
> > On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> > > On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
> > > >
> > > > On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:
> > > >
> > > > > log4j.logger.org.apache.cayenne=ERROR
> > > > > log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
> > > > >
> > > > >
> > > > > Is there anything going on on the Cayenne code between the last [batch
> > > > > bind:xxx] log and the "updated" log?
> > > >
> > > > Well, you could always try:
> > > >
> > > > log4j.logger.org.apache.cayenne=DEBUG
> > >
> > > Argh. I should have thought about this myself. :-(
> >
> > Even with full log enabled I got the same exact log, so it looks like
> > all the time is really spent on the DB. And the next logs confirm
> > this.
> >
> >
> > > > > Is all this time spent on the DB only?
> > > >
> > > > What does turning on log_min_duration_statement tell you?
> >
> > I have being able to run PostgreSQL with logging enabled only on my
> > Parallels instance, as I can not start/stop the db on the deployment
> > server; this will make the absolute times not that relevant, but I
> > hope we can understand something more anyway.
> >
> > First of all, there is a huge difference between the deletion time of
> > the "cascade" records in different tables.
> >
> > [...]
> > LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> > LOG:  duration: 0.021 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> > LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> > [...]
> >
> > To delete rows on the USRCNNRQS (aka UserConnectionRequest) table, it
> > averages at bout 0.020 ms.
> >
> >
> > For another table (USRCNN aka UserConnection) the times range from
> > 0.145 to 0.060, for later settling for 0.040 ms:
> >
> > [...]
> > LOG:  duration: 0.090 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> > LOG:  duration: 0.071 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> > LOG:  duration: 0.067 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> > [...]
> >
> >
> > It later took 43 seconds (!!) to parse and execute the deletion of a
> > single record, ...
> > [...]
> > LOG:  statement: PREPARE S_5 AS DELETE FROM clipperz.RCR WHERE ID_RCR = $1
> > LOG:  statement: <BIND>
> > LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> > clipperz.RCR WHERE ID_RCR = $1]
> > LOG:  duration: 42.998 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> > LOG:  statement: <BIND>
> > LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> > clipperz.RCR WHERE ID_RCR = $1]
> > [...]
> >
> > ...later spotting time as low as 0.042 for the execution of the same query:
> > [...]
> > LOG:  duration: 0.042 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> > LOG:  statement: <BIND>
> > LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> > clipperz.RCR WHERE ID_RCR = $1]
> > [...]
> >
> > But as I may understand this different timing may be due to the
> > interaction of the Virtual machine with the real server, what is
> > puzzling me is that the deletion from the latest table had a
> > "constant" time of about 2 seconds each:
> > [...]
> > LOG:  duration: 1929.043 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > LOG:  duration: 1991.311 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > LOG:  duration: 2078.249 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > LOG:  duration: 2015.047 ms  statement: EXECUTE <unnamed>  [PREPARE:
> > DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> > [...]
> >
> >
> > If nobody as something else to suggest, I will try to get back the the
> > DB schema to see if there is some constraint/index that is slowing
> > down the deletion of these records.
> >
> > Thank you very much for your attention.
> >
> > Best regards,
> >
> > Giulio Cesare
> >
>

Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
Hello everybody,

today I have being able to do some more tests and I have found out
that the problem seems to be caused by a "fragmentation" of the data
on the Postgresql table space.

I have created a copy of the table where delete statements where very
slow (using the "create table .... as select from ..." syntax) and
matching the structure of the new copy to the original one (index and
constraint wise); in this new table the performance of the delete
statements where from 20 to 100 times faster that in the original
table. :-(

This rules out Cayenne as the culprit, but leave me wondering how to
avoid this degradation of performances on Postgresql; but this is
probably not the right place where to start.

Thank you everybody for your attention and for the very useful
suggestions or pointers that greatly helped me in understanding this
problem.

Best regards,

Giulio Cesare







On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> Hello,
>
> I have collected a few more data, as suggested by Aristedes.
>
> On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> > On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
> > >
> > > On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:
> > >
> > > > log4j.logger.org.apache.cayenne=ERROR
> > > > log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
> > > >
> > > >
> > > > Is there anything going on on the Cayenne code between the last [batch
> > > > bind:xxx] log and the "updated" log?
> > >
> > > Well, you could always try:
> > >
> > > log4j.logger.org.apache.cayenne=DEBUG
> >
> > Argh. I should have thought about this myself. :-(
>
> Even with full log enabled I got the same exact log, so it looks like
> all the time is really spent on the DB. And the next logs confirm
> this.
>
>
> > > > Is all this time spent on the DB only?
> > >
> > > What does turning on log_min_duration_statement tell you?
>
> I have being able to run PostgreSQL with logging enabled only on my
> Parallels instance, as I can not start/stop the db on the deployment
> server; this will make the absolute times not that relevant, but I
> hope we can understand something more anyway.
>
> First of all, there is a huge difference between the deletion time of
> the "cascade" records in different tables.
>
> [...]
> LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> LOG:  duration: 0.021 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
> [...]
>
> To delete rows on the USRCNNRQS (aka UserConnectionRequest) table, it
> averages at bout 0.020 ms.
>
>
> For another table (USRCNN aka UserConnection) the times range from
> 0.145 to 0.060, for later settling for 0.040 ms:
>
> [...]
> LOG:  duration: 0.090 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> LOG:  duration: 0.071 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> LOG:  duration: 0.067 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
> [...]
>
>
> It later took 43 seconds (!!) to parse and execute the deletion of a
> single record, ...
> [...]
> LOG:  statement: PREPARE S_5 AS DELETE FROM clipperz.RCR WHERE ID_RCR = $1
> LOG:  statement: <BIND>
> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> clipperz.RCR WHERE ID_RCR = $1]
> LOG:  duration: 42.998 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> LOG:  statement: <BIND>
> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> clipperz.RCR WHERE ID_RCR = $1]
> [...]
>
> ...later spotting time as low as 0.042 for the execution of the same query:
> [...]
> LOG:  duration: 0.042 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
> LOG:  statement: <BIND>
> LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
> clipperz.RCR WHERE ID_RCR = $1]
> [...]
>
> But as I may understand this different timing may be due to the
> interaction of the Virtual machine with the real server, what is
> puzzling me is that the deletion from the latest table had a
> "constant" time of about 2 seconds each:
> [...]
> LOG:  duration: 1929.043 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> LOG:  duration: 1991.311 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> LOG:  duration: 2078.249 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> LOG:  duration: 2015.047 ms  statement: EXECUTE <unnamed>  [PREPARE:
> DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
> [...]
>
>
> If nobody as something else to suggest, I will try to get back the the
> DB schema to see if there is some constraint/index that is slowing
> down the deletion of these records.
>
> Thank you very much for your attention.
>
> Best regards,
>
> Giulio Cesare
>

Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
Hello,

I have collected a few more data, as suggested by Aristedes.

On 9/23/07, Giulio Cesare Solaroli <gi...@gmail.com> wrote:
> On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
> >
> > On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:
> >
> > > log4j.logger.org.apache.cayenne=ERROR
> > > log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
> > >
> > >
> > > Is there anything going on on the Cayenne code between the last [batch
> > > bind:xxx] log and the "updated" log?
> >
> > Well, you could always try:
> >
> > log4j.logger.org.apache.cayenne=DEBUG
>
> Argh. I should have thought about this myself. :-(

Even with full log enabled I got the same exact log, so it looks like
all the time is really spent on the DB. And the next logs confirm
this.


> > > Is all this time spent on the DB only?
> >
> > What does turning on log_min_duration_statement tell you?

I have being able to run PostgreSQL with logging enabled only on my
Parallels instance, as I can not start/stop the db on the deployment
server; this will make the absolute times not that relevant, but I
hope we can understand something more anyway.

First of all, there is a huge difference between the deletion time of
the "cascade" records in different tables.

[...]
LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
LOG:  duration: 0.021 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
LOG:  duration: 0.020 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM connection.USRCNNRQS WHERE ID_USRCNNRQS = $1]
[...]

To delete rows on the USRCNNRQS (aka UserConnectionRequest) table, it
averages at bout 0.020 ms.


For another table (USRCNN aka UserConnection) the times range from
0.145 to 0.060, for later settling for 0.040 ms:

[...]
LOG:  duration: 0.090 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
LOG:  duration: 0.071 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
LOG:  duration: 0.067 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM connection.USRCNN WHERE ID_USRCNN = $1]
[...]


It later took 43 seconds (!!) to parse and execute the deletion of a
single record, ...
[...]
LOG:  statement: PREPARE S_5 AS DELETE FROM clipperz.RCR WHERE ID_RCR = $1
LOG:  statement: <BIND>
LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
clipperz.RCR WHERE ID_RCR = $1]
LOG:  duration: 42.998 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
LOG:  statement: <BIND>
LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
clipperz.RCR WHERE ID_RCR = $1]
[...]

...later spotting time as low as 0.042 for the execution of the same query:
[...]
LOG:  duration: 0.042 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM clipperz.RCR WHERE ID_RCR = $1]
LOG:  statement: <BIND>
LOG:  statement: EXECUTE <unnamed>  [PREPARE:  DELETE FROM
clipperz.RCR WHERE ID_RCR = $1]
[...]

But as I may understand this different timing may be due to the
interaction of the Virtual machine with the real server, what is
puzzling me is that the deletion from the latest table had a
"constant" time of about 2 seconds each:
[...]
LOG:  duration: 1929.043 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
LOG:  duration: 1991.311 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
LOG:  duration: 2078.249 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
LOG:  duration: 2015.047 ms  statement: EXECUTE <unnamed>  [PREPARE:
DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = $1]
[...]


If nobody as something else to suggest, I will try to get back the the
DB schema to see if there is some constraint/index that is slowing
down the deletion of these records.

Thank you very much for your attention.

Best regards,

Giulio Cesare

Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
>
> On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:
>
> > log4j.logger.org.apache.cayenne=ERROR
> > log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
> >
> >
> > Is there anything going on on the Cayenne code between the last [batch
> > bind:xxx] log and the "updated" log?
>
> Well, you could always try:
>
> log4j.logger.org.apache.cayenne=DEBUG

Argh. I should have thought about this myself. :-(


> > Is all this time spent on the DB only?
>
> What does turning on log_min_duration_statement tell you?

Woow!! This is a great reference for PostgreSQL debugging!!

I will try to apply both suggestion and run the tests again.

Thank you very much for your attention.

Best regards,

Giulio Cesare


>
> Ari Maniatis
>
>
>
> -------------------------->
> Aristedes Maniatis
> phone +61 2 9660 9700
> PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8
>
>
>
>

Re: Optimize cascade deletes

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 23/09/2007, at 5:38 PM, Giulio Cesare Solaroli wrote:

> log4j.logger.org.apache.cayenne=ERROR
> log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG
>
>
> Is there anything going on on the Cayenne code between the last [batch
> bind:xxx] log and the "updated" log?

Well, you could always try:

log4j.logger.org.apache.cayenne=DEBUG

> Is all this time spent on the DB only?

What does turning on log_min_duration_statement	tell you?

Ari Maniatis



-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8



Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
Hello Aristedes,

On 9/23/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
>
> On 22/09/2007, at 11:41 PM, Giulio Cesare Solaroli wrote:
>
> > 15 seconds are a lot of time, but they would be bearable. The problem
> > is that I suppose this time will gradually increase with the growth of
> > the number of records to delete.
>
> That is still about 1000 times the duration I'd expect this to take.
> There is still a significant problem which you might be able to
> identify with some profiling or logging.

[15:21:12] DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = ?
[15:21:12] [batch bind: 1433]
[15:21:12] [batch bind: 797]
[....]
[15:21:12] [batch bind: 55637]
[15:21:12] [batch bind: 51363]
[15:21:27] === updated 182 rows.

the above log was generated using these log4j preferencess:

log4j.logger.org.apache.cayenne=ERROR
log4j.logger.org.apache.cayenne.access.QueryLogger=DEBUG


Is there anything going on on the Cayenne code between the last [batch
bind:xxx] log and the "updated" log?

Is all this time spent on the DB only?

Thanks for your feedback.

Best regards,

Giulio Cesare

Re: Optimize cascade deletes

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 22/09/2007, at 11:41 PM, Giulio Cesare Solaroli wrote:

> 15 seconds are a lot of time, but they would be bearable. The problem
> is that I suppose this time will gradually increase with the growth of
> the number of records to delete.

That is still about 1000 times the duration I'd expect this to take.  
There is still a significant problem which you might be able to  
identify with some profiling or logging.

Ari



-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8



Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
Hello Andrus,

thank you very much for the pointers; I will start digging them soon.

Best regards,

Giulio Cesare



On 9/22/07, Andrus Adamchik <an...@objectstyle.org> wrote:
>
> On Sep 22, 2007, at 4:41 PM, Giulio Cesare Solaroli wrote:
> >
> >>> The SQL I am expecting Cayenne to generate would look like this:
> >>>
> >>> delete from user_detail where id_user = <id of the user I want to
> >>> delete>;
> >>
> >> Cayenne 3.0M2 (yet unreleased) supports such functionality via an
> >> EJBQL query, but maybe you don't need to do direct DB deletion
> >> just yet.
> >
> > Where should I start looking to try to integrate the new revision of
> > Cayenne in my project in order to take advantage of this feature?
>
>
> Here is a few links that should help you to get started:
>
> * Today's nightly build:  http://people.apache.org/~aadamchik/nightly/
> 09222007/
> * The jars from the build that you need (cayenne-server.jar +
> dependencies): http://cayenne.apache.org/doc/jar-files-and-
> dependencies.html
> * EJBQLQuery documentation: http://cayenne.apache.org/doc/
> ejbqlquery.html
> * Additional details of EJBQL syntax can be found in the JPA spec,
> Chapter 4: http://jcp.org/aboutJava/communityprocess/final/jsr220/
> index.html
>
> Your query may look like this:
>
>    String ejbql = "delete from UserDetail ud where ud.user = :user";
>    EJBQLQuery query = new EJBQLQuery(ejbql);
>    query.setParameter("user", user);
>    context.performGenericQuery(query);
>
>
> Andrus
>
>
>
>

Re: Optimize cascade deletes

Posted by Andrus Adamchik <an...@objectstyle.org>.
On Sep 22, 2007, at 4:41 PM, Giulio Cesare Solaroli wrote:
>
>>> The SQL I am expecting Cayenne to generate would look like this:
>>>
>>> delete from user_detail where id_user = <id of the user I want to
>>> delete>;
>>
>> Cayenne 3.0M2 (yet unreleased) supports such functionality via an
>> EJBQL query, but maybe you don't need to do direct DB deletion  
>> just yet.
>
> Where should I start looking to try to integrate the new revision of
> Cayenne in my project in order to take advantage of this feature?


Here is a few links that should help you to get started:

* Today's nightly build:  http://people.apache.org/~aadamchik/nightly/ 
09222007/
* The jars from the build that you need (cayenne-server.jar +  
dependencies): http://cayenne.apache.org/doc/jar-files-and- 
dependencies.html
* EJBQLQuery documentation: http://cayenne.apache.org/doc/ 
ejbqlquery.html
* Additional details of EJBQL syntax can be found in the JPA spec,  
Chapter 4: http://jcp.org/aboutJava/communityprocess/final/jsr220/ 
index.html

Your query may look like this:

   String ejbql = "delete from UserDetail ud where ud.user = :user";
   EJBQLQuery query = new EJBQLQuery(ejbql);
   query.setParameter("user", user);
   context.performGenericQuery(query);

Andrus




Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
On 9/22/07, Andrus Adamchik <an...@objectstyle.org> wrote:
>
> On Sep 21, 2007, at 7:45 PM, Giulio Cesare Solaroli wrote:
>
> >
> > [18:09:40] DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = ?
> > [18:09:40] [batch bind: 968]
> > [18:09:40] [batch bind: 875]
> > [......]
> > [18:09:40] [batch bind: 1177]
> > [18:09:40] [batch bind: 2792]
> > [18:11:54] === updated 68 rows.
> >
> > In this case, to delete just 68 rows it took more than two minutes,
> > but I don't know if this time is spent somewhere inside the cayenne
> > code, or if this is simply the time it took PostgreSQL to "physically"
> > delete the rows.
>
> Initially I suspected performance problems with faulting the
> relationships, but if the delete itself is so slow, it points to
> PostgreSQL as the bottleneck as Ari said (coincidentally the query
> above is a BatchDeleteQuery generated by Cayenne). If it takes that
> long, something is wrong with the DB. I second a suggestion to try
> installing PostgreSQL on Mac and it without Parallels.

I have being able to run the same test on the deployment server, that
is running on a Solaris zone c/o Joyent, and the issue is much
smaller:

[15:21:12] DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = ?
[15:21:12] [batch bind: 1433]
[15:21:12] [batch bind: 797]
[....]
[15:21:12] [batch bind: 55637]
[15:21:12] [batch bind: 51363]
[15:21:27] === updated 182 rows.

So, to delete 182 rows it took about 15 seconds. I was a little
skeptical, but it really seems that even with such a small operations,
running from inside a virtual machine can make a huge difference for
the DB engine.

15 seconds are a lot of time, but they would be bearable. The problem
is that I suppose this time will gradually increase with the growth of
the number of records to delete.

And if I will end-up needing to delete thousands of records, the
outcome will be unbearable.

So I can probably live with this limits for a few more weeks, but I
need to immediately take action to fix the problem definitely.



> > The SQL I am expecting Cayenne to generate would look like this:
> >
> > delete from user_detail where id_user = <id of the user I want to
> > delete>;
>
> Cayenne 3.0M2 (yet unreleased) supports such functionality via an
> EJBQL query, but maybe you don't need to do direct DB deletion just yet.

Where should I start looking to try to integrate the new revision of
Cayenne in my project in order to take advantage of this feature?

Thank you very much for your support.

Best regards,

Giulio Cesare

Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
Hello,

On 9/22/07, Michael Gentry <bl...@gmail.com> wrote:
> FWIW:
>
> I run PostgreSQL on OS X and it is plenty fast enough.  The downside
> to PostgreSQL on OS X is there isn't an installer, so it is more work
> to build and install.  If you are Unix-savvy, though, this won't be
> much of a problem.

I have avoided installing PostgreSQL on my Mac, as it is not possible
to easily remove it, and I really hate to have the "postgres" user on
my Mac login window. :-)


> I've used PostgreSQL on Windows, too, which works alright overall.
> One thing that really harmed performance for me on Windows, though,
> was virus scanners and Pointsec, which does full HD encryption -- the
> combination of virus scanning every read/write + encryption really
> applies the brakes.  Speaking of encryption, do you have File Vault
> turned on under OS X?  If so, is your Parallels VM running on that
> file system?  I'd imagine that would really harm performance.

I don't have FileVault active, and I really have no issue with
performance of PostgreSQL running on the Parallels virtual machine
other than for this task.

I will try to collect the same log also from our deployment server,
that is on a Solaris zone c/o Joyent.

I will keep you updated.

Thanks for your attention.

Best regards,

Giulio Cesare

Re: Optimize cascade deletes

Posted by Michael Gentry <bl...@gmail.com>.
FWIW:

I run PostgreSQL on OS X and it is plenty fast enough.  The downside
to PostgreSQL on OS X is there isn't an installer, so it is more work
to build and install.  If you are Unix-savvy, though, this won't be
much of a problem.

I've used PostgreSQL on Windows, too, which works alright overall.
One thing that really harmed performance for me on Windows, though,
was virus scanners and Pointsec, which does full HD encryption -- the
combination of virus scanning every read/write + encryption really
applies the brakes.  Speaking of encryption, do you have File Vault
turned on under OS X?  If so, is your Parallels VM running on that
file system?  I'd imagine that would really harm performance.

/dev/mrg


On 9/22/07, Andrus Adamchik <an...@objectstyle.org> wrote:
>
> On Sep 21, 2007, at 7:45 PM, Giulio Cesare Solaroli wrote:
>
> >
> > [18:09:40] DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = ?
> > [18:09:40] [batch bind: 968]
> > [18:09:40] [batch bind: 875]
> > [......]
> > [18:09:40] [batch bind: 1177]
> > [18:09:40] [batch bind: 2792]
> > [18:11:54] === updated 68 rows.
> >
> > In this case, to delete just 68 rows it took more than two minutes,
> > but I don't know if this time is spent somewhere inside the cayenne
> > code, or if this is simply the time it took PostgreSQL to "physically"
> > delete the rows.
>
> Initially I suspected performance problems with faulting the
> relationships, but if the delete itself is so slow, it points to
> PostgreSQL as the bottleneck as Ari said (coincidentally the query
> above is a BatchDeleteQuery generated by Cayenne). If it takes that
> long, something is wrong with the DB. I second a suggestion to try
> installing PostgreSQL on Mac and it without Parallels.

Re: Optimize cascade deletes

Posted by Andrus Adamchik <an...@objectstyle.org>.
On Sep 21, 2007, at 7:45 PM, Giulio Cesare Solaroli wrote:

>
> [18:09:40] DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = ?
> [18:09:40] [batch bind: 968]
> [18:09:40] [batch bind: 875]
> [......]
> [18:09:40] [batch bind: 1177]
> [18:09:40] [batch bind: 2792]
> [18:11:54] === updated 68 rows.
>
> In this case, to delete just 68 rows it took more than two minutes,
> but I don't know if this time is spent somewhere inside the cayenne
> code, or if this is simply the time it took PostgreSQL to "physically"
> delete the rows.

Initially I suspected performance problems with faulting the  
relationships, but if the delete itself is so slow, it points to  
PostgreSQL as the bottleneck as Ari said (coincidentally the query  
above is a BatchDeleteQuery generated by Cayenne). If it takes that  
long, something is wrong with the DB. I second a suggestion to try  
installing PostgreSQL on Mac and it without Parallels.


> The SQL I am expecting Cayenne to generate would look like this:
>
> delete from user_detail where id_user = <id of the user I want to  
> delete>;

Cayenne 3.0M2 (yet unreleased) supports such functionality via an  
EJBQL query, but maybe you don't need to do direct DB deletion just yet.

Andrus


Re: Optimize cascade deletes

Posted by Craig L Russell <Cr...@Sun.COM>.
On Sep 21, 2007, at 9:45 AM, Giulio Cesare Solaroli wrote:

> I have tried to attach the logs and the model to the message, but the
> maililng list system has rejected it has spam. :-(

Apache has strong anti-spam controls in place for email.

You can open a jira issue and upload files to the issue...

Craig
>
> If the full log and/or model could be useful in understanding the
> problem, I will upload them somewhere else.
>
>
> Hello Aristedes,
>
> thanks for your reply.
>
> Before answering your questions, let me describe my "environment":
>
> I am running everything on my own MacBookPro, with Postgresql running
> on a CentOS based Parallels virtual machine, while tomcat is running
> on the MacOS environment.
>
> This is definitely not a performance wise reference configuration, but
> I am not comfortable with these times anyway.
>
>
> On 9/21/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
>>
>> On 22/09/2007, at 12:31 AM, Giulio Cesare Solaroli wrote:
>>
>>> When I try to delete an user, all the cascade rules are correctly
>>> applied, but the all process takes about 7 minutes for a regular
>>> account. This is much more than we can bear for any operation.
>>
>> That does seem extraordinary. Can you give us some background:
>>
>> * how many records are deleted
>
> about 100 objects, from 6/7 different tables.
>
>
>> * what version of Cayenne you are using
>
> 2.0.2
>
>
>> * what database are you using
>
> PostgreSQL, probably version 8.1
>
>
>> * can you get a log of the sql being executed and discover what
>> precisely is taking so long
>
> I have attached the full log (and also the zip of the cayenne model,
> in case it is relevant); but the odd part is highlighted here (at the
> begin of each line is the time stamp with format "hh:mm:ss"):
>
>
> [18:09:40] DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = ?
> [18:09:40] [batch bind: 968]
> [18:09:40] [batch bind: 875]
> [......]
> [18:09:40] [batch bind: 1177]
> [18:09:40] [batch bind: 2792]
> [18:11:54] === updated 68 rows.
>
> In this case, to delete just 68 rows it took more than two minutes,
> but I don't know if this time is spent somewhere inside the cayenne
> code, or if this is simply the time it took PostgreSQL to "physically"
> delete the rows.
>
> Do these details help understanding what is going on?
>
> Thanks,
>
> Giulio Cesare

Craig Russell
Architect, Sun Java Enterprise System http://java.sun.com/products/jdo
408 276-5638 mailto:Craig.Russell@sun.com
P.S. A good JDO? O, Gasp!


Re: Optimize cascade deletes

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 22/09/2007, at 2:45 AM, Giulio Cesare Solaroli wrote:

> In this case, to delete just 68 rows it took more than two minutes,
> but I don't know if this time is spent somewhere inside the cayenne
> code, or if this is simply the time it took PostgreSQL to "physically"
> delete the rows.
>
> Do these details help understanding what is going on?

 From these logs (and I don't use Postgresql so I don't know much  
about it), it would appear that the speed issue is inside Postgresql.  
I'd recommend trying this without Parallels (postgresql should run OK  
in OSX) and seeing what extra logging you can switch on at the  
database. mysql has a concept of 'slow-query-log' - perhaps  
postgresql has something similar.

Cheers
Ari Maniatis



-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8



Re: Optimize cascade deletes

Posted by Giulio Cesare Solaroli <gi...@gmail.com>.
I have tried to attach the logs and the model to the message, but the
maililng list system has rejected it has spam. :-(

If the full log and/or model could be useful in understanding the
problem, I will upload them somewhere else.


Hello Aristedes,

thanks for your reply.

Before answering your questions, let me describe my "environment":

I am running everything on my own MacBookPro, with Postgresql running
on a CentOS based Parallels virtual machine, while tomcat is running
on the MacOS environment.

This is definitely not a performance wise reference configuration, but
I am not comfortable with these times anyway.


On 9/21/07, Aristedes Maniatis <ar...@maniatis.org> wrote:
>
> On 22/09/2007, at 12:31 AM, Giulio Cesare Solaroli wrote:
>
> > When I try to delete an user, all the cascade rules are correctly
> > applied, but the all process takes about 7 minutes for a regular
> > account. This is much more than we can bear for any operation.
>
> That does seem extraordinary. Can you give us some background:
>
> * how many records are deleted

about 100 objects, from 6/7 different tables.


> * what version of Cayenne you are using

2.0.2


> * what database are you using

PostgreSQL, probably version 8.1


> * can you get a log of the sql being executed and discover what
> precisely is taking so long

I have attached the full log (and also the zip of the cayenne model,
in case it is relevant); but the odd part is highlighted here (at the
begin of each line is the time stamp with format "hh:mm:ss"):


[18:09:40] DELETE FROM clipperz.RCRVRS WHERE ID_RCRVRS = ?
[18:09:40] [batch bind: 968]
[18:09:40] [batch bind: 875]
[......]
[18:09:40] [batch bind: 1177]
[18:09:40] [batch bind: 2792]
[18:11:54] === updated 68 rows.

In this case, to delete just 68 rows it took more than two minutes,
but I don't know if this time is spent somewhere inside the cayenne
code, or if this is simply the time it took PostgreSQL to "physically"
delete the rows.

Do these details help understanding what is going on?

Thanks,

Giulio Cesare

Re: Optimize cascade deletes

Posted by Aristedes Maniatis <ar...@maniatis.org>.
On 22/09/2007, at 12:31 AM, Giulio Cesare Solaroli wrote:

> When I try to delete an user, all the cascade rules are correctly
> applied, but the all process takes about 7 minutes for a regular
> account. This is much more than we can bear for any operation.

That does seem extraordinary. Can you give us some background:

* how many records are deleted
* what version of Cayenne you are using
* what database are you using
* can you get a log of the sql being executed and discover what  
precisely is taking so long

Ari Maniatis



-------------------------->
Aristedes Maniatis
phone +61 2 9660 9700
PGP fingerprint 08 57 20 4B 80 69 59 E2  A9 BF 2D 48 C2 20 0C C8