You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@pekko.apache.org by "wjohnsonbynder (via GitHub)" <gi...@apache.org> on 2024/02/08 14:01:26 UTC

[I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

wjohnsonbynder opened a new issue, #105:
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105

   The eventsByTag query is very slow because of a subquery and the incorrect placement of the where clause.
   
   Dumped from our logs
   
   select x2.x3, x2.x4, x2.x5, x2.x6, x2.x7, x2.x8, x2.x9, x2.x10, x2.x11, x2.x12, x2.x13, x2.x14, x2.x15 from (select "deleted" as x4, "meta_ser_id" as x14, "meta_ser_manifest" as x15, "sequence_number" as x6, "writer" as x7, "event_ser_manifest" as x12, "write_timestamp" as x8, "persistence_id" as x5, "adapter_manifest" as x9, "ordering" as x3, "event_payload" as x10, "event_ser_id" as x11, "meta_payload" as x13 from "event_journal" where "deleted" = false) x2, "event_tag" x16 where ((x16."tag" = ?) and ((x2.x3 > ?) and (x2.x3 <= ?))) and (x2.x3 = x16."event_id") order by x2.x3 limit ?
   
   This subquery
   select "deleted" as x4, "meta_ser_id" as x14, "meta_ser_manifest" as x15, "sequence_number" as x6, "writer" as x7, "event_ser_manifest" as x12, "write_timestamp" as x8, "persistence_id" as x5, "adapter_manifest" as x9, "ordering" as x3, "event_payload" as x10, "event_ser_id" as x11, "meta_payload" as x13 from "event_journal" where "deleted" = false
   
   Doesn't have the where clause for the Ordering field. It will run the entire event-journal each time.
   
   Initial PR that seems to have created the issue. https://github.com/akka/akka-persistence-jdbc/pull/467
   
   The query in question
   https://github.com/apache/incubator-pekko-persistence-jdbc/blob/a9ebfe6da617f8cd2abc5813d2a530b1e1de70bb/core/src/main/scala/org/apache/pekko/persistence/jdbc/query/dao/ReadJournalQueries.scala#L61
   
   The issue is the Join happens before the filters on the event_journal table. This pushes the where clause to the main query forcing a full event journal query in the subquery.
   
     private def baseTableWithTagsQuery() = {
       baseTableQuery().join(TagTable).on(_.ordering === _.eventId)
     }
   
     private def _eventsByTag(
         tag: Rep[String],
         offset: ConstColumn[Long],
         maxOffset: ConstColumn[Long],
         max: ConstColumn[Long]) = {
       baseTableWithTagsQuery()
         .filter(_._2.tag === tag)
         .sortBy(_._1.ordering.asc)
         .filter(row => row._1.ordering > offset && row._1.ordering <= maxOffset)
         .take(max)
         .map(_._1)
     }
   
   
   
   I think the fix is as simple as reordering the query to push the filter above the join.
   
     private def _eventsByTag(
         tag: Rep[String],
         offset: ConstColumn[Long],
         maxOffset: ConstColumn[Long],
         max: ConstColumn[Long]
     ) = {
       baseTableQuery()
         .filter(row => row.ordering > offset && row.ordering <= maxOffset)
         .sortBy(_.ordering.asc)
         .join(TagTable)
         .on(_.ordering === _.eventId)
         .filter(_._2.tag === tag)
         .take(max)
         .map(_._1)
     }
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

Posted by "pjfanning (via GitHub)" <gi...@apache.org>.
pjfanning commented on issue #105:
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105#issuecomment-1934486708

   Could you give us a PR with your changes? We don't necessarily need performance tests. Someone else might be able to perf test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

Posted by "wjohnsonbynder (via GitHub)" <gi...@apache.org>.
wjohnsonbynder commented on issue #105:
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105#issuecomment-1934416212

   A bit out of my depth with for the project but I messed around with the following test with both the existing code and my fix.
   Run locally
   https://github.com/apache/incubator-pekko-persistence-jdbc/blob/69a80e3259e0a625ee99f247b1f6ec00276408da/core/src/test/scala/org/apache/pekko/persistence/jdbc/query/EventsByTagTest.scala#L211
   
   I ran it with 100, 1000 and 1300 actors with both old select and my fix. More than the tests were timing out locally.
   Old followed by new
   100 - old: 3.708s new: 2.370s
   1000 - old: 30.447s new: 8.902s
   1300 - old: 53.257s new: 12.082s
   
   The other tests all run about the same as the size of the data is small. Since this is an error with selecting the whole table, it really only shows up at scale.
   
   Should I go ahead and make a PR?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

Posted by "Roiocam (via GitHub)" <gi...@apache.org>.
Roiocam commented on issue #105:
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105#issuecomment-1934285257

   Thanks @wjohnsonbynder for the reporting. If you were interested in creating a PR, you could start by adding a perf unit test on: https://github.com/apache/incubator-pekko-persistence-jdbc/blob/main/core/src/test/scala/org/apache/pekko/persistence/jdbc/query/EventsByTagTest.scala
   
   And then report the cost before and after the change, If you want to contribute, the pekko community will help you complete this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

Posted by "pjfanning (via GitHub)" <gi...@apache.org>.
pjfanning commented on issue #105:
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105#issuecomment-1934266800

   Thanks @wjohnsonbynder for the detailed analysis.
   
   Would you be interested in creating a PR? We have a fairly limited number of regular contributors.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

Posted by "wjohnsonbynder (via GitHub)" <gi...@apache.org>.
wjohnsonbynder commented on issue #105:
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105#issuecomment-1934521848

   Not sure I understand.
   
   This subquery currently has nothing on the where clause other than deleted = false
   
   select "deleted" as x4, "meta_ser_id" as x14, "meta_ser_manifest" as x15, "sequence_number" as x6, "writer" as x7, "event_ser_manifest" as x12, "write_timestamp" as x8, "persistence_id" as x5, "adapter_manifest" as x9, "ordering" as x3, "event_payload" as x10, "event_ser_id" as x11, "meta_payload" as x13 from "event_journal" where "deleted" = false
   
   This part of the parent where clause only pertains to the subquery.
   ((x2.x3 > ?) and (x2.x3 <= ?))
   
   This is the ordering field. Moving this to the subquery since it is whole contained to that table means only a portion of event-journal is selected. Right now the subquery must execute and it returns the entire event-journal. Limiting the results from the subquery instead of doing it in the parent stops the full scan of the journal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

Posted by "Roiocam (via GitHub)" <gi...@apache.org>.
Roiocam commented on issue #105:
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105#issuecomment-1934507227

   > Should I go ahead and make a PR?
   
   Thanks for those verifications again, the reason why join before the offset windows, is because these queries always start from the event_tag table.
   
   It is possible that selected journal rows query before tag insert execution? We have to make sure it won't break any semantics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org


Re: [I] Slow JDBC query for eventsByTag [incubator-pekko-persistence-jdbc]

Posted by "wjohnsonbynder (via GitHub)" <gi...@apache.org>.
wjohnsonbynder closed issue #105: Slow JDBC query for eventsByTag
URL: https://github.com/apache/incubator-pekko-persistence-jdbc/issues/105


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@pekko.apache.org
For additional commands, e-mail: notifications-help@pekko.apache.org