You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Andras Piros (JIRA)" <ji...@apache.org> on 2018/09/04 13:25:00 UTC

[jira] [Created] (OOZIE-3336) [persistence] Refactor entity classes to feature PK, FK, and UQ constraints

Andras Piros created OOZIE-3336:
-----------------------------------

             Summary: [persistence] Refactor entity classes to feature PK, FK, and UQ constraints
                 Key: OOZIE-3336
                 URL: https://issues.apache.org/jira/browse/OOZIE-3336
             Project: Oozie
          Issue Type: Improvement
          Components: core
    Affects Versions: 5.0.0
            Reporter: Andras Piros
             Fix For: 5.2.0


When an Oozie database grows substantial in size, let's say, over a few hundred thousands of {{WorkflowActionBean}}, {{CoordinatorActionBean}} instances, we face a couple of performance issues. Here is an analysis why.

Current Oozie JPA {{@Entity}} usage, and the resulting database DDL, suffers from a couple of drawback from a performance point of view:
* {{@Id}} fields are {{String}}:
** leaving no space for database primary key indices to work effectively
** those values are calculated in case of {{WorkflowActionBean}}, {{CoordinatorActionBean}}, and {{BundleActionBean}} instances
* no foreign constraint is set from {{WorkflowActionBean}} to {{WorkflowJobBean}}, from {{CoordinatorActionBean}} to {{CoordinatorJobBean}}, or from {{BundleActionBean}} to {{BundleJobBean}} instances:
** have to assess JPA queries discovering parent-child relationships by hand
** no database indices are created, and hence, those queries that contain any {{JOIN}} instances are slower
* no use of unique constraints whatsoever
* JPA queries are created by hand instead of relying on OpenJPA
* JPA entities are filled by hand instead of relying on OpenJPA

Following enhancements are necessary:
# keeping the existing {{String compositeId}} fields, let's break down the contents to following new fields:
## {{@Id long id}} - an auto-increment value that is unique across Oozie database
## {{long currentSequence}} - the sequence number of the current run since last Oozie server restart. The first part of the {{compositeId}}
## {{Timestamp serverStartupTimestamp}} - the timestamp when the Oozie server was last started. The second part of the {{compositeId}}
## {{String serverName}} - the third part of the {{compositeId}}
## {{String name}} - the fourth and last part of the {{compositeId}}
## {{compositeId}} might be calculated when an entity is loaded / persisted, and then stored
# FK constraints:
## {{@OneToMany}} fields where we have a list of child references inside parent
## {{@ManyToOne}} fields where we have a parent reference inside child
## pay attention to {{FetchType}}, most of the times {{LAZY}} will be needed
## the containment fields should not be {{@Transient}} anymore
# UQ constraints:
## on {{currentSequence}} and {{serverStartupTimestamp}}
## on {{currentSequence}} and {{name}}
# new JPQL queries:
## to cover changed parent-child relationships
## to get use of each disassembled part of {{originalId}} when doing e.g. filtering
# let JPA fill entities instead performing this by hand

Following enhancements can be considered as nice-to-have:
* upgrade to an OpenJPA version that features JPA 2.1's composite indexing capability
* see whether to have an optimistic locking field using {{@Version}} instead of ZooKeeper based pessimistic locking would increase High Availability characteristics
* refactor also SLA related entity classes

It's necessary to have performance benchmarks with some database types like MySQL/MariaDB, and PostgreSQL before and after the changes for following use cases:
* {{CoordinatorJobBean}} and {{WorkflowJobBean}} instances up to millions
* {{CoordinatorActionBean}} and {{WorkflowActionBean}} instances up to tens of millions
* performance for JPQLs that get a list of entities
* performance of persisting a new entity
* performance of querying lists of entities based on popular / possible filters like the ones used by {{VxJobsServlet}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)