You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@nifi.apache.org by "Matt Gilman (JIRA)" <ji...@apache.org> on 2014/12/08 15:18:12 UTC

[jira] [Created] (NIFI-128) General Framework Performance Improvements

Matt Gilman created NIFI-128:
--------------------------------

Summary: General Framework Performance Improvements
Key: NIFI-128
URL: https://issues.apache.org/jira/browse/NIFI-128
Project: Apache NiFi
Issue Type: Improvement
Components: Core Framework
Reporter: Matt Gilman
Priority: Minor

- VolatileContentRepository.createLossTolerant() calls ContentClaims.incrementClaimentCount(ContentClaim). This calls LinkedBlockingQueue.remove(). This is quite expensive, but since we know that we are incrementing a newly created ContentClaim, we can skip this if we have a ContentClaim.incrementClaimantCount(ContentClaim claim, boolean newClaim)
- StandardProvenanceReporter should provide getEventBuilders() rather than getEvents(). This way, we have the Builders, instead of the actual events, calls to StandardProcessSession.enrich do not have to duplicate the ProvenanceEventRecord via Builder.fromEvent() in order to enrich the record before create. Instead, it can just update the builder before calling build()
- StandardProcessorNode.toString() is called A LOT and should be pre-computed in the constructor and returned, as opposed to being built each time toString() is called.
- StandardProcessSession should call summarizeEvents() ONLY if LOG.isInfoEnabled() returns true.
- ContentClaims.markDestructable calls LinkedBlockingQueue.offer, which is quite slow. Consider changing to something like Disruptor?
- StandardContentClaim equals() is called a lot. It uses instanceof operator, which is slow. Before doing this, we should compare this.hashCode() to other.hashCode() because this method returns a constant final value for StandardContentClaim and we are more likely to compare to other StandardContentClaim objects than not.
- Expression Language: allow method to create Abstract Syntax Tree: Tree Query.compileTree(String), and then allow to create Query object: Query.forTree(Tree); this avoids parsing the String every time, as well as validating syntax.
- StandardFLowFileQueue: Write Lock has VERY high contention when running at very high rates. This is the bottleneck for this like UpdateAttribute, RouteOnAttribute (especially when running volatile mode). Session.get(), specifically, blocks a lot here. Perhaps this can be made better?
- PersistentProv Repo: No need to obtain write lock to update Map. Instead, create a method that updates it atomically using the AtomicReference and copying the SortedMap each time. This may or may not be a bit slower but doesn't block other threads.
- PersistentProv Repo: rollover obtains a write lock while merging journals. This is done only so that merging of journals and setting of repoDirty flag is atomic. This causes huge performance degredation, because we lock the entire JVM's processing by locking the Prov Repo, which we are merging journals. If we don't do this atomically, but we are able to merge journals, the issue is that we could merge the journals, delete the journal files, and then another thread sets repoDirty to true and then we set it to false, in which case we could write to a 'dirty repo' which means that the last record wasn't fully written, resulting in a half-written record followed by another record, which is corrupted. This can be avoided though if we instead make the writer itself know that the file is full, instead of a flag on the entire repo. This is better anyway if using multiple directories.
- FileSystemRepository: on create(), do not do Files.create(). This was done to guard against the following scenario: We create a claim but don't write anything to it (processor calls write() and then doesn't touch output stream in callback). As a result, the files isn't created. When the next processor attempts to read the file, we get a ContentNotFoundException and remove the FlowFile. So we create a 0-byte file to guard against this case. Instead, we can detect in the StandardProcessSession that the FlowFile length is 0 bytes and instead of going to Content Repo for content, just return ByteArrayInputStream that wraps a byte[0]. Alternatively, the session could detect after session.write() returns from callback that nothing was written and, as a result, either create a 0-byte file or not add the ContentClaim (though not adding it at all could be confusing when looking at a Provenance Event and seeing that there is no ContentClaim at all). This way, we avoid always creating a 0-byte file, which is very expensive!

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)