You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "David Capwell (Jira)" <ji...@apache.org> on 2019/10/22 01:31:00 UTC

[jira] [Comment Edited] (CASSANDRA-15364) Avoid over scanning data directories in LogFile.verify()

    [ https://issues.apache.org/jira/browse/CASSANDRA-15364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16956431#comment-16956431 ] 

David Capwell edited comment on CASSANDRA-15364 at 10/22/19 1:30 AM:
---------------------------------------------------------------------

* As a general comment, almost every case I see where absolutePath.ifPresent is called is known to be a ADD or REMOVE so the lambda version could be replaced with .get() (style comment, feel free to ignore)

 * [https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15364#diff-d8721a4dd04f4f35b65e7edeb6c883f6R357]
 The makeRecord call above will do a listing (in make(Type, SSTable) method), so this change looks to be because deleteRecordFiles(LogRecord) no longer exists and was replaced by deleteRecordFiles(List<File>); LGTM

 * [https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15364#diff-d8721a4dd04f4f35b65e7edeb6c883f6R386]
 Is this to handle failure in the middle of the log (while deleting on commit we hard-stop in the middle, so recovery may see missing files)? Seems fine, was just curious if there was a different reason (there was a comment in the code about people copying tx files to different nodes, hence why I wanted to call out).

 * [https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15364#diff-2e4c884e6aace427bdb6b0a122164ce4R308-R328]
 Took a while to parse this change, so I am curious what was the motivation? Was it just to avoid File::getCanonicalPath? As far as I can tell Descriptor::fromFilenameWithComponent is only string parsing with a call to File::getCanonicalFile.
 While reading this change, what bothered me was the question "why is the file name a prefix" and I missed the comment in the javadoc which expresses that expectation ("absoluteFilePaths contains full file parts up to the component name", "up to" being the important part); the code before relied on Descriptor:fromFilename so didn't really care.
The new version looks fine, I just wanted to better understand the motive

Overall, looks good to me; I want to do one more pass and test this, ETA EOD.


was (Author: dcapwell):
* As a general comment, almost every case I see where absolutePath.ifPresent is called is known to be a ADD or REMOVE so the lambda version could be replaced with .get() (style comment, feel free to ignore)

 * [https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15364#diff-d8721a4dd04f4f35b65e7edeb6c883f6R357]
 The makeRecord call above will do a listing (in make(Type, SSTable) method), so this change looks to be because deleteRecordFiles(LogRecord) no longer exists and was replaced by deleteRecordFiles(List<File>); LGTM

 * [https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15364#diff-d8721a4dd04f4f35b65e7edeb6c883f6R386]
 Is this to handle failure in the middle of the log (while deleting on commit we hard-stop in the middle, so recovery may see missing files)? Seems fine, was just curious if there was a different reason (there was a comment in the code about people copying tx files to different nodes, hence why I wanted to call out).

 * [https://github.com/apache/cassandra/compare/trunk...krummas:marcuse/15364#diff-2e4c884e6aace427bdb6b0a122164ce4R308-R328]
 Took a while to parse this change, so I am curious what was the motivation? Was it just to avoid File::getCanonicalPath? As far as I can tell Descriptor::fromFilenameWithComponent is only string parsing with a call to File::getCanonicalFile.
 While reading this change, what bothered me was the question "why is the file name a prefix" and I missed the comment in the javadoc which expresses that expectation ("absoluteFilePaths contains full file parts up to the component name", "up to" being the important part); the code before relied on Descriptor:fromFilename so didn't really care.

Overall, looks good to me; I want to do one more pass and test this, ETA EOD.

> Avoid over scanning data directories in LogFile.verify()
> --------------------------------------------------------
>
>                 Key: CASSANDRA-15364
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15364
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Local/Compaction
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We currently list the data directory for every {{REMOVE}} record in the file in {{LogFile.verify()}} - this can get very expensive during startup when we call {{LogTransaction.removeUnfinishedLeftovers()}}. In {{LogRecord.getExistingFiles(Set<String> absoluteFilePaths)}} we also fully parse the file name of the sstables found, here we only need to prefix match.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org