You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by "nickva (via GitHub)" <gi...@apache.org> on 2023/05/16 22:56:30 UTC

[GitHub] [couchdb] nickva opened a new pull request, #4604: Optimize mem3:dbname/1 function

nickva opened a new pull request, #4604:
URL: https://github.com/apache/couchdb/pull/4604

   `mem3:dbname/1` with a `<<"shard/...">>` binary is called quite a few times as seen when profiling with fprof: https://gist.github.com/nickva/38760462c1545bf55d98f4898ae1983d
   
   In that case `mem3:dbname` is removing the timestamp suffix. However, because it uses `filename:rootname/1` which handles cases pertaining to file system paths and such, it ends up being a bit more expensive than necessary.
   
   To optimize it, try avoid translating it a list and parse the extension starting at the end of the binary. To avoid surprises, attempt to emulate exact behavior of the previously used filename:rootname/1 calls for various edge cases (`.`, `/.` etc). Most of those should be asserted in eunit test.
   
   A quick speed comparison test shows a 4x speedup or so:
   ```
   speed_compare_test() ->
       DbName = <<"fabricbenchdb-1684269710714470.1684269710">>,
       Ids = lists:seq(1, 1000),
       {LocalUSec, ok} = timer:tc(fun() -> lists:foreach(fun(_) -> strip_suffix(DbName) end, Ids) end),
       {FilenameUSec, ok} = timer:tc(fun() -> lists:foreach(fun(_) -> list_to_binary(filename:rootname(binary_to_list(DbName))) end, Ids) end),
   
   > mem3:speed_compare_test().
   {0.327,1.324}
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] pgj commented on pull request #4604: Optimize mem3:dbname/1 function

Posted by "pgj (via GitHub)" <gi...@apache.org>.
pgj commented on PR #4604:
URL: https://github.com/apache/couchdb/pull/4604#issuecomment-1551202606

   The PR in its current form does not remove any old code (i.e. no replace) and the new function is not used anywhere.  Is this still a draft?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] iilyak commented on pull request #4604: Optimize mem3:dbname/1 function

Posted by "iilyak (via GitHub)" <gi...@apache.org>.
iilyak commented on PR #4604:
URL: https://github.com/apache/couchdb/pull/4604#issuecomment-1551320600

   There are other places in the codebase which use `filename:basname/1`.  `mem3` application depend on `couch` app. Most of the things depend on `couch`. I would suggest to place this function in `couch_util`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva merged pull request #4604: Optimize mem3:dbname/1 function

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva merged PR #4604:
URL: https://github.com/apache/couchdb/pull/4604


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] willholley commented on pull request #4604: Optimize mem3:dbname/1 function

Posted by "willholley (via GitHub)" <gi...@apache.org>.
willholley commented on PR #4604:
URL: https://github.com/apache/couchdb/pull/4604#issuecomment-1550593024

   @nickva do we need to handle Windows path separators here as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on pull request #4604: Optimize mem3:dbname/1 function

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on PR #4604:
URL: https://github.com/apache/couchdb/pull/4604#issuecomment-1552079498

   @davisp Inspired by your `drop_dot_couch_ext/1` function from while back though we could do something similar here. That proved to be the fastest approach:
   
   https://github.com/apache/couchdb/blob/6b4cbaa72ab766e0a08c592971fb86be99a7d182/src/couch/src/couch_util.erl#L385-L394
   
   The slight complication is we don't know the exact extension binary, we just expect it to be a 10 digit integer, so we use the `binary_to_integer` BIF to test it. That seems speedy enough.
   
   We tried a few approaches like matching all the 10 digits separately but the BIF seems to a tiny bit faster and is a bit shorter overall.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [couchdb] nickva commented on pull request #4604: Optimize mem3:dbname/1 function

Posted by "nickva (via GitHub)" <gi...@apache.org>.
nickva commented on PR #4604:
URL: https://github.com/apache/couchdb/pull/4604#issuecomment-1552073263

   > @nickva do we need to handle Windows path separators here as well?
   
   Good point, @willholley. I had noticed in other parts of the code we don't handle Windows paths as just use `/` and expect Windows to handle that format as well. Since I had updated the PR to do a simpler check like and fall back to the previous method if that fails.
   
   > I would suggest to place this function in couch_util.
   
   @iilyak I started doing it that way but then in the latest version, opted to just specialize the function to handle timestamp specifically, so then mem3 seemed like a good place for it. If those other places are hotspots seen in profiling we can optimize those too if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@couchdb.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org