You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "scholarsmate (via GitHub)" <gi...@apache.org> on 2023/07/21 14:26:07 UTC

[GitHub] [daffodil-vscode] scholarsmate opened a new issue, #714: Handle large scale search and replace

scholarsmate opened a new issue, #714:
URL: https://github.com/apache/daffodil-vscode/issues/714

   Now that we can load and edit very large files, we'll need to also handle searching resulting in over 1K matches, and replacing over 1K matches.
   
   As a test I created a 1MB file with only the character 'a' in it and tried to do a search.  This results in 1M matches, which took only 0.1s.  Next I created a file 100M lines with 'CDE' in every line.  The file is 1.6BG and the number of matches are 100M, this took XXX.
   
   Since search is having scale issues, replace is worse.  So in addition to having to manage search, replace is tracking and materializing all of the changes on the fly, so 1M matches results in tracking 2M changes in 1M transactions (the delete of the original token and the insertion of the new token).  This took about 3 hours to do, but could take closer to about 0.1s if we did it as a single transaction instead of tracking and materializing all of those individual changes on the fly.
   
   If we don't do the large scale replace in a single transaction, then undo really becomes effectively worthless if made some changes, then have 1M transactional changes, going back to the changes you made before running the replacements has been effectively disabled.  I view doing the 1M transactions in a single transaction a much more appealing solution for the UI anyway.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-vscode] scholarsmate commented on issue #714: Handle large scale search and replace

Posted by "scholarsmate (via GitHub)" <gi...@apache.org>.
scholarsmate commented on issue #714:
URL: https://github.com/apache/daffodil-vscode/issues/714#issuecomment-1645713022

   I am testing and developing a solution in Ωedit for dealing with large scale transforms in general.  These will be memory efficient single transaction transforms.  These transforms will enable us to do large scale replace as described above.  In addition, we can support any number of other transforms like up, down, and switch casing, compression/decompression, encoding/decoding, etc.  You can think of this like running a stream through `sed` or `awk` and collecting the results, and this transformation operation is done in a single edit transaction.
   
   I'll also consider non-transformational "scanning" functions like search, profiling, hashing, etc where the data is only read and some result object is populated.  You can think of this like running a stream through `grep`, `wc`, or `md5sum` and collecting the results.  No edit transactions are made, but up to the entire stream is materialized all at once, then read.  There are benefits to having access to the entire stream for doing searches since you don't need to worry about managing overlapping segments for a sliding window technique (what search is currently doing).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [daffodil-vscode] stricklandrbls closed issue #714: Handle large scale search and replace

Posted by "stricklandrbls (via GitHub)" <gi...@apache.org>.
stricklandrbls closed issue #714: Handle large scale search and replace
URL: https://github.com/apache/daffodil-vscode/issues/714


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@daffodil.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org