You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@jena.apache.org by GitBox <gi...@apache.org> on 2022/08/31 14:13:25 UTC

[GitHub] [jena] ieugen opened a new issue, #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

ieugen opened a new issue, #1500:
URL: https://github.com/apache/jena/issues/1500

   ### Version
   
   4.6.0
   
   ### Feature
   
   This was asked on ML https://lists.apache.org/thread/rdt5otow263xhvwymfsgnxwwy2bxh60r . 
   
   > We are using fuseki and we would like to implement a backup policy similar in capabilities to what [autopostgresqlbackup] has to offer.
   Are there any existing solutions out there that can do all / part of these?
   > We would like to take:
   > * daily backups for a week
   > * weekly backups - 1 per week, last 4 weeks
   > * monthly backups - 1/ month, last 6 months
   I believe this could be scripted with via the HTTP API + directory access.
   The backup api in [fuseki-server-protocol] can trigger a backup and can also list existing backups.
   Unfortunately in the current implementation, backup is not consistent.
   > In case of a server crash during backup, the file will remain there incomplete.
   
   > Also, since tasks are stored in memory and cleaned (periodically / on restart) there is no way to know for sure if the backup was successful or not.
   In have encountered the above quite often in some workloads.
   
   > The in-consistency could be solved by writing the backup to temporary file name and on completion, renaming it to final file name.
   Rename is usually atomic operation on POSIX file systems.
   
   > /backup-list API can list all backups or split backups in complete / incomplete. IMO for now, it can list all of them.
   
   > The in progress backup could be stored alongside the other backups with a file marker like: dataset_date.nq.gz.INCOMPLETE .
   Once it's done it can be renamed to dataset_date.nq.gz .
   
   > Cleanup might be handled externally. In case of a crash, the file will remain INCOMPLETE until it is removed by system by checking a specific amount of time has passed since backup was started (1-2 days). 
   
   @afs replied:
   
   > Yes, the backup should be written then atomically moved (i.e. same directory). Cleanup would then be "delete" by pattern in the server startup script.
   > As to putting a process script around the functionality, it is an external script which needs access to the server file area (to know the state of backups). The file system state is the definitive state - not the jobs (that's a UI feature).
   
   > This would make a good independent project or contribution. Or published example as a starting point because the requirements will be depend on the deployment environment and it seems unlikely to me that there is a one size fits all. 
   
   > (The codebase already has some "safe write" code in IOX.safeWrite) 
   
   
   ### Are you interested in contributing a solution yourself?
   
   Perhaps?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1235563990

   > Any idea when the next release is due?
   
   This is not in 4.6.1 (which is currently in it's VOTE phase).
   
   You can get development builds (daily): See https://ci-builds.apache.org/job/Jena/
   These go to `https://repository.apache.org/content/repositories/snapshots/`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] ieugen commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
ieugen commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1235885970

   I can vote against 4.6.1 so the release will be re-done :D (kidding, I know it's a lot of effort to make a release). 
   Thanks for implementing it so fast.
   
   Did you also implement cleanup? 
   I did not see that part.
   I hope the tmp files are not written to /tmp which is very small in size.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] kdejaeger commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
kdejaeger commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1376945031

   The documentation for 4.7.0 now points out you can make backup policies.
   But doesn't explain how to do it at https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html#backup


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1235253465

   PR #1513 ensures the backup is written atomically. The intermediate files have a name ending ".tmp" and chosen by the JDK library function `Files.createTempFile`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] afs commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
afs commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1235919484

   No cleanup. The PR puts in what the Fuseki engine should do -  no hard coded policy. There is no assumption on JVM cleanup.
   
   The rest is best done outside the JVM. Maybe the deployment wants to leave them around to see what went wrong.
   
   The tmp files are in the same directory as the final file. This means there is enough space (/tmp may be on another file system).
   
   At worse (an OS crash), the file contents may have two names (old and new, as hard links), never no name so deleting the old name fixes that. In the same directory, local filesystem, it is even more unlikely to end up linked twice. See `man 2 rename`.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] ieugen commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
ieugen commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1236055924

   Thanks. 
   I think we should add this as documentation. I made this PR: https://github.com/apache/jena-site/pull/117 . 
   I am an apache committer - I have CLA signed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] ieugen closed issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
ieugen closed issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups
URL: https://github.com/apache/jena/issues/1500


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] ieugen commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
ieugen commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1235476041

   Thanks. 
   I will see if I can test it. 
   Any idea when the next release is due? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org


[GitHub] [jena] ieugen commented on issue #1500: Improve fuseki backup to consider failures (fuseki crash) and clean up incomplete backups

Posted by GitBox <gi...@apache.org>.
ieugen commented on issue #1500:
URL: https://github.com/apache/jena/issues/1500#issuecomment-1274981437

   @afs : Can we add the name of the database being backed up to the JSON task output?
   
   Right now we get.
   From this I don't know which DB is being backed up.
   It is important especially if you have more tasks in progress. 
   ```
   [ { 
       "task" : "Backup" ,
       "taskId" : "1" ,
       "started" : "2022-10-11T16:25:47.083+00:00"
     }
   ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@jena.apache.org
For additional commands, e-mail: issues-help@jena.apache.org