You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Андрей Троицкий <an...@gmail.com> on 2020/06/22 13:24:12 UTC

Checksum function for files on Jackrabbit server

Hi everybody,
I have a question regarding the functionality of Jackrabbit WebDAV server.

We have run a standard server and uploaded some files in it. When we upload
a new file with the same name we need to check if the file on the server is
up to date and shouldn't be replaced. The initial idea was to use ETags but
we found it's weak and is not suitable for the application. So now the idea
is to check the checksum (hash) of the incoming and existing files. As
files can be pretty big and downloading everytime can be a time-consuming
operation, it's better to have an option to easily obtain checksum for
already uploaded files.

So my question: are there any options to get a checksum for files uploaded
to the server? Maybe there are some other options or features that will
help in such file handling?

Thanks in advance for your reply!
Regards
Andrey

Re: Checksum function for files on Jackrabbit server

Posted by Julian Reschke <ju...@gmx.de>.

On 22.06.2020 15:24, Андрей Троицкий wrote:
> Hi everybody,
> I have a question regarding the functionality of Jackrabbit WebDAV server.
>
> We have run a standard server and uploaded some files in it. When we
> upload a new file with the same name we need to check if the file on the
> server is up to date and shouldn't be replaced. The initial idea was to
> use ETags but we found it's weak and is not suitable for the
> application. So now the idea is to check the checksum (hash) of the
> incoming and existing files. As files can be pretty big and downloading
> everytime can be a time-consuming operation, it's better to have an
> option to easily obtain checksum for already uploaded files.
>
> So my question: are there any options to get a checksum for files
> uploaded to the server? Maybe there are some other options or features
> that will help in such file handling?
>
> Thanks in advance for your reply!
> Regards
> Andrey

There is no such way, but it could be added.

There's
<https://greenbytes.de/tech/webdav/draft-ietf-httpbis-digest-headers-02.html>
which is currently under development and could be ready a few months
from now.

In the meantime, a custom WebDAV REPORT probably would be the most
simple approach.

That said, keep in mind that forcing a server to calculate hashes can be
used for DOS attacks, so we need to be a bit careful here - unless we
can figure out a robust way to store the hash with the content.

Best regards, Julian

Re: Checksum function for files on Jackrabbit server

Posted by Woonsan Ko <wo...@apache.org>.

On Mon, Jun 22, 2020 at 10:52 AM Андрей Троицкий
<an...@gmail.com> wrote:
>
> Hi everybody,
> I have a question regarding the functionality of Jackrabbit WebDAV server.
>
> We have run a standard server and uploaded some files in it. When we upload a new file with the same name we need to check if the file on the server is up to date and shouldn't be replaced. The initial idea was to use ETags but we found it's weak and is not suitable for the application. So now the idea is to check the checksum (hash) of the incoming and existing files. As files can be pretty big and downloading everytime can be a time-consuming operation, it's better to have an option to easily obtain checksum for already uploaded files.
>
> So my question: are there any options to get a checksum for files uploaded to the server? Maybe there are some other options or features that will help in such file handling?

I don't think there's a built-in feature to retrieve the hash of the
file content in WebDAV or Jackrabbit WebDAV.
If it is feasible to use JCR API and add a custom code (e.g, servlet)
in addition to the default JR WebDAV servlet for example, then the
following info might be helpful:
- JR WebDAV is just a WebDAV binding for JCR, and Jackrabbit JCR uses
DataStore to save binary (file) content which exceeds the threshold
size setting. [1]
- And DataStores calculates and uses hash values for each binary data
item. The identifier, the hash value, can be retrieved using
Jackrabbit API. [2] For example,

import org.apache.jackrabbit.api.JackrabbitValue;
//...
String binaryId = ((JackrabbitValue)
node.getProperty("jcr:data").getValue()).getContentIdentity();

So, it is technically possible to let a custom code read and respond
with the hash value on existing file item's binary data even though it
will require time/effort looking into the JCR node structure/API in
detail.

Regards,

Woonsan

[1] http://jackrabbit.apache.org/archive/wiki/JCR/DataStore_115513387.html
[2] http://jackrabbit.apache.org/archive/wiki/JCR/DataStore_115513387.html#DataStore-RetrievetheIdentifier

>
> Thanks in advance for your reply!
> Regards
> Andrey