You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Mathijs den Burger <m....@onehippo.com> on 2014/09/03 10:27:02 UTC

Syncing a local directory using FileVault

Hi,

I'm forwarding this question to the dev list, which seems more
appropriate then the users list.

I'm trying to use FileVault to automatically sync a local directory
from filesystem into a Hippo repository (which is based on
JackRabbit). The main reason for using FileVault is the automatic
handling of meta-data in the .content.xml files.

So far I've successfully used the Importer class to import an Archive
based on files or a zip file. I've also implemented a simple file
watcher that triggers a reimport whenever something changes. This
nicely imports changes and newly added files and directories. So far
so good.

The problem is that deleting files or directories does not delete
their counterparts in JCR. I'd expect that using ImportMode.REPLACE
would completely replace the JCR node structure and automatically
delete everything that is no longer present in the imported archive,
but that's not the case.

Reading through code it seems the vlt tool uses the meta-data in the
.vlt directory to track deleted files, which are then processed during
a commit. Is there a way to add such meta-data to an Archive or
educate the Importer somehow?

I've also read through the code of the vault-sync tool, which
basically implements my use case. I cannot use it as-is though, since
Hippo does not use OSGI. IIUC vault-sync does not use Archives or
Packages at all, but uses its own TreeSync class that does lots of
plain JCR calls. Does that mean that vault-sync does not process
meta-data files at all?

best,
Mathijs

Re: Syncing a local directory using FileVault

Posted by Mathijs den Burger <m....@onehippo.com>.
On Wed, Sep 10, 2014 at 7:04 PM, Tobias Bocanegra <tr...@apache.org> wrote:
> I will look into the test case - from what you describe it should work
> - so it might be a bug....
> if the package is setup properly, it must remove the missing file.

OK, thanks for investigating. I've created
https://issues.apache.org/jira/browse/JCRVLT-59 to track the issue.

On a side note: I've changed my directory sync code so it detects
which files/directories actually changed. It will then first delete
the part of JCR node tree that got changed before reimporting a subset
of the files. That works fairly well, since most file system changes
occur on a fine-grained level. But having FileVault process deletions
automatically would be nicer, of course :).

best,
Mathijs

Re: Syncing a local directory using FileVault

Posted by Tobias Bocanegra <tr...@apache.org>.
I will look into the test case - from what you describe it should work
- so it might be a bug....
if the package is setup properly, it must remove the missing file.

regards, toby

On Wed, Sep 10, 2014 at 2:48 AM, Mathijs den Burger
<m....@onehippo.com> wrote:
> Hi Toby,
>
> On Tue, Sep 9, 2014 at 7:23 PM, Tobias Bocanegra <tr...@adobe.com> wrote:
>> Hi,
>>
>> On Tue, Sep 9, 2014 at 5:02 AM, Mathijs den Burger
>> <m....@onehippo.com> wrote:
>>> Hi Toby,
>>>
>>> The import code is something along the lines of:
>>>
>>>   private void importArchive(final Session session, final Archive
>>> archive, final ImportMode mode) throws ... {
>>>         archive.open(true);
>>>
>>>         ImportOptions options = new ImportOptions();
>>>         options.setImportMode(mode);
>>>
>>>         Importer importer = new Importer(options);
>>>         importer.run(archive, <some JCR node>);
>>>   }
>>>
>>> The archive implementations override getMetaInf to return some static setup:
>>>
>>>   class MyArchive extends AbstractArchive {
>>>
>>>       @Override
>>>       public MetaInf getMetaInf() {
>>>           DefaultMetaInf metaInf = new DefaultMetaInf();
>>>           metaInf.setSettings(VaultSettings.createDefault());
>>>
>>>           DefaultWorkspaceFilter includeAll = new DefaultWorkspaceFilter();
>>>           includeAll.add(PathFilterSet.INCLUDE_ALL);
>>>           metaInf.setFilter(includeAll);
>>>
>>>           return metaInf;
>>>       }
>>>
>> how you you actually provide the files for the archive?
>
> They are read from filesystem. The archive implementation is very
> similar to FileArchive (e.g. openInputStream and getInputSource read
> files, and Entry#getChildren iterates over the files in a directory).
> I rewrote getJcrRoot and getRoot so I don't need jcr_root and META-INF
> directories for my archive, but that does not seem relevant. I've also
> tried using a FileArchive as-is, with the same result: deleting a file
> and then reimporting the archive with less content does not delete the
> nt:file node created by the previous import.
>
> The following additional unit test in
> vault-core/src/test/java/org/apache/jackrabbit/vault/packaging/integration/ImportTests.java
> shows the point:
>
>     @Test
>     public void testReimportLess() throws IOException,
> RepositoryException, ConfigurationException {
>         ZipArchive archive = new
> ZipArchive(getTempFile("testpackages/tmp.zip"));
>         archive.open(true);
>         Node rootNode = admin.getRootNode();
>         ImportOptions opts = getDefaultOptions();
>         Importer importer = new Importer(opts);
>         importer.run(archive, rootNode);
>
>         assertNodeExists("/tmp/foo/bar/tobi");
>
>         ZipArchive archive2 = new
> ZipArchive(getTempFile("testpackages/tmp-less.zip"));
>         archive2.open(true);
>         importer.run(archive2, rootNode);
>
>         assertNodeMissing("/tmp/foo/bar/tobi");
>     }
>
> The 'testpackages/tmp-less.zip' file is a copy of
> 'testpackages/tmp.zip', but without the directory /tmp/foo/bar/tobi.
>
> The last assert fails: /tmp/foo/bar/tobi still exists after importing
> an archive that no longer contains it.
>
>>> But it seems the Importer is simply the wrong corner of FileVault to
>>> use when I want to keep a local directory in sync. AFAICS the Importer
>>> is only used in the context of installing Packages.
>>
>> well, but that's exactly what you want. the package manage is
>> specifically built to "assemble" packages that look like a zip of a
>> vlt checkout. and it also allows for installing packages that are
>> built that way.
>
> Is it also be possible to install a newer version of a package with
> less files/directories, in such a way that the nt:file/nt:folder nodes
> that used to be 'present' in the old package are deleted once the new
> package has been installed? Or should you then first remove the old
> package before installing the new package?
>
>> as you might know, we moved the entire vlt/packaging stuff from the
>> day/adobe code into jackrabbit. "we" still use the package manager for
>> all our content deployment and development. we also use maven plugins
>> to create and install packages automatically.
>
> That's why I can't imagine deletion of nt:file/nt:folder nodes via
> Packages or Archives is not possible :). How do "you" handle that
> case?
>
> regards,
> Mathijs

Re: Syncing a local directory using FileVault

Posted by Mathijs den Burger <m....@onehippo.com>.
Hi Toby,

On Tue, Sep 9, 2014 at 7:23 PM, Tobias Bocanegra <tr...@adobe.com> wrote:
> Hi,
>
> On Tue, Sep 9, 2014 at 5:02 AM, Mathijs den Burger
> <m....@onehippo.com> wrote:
>> Hi Toby,
>>
>> The import code is something along the lines of:
>>
>>   private void importArchive(final Session session, final Archive
>> archive, final ImportMode mode) throws ... {
>>         archive.open(true);
>>
>>         ImportOptions options = new ImportOptions();
>>         options.setImportMode(mode);
>>
>>         Importer importer = new Importer(options);
>>         importer.run(archive, <some JCR node>);
>>   }
>>
>> The archive implementations override getMetaInf to return some static setup:
>>
>>   class MyArchive extends AbstractArchive {
>>
>>       @Override
>>       public MetaInf getMetaInf() {
>>           DefaultMetaInf metaInf = new DefaultMetaInf();
>>           metaInf.setSettings(VaultSettings.createDefault());
>>
>>           DefaultWorkspaceFilter includeAll = new DefaultWorkspaceFilter();
>>           includeAll.add(PathFilterSet.INCLUDE_ALL);
>>           metaInf.setFilter(includeAll);
>>
>>           return metaInf;
>>       }
>>
> how you you actually provide the files for the archive?

They are read from filesystem. The archive implementation is very
similar to FileArchive (e.g. openInputStream and getInputSource read
files, and Entry#getChildren iterates over the files in a directory).
I rewrote getJcrRoot and getRoot so I don't need jcr_root and META-INF
directories for my archive, but that does not seem relevant. I've also
tried using a FileArchive as-is, with the same result: deleting a file
and then reimporting the archive with less content does not delete the
nt:file node created by the previous import.

The following additional unit test in
vault-core/src/test/java/org/apache/jackrabbit/vault/packaging/integration/ImportTests.java
shows the point:

    @Test
    public void testReimportLess() throws IOException,
RepositoryException, ConfigurationException {
        ZipArchive archive = new
ZipArchive(getTempFile("testpackages/tmp.zip"));
        archive.open(true);
        Node rootNode = admin.getRootNode();
        ImportOptions opts = getDefaultOptions();
        Importer importer = new Importer(opts);
        importer.run(archive, rootNode);

        assertNodeExists("/tmp/foo/bar/tobi");

        ZipArchive archive2 = new
ZipArchive(getTempFile("testpackages/tmp-less.zip"));
        archive2.open(true);
        importer.run(archive2, rootNode);

        assertNodeMissing("/tmp/foo/bar/tobi");
    }

The 'testpackages/tmp-less.zip' file is a copy of
'testpackages/tmp.zip', but without the directory /tmp/foo/bar/tobi.

The last assert fails: /tmp/foo/bar/tobi still exists after importing
an archive that no longer contains it.

>> But it seems the Importer is simply the wrong corner of FileVault to
>> use when I want to keep a local directory in sync. AFAICS the Importer
>> is only used in the context of installing Packages.
>
> well, but that's exactly what you want. the package manage is
> specifically built to "assemble" packages that look like a zip of a
> vlt checkout. and it also allows for installing packages that are
> built that way.

Is it also be possible to install a newer version of a package with
less files/directories, in such a way that the nt:file/nt:folder nodes
that used to be 'present' in the old package are deleted once the new
package has been installed? Or should you then first remove the old
package before installing the new package?

> as you might know, we moved the entire vlt/packaging stuff from the
> day/adobe code into jackrabbit. "we" still use the package manager for
> all our content deployment and development. we also use maven plugins
> to create and install packages automatically.

That's why I can't imagine deletion of nt:file/nt:folder nodes via
Packages or Archives is not possible :). How do "you" handle that
case?

regards,
Mathijs

Re: Syncing a local directory using FileVault

Posted by Tobias Bocanegra <tr...@adobe.com>.
Hi,

On Tue, Sep 9, 2014 at 5:02 AM, Mathijs den Burger
<m....@onehippo.com> wrote:
> Hi Toby,
>
> The import code is something along the lines of:
>
>   private void importArchive(final Session session, final Archive
> archive, final ImportMode mode) throws ... {
>         archive.open(true);
>
>         ImportOptions options = new ImportOptions();
>         options.setImportMode(mode);
>
>         Importer importer = new Importer(options);
>         importer.run(archive, <some JCR node>);
>   }
>
> The archive implementations override getMetaInf to return some static setup:
>
>   class MyArchive extends AbstractArchive {
>
>       @Override
>       public MetaInf getMetaInf() {
>           DefaultMetaInf metaInf = new DefaultMetaInf();
>           metaInf.setSettings(VaultSettings.createDefault());
>
>           DefaultWorkspaceFilter includeAll = new DefaultWorkspaceFilter();
>           includeAll.add(PathFilterSet.INCLUDE_ALL);
>           metaInf.setFilter(includeAll);
>
>           return metaInf;
>       }
>
how you you actually provide the files for the archive?

> But it seems the Importer is simply the wrong corner of FileVault to
> use when I want to keep a local directory in sync. AFAICS the Importer
> is only used in the context of installing Packages.
well, but that's exactly what you want. the package manage is
specifically built to "assemble" packages that look like a zip of a
vlt checkout. and it also allows for installing packages that are
built that way.

as you might know, we moved the entire vlt/packaging stuff from the
day/adobe code into jackrabbit. "we" still use the package manager for
all our content deployment and development. we also use maven plugins
to create and install packages automatically.

I wanted to progress with the donation a long time ago, and finish the
http api for the package manager, and also make the maven plugins
public. but I hadn't have time yet.

> I'm going to try
> the Mounter now to 'mount' a part of the local filesystem in a
> repository, which then gives me access to a VaultFileSystem in which a
> can start a Transaction.

ok.
regards, toby

Re: Syncing a local directory using FileVault

Posted by Mathijs den Burger <m....@onehippo.com>.
Hi Toby,

The import code is something along the lines of:

  private void importArchive(final Session session, final Archive
archive, final ImportMode mode) throws ... {
        archive.open(true);

        ImportOptions options = new ImportOptions();
        options.setImportMode(mode);

        Importer importer = new Importer(options);
        importer.run(archive, <some JCR node>);
  }

The archive implementations override getMetaInf to return some static setup:

  class MyArchive extends AbstractArchive {

      @Override
      public MetaInf getMetaInf() {
          DefaultMetaInf metaInf = new DefaultMetaInf();
          metaInf.setSettings(VaultSettings.createDefault());

          DefaultWorkspaceFilter includeAll = new DefaultWorkspaceFilter();
          includeAll.add(PathFilterSet.INCLUDE_ALL);
          metaInf.setFilter(includeAll);

          return metaInf;
      }

But it seems the Importer is simply the wrong corner of FileVault to
use when I want to keep a local directory in sync. AFAICS the Importer
is only used in the context of installing Packages. I'm going to try
the Mounter now to 'mount' a part of the local filesystem in a
repository, which then gives me access to a VaultFileSystem in which a
can start a Transaction.

best,
Mathijs

Re: Syncing a local directory using FileVault

Posted by Tobias Bocanegra <tr...@apache.org>.
Hi Mathijs,

it is difficult to imagine what the problem could be. how exactly do
you import the package?
when you use the normal file archive it uses the be default the
workspace filter from the META-INF/vault/filter.xml
but you can override this in the import options.

maybe you can give me an example?
regards, toby


On Mon, Sep 8, 2014 at 9:50 AM, Mathijs den Burger
<m....@onehippo.com> wrote:
> Hi Toby,
>
> On Wed, Sep 3, 2014 at 8:27 PM, Tobias Bocanegra <tr...@apache.org> wrote:
>> Hi Mathijs,
>>
>>
>> On Wed, Sep 3, 2014 at 1:27 AM, Mathijs den Burger
>> <m....@onehippo.com> wrote:
>>> Hi,
>>>
>>> I'm forwarding this question to the dev list, which seems more
>>> appropriate then the users list.
>>>
>>> I'm trying to use FileVault to automatically sync a local directory
>>> from filesystem into a Hippo repository (which is based on
>>> JackRabbit). The main reason for using FileVault is the automatic
>>> handling of meta-data in the .content.xml files.
>>>
>>> So far I've successfully used the Importer class to import an Archive
>>> based on files or a zip file. I've also implemented a simple file
>>> watcher that triggers a reimport whenever something changes. This
>>> nicely imports changes and newly added files and directories. So far
>>> so good.
>>
>> great!
>>
>>> The problem is that deleting files or directories does not delete
>>> their counterparts in JCR. I'd expect that using ImportMode.REPLACE
>>> would completely replace the JCR node structure and automatically
>>> delete everything that is no longer present in the imported archive,
>>> but that's not the case.
>>
>> how to you setup the WorkspaceFilter ? Only nodes covered by the
>> filter are updated. IIRC, if you don't use a filter at all, the
>> importer only does updates.
>
> I did not setup any filter, which made the Importer use a plain
> DefaultWorkspaceFilter, which returned an empty list of filter sets.
>
> So I tried using my own DefaultWorkspaceFilter to which I added
> PathFilterSet.INCLUDE_ALL. But to no avail: a deleted file or deleted
> directory is still not deleted by the Importer. Also additional nodes
> declared in .content.xml files are not deleted when they are deleted
> again from the .content.xml files and reimported.
>
> To summarize: the Importer only seems to process what's in an archive,
> and not what's no longer in the archive :)
> Still odd, since the javadoc of ImportMode.REPLACE states:
>
>   "Normal behavior. Existing content is replaced completely by the
> imported content, i.e. is overridden or deleted accordingly."
>
>>> Reading through code it seems the vlt tool uses the meta-data in the
>>> .vlt directory to track deleted files, which are then processed during
>>> a commit. Is there a way to add such meta-data to an Archive or
>>> educate the Importer somehow?
>>
>> The vault cli tool works completely different than the packageing. it
>> mimics a subversion like behaviour and is probably unsuited for your
>> needs. The main magic is done in the
>> org.apache.jackrabbit.vault.fs.impl.TransactionImpl which uses the
>> recorded changes to figure out which nodes to sync.
>
> OK, so then the only options seems to be
>
> a) delete everything before importing again (which is a bit crude and
> won't perform very well for large imports)
> b) build up a TransactionImpl from the observed file system changes
> and commit that
>
> Or do I miss a viable alternative?
>
>>> I've also read through the code of the vault-sync tool, which
>>> basically implements my use case. I cannot use it as-is though, since
>>> Hippo does not use OSGI. IIUC vault-sync does not use Archives or
>>> Packages at all, but uses its own TreeSync class that does lots of
>>> plain JCR calls. Does that mean that vault-sync does not process
>>> meta-data files at all?
>>
>> no. currently vault-sync only handle simple files and folders.
>
> Ah, that's too limited.
>
> best,
> Mathijs

Re: Syncing a local directory using FileVault

Posted by Mathijs den Burger <m....@onehippo.com>.
Hi Toby,

On Wed, Sep 3, 2014 at 8:27 PM, Tobias Bocanegra <tr...@apache.org> wrote:
> Hi Mathijs,
>
>
> On Wed, Sep 3, 2014 at 1:27 AM, Mathijs den Burger
> <m....@onehippo.com> wrote:
>> Hi,
>>
>> I'm forwarding this question to the dev list, which seems more
>> appropriate then the users list.
>>
>> I'm trying to use FileVault to automatically sync a local directory
>> from filesystem into a Hippo repository (which is based on
>> JackRabbit). The main reason for using FileVault is the automatic
>> handling of meta-data in the .content.xml files.
>>
>> So far I've successfully used the Importer class to import an Archive
>> based on files or a zip file. I've also implemented a simple file
>> watcher that triggers a reimport whenever something changes. This
>> nicely imports changes and newly added files and directories. So far
>> so good.
>
> great!
>
>> The problem is that deleting files or directories does not delete
>> their counterparts in JCR. I'd expect that using ImportMode.REPLACE
>> would completely replace the JCR node structure and automatically
>> delete everything that is no longer present in the imported archive,
>> but that's not the case.
>
> how to you setup the WorkspaceFilter ? Only nodes covered by the
> filter are updated. IIRC, if you don't use a filter at all, the
> importer only does updates.

I did not setup any filter, which made the Importer use a plain
DefaultWorkspaceFilter, which returned an empty list of filter sets.

So I tried using my own DefaultWorkspaceFilter to which I added
PathFilterSet.INCLUDE_ALL. But to no avail: a deleted file or deleted
directory is still not deleted by the Importer. Also additional nodes
declared in .content.xml files are not deleted when they are deleted
again from the .content.xml files and reimported.

To summarize: the Importer only seems to process what's in an archive,
and not what's no longer in the archive :)
Still odd, since the javadoc of ImportMode.REPLACE states:

  "Normal behavior. Existing content is replaced completely by the
imported content, i.e. is overridden or deleted accordingly."

>> Reading through code it seems the vlt tool uses the meta-data in the
>> .vlt directory to track deleted files, which are then processed during
>> a commit. Is there a way to add such meta-data to an Archive or
>> educate the Importer somehow?
>
> The vault cli tool works completely different than the packageing. it
> mimics a subversion like behaviour and is probably unsuited for your
> needs. The main magic is done in the
> org.apache.jackrabbit.vault.fs.impl.TransactionImpl which uses the
> recorded changes to figure out which nodes to sync.

OK, so then the only options seems to be

a) delete everything before importing again (which is a bit crude and
won't perform very well for large imports)
b) build up a TransactionImpl from the observed file system changes
and commit that

Or do I miss a viable alternative?

>> I've also read through the code of the vault-sync tool, which
>> basically implements my use case. I cannot use it as-is though, since
>> Hippo does not use OSGI. IIUC vault-sync does not use Archives or
>> Packages at all, but uses its own TreeSync class that does lots of
>> plain JCR calls. Does that mean that vault-sync does not process
>> meta-data files at all?
>
> no. currently vault-sync only handle simple files and folders.

Ah, that's too limited.

best,
Mathijs

Re: Syncing a local directory using FileVault

Posted by Tobias Bocanegra <tr...@apache.org>.
Hi Mathijs,


On Wed, Sep 3, 2014 at 1:27 AM, Mathijs den Burger
<m....@onehippo.com> wrote:
> Hi,
>
> I'm forwarding this question to the dev list, which seems more
> appropriate then the users list.
>
> I'm trying to use FileVault to automatically sync a local directory
> from filesystem into a Hippo repository (which is based on
> JackRabbit). The main reason for using FileVault is the automatic
> handling of meta-data in the .content.xml files.
>
> So far I've successfully used the Importer class to import an Archive
> based on files or a zip file. I've also implemented a simple file
> watcher that triggers a reimport whenever something changes. This
> nicely imports changes and newly added files and directories. So far
> so good.

great!

> The problem is that deleting files or directories does not delete
> their counterparts in JCR. I'd expect that using ImportMode.REPLACE
> would completely replace the JCR node structure and automatically
> delete everything that is no longer present in the imported archive,
> but that's not the case.

how to you setup the WorkspaceFilter ? Only nodes covered by the
filter are updated. IIRC, if you don't use a filter at all, the
importer only does updates.

> Reading through code it seems the vlt tool uses the meta-data in the
> .vlt directory to track deleted files, which are then processed during
> a commit. Is there a way to add such meta-data to an Archive or
> educate the Importer somehow?

The vault cli tool works completely different than the packageing. it
mimics a subversion like behaviour and is probably unsuited for your
needs. The main magic is done in the
org.apache.jackrabbit.vault.fs.impl.TransactionImpl which uses the
recorded changes to figure out which nodes to sync.

> I've also read through the code of the vault-sync tool, which
> basically implements my use case. I cannot use it as-is though, since
> Hippo does not use OSGI. IIUC vault-sync does not use Archives or
> Packages at all, but uses its own TreeSync class that does lots of
> plain JCR calls. Does that mean that vault-sync does not process
> meta-data files at all?

no. currently vault-sync only handle simple files and folders.

regards, toby