You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Luiz Fernando Teston <fe...@caravelatech.com> on 2009/11/27 16:46:14 UTC

question about xpath on not saved nodes

I'm a core developer of a project that uses JCR a lot. We are using
JackRabbit as our implementation and it works very well for us on a lot of
situations.
I don't know if I did something wrong, but this is what happens on my
environment:
Working on the same opened Jcr Session, I just added a node and tried to
retrieve it using xpath. It doesn't retrieve its node. But if I save this
node before doing the xpath it works.
I don't know, but on sql inside a transaction, after doing an insert is
possible to retrieve the new row on the same transaction. I think it should
behave the same way on a new node inside a given session.
So, I have a question: Is it possible to change this behavior to be possible
to retrieve this nodes before saving? (of course only on the current
session). Maybe it should be possible to change this behavior by modifying
the repository.xml or something like that.

Here is the code used inside the test:

                 //opens a session...
                Node rootNode = session.getRootNode();
rootNode.addNode("abc");
session.save(); //if this line is commented the result of hasNext is false.
Otherwise is true
       QueryResult result = XPathTest.session.getWorkspace()
.getQueryManager().createQuery("abc", Query.XPATH).execute();
boolean hasNext = result.getNodes().hasNext();

Before send this email I also looked for it on google and jackrabbit site,
but I didn't find anything.
So, I appreciate if you guys can help me.


Best regards,



Fernando Teston

Re: question about xpath on not saved nodes

Posted by Luiz Fernando Teston <fe...@caravelatech.com>.

To use two different folders will add a huge complexity to our codebase. And
we deal with huge amounts of data. To create a temporary folder based on a
existent and to merge or replace that folder after the process is done
should spend a long time because of the huge data.
I think the only practical way to solve my problem will be to do this as a
single thread :-(

Regards,


Fernando Teston

On Mon, Nov 30, 2009 at 9:44 AM, Alexander Klimetschek <ak...@day.com>wrote:

> On Fri, Nov 27, 2009 at 20:54, Luiz Fernando Teston
> <fe...@caravelatech.com> wrote:
> > I was thinking it should be possible because of this:
> > http://www.day.com/specs/jcr/1.0/8.1.3_Save_vs._Commit.html
>
> As it is stated there:
>
> "Within a transaction, changes made by save (or other,
> workspace-direct, methods) are transactionalized and are only
> persisted and published (made visible to other sessions), upon commit
> of the transaction."
>
> Thus persistence (incl. search index update) happens only upon commit
> in the context of transactions.
>
> > I have the following scenario: I work on a software which parses a lot of
> > data and store its resulted metadata on jcr. The stored metadata is very
> > complex and we need to do some complex search during a massive data
> > inclusion. This search should see everything that was already included on
> > this session, but it should not reflect in the "saved data" until it
> > finishes.
>
> Why not use the content model for that, eg. have a temporary "import"
> or "staging" folder that you import stuff into, set the metadata, save
> it and then run searches on top of it to integrate it with the rest of
> your content (or whatever you need to do here). You can copy it over
> to the real, active content tree later, eg. through one or more moves.
> If the data-accessing part of your application only looks at the
> active content tree, you still have proper transactional separation.
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
>

Re: question about xpath on not saved nodes

Posted by Alexander Klimetschek <ak...@day.com>.

On Fri, Nov 27, 2009 at 20:54, Luiz Fernando Teston
<fe...@caravelatech.com> wrote:
> I was thinking it should be possible because of this:
> http://www.day.com/specs/jcr/1.0/8.1.3_Save_vs._Commit.html

As it is stated there:

"Within a transaction, changes made by save (or other,
workspace-direct, methods) are transactionalized and are only
persisted and published (made visible to other sessions), upon commit
of the transaction."

Thus persistence (incl. search index update) happens only upon commit
in the context of transactions.

> I have the following scenario: I work on a software which parses a lot of
> data and store its resulted metadata on jcr. The stored metadata is very
> complex and we need to do some complex search during a massive data
> inclusion. This search should see everything that was already included on
> this session, but it should not reflect in the "saved data" until it
> finishes.

Why not use the content model for that, eg. have a temporary "import"
or "staging" folder that you import stuff into, set the metadata, save
it and then run searches on top of it to integrate it with the rest of
your content (or whatever you need to do here). You can copy it over
to the real, active content tree later, eg. through one or more moves.
If the data-accessing part of your application only looks at the
active content tree, you still have proper transactional separation.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: question about xpath on not saved nodes

Posted by Luiz Fernando Teston <fe...@caravelatech.com>.

Alexander,

I was thinking it should be possible because of this:
http://www.day.com/specs/jcr/1.0/8.1.3_Save_vs._Commit.html
I have the following scenario: I work on a software which parses a lot of
data and store its resulted metadata on jcr. The stored metadata is very
complex and we need to do some complex search during a massive data
inclusion. This search should see everything that was already included on
this session, but it should not reflect in the "saved data" until it
finishes.
The first workaround we did was to create different versions and only do a
check in where this work is in a valid state. This approach seems to work
well on a single thread environment, but on multi thread environment it
throws some errors when dealing with the same nodes on different threads,
since each thread had its own version state.
I don't know if there's any other way to isolate the changes until a given
state, but doing xpath queries on this transient data. Do you know anyway to
achieve that isolation doing xpath on this transient data?

Thanks a lot for the help!

Fernando Teston

On Fri, Nov 27, 2009 at 4:13 PM, Alexander Klimetschek <ak...@day.com>wrote:

> On Fri, Nov 27, 2009 at 11:44, Luiz Fernando Teston
> <fe...@caravelatech.com> wrote:
> > I'll try to get this behavior by using XASessions on jackrabbit.
>
> This won't change it, because an XASession will defer the actual
> persistence (and thus the search index update) from the session.save()
> call to the commit of the transaction.
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
>

Re: question about xpath on not saved nodes

Posted by Alexander Klimetschek <ak...@day.com>.

On Fri, Nov 27, 2009 at 11:44, Luiz Fernando Teston
<fe...@caravelatech.com> wrote:
> I'll try to get this behavior by using XASessions on jackrabbit.

This won't change it, because an XASession will defer the actual
persistence (and thus the search index update) from the session.save()
call to the commit of the transaction.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: question about xpath on not saved nodes

Posted by Luiz Fernando Teston <fe...@caravelatech.com>.

Alexander,

Thanks for the fast reply. So this behavior are on the spec and isn't
particular to jackrabbit.
I'll try to get this behavior by using XASessions on jackrabbit.

Best regards,


Fernando Teston

On Fri, Nov 27, 2009 at 2:11 PM, Alexander Klimetschek <ak...@day.com>wrote:

> On Sat, Nov 28, 2009 at 01:16, Luiz Fernando Teston
> <fe...@caravelatech.com> wrote:
> > I'm a core developer of a project that uses JCR a lot. We are using
> > JackRabbit as our implementation and it works very well for us on a lot
> of
> > situations.
> > I don't know if I did something wrong, but this is what happens on my
> > environment:
> > Working on the same opened Jcr Session, I just added a node and tried to
> > retrieve it using xpath. It doesn't retrieve its node. But if I save this
> > node before doing the xpath it works.
>
> A JCR 1.0 query always searches the persisted storage only:
>
> http://www.day.com/specs/jcr/1.0/6.6.7_Search_Scope.html
>
> JCR 2.0 adds a "*may* search transient space as well" statement,
> however, that is not support by Jackrabbit 2.0 and its Lucene search
> index implementation:
>
> http://www.day.com/specs/jcr/2.0/6_Query.html#6.5%20Search%20Scope
>
> The complexity of having such a separate full-text index for each
> session, which must be merged with the persistent index, outweighs the
> very few uses for it - at least I haven't seen a case where you'd
> wanted to search the transient space. Since the application created
> that transient data itself, there is rarely the need to search in
> that.
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
>

Re: question about xpath on not saved nodes

Posted by Alexander Klimetschek <ak...@day.com>.

On Sat, Nov 28, 2009 at 01:16, Luiz Fernando Teston
<fe...@caravelatech.com> wrote:
> I'm a core developer of a project that uses JCR a lot. We are using
> JackRabbit as our implementation and it works very well for us on a lot of
> situations.
> I don't know if I did something wrong, but this is what happens on my
> environment:
> Working on the same opened Jcr Session, I just added a node and tried to
> retrieve it using xpath. It doesn't retrieve its node. But if I save this
> node before doing the xpath it works.

A JCR 1.0 query always searches the persisted storage only:

http://www.day.com/specs/jcr/1.0/6.6.7_Search_Scope.html

JCR 2.0 adds a "*may* search transient space as well" statement,
however, that is not support by Jackrabbit 2.0 and its Lucene search
index implementation:

http://www.day.com/specs/jcr/2.0/6_Query.html#6.5%20Search%20Scope

The complexity of having such a separate full-text index for each
session, which must be merged with the persistent index, outweighs the
very few uses for it - at least I haven't seen a case where you'd
wanted to search the transient space. Since the application created
that transient data itself, there is rarely the need to search in
that.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com