You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Lei Zhou <Le...@pointalliance.com> on 2006/11/21 23:16:23 UTC
Question on jcr:deref usage
Hi,
I'd like to build a custom query which is quite complicated. I'd like to
use jcr:deref to achieve the SQL Join style query but am not sure if this
is doable in Jackrabbit. Could anyone comment on the following use case?
The objects: Document has text properties and Category references
The query: user need to search for documents by specifying values of any
combination of text values and/or category values
The query result: user demands a categorized result view, which contains
expandable/collapsable categories
For example, a document may have text properties Subject, author,
Description; and refers to one or more entries of category "Products".
I'd like to be able to create something like below:
SELECT Document.Subject, Document.Description,
Document.ProductReference->categoryName
where Subject='Manual' AND description contains 'maintenance' AND
Document.ProductReference->categoryName='Product #1'
order by Document.ProductReference->categoryName, Document.Subject
Is this possible in Jackrabbit with Xpath query?
Thanks,
Re: Question on jcr:deref usage
Posted by Lei Zhou <Le...@pointalliance.com>.
Hi,
Not sure I understand what you meant by "extend JCR API" - I guess adding
new features to JCR2.0? Or adding new features to Jackrabbit?
Simply put, it would be good to see added support for SELECT DISTINCT and
GROUP BY. I haven't tried with Jackrabbit for JOIN between primary
nodetype tables, it would be nice if it is there.
An example, if one needs to create a content navigator (similar to the
JTree format) for search results, one way is to grab ALL data and use
application code to create the navigational structure. Another way is to
use one query (like SELECT DISTINCT, and/or GROUP BY ) to get the
navigation data (tree nodes), and user another query (when user selects
one category) to get the requested details.
>Is this now not more an implementation issue (how to do it) than a
>user issue (what features are required, how the API looks like)?
No. See below.
>AFAIK, Lucene (not the RDBMS) is used to for queries (except for
>direct, relative Node access and references).
That's what I thought. Just wonder, wouldn't the RDBMS search/index engine
be faster than an added layer of external code? This is part of the reason
I feel it may be beneficial to expand the DB schema and not use Blobs -
let the DB do the indexing and search - if I'm using DB repository.
>For me, the main reasons to use RDBMS are transaction support and speed.
So I have a file system index from Lucene, on top of another DB index for
Nodes...
>I know, many people think like that. On the other hand, if using Blobs
>is faster, and people don't wants to access the data store directly,
>why not use Blobs?
Agreed. Like I said before, this is more of a "convincing others" issue.
In business, it could be one of the factors that affect decision making by
business users. (Aren't we IT people paid by business :-).
Regards,
Lei
Re: Question on jcr:deref usage
Posted by Thomas Mueller <th...@gmail.com>.
Hi,
> So I wouldn't dare to say "here is how I
> think things should be done".
Maybe you have some ideas how to extend the JCR API to simplify using
it, or just what additional features are required.
> This approach also has a draw-back: by treating Local file systems and RDB
> systems the same way - same set of features, same indexing & searching API
> (??), a lot of good stuff from RDBMS are wasted.
Is this now not more an implementation issue (how to do it) than a
user issue (what features are required, how the API looks like)?
> These features are very useful even in querying structural data model
AFAIK, Lucene (not the RDBMS) is used to for queries (except for
direct, relative Node access and references).
> My personal experience is that production-level content management systems
> are more implemented on RDBMS than on local file system.
For me, the main reasons to use RDBMS are transaction support and speed.
> 2. When presenting architecture design to a business client (usually with
> 'some' knowledge of the IT
> systems/products), the first question would be "is this a serious
> design? why are all the data in
> Blobs?". Although we as developers know that there are good reasons
> for that, it may not be easily
> conveyed to the client.
I know, many people think like that. On the other hand, if using Blobs
is faster, and people don't wants to access the data store directly,
why not use Blobs?
Thomas
Re: Question on jcr:deref usage
Posted by Lei Zhou <Le...@pointalliance.com>.
Hi Thomas,
Thanks for responding.
>I think there are two solutions: Add the missing features to the JCR
>API, and provide the missing features in some other way. Do you have a
>suggestion how to extend the API to support the features you like to
>have (seems to be: aggregation, join, ordering)?
I came to JCR and Jackrabbit more from the perspective of an
integrator/end user, and did not have a chance to study the implementation
in greater technical details. So I wouldn't dare to say "here is how I
think things should be done".
I could provide some thoughts & observations though.
Jackrabbit (or JCR) handles all kinds of possible persistence storage
(local files, DB, etc.) through unified PersistenceManager interface. This
is great because it makes the product easily adapt to any usage scenario.
This approach also has a draw-back: by treating Local file systems and RDB
systems the same way - same set of features, same indexing & searching API
(??), a lot of good stuff from RDBMS are wasted. For example, SELECT
DISTINCT, JOIN, GROUP BY, triggers, stored procedures etc. Some may argue
that JCR is about "Structural", not "Relational" data, why would we care?
These features are very useful even in querying structural data model - my
previous email discussed one use case as an example.
I'm not saying we should use all RDBMS features where it exists, because
there are compatibility & portability issues. Since we have already
provided DB persistence manager and schema DDLs for several RDBMS, it
wouldn't hurt if we extend that effort to do more with the 'native'
features of supported RDBMS.
One idea is to have an "extended set" of features for RDBMS, that can be
queried by Repository.getDescriptorKeys(). These features would support
extended SQL capabilities like SELECT DISTINCT, JOIN, GROUP BY, and ORDER
BY etc.
And I'm not proposing to completely "normalize" the DB schema, there is
always a line between "better" and "extreme".
My personal experience is that production-level content management systems
are more implemented on RDBMS than on local file system. If this applies
to most of the community (??), why would we restrict ourselves?
>If you want to integrate other products in the DB schema level, then
>the current schema may not be the best. However I don't think it was
>the idea that other software accesses the schema of Jackrabbit
>directly.
As described above, I'm not trying to manipulate the repository at DB
level. There are two reasons for me to raise that point:
1. For same reasons as mentioned above, and previous emails, I felt it
would be more beneficial for
people who use RDBMS for repository - and I would bet that represents
a good portion of
JCR/Jackrabbit based applications.
2. When presenting architecture design to a business client (usually with
'some' knowledge of the IT
systems/products), the first question would be "is this a serious
design? why are all the data in
Blobs?". Although we as developers know that there are good reasons
for that, it may not be easily
conveyed to the client.
Again, these are just personal observations and I'm not yet an expert in
JCR/Jackrabbit. Any comments / corrections are appreciated.
Best regards,
Lei
"Thomas Mueller" <th...@gmail.com>
11/24/06 03:53 AM
Please respond to
users@jackrabbit.apache.org
To
users@jackrabbit.apache.org
cc
Subject
Re: Question on jcr:deref usage
Hi,
> So it seems that due to the limitation of JCR (no aggregation query
> support).
I think there are two solutions: Add the missing features to the JCR
API, and provide the missing features in some other way. Do you have a
suggestion how to extend the API to support the features you like to
have (seems to be: aggregation, join, ordering)?
One option is to make the structured part of the JCR repository
accessible like a 'standard' SQL database. Existing (SQL based) report
generators could then be used as well. If you could access the data
stored in the repository using the JDBC API using the following SQL
query, would this provide the convenience you are looking for?
select m.uuid from manual m, product p, region r
where p.uuid = m.product and r.uuid = m.region
and p.name in ('TV', 'VCR', 'DVD')
and r.name in ('North America', 'Europe')
and p.availableFor in ('distributor', 'repairHouse')
order by r.name, p.name
My idea is to add support for 'jcr views' to my database
(http://www.h2database.com).
> #2. The RDBMS based repository, current DB schema is not very convincing
> for large enterprise level applications. A more normalized schema might
> help both performance and #1, but yes, more DB level code may be needed
> (for performance's sake) and that may limit the portability of the
> product.
If you want to integrate other products in the DB schema level, then
the current schema may not be the best. However I don't think it was
the idea that other software accesses the schema of Jackrabbit
directly.
Thomas
Re: Question on jcr:deref usage
Posted by Thomas Mueller <th...@gmail.com>.
Hi,
> So it seems that due to the limitation of JCR (no aggregation query
> support).
I think there are two solutions: Add the missing features to the JCR
API, and provide the missing features in some other way. Do you have a
suggestion how to extend the API to support the features you like to
have (seems to be: aggregation, join, ordering)?
One option is to make the structured part of the JCR repository
accessible like a 'standard' SQL database. Existing (SQL based) report
generators could then be used as well. If you could access the data
stored in the repository using the JDBC API using the following SQL
query, would this provide the convenience you are looking for?
select m.uuid from manual m, product p, region r
where p.uuid = m.product and r.uuid = m.region
and p.name in ('TV', 'VCR', 'DVD')
and r.name in ('North America', 'Europe')
and p.availableFor in ('distributor', 'repairHouse')
order by r.name, p.name
My idea is to add support for 'jcr views' to my database
(http://www.h2database.com).
> #2. The RDBMS based repository, current DB schema is not very convincing
> for large enterprise level applications. A more normalized schema might
> help both performance and #1, but yes, more DB level code may be needed
> (for performance's sake) and that may limit the portability of the
> product.
If you want to integrate other products in the DB schema level, then
the current schema may not be the best. However I don't think it was
the idea that other software accesses the schema of Jackrabbit
directly.
Thomas
Re: Question on jcr:deref usage
Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Lei,
the exception message is indeed wrong. I've fixed just it (see:
http://issues.apache.org/jira/browse/JCR-646)
Thanks for reporting this issue.
regards
marcel
Lei Zhou wrote:
> Hi,
>
> Just found out that I have to quote the * char, and '%' indeed doesn't
> work, although it is the one used for jcr:like.
>
> //element(*, Document)[@Subject='Manual']/jcr:deref(@ProductReference,'*')
>
> regards,
> Lei
Re: Question on jcr:deref usage
Posted by Lei Zhou <Le...@pointalliance.com>.
Hi,
Just found out that I have to quote the * char, and '%' indeed doesn't
work, although it is the one used for jcr:like.
//element(*, Document)[@Subject='Manual']/jcr:deref(@ProductReference,'*')
regards,
Lei
>Thanks Marcel,
>I just tried the query below and got exceptions complaining about "second
>argument type for jcr:like".
>//element(*, Document)[@Subject =
'Manual']/jcr:deref(@ProductReference,*)
>Here is the error: javax.jcr.query.InvalidQueryException: Wrong second
>argument type for jcr:like
>I then replaced '*' with '%', which seems to be accepted, but did not
>return any result:
>//element(*, Document)[@Subject = 'Manual']/jcr:deref(@ProductReference,
>'%')
>Did I miss anything?
Re: Question on jcr:deref usage
Posted by Lei Zhou <Le...@pointalliance.com>.
Thanks Marcel,
I just tried the query below and got exceptions complaining about "second
argument type for jcr:like".
//element(*, Document)[@Subject = 'Manual']/jcr:deref(@ProductReference,
*)
Here is the error: javax.jcr.query.InvalidQueryException: Wrong second
argument type for jcr:like
I then replaced '*' with '%', which seems to be accepted, but did not
return any result:
//element(*, Document)[@Subject = 'Manual']/jcr:deref(@ProductReference,
'%')
Did I miss anything?
Thanks,
Lei
Re: Question on jcr:deref usage
Posted by Marcel Reutegger <ma...@gmx.net>.
Lei Zhou wrote:
> Thanks Marcel!
>
> So it seems that due to the limitation of JCR (no aggregation query
> support), it would be much slower to support this type of application than
> RDBMS.
>
> Is that a correct assessment?
An RDBMS certainly provides a wider range of operations through SQL than JCR
with the current set of XPath or SQL syntax. depending on your needs some of the
queries won't be possible in JCR but others will just be obsolete. E.g. in JCR
you don't have to execute a query to follow a reference you simply call the
method Property.getNode().
> Also, to articulate, if I have to present to users with a query result
> view that is categorized (or grouped) by ProductName, I'd have to do the
> following:
>
> 1. Run query #1
> //element(*, Document)[@Subject = 'Manual' and
> jcr:contains(@description,
> 'maintenance')]
>
> 2. iterate through the entire RowIterator (may have thousands of
> entries), use Java code
> to create an aggregated ProductNames/ProductReference pairs collection
>
> (since JCR doesn't have this type of query),
>
> 3. No "Order By" clause is used because the ProductReferences won't be in
> same order as
> the ProductNames, manual sorting is required in Java post-processing
The same can be achieved in one step:
//element(*, Document)[@Subject = 'Manual' and jcr:contains(@description,
'maintenance')]/jcr:deref(@ProductReference, *) order by @ProductName
this will return an ordered list of product names which contain matches.
> 4. Depending on which category has been selected by user to expand, run
> query #2, limiting
> results to that single product category:
> (query #2)
> //element(*, Document)[@Subject = 'Manual' and
> jcr:contains(@description,
> 'maintenance') and @ProductReference = '<uuid-of-Product-#1>']
Correct.
> 5. Again, product names has to be de-referenced manually, and ordering has
> to be moved from
> the query to the java post-processing
This step I don't understand. What's the purpose of this step and why is it
needed? Isn't all information already available?
> I'm fairly new to JCR and Jackrabbit. I've found them very helpful in many
> aspects of managing contents. But I do feel that certains improvements
> could make Jackrabbit a better choice for enterprise use.
>
> #1. In the many years of enterprise application development, I've seen a
> lot of our content based applications in need of support for complicated
> search, e.g, search by arbitrary combination of document properties, and
> grouping of search results (it is not uncommon to see 2, even 3 levels of
> nested grouping).
> -- Aggregations and Joins are definitely a big plus for querying a
> complicated content model.
Such requirements are also discussed in the expert group of JSR 283. You can
comment on the current spec and post enhancement wishes to jsr-283-comments@jcp.org.
> I've seen posts mentioning use of Node references to compensate the lack
> of SQL Join, but what if I need to perform a search like below
> (ProductNames, Regions and AvailableFors would most likely be categories
> that are referenced by all documents):
> FIND all manuals
> THAT (ProductName is 'TV' or 'VCR' or 'DVD')
> and (Region is 'North America' or 'Europe')
> and (AvailableFor is 'distributor' or 'repairHouse')
> GROUP BY Region, ProductName
such a query is certainly not possible with the current set of XPath or SQL in
JCR. You would have to break up the query into multiple queries. e.g. retrieve
uuids for produces with names 'TV', 'VCR' and 'DVD' and use those uuids in a
query. The same applies to Region and AvailableFor.
IMO XQuery would be a nice fit for those requirements.
> #2. The RDBMS based repository, current DB schema is not very convincing
> for large enterprise level applications. A more normalized schema might
> help both performance and #1, but yes, more DB level code may be needed
> (for performance's sake) and that may limit the portability of the
> product.
I'm not sure that's really the case. Usually a normalized schema means less
performance. There were attempts to create a persistence manager using a
normalized schema, but in the end the currently used schema turned out to be the
most practical one.
regards
marcel
Re: Question on jcr:deref usage
Posted by Lei Zhou <Le...@pointalliance.com>.
Thanks Marcel!
So it seems that due to the limitation of JCR (no aggregation query
support), it would be much slower to support this type of application than
RDBMS.
Is that a correct assessment?
Also, to articulate, if I have to present to users with a query result
view that is categorized (or grouped) by ProductName, I'd have to do the
following:
1. Run query #1
//element(*, Document)[@Subject = 'Manual' and
jcr:contains(@description,
'maintenance')]
2. iterate through the entire RowIterator (may have thousands of
entries), use Java code
to create an aggregated ProductNames/ProductReference pairs collection
(since JCR doesn't have this type of query),
3. No "Order By" clause is used because the ProductReferences won't be in
same order as
the ProductNames, manual sorting is required in Java post-processing
4. Depending on which category has been selected by user to expand, run
query #2, limiting
results to that single product category:
(query #2)
//element(*, Document)[@Subject = 'Manual' and
jcr:contains(@description,
'maintenance') and @ProductReference = '<uuid-of-Product-#1>']
5. Again, product names has to be de-referenced manually, and ordering has
to be moved from
the query to the java post-processing
I'm fairly new to JCR and Jackrabbit. I've found them very helpful in many
aspects of managing contents. But I do feel that certains improvements
could make Jackrabbit a better choice for enterprise use.
#1. In the many years of enterprise application development, I've seen a
lot of our content based applications in need of support for complicated
search, e.g, search by arbitrary combination of document properties, and
grouping of search results (it is not uncommon to see 2, even 3 levels of
nested grouping).
-- Aggregations and Joins are definitely a big plus for querying a
complicated content model.
I've seen posts mentioning use of Node references to compensate the lack
of SQL Join, but what if I need to perform a search like below
(ProductNames, Regions and AvailableFors would most likely be categories
that are referenced by all documents):
FIND all manuals
THAT (ProductName is 'TV' or 'VCR' or 'DVD')
and (Region is 'North America' or 'Europe')
and (AvailableFor is 'distributor' or 'repairHouse')
GROUP BY Region, ProductName
#2. The RDBMS based repository, current DB schema is not very convincing
for large enterprise level applications. A more normalized schema might
help both performance and #1, but yes, more DB level code may be needed
(for performance's sake) and that may limit the portability of the
product.
These are just my perspective of evaluating Jackrabbit and I'd welcome any
comments or corrections on mis-understanding.
All in all, I understand that Jackrabbit is only a reference
implementation to JCR 1.0, and it is really a great product. Just hoped it
can be even better and be more extensively adopted just like Apache HTTP
server.
Best Regards,
Lei
Marcel Reutegger <ma...@gmx.net>
11/23/06 09:25 AM
Please respond to
users@jackrabbit.apache.org
To
users@jackrabbit.apache.org
cc
Subject
Re: Question on jcr:deref usage
Lei Zhou wrote:
> SELECT Document.Subject, Document.Description,
> Document.ProductReference->categoryName
> where Subject='Manual' AND description contains 'maintenance' AND
> Document.ProductReference->categoryName='Product #1'
> order by Document.ProductReference->categoryName, Document.Subject
>
> Is this possible in Jackrabbit with Xpath query?
no, not quite. the jcr:deref() function cannot be used in a predicate,
which
would be required for that use case. furthermore the select clause and
order by
clause may only contain property names.
the closed you can get is something like:
//element(*, Document)[@Subject = 'Manual' and jcr:contains(@description,
'maintenance') and @ProductReference = '<uuid-of-Product-#1>'] order by
@ProductReference, @Subject
and then you have to do some post processing. basically dereferencing the
ProductReference to get the name of the product.
regards
marcel
Re: Question on jcr:deref usage
Posted by Marcel Reutegger <ma...@gmx.net>.
Lei Zhou wrote:
> SELECT Document.Subject, Document.Description,
> Document.ProductReference->categoryName
> where Subject='Manual' AND description contains 'maintenance' AND
> Document.ProductReference->categoryName='Product #1'
> order by Document.ProductReference->categoryName, Document.Subject
>
> Is this possible in Jackrabbit with Xpath query?
no, not quite. the jcr:deref() function cannot be used in a predicate, which
would be required for that use case. furthermore the select clause and order by
clause may only contain property names.
the closed you can get is something like:
//element(*, Document)[@Subject = 'Manual' and jcr:contains(@description,
'maintenance') and @ProductReference = '<uuid-of-Product-#1>'] order by
@ProductReference, @Subject
and then you have to do some post processing. basically dereferencing the
ProductReference to get the name of the product.
regards
marcel