You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Christian Stocker <ch...@liip.ch> on 2012/03/31 08:18:56 UTC

About the missing COUNT() feature in jackrabbit

Hi

As we all know, there's no COUNT feature in SQL2. This is not that bad,
if you're not trying to get a count via remoting, since you can get it
with .getSize() locally.  But if you're working via the webdav layer,
it's not really efficient right now, especially if your resultsets are
larger.

There are in my opinion 2 opinions to solve this.

1) add a count() feature to the SQL2 parser, there's already a ticket
for that (among other things)
https://issues.apache.org/jira/browse/JCR-2605
That of course would be a great solutions, it solves the problem for all
cases who need that

2) add an option to the SEARCH webdav interface to get the count if
results instead of the whole result set.

For 2) I made a little prototype (which is really just that ;)) to see
if that would be feasible. That patch can be found here

https://github.com/chregu/jackrabbit/compare/count-in-webdav

It adds a D:count option to a SEARCH query like this

<D:searchrequest xmlns:D="DAV:">
	<JCR-SQL2>
		<![CDATA[SELECT * FROM [nt:unstructured] ]]>
	</JCR-SQL2>
	<D:count>1</D:count>	
</D:searchrequest>

and if that is set it returns only the count of results. Currently like
below, but that format has certainly to be changed (it was just the
shortest way to get something back):

<D:multistatus xmlns:D="DAV:">
	<D:response>
		<D:href>#count</D:href>
		<D:responsedescription>433</D:responsedescription>
	</D:response>
<D:responsedescription>nt:unstructured.jcr:primaryType</D:responsedescription>
</D:multistatus>

My questions:

- Is there any chance that something like this goes into jackrabbit and
we can build on that feature?
- Or is there a better way to just get the count of a query?
- How should the response look like for something like this? The above
is certainly not right ;)
- And last but not least, are there any plans to implement COUNT() some
time in the future directly in SQL2?

Thanks for any feedback.

chregu

Re: About the missing COUNT() feature in jackrabbit

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

It's true that COUNT() queries are slow in Jackrabbit. Databases have the
same problem: if you have no index (and in many cases even if you have
one), the database internally still needs to read all rows that match, at
least in a lot of cases.

Regards,
Thomas


On 4/4/12 11:08 AM, "Bart van der Schans" <b....@onehippo.com>
wrote:

>Hi,
>
>The problem with providing a COUNT feature is authorization. The get
>to a correct count you need to fetch all the nodes from the PM and
>authorize them one-by-one. This works fine for a few hundred nodes,
>but will take forever when you scale up.For example "SELECT COUNT(*)
>FROM nt:base" would directly translate to fetch every single bundle
>from the persistence layer (database) and do that with a single query
>for every node and then authorize each of those nodes.
>
>The only way I can see a well performing count feature is when the
>authorization tokens are indexed as well, aka running authorized
>queries. For JR 2 this is probably not an option.
>
>IMO providing a count feature in the query language would give the
>wrong impression to developers. It would look like something you can
>"just use", but in fact it will not work once you start putting
>content in your production repository.
>
>Regard,
>Bart
>
>On Wed, Apr 4, 2012 at 8:35 AM, Christian Stocker
><ch...@liip.ch> wrote:
>> Hi
>>
>> Any opinions on this? I'm mainly wondering if there's a chance that
>> something like this goes into jackrabbit or if there's opposition to
>> that. Not the actual implementation, just the idea of it. I'll work more
>> on it, if you think that could be something for jackrabbit-jcr-server
>> and propose something more decent.
>>
>> If not, I'll pursue other challenges ;)
>>
>> Greetings
>>
>> chregu
>>
>> On 31.03.12 08:18, Christian Stocker wrote:
>>> Hi
>>>
>>> As we all know, there's no COUNT feature in SQL2. This is not that bad,
>>> if you're not trying to get a count via remoting, since you can get it
>>> with .getSize() locally.  But if you're working via the webdav layer,
>>> it's not really efficient right now, especially if your resultsets are
>>> larger.
>>>
>>> There are in my opinion 2 opinions to solve this.
>>>
>>> 1) add a count() feature to the SQL2 parser, there's already a ticket
>>> for that (among other things)
>>> https://issues.apache.org/jira/browse/JCR-2605
>>> That of course would be a great solutions, it solves the problem for
>>>all
>>> cases who need that
>>>
>>> 2) add an option to the SEARCH webdav interface to get the count if
>>> results instead of the whole result set.
>>>
>>> For 2) I made a little prototype (which is really just that ;)) to see
>>> if that would be feasible. That patch can be found here
>>>
>>> https://github.com/chregu/jackrabbit/compare/count-in-webdav
>>>
>>> It adds a D:count option to a SEARCH query like this
>>>
>>> <D:searchrequest xmlns:D="DAV:">
>>>       <JCR-SQL2>
>>>               <![CDATA[SELECT * FROM [nt:unstructured] ]]>
>>>       </JCR-SQL2>
>>>       <D:count>1</D:count>
>>> </D:searchrequest>
>>>
>>> and if that is set it returns only the count of results. Currently like
>>> below, but that format has certainly to be changed (it was just the
>>> shortest way to get something back):
>>>
>>> <D:multistatus xmlns:D="DAV:">
>>>       <D:response>
>>>               <D:href>#count</D:href>
>>>               <D:responsedescription>433</D:responsedescription>
>>>       </D:response>
>>> 
>>><D:responsedescription>nt:unstructured.jcr:primaryType</D:responsedescri
>>>ption>
>>> </D:multistatus>
>>>
>>> My questions:
>>>
>>> - Is there any chance that something like this goes into jackrabbit and
>>> we can build on that feature?
>>> - Or is there a better way to just get the count of a query?
>>> - How should the response look like for something like this? The above
>>> is certainly not right ;)
>>> - And last but not least, are there any plans to implement COUNT() some
>>> time in the future directly in SQL2?
>>>
>>> Thanks for any feedback.
>>>
>>> chregu
>>>
>>>
>>
>> --
>> Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
>> Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
>> www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE
>>
>
>
>
>-- 
>Amsterdam - Oosteinde 11, 1017 WT Amsterdam
>Boston - 1 Broadway, Cambridge, MA 02142
>
>US +1 877 414 4776 (toll free)
>Europe +31(0)20 522 4466
>www.onehippo.com

Re: About the missing COUNT() feature in jackrabbit

Posted by Bart van der Schans <b....@onehippo.com>.

Hi,

The problem with providing a COUNT feature is authorization. The get
to a correct count you need to fetch all the nodes from the PM and
authorize them one-by-one. This works fine for a few hundred nodes,
but will take forever when you scale up.For example "SELECT COUNT(*)
FROM nt:base" would directly translate to fetch every single bundle
from the persistence layer (database) and do that with a single query
for every node and then authorize each of those nodes.

The only way I can see a well performing count feature is when the
authorization tokens are indexed as well, aka running authorized
queries. For JR 2 this is probably not an option.

IMO providing a count feature in the query language would give the
wrong impression to developers. It would look like something you can
"just use", but in fact it will not work once you start putting
content in your production repository.

Regard,
Bart

On Wed, Apr 4, 2012 at 8:35 AM, Christian Stocker
<ch...@liip.ch> wrote:
> Hi
>
> Any opinions on this? I'm mainly wondering if there's a chance that
> something like this goes into jackrabbit or if there's opposition to
> that. Not the actual implementation, just the idea of it. I'll work more
> on it, if you think that could be something for jackrabbit-jcr-server
> and propose something more decent.
>
> If not, I'll pursue other challenges ;)
>
> Greetings
>
> chregu
>
> On 31.03.12 08:18, Christian Stocker wrote:
>> Hi
>>
>> As we all know, there's no COUNT feature in SQL2. This is not that bad,
>> if you're not trying to get a count via remoting, since you can get it
>> with .getSize() locally.  But if you're working via the webdav layer,
>> it's not really efficient right now, especially if your resultsets are
>> larger.
>>
>> There are in my opinion 2 opinions to solve this.
>>
>> 1) add a count() feature to the SQL2 parser, there's already a ticket
>> for that (among other things)
>> https://issues.apache.org/jira/browse/JCR-2605
>> That of course would be a great solutions, it solves the problem for all
>> cases who need that
>>
>> 2) add an option to the SEARCH webdav interface to get the count if
>> results instead of the whole result set.
>>
>> For 2) I made a little prototype (which is really just that ;)) to see
>> if that would be feasible. That patch can be found here
>>
>> https://github.com/chregu/jackrabbit/compare/count-in-webdav
>>
>> It adds a D:count option to a SEARCH query like this
>>
>> <D:searchrequest xmlns:D="DAV:">
>>       <JCR-SQL2>
>>               <![CDATA[SELECT * FROM [nt:unstructured] ]]>
>>       </JCR-SQL2>
>>       <D:count>1</D:count>
>> </D:searchrequest>
>>
>> and if that is set it returns only the count of results. Currently like
>> below, but that format has certainly to be changed (it was just the
>> shortest way to get something back):
>>
>> <D:multistatus xmlns:D="DAV:">
>>       <D:response>
>>               <D:href>#count</D:href>
>>               <D:responsedescription>433</D:responsedescription>
>>       </D:response>
>> <D:responsedescription>nt:unstructured.jcr:primaryType</D:responsedescription>
>> </D:multistatus>
>>
>> My questions:
>>
>> - Is there any chance that something like this goes into jackrabbit and
>> we can build on that feature?
>> - Or is there a better way to just get the count of a query?
>> - How should the response look like for something like this? The above
>> is certainly not right ;)
>> - And last but not least, are there any plans to implement COUNT() some
>> time in the future directly in SQL2?
>>
>> Thanks for any feedback.
>>
>> chregu
>>
>>
>
> --
> Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
> Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
> www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE
>



-- 
Amsterdam - Oosteinde 11, 1017 WT Amsterdam
Boston - 1 Broadway, Cambridge, MA 02142

US +1 877 414 4776 (toll free)
Europe +31(0)20 522 4466
www.onehippo.com

Re: About the missing COUNT() feature in jackrabbit

Posted by Angela Schreiber <an...@adobe.com>.

hi chregu

as far as i am concerned i would feel comfortable with either
solution. however, i am not a query expert and thus i would
like to leave the decision regarding 1) to those working on
the query code base.
on the other hand, i don't see any problem in addressing 2).
just a minor detail that i happen to see: i wouldn't use the
DAV: uri for a custom extension.

the best way to proceed was imo if you would create an jira
issue and attach a patch.

kind regards
angela


On 4/4/12 8:35 AM, Christian Stocker wrote:
> Hi
>
> Any opinions on this? I'm mainly wondering if there's a chance that
> something like this goes into jackrabbit or if there's opposition to
> that. Not the actual implementation, just the idea of it. I'll work more
> on it, if you think that could be something for jackrabbit-jcr-server
> and propose something more decent.
>
> If not, I'll pursue other challenges ;)
>
> Greetings
>
> chregu
>
> On 31.03.12 08:18, Christian Stocker wrote:
>> Hi
>>
>> As we all know, there's no COUNT feature in SQL2. This is not that bad,
>> if you're not trying to get a count via remoting, since you can get it
>> with .getSize() locally.  But if you're working via the webdav layer,
>> it's not really efficient right now, especially if your resultsets are
>> larger.
>>
>> There are in my opinion 2 opinions to solve this.
>>
>> 1) add a count() feature to the SQL2 parser, there's already a ticket
>> for that (among other things)
>> https://issues.apache.org/jira/browse/JCR-2605
>> That of course would be a great solutions, it solves the problem for all
>> cases who need that
>>
>> 2) add an option to the SEARCH webdav interface to get the count if
>> results instead of the whole result set.
>>
>> For 2) I made a little prototype (which is really just that ;)) to see
>> if that would be feasible. That patch can be found here
>>
>> https://github.com/chregu/jackrabbit/compare/count-in-webdav
>>
>> It adds a D:count option to a SEARCH query like this
>>
>> <D:searchrequest xmlns:D="DAV:">
>> 	<JCR-SQL2>
>> 		<![CDATA[SELECT * FROM [nt:unstructured] ]]>
>> 	</JCR-SQL2>
>> 	<D:count>1</D:count>	
>> </D:searchrequest>
>>
>> and if that is set it returns only the count of results. Currently like
>> below, but that format has certainly to be changed (it was just the
>> shortest way to get something back):
>>
>> <D:multistatus xmlns:D="DAV:">
>> 	<D:response>
>> 		<D:href>#count</D:href>
>> 		<D:responsedescription>433</D:responsedescription>
>> 	</D:response>
>> <D:responsedescription>nt:unstructured.jcr:primaryType</D:responsedescription>
>> </D:multistatus>
>>
>> My questions:
>>
>> - Is there any chance that something like this goes into jackrabbit and
>> we can build on that feature?
>> - Or is there a better way to just get the count of a query?
>> - How should the response look like for something like this? The above
>> is certainly not right ;)
>> - And last but not least, are there any plans to implement COUNT() some
>> time in the future directly in SQL2?
>>
>> Thanks for any feedback.
>>
>> chregu
>>
>>
>

Re: About the missing COUNT() feature in jackrabbit

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On Wed, Apr 4, 2012 at 9:06 AM, Thomas Mueller <mu...@adobe.com> wrote:
> For Oak, COUNT(*) is definitely possible. Also important is GROUP BY I
> believe, in combination with COUNT(*). I'm not sure about DISTINCT.
>
> But for Jackrabbit 2.x, I can't tell how easy it would be to support it.

I looked at this last year. It's doable if someone has the cycles for
it, but requires quite a bit of effort as the relevant bits in
Jackrabbit 2.x weren't designed with aggregate queries in mind.

Implementing option 2 as outlined by Christian is far easier for this
specific use case.

BR,

Jukka Zitting

Re: About the missing COUNT() feature in jackrabbit

Posted by Thomas Mueller <mu...@adobe.com>.

Hi,

For Oak, COUNT(*) is definitely possible. Also important is GROUP BY I
believe, in combination with COUNT(*). I'm not sure about DISTINCT.

But for Jackrabbit 2.x, I can't tell how easy it would be to support it.

Regards,
Thomas


On 4/4/12 8:35 AM, "Christian Stocker" <ch...@liip.ch> wrote:

>Hi
>
>Any opinions on this? I'm mainly wondering if there's a chance that
>something like this goes into jackrabbit or if there's opposition to
>that. Not the actual implementation, just the idea of it. I'll work more
>on it, if you think that could be something for jackrabbit-jcr-server
>and propose something more decent.
>
>If not, I'll pursue other challenges ;)
>
>Greetings
>
>chregu
>
>On 31.03.12 08:18, Christian Stocker wrote:
>> Hi
>> 
>> As we all know, there's no COUNT feature in SQL2. This is not that bad,
>> if you're not trying to get a count via remoting, since you can get it
>> with .getSize() locally.  But if you're working via the webdav layer,
>> it's not really efficient right now, especially if your resultsets are
>> larger.
>> 
>> There are in my opinion 2 opinions to solve this.
>> 
>> 1) add a count() feature to the SQL2 parser, there's already a ticket
>> for that (among other things)
>> https://issues.apache.org/jira/browse/JCR-2605
>> That of course would be a great solutions, it solves the problem for all
>> cases who need that
>> 
>> 2) add an option to the SEARCH webdav interface to get the count if
>> results instead of the whole result set.
>> 
>> For 2) I made a little prototype (which is really just that ;)) to see
>> if that would be feasible. That patch can be found here
>> 
>> https://github.com/chregu/jackrabbit/compare/count-in-webdav
>> 
>> It adds a D:count option to a SEARCH query like this
>> 
>> <D:searchrequest xmlns:D="DAV:">
>> 	<JCR-SQL2>
>> 		<![CDATA[SELECT * FROM [nt:unstructured] ]]>
>> 	</JCR-SQL2>
>> 	<D:count>1</D:count>	
>> </D:searchrequest>
>> 
>> and if that is set it returns only the count of results. Currently like
>> below, but that format has certainly to be changed (it was just the
>> shortest way to get something back):
>> 
>> <D:multistatus xmlns:D="DAV:">
>> 	<D:response>
>> 		<D:href>#count</D:href>
>> 		<D:responsedescription>433</D:responsedescription>
>> 	</D:response>
>> 
>><D:responsedescription>nt:unstructured.jcr:primaryType</D:responsedescrip
>>tion>
>> </D:multistatus>
>> 
>> My questions:
>> 
>> - Is there any chance that something like this goes into jackrabbit and
>> we can build on that feature?
>> - Or is there a better way to just get the count of a query?
>> - How should the response look like for something like this? The above
>> is certainly not right ;)
>> - And last but not least, are there any plans to implement COUNT() some
>> time in the future directly in SQL2?
>> 
>> Thanks for any feedback.
>> 
>> chregu
>> 
>> 
>
>-- 
>Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
>Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
>www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE
>

Re: About the missing COUNT() feature in jackrabbit

Posted by Christian Stocker <ch...@liip.ch>.

Hi

Any opinions on this? I'm mainly wondering if there's a chance that
something like this goes into jackrabbit or if there's opposition to
that. Not the actual implementation, just the idea of it. I'll work more
on it, if you think that could be something for jackrabbit-jcr-server
and propose something more decent.

If not, I'll pursue other challenges ;)

Greetings

chregu

On 31.03.12 08:18, Christian Stocker wrote:
> Hi
> 
> As we all know, there's no COUNT feature in SQL2. This is not that bad,
> if you're not trying to get a count via remoting, since you can get it
> with .getSize() locally.  But if you're working via the webdav layer,
> it's not really efficient right now, especially if your resultsets are
> larger.
> 
> There are in my opinion 2 opinions to solve this.
> 
> 1) add a count() feature to the SQL2 parser, there's already a ticket
> for that (among other things)
> https://issues.apache.org/jira/browse/JCR-2605
> That of course would be a great solutions, it solves the problem for all
> cases who need that
> 
> 2) add an option to the SEARCH webdav interface to get the count if
> results instead of the whole result set.
> 
> For 2) I made a little prototype (which is really just that ;)) to see
> if that would be feasible. That patch can be found here
> 
> https://github.com/chregu/jackrabbit/compare/count-in-webdav
> 
> It adds a D:count option to a SEARCH query like this
> 
> <D:searchrequest xmlns:D="DAV:">
> 	<JCR-SQL2>
> 		<![CDATA[SELECT * FROM [nt:unstructured] ]]>
> 	</JCR-SQL2>
> 	<D:count>1</D:count>	
> </D:searchrequest>
> 
> and if that is set it returns only the count of results. Currently like
> below, but that format has certainly to be changed (it was just the
> shortest way to get something back):
> 
> <D:multistatus xmlns:D="DAV:">
> 	<D:response>
> 		<D:href>#count</D:href>
> 		<D:responsedescription>433</D:responsedescription>
> 	</D:response>
> <D:responsedescription>nt:unstructured.jcr:primaryType</D:responsedescription>
> </D:multistatus>
> 
> My questions:
> 
> - Is there any chance that something like this goes into jackrabbit and
> we can build on that feature?
> - Or is there a better way to just get the count of a query?
> - How should the response look like for something like this? The above
> is certainly not right ;)
> - And last but not least, are there any plans to implement COUNT() some
> time in the future directly in SQL2?
> 
> Thanks for any feedback.
> 
> chregu
> 
> 

-- 
Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE