You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Marcel Reutegger <ma...@gmx.net> on 2007/09/06 10:19:16 UTC

master plan for jsr 283 query implementation

well, those are actually just my thoughts how I think we should implement the 
query enhancements specified in JSR 283.

there are basically three major blocks that we need to implement:

- JQOM, allows you to programmatically create a query
- JCR-SQL2, the new SQL query syntax
- additional query features (joins, etc.)

In a first step I already introduced temporary interfaces for the JQOM and 
implementing classes.

I'd like to keep the current design of the query sub system for a while until we 
are ready to switch to the new JQOM as the basis for syntax independent query 
representation.

That is, in a first phase my suggestion is the following:

XPath---+
         +--->AQT----+
SQL-----+           +---->LuceneQuery
                     |
SQL2------->JQOM----+


AQT: abstract query tree

And once the path SQL2->JQOM->LuceneQuery is stable:

XPath---+     AQT (deprecated)
         |
SQL-----+---->JQOM----->LuceneQuery
         |
SQL2----+


Comments and suggestions are welcome.

regards
  marcel

Re: jsr 283 query implementation

Posted by Marcel Reutegger <ma...@gmx.net>.

Hi Ramesh,

you can find the syntax details in the public review version of the JSR 283 
specification:

http://jcp.org/aboutJava/communityprocess/pr/jsr283/

regards
  marcel

Ramesh wrote:
> Hi Marcel
> 
> Can you give me an idea who the JCR-SQL2 query syntax like. if possible some
> examples.
> 
> how to convert JCR-SQL2  to JQOM.
> 
> Kind Regards
> Ramesh
> 
> 
>

jsr 283 query implementation

Posted by Ramesh <dr...@yahoo.com>.

Hi Marcel

Can you give me an idea who the JCR-SQL2 query syntax like. if possible some
examples.

how to convert JCR-SQL2  to JQOM.

Kind Regards
Ramesh

Re: master plan for jsr 283 query implementation

Posted by Julian Reschke <ju...@gmx.de>.

Christoph Kiehl wrote:
> Marcel Reutegger wrote:
> 
>> well, those are actually just my thoughts how I think we should 
>> implement the query enhancements specified in JSR 283.
>>
>> there are basically three major blocks that we need to implement:
>>
>> - JQOM, allows you to programmatically create a query
>> - JCR-SQL2, the new SQL query syntax
>> - additional query features (joins, etc.)
>>
>> In a first step I already introduced temporary interfaces for the JQOM 
>> and implementing classes.
>>
>> I'd like to keep the current design of the query sub system for a 
>> while until we are ready to switch to the new JQOM as the basis for 
>> syntax independent query representation.
>>
>> That is, in a first phase my suggestion is the following:
>>
>> XPath---+
>>         +--->AQT----+
>> SQL-----+           +---->LuceneQuery
>>                     |
>> SQL2------->JQOM----+
>>
>>
>> AQT: abstract query tree
>>
>> And once the path SQL2->JQOM->LuceneQuery is stable:
>>
>> XPath---+     AQT (deprecated)
>>         |
>> SQL-----+---->JQOM----->LuceneQuery
>>         |
>> SQL2----+
>>
>>
>> Comments and suggestions are welcome.
> 
> +1. Sounds reasonable. Do you want to use javacc for SQL2 parsing?

Looks good to me as well (sorry, somehow missed the original post).

Best regards, Julian

Re: master plan for jsr 283 query implementation

Posted by Christoph Kiehl <ch...@sulu3000.de>.

Marcel Reutegger wrote:
> Thomas Mueller wrote:
>>> use javacc for SQL2 parsing
>>
>> I would use a hand-written recursive descent parser. I know I'm
>> probably the only one suggesting this...
> 
> what are the advantages of a hand-written parser over a generated one?
> 
> probably performance, but are there other?

I my view the goal should be to make it as easy as possible for people to 
understand the jackrabbit code and contribute to it, without significantly 
sacrifying performance or reinventing the whole wheel. It's maybe easy to 
understand the javacc grammar once you know javacc but it still is a hurdle to 
jump the first time.
I'm still unable to contribute to the parser because I had no time yet to 
thoroughly learn javacc.
We just went back from antlr to hand-written java code for a very simple query 
language we use. The resulting code was much simpler (less code), better to 
understand and was more flexible in regards of how tokens are handled in 
different contexts. All this need not apply to JCR-SQL2. So I can't judge what 
is the best way to implement the parser.

Cheers,
Christoph

Re: master plan for jsr 283 query implementation

Posted by "Padraic I. Hannon" <pi...@wasabicowboy.com>.

I concur, javacc while not something a lot of people use day to day has 
been around a long while now and is pretty standard. It would be best to 
leverage something that others know/can pick up than write something 
from scratch. I think there is time better spent doing other things than 
writing parsers :-)

-paddy

Re: master plan for jsr 283 query implementation

Posted by Marcel May <ma...@consol.de>.

Marcel Reutegger wrote:
> Thomas Mueller wrote:
>>> use javacc for SQL2 parsing
>>
>> I would use a hand-written recursive descent parser. I know I'm
>> probably the only one suggesting this...
>
> what are the advantages of a hand-written parser over a generated one?
>
> probably performance, but are there other?
>
- Extensibility and easier maintenance

- ANTLR/JavaCC are more or less 'standards'

- Don't reinvent the wheel :-) It's nice to write your own parser and
lexer, but why do that if
  you get a perfectly generated one?
  You'll be faster, too, in terms of performance and implementation time.
  The generated lexer/parser are proven to work in many other projects.

- You need to define the grammar anyway, and eg JavaCC can generate
Javadoc like grammar documentation using JJDoc
> regards
>  marcel

Cheers,
Marcel

Re: master plan for jsr 283 query implementation

Posted by Marcel Reutegger <ma...@gmx.net>.

Thomas Mueller wrote:
>> use javacc for SQL2 parsing
> 
> I would use a hand-written recursive descent parser. I know I'm
> probably the only one suggesting this...

what are the advantages of a hand-written parser over a generated one?

probably performance, but are there other?

regards
  marcel

Re: master plan for jsr 283 query implementation

Posted by Felix Meschberger <fm...@gmail.com>.

Hi,

After having read all the messages in this thread starting with this
note, I just want to throw my $.02 into the pot:

I admit feeling more comfortable with good hand-written parsers, too. It
is easy after all: You just convert each BNF statement into a method and
you are almost done. Tokenizing is definitely a separate step - yet
depending on the set of terminal symbols not that complicated, either.

Reading and understanding a clean hand-written parser containing the
actual BNF as comments is much easier than understand JavaCC code (don't
know about ANTLR, and yacc is similar :-) ). And this has a direct
influence on the quality of the code :-)

BTW: I also have encountered parsers implemented as a State Machine with
one huge method looping over the symbols and setting state and having a
huge switch statement ... Well in such cases, I would rather suggest to
use a generator :-)

Regards
Felix

Am Dienstag, den 11.09.2007, 09:08 +0200 schrieb Thomas Mueller:
> > use javacc for SQL2 parsing
> 
> I would use a hand-written recursive descent parser. I know I'm
> probably the only one suggesting this...
> 
> Thomas

Re: master plan for jsr 283 query implementation

Posted by Marcel Reutegger <ma...@gmx.net>.

I agree, this is a good point, though you can generate a parser with JavaCC, 
which emits very detailed debug messages about the parse progress. this has been 
  very helpful to me debugging a parser. But again, I agree being able to debug 
the actual parser class is more convenient.

regards
  marcel

Christoph Kiehl wrote:
> Thomas Mueller wrote:
> 
>> Two more advantages of a hand-written parser:
>>
>> - You can actually debug the parser. No chance with JavaCC or ANTLR
> 
> A very important point in my opinion! I learned a lot of Jackrabbits 
> internals by debugging.
> 
> Cheers,
> Christoph
> 
> 
>

Re: master plan for jsr 283 query implementation

Posted by Christoph Kiehl <ch...@sulu3000.de>.

Thomas Mueller wrote:

> Two more advantages of a hand-written parser:
> 
> - You can actually debug the parser. No chance with JavaCC or ANTLR

A very important point in my opinion! I learned a lot of Jackrabbits internals 
by debugging.

Cheers,
Christoph

Re: master plan for jsr 283 query implementation

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 9/12/07, Thomas Mueller <th...@gmail.com> wrote:

> >... context-sensitive tokenizing
>
> I'm not sure what you refer to. Keywords versus identifiers? Example
> token types are: 'integer value', 'decimal value', 'text value',
> 'operator', 'quoted identifier', 'name'. The keywords are well defined
> in Java, but for SQL, I wouldn't decide if it's a keyword or
> identifier while tokenizing....

That's what I meant, being too strict in tokenizing can make things
harder downstream. I agree about having a soft boundary between
tokenizing and the actual parsing.

-Bertrand

Re: master plan for jsr 283 query implementation

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

Two more advantages of a hand-written parser:

- You can actually debug the parser. No chance with JavaCC or ANTLR
- Better tools support (refactoring, autocomplete)

> sorry for my somewhat ironic statement about you being the only one
> wanting a hand-written parser,

To my surprise, it turns out I was wrong!

> Just curious, don't you use use a separate tokenizing step in your
> hand-written parsers (I'm asking because of the literal "AND" above)?

Lexing (tokenizing, scanning) is done in a lower level. Can be
hand-written, or using a tool (for example StringTokenizer, or JFlex).
The boundary between tokenizing, lexing and parsing is soft. In my
example tokenizing is done in 'read(): a token'.

> I usually prefer a separate tokenizing step, if only to make testing
> easier.

Sure! Not sure how to do that in JavaCC or ANTLR, but it is probably
possible as well.

> context-sensitive tokenizing

I'm not sure what you refer to. Keywords versus identifiers? Example
token types are: 'integer value', 'decimal value', 'text value',
'operator', 'quoted identifier', 'name'. The keywords are well defined
in Java, but for SQL, I wouldn't decide if it's a keyword or
identifier while tokenizing. Remarks are usually silently eaten by the
tokenizer (except for @deprecated in Javac).

> The final answer to this question is probably "whoever implements it
> gets to decide". For me, the easiest way to understand a parser would
> be the unit tests which demonstrate its functionality, anyway.

I fully agree.

Some example parser code:

Derby JavaCC source file (313 KB):
http://svn.apache.org/repos/asf/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj
(the generated .java files are 691 + 314 + 20 + 5 = 1030 KB)

H2 hand-written parser (161 KB):
http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/command/Parser.java

Thomas

Re: master plan for jsr 283 query implementation

Posted by Bertrand Delacretaz <bd...@apache.org>.

Hi Thomas,

Thanks for the clarifications (and sorry for my somewhat ironic
statement about you being the only one wanting a hand-written parser,
but you kind of called for that ;-)

On 9/11/07, Thomas Mueller <th...@gmail.com> wrote:
> ...I have used JavaCC, ANTLR, and made hand-written parsers. Hand-written
> parsers are more flexible:...

Agreed, if you write it yourself you can do whatever.

> ...Many people think that hand-written parsers are hard to read, I don't think so:
>
> ...Java:
> private Expression readAnd() throws ParseException {
>     Expression r = readUnary();
>     while (readIf("AND")) {
>         r = new AndExpression(r, readUnary());
>     }
>     return r;
> }

Cool - very readable indeed.

Just curious, don't you use use a separate tokenizing step in your
hand-written parsers (I'm asking because of the literal "AND" above)?

I usually prefer a separate tokenizing step, if only to make testing
easier. Is this due to the context-sensitive tokenizing that you
mention?

> > ...I think there is time better spent doing other things than writing parsers :-)
> I agree, if writing the parser yourself actually takes more time than
> using JavaCC. In my view, it doesn't, but this is just my opinion...

The final answer to this question is probably "whoever implements it
gets to decide". For me, the easiest way to understand a parser would
be the unit tests which demonstrate its functionality, anyway.

-Bertrand

Re: master plan for jsr 283 query implementation

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

I have used JavaCC, ANTLR, and made hand-written parsers. Hand-written
parsers are more flexible:

- Returning meaningful error messages is easy
- Tokens that are sometimes identifiers and sometimes keywords
  (many in SQL) are not problematic
- Strange grammar can be supported (SQL is strange, but not sure about JCR SQL)
- You can better optimize pure Java (probably irrelevant for Jackrabbit)
- You can support conditional grammar (irrelevant for Jackrabbit)

Flexibility is not always an advantage, specially if you develop a new
language: ambiguity in the grammar is easily found using a parser
generator. On the other hand, you could write the BNF and still use a
hand-written parser. I wrote a BNF parser / auto-complete tool, of
course hand-written ;-) Maybe this would be interesting for Jackrabbit
as well (a query tool with auto-complete).

Another advantage of a hand-written parser is that there is no new
language, just Java:

- Simplifies the build process a bit
- No need to learn JavaCC / ANTLR

Many people think that hand-written parsers are hard to read, I don't think so:

JavaCC:
void AndExpression() #void :
{}
{
  (
    UnaryExpression() (<AND> UnaryExpression())*
  ) #AndExpression(>1)
}

Java:
private Expression readAnd() throws ParseException {
    Expression r = readUnary();
    while (readIf("AND")) {
        r = new AndExpression(r, readUnary());
    }
    return r;
}

The main functions of a hand-written parser are usually:

- read(): read a token
- boolean readIf(String token): checks if the current token is
'token', and eat it if true.
- read(String expected): eat a required token or throw an exception.

But (obviously) you will add more convenience methods.

Of course, some thing are complicated in both JavaCC and in a
hand-written parsers (in Jackrabbit, I don't understand JCRSQL.jjt,
Predicate(), line 300 - 377).

In term of 'work required': In my view, a hand-written parser requires
about the same amount of work than a JavaCC / ANTLR one, and are
easier to understand for a developer / maintainer.

> - Don't reinvent the wheel :-)
"Re-invent the wheel" would be if you write JavaCC or ANTLR yourself,
I don't suggest to do that. It's more like "using a GUI builder"
versus "writing the GUI code yourself".

> - Extensibility and easier maintenance
Having done both, I don't agree.

>  You'll be faster, too, in terms of performance and implementation time.
Performance: a hand-written one can be better optimized (if you want
to). Implementation time: it depends on if you already know the tool
or have templates (for both approaches).

> - You need to define the grammar
I agree, needs to be done for the spec.

> JavaCC can generate Javadoc like grammar documentation
I don't think this generated documentation is 'suitable for human
consumption'. So far I found this:
http://www.w3.org/2002/11/xquery-xpath-applets/xpath-jjdoc.html and I
wouldn't want to learn the grammar from this file.

> I think there is time better spent doing other things than writing parsers :-)
I agree, if writing the parser yourself actually takes more time than
using JavaCC. In my view, it doesn't, but this is just my opinion.

Thomas

Re: master plan for jsr 283 query implementation

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 9/11/07, Christoph Kiehl <ch...@sulu3000.de> wrote:

> ...does anyone know of any easier to understand solutions than using
> javacc? Maybe it is just that complex. Is antlr a better choice?...

I've used both, and found them comparable in terms of complexity and power.

Once you understand the concepts (took me a while initially because I
was ignorant about parsing and compilation techniques), they're really
not hard to use, and IMHO learning at least one of these tools is time
very well invested.

-Bertrand

Re: master plan for jsr 283 query implementation

Posted by Christoph Kiehl <ch...@sulu3000.de>.

Bertrand Delacretaz wrote:
> On 9/11/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
> 
>> ...WDOT?...
> 
> I agree with Thomas that he'll probably be the only one to suggest a
> hand-written parser ;-)

Ok ;) So does anyone know of any easier to understand solutions than using 
javacc? Maybe it is just that complex. Is antlr a better choice? If there is no 
real other option than using javacc I wouldn't mind using javacc, but I just 
thought it might a good point in time to think about alternatives.

Cheers,
Christoph

Re: master plan for jsr 283 query implementation

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 9/11/07, Christoph Kiehl <ch...@sulu3000.de> wrote:

> ...WDOT?...

I agree with Thomas that he'll probably be the only one to suggest a
hand-written parser ;-)

-Bertrand

Re: master plan for jsr 283 query implementation

Posted by Christoph Kiehl <ch...@sulu3000.de>.

Thomas Mueller wrote:

>> use javacc for SQL2 parsing
> 
> I would use a hand-written recursive descent parser. I know I'm
> probably the only one suggesting this...

Well, not quite ;) I asked because currently you need to have knowledge about 
javacc to extend the parsers. I would like to make it easier to understand and 
extend the query parsing. But I'm not sure if it really helps if we use a 
hand-written parser. It might become quite complex as well.

WDOT?

Cheers,
Christoph

Re: master plan for jsr 283 query implementation

Posted by Thomas Mueller <th...@gmail.com>.

> use javacc for SQL2 parsing

I would use a hand-written recursive descent parser. I know I'm
probably the only one suggesting this...

Thomas

Re: master plan for jsr 283 query implementation

Posted by Christoph Kiehl <ch...@sulu3000.de>.

Marcel Reutegger wrote:

> well, those are actually just my thoughts how I think we should 
> implement the query enhancements specified in JSR 283.
> 
> there are basically three major blocks that we need to implement:
> 
> - JQOM, allows you to programmatically create a query
> - JCR-SQL2, the new SQL query syntax
> - additional query features (joins, etc.)
> 
> In a first step I already introduced temporary interfaces for the JQOM 
> and implementing classes.
> 
> I'd like to keep the current design of the query sub system for a while 
> until we are ready to switch to the new JQOM as the basis for syntax 
> independent query representation.
> 
> That is, in a first phase my suggestion is the following:
> 
> XPath---+
>         +--->AQT----+
> SQL-----+           +---->LuceneQuery
>                     |
> SQL2------->JQOM----+
> 
> 
> AQT: abstract query tree
> 
> And once the path SQL2->JQOM->LuceneQuery is stable:
> 
> XPath---+     AQT (deprecated)
>         |
> SQL-----+---->JQOM----->LuceneQuery
>         |
> SQL2----+
> 
> 
> Comments and suggestions are welcome.

+1. Sounds reasonable. Do you want to use javacc for SQL2 parsing?

Cheers,
Christoph