You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Marcel Reutegger <ma...@gmx.net> on 2007/09/06 10:19:16 UTC
master plan for jsr 283 query implementation
well, those are actually just my thoughts how I think we should implement the
query enhancements specified in JSR 283.
there are basically three major blocks that we need to implement:
- JQOM, allows you to programmatically create a query
- JCR-SQL2, the new SQL query syntax
- additional query features (joins, etc.)
In a first step I already introduced temporary interfaces for the JQOM and
implementing classes.
I'd like to keep the current design of the query sub system for a while until we
are ready to switch to the new JQOM as the basis for syntax independent query
representation.
That is, in a first phase my suggestion is the following:
XPath---+
+--->AQT----+
SQL-----+ +---->LuceneQuery
|
SQL2------->JQOM----+
AQT: abstract query tree
And once the path SQL2->JQOM->LuceneQuery is stable:
XPath---+ AQT (deprecated)
|
SQL-----+---->JQOM----->LuceneQuery
|
SQL2----+
Comments and suggestions are welcome.
regards
marcel
Re: jsr 283 query implementation
Posted by Marcel Reutegger <ma...@gmx.net>.
Hi Ramesh,
you can find the syntax details in the public review version of the JSR 283
specification:
http://jcp.org/aboutJava/communityprocess/pr/jsr283/
regards
marcel
Ramesh wrote:
> Hi Marcel
>
> Can you give me an idea who the JCR-SQL2 query syntax like. if possible some
> examples.
>
> how to convert JCR-SQL2 to JQOM.
>
> Kind Regards
> Ramesh
>
>
>
jsr 283 query implementation
Posted by Ramesh <dr...@yahoo.com>.
Hi Marcel
Can you give me an idea who the JCR-SQL2 query syntax like. if possible some
examples.
how to convert JCR-SQL2 to JQOM.
Kind Regards
Ramesh
Re: master plan for jsr 283 query implementation
Posted by Julian Reschke <ju...@gmx.de>.
Christoph Kiehl wrote:
> Marcel Reutegger wrote:
>
>> well, those are actually just my thoughts how I think we should
>> implement the query enhancements specified in JSR 283.
>>
>> there are basically three major blocks that we need to implement:
>>
>> - JQOM, allows you to programmatically create a query
>> - JCR-SQL2, the new SQL query syntax
>> - additional query features (joins, etc.)
>>
>> In a first step I already introduced temporary interfaces for the JQOM
>> and implementing classes.
>>
>> I'd like to keep the current design of the query sub system for a
>> while until we are ready to switch to the new JQOM as the basis for
>> syntax independent query representation.
>>
>> That is, in a first phase my suggestion is the following:
>>
>> XPath---+
>> +--->AQT----+
>> SQL-----+ +---->LuceneQuery
>> |
>> SQL2------->JQOM----+
>>
>>
>> AQT: abstract query tree
>>
>> And once the path SQL2->JQOM->LuceneQuery is stable:
>>
>> XPath---+ AQT (deprecated)
>> |
>> SQL-----+---->JQOM----->LuceneQuery
>> |
>> SQL2----+
>>
>>
>> Comments and suggestions are welcome.
>
> +1. Sounds reasonable. Do you want to use javacc for SQL2 parsing?
Looks good to me as well (sorry, somehow missed the original post).
Best regards, Julian
Re: master plan for jsr 283 query implementation
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Marcel Reutegger wrote:
> Thomas Mueller wrote:
>>> use javacc for SQL2 parsing
>>
>> I would use a hand-written recursive descent parser. I know I'm
>> probably the only one suggesting this...
>
> what are the advantages of a hand-written parser over a generated one?
>
> probably performance, but are there other?
I my view the goal should be to make it as easy as possible for people to
understand the jackrabbit code and contribute to it, without significantly
sacrifying performance or reinventing the whole wheel. It's maybe easy to
understand the javacc grammar once you know javacc but it still is a hurdle to
jump the first time.
I'm still unable to contribute to the parser because I had no time yet to
thoroughly learn javacc.
We just went back from antlr to hand-written java code for a very simple query
language we use. The resulting code was much simpler (less code), better to
understand and was more flexible in regards of how tokens are handled in
different contexts. All this need not apply to JCR-SQL2. So I can't judge what
is the best way to implement the parser.
Cheers,
Christoph
Re: master plan for jsr 283 query implementation
Posted by "Padraic I. Hannon" <pi...@wasabicowboy.com>.
I concur, javacc while not something a lot of people use day to day has
been around a long while now and is pretty standard. It would be best to
leverage something that others know/can pick up than write something
from scratch. I think there is time better spent doing other things than
writing parsers :-)
-paddy
Re: master plan for jsr 283 query implementation
Posted by Marcel May <ma...@consol.de>.
Marcel Reutegger wrote:
> Thomas Mueller wrote:
>>> use javacc for SQL2 parsing
>>
>> I would use a hand-written recursive descent parser. I know I'm
>> probably the only one suggesting this...
>
> what are the advantages of a hand-written parser over a generated one?
>
> probably performance, but are there other?
>
- Extensibility and easier maintenance
- ANTLR/JavaCC are more or less 'standards'
- Don't reinvent the wheel :-) It's nice to write your own parser and
lexer, but why do that if
you get a perfectly generated one?
You'll be faster, too, in terms of performance and implementation time.
The generated lexer/parser are proven to work in many other projects.
- You need to define the grammar anyway, and eg JavaCC can generate
Javadoc like grammar documentation using JJDoc
> regards
> marcel
Cheers,
Marcel
Re: master plan for jsr 283 query implementation
Posted by Marcel Reutegger <ma...@gmx.net>.
Thomas Mueller wrote:
>> use javacc for SQL2 parsing
>
> I would use a hand-written recursive descent parser. I know I'm
> probably the only one suggesting this...
what are the advantages of a hand-written parser over a generated one?
probably performance, but are there other?
regards
marcel
Re: master plan for jsr 283 query implementation
Posted by Felix Meschberger <fm...@gmail.com>.
Hi,
After having read all the messages in this thread starting with this
note, I just want to throw my $.02 into the pot:
I admit feeling more comfortable with good hand-written parsers, too. It
is easy after all: You just convert each BNF statement into a method and
you are almost done. Tokenizing is definitely a separate step - yet
depending on the set of terminal symbols not that complicated, either.
Reading and understanding a clean hand-written parser containing the
actual BNF as comments is much easier than understand JavaCC code (don't
know about ANTLR, and yacc is similar :-) ). And this has a direct
influence on the quality of the code :-)
BTW: I also have encountered parsers implemented as a State Machine with
one huge method looping over the symbols and setting state and having a
huge switch statement ... Well in such cases, I would rather suggest to
use a generator :-)
Regards
Felix
Am Dienstag, den 11.09.2007, 09:08 +0200 schrieb Thomas Mueller:
> > use javacc for SQL2 parsing
>
> I would use a hand-written recursive descent parser. I know I'm
> probably the only one suggesting this...
>
> Thomas
Re: master plan for jsr 283 query implementation
Posted by Marcel Reutegger <ma...@gmx.net>.
I agree, this is a good point, though you can generate a parser with JavaCC,
which emits very detailed debug messages about the parse progress. this has been
very helpful to me debugging a parser. But again, I agree being able to debug
the actual parser class is more convenient.
regards
marcel
Christoph Kiehl wrote:
> Thomas Mueller wrote:
>
>> Two more advantages of a hand-written parser:
>>
>> - You can actually debug the parser. No chance with JavaCC or ANTLR
>
> A very important point in my opinion! I learned a lot of Jackrabbits
> internals by debugging.
>
> Cheers,
> Christoph
>
>
>
Re: master plan for jsr 283 query implementation
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Thomas Mueller wrote:
> Two more advantages of a hand-written parser:
>
> - You can actually debug the parser. No chance with JavaCC or ANTLR
A very important point in my opinion! I learned a lot of Jackrabbits internals
by debugging.
Cheers,
Christoph
Re: master plan for jsr 283 query implementation
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 9/12/07, Thomas Mueller <th...@gmail.com> wrote:
> >... context-sensitive tokenizing
>
> I'm not sure what you refer to. Keywords versus identifiers? Example
> token types are: 'integer value', 'decimal value', 'text value',
> 'operator', 'quoted identifier', 'name'. The keywords are well defined
> in Java, but for SQL, I wouldn't decide if it's a keyword or
> identifier while tokenizing....
That's what I meant, being too strict in tokenizing can make things
harder downstream. I agree about having a soft boundary between
tokenizing and the actual parsing.
-Bertrand
Re: master plan for jsr 283 query implementation
Posted by Thomas Mueller <th...@gmail.com>.
Hi,
Two more advantages of a hand-written parser:
- You can actually debug the parser. No chance with JavaCC or ANTLR
- Better tools support (refactoring, autocomplete)
> sorry for my somewhat ironic statement about you being the only one
> wanting a hand-written parser,
To my surprise, it turns out I was wrong!
> Just curious, don't you use use a separate tokenizing step in your
> hand-written parsers (I'm asking because of the literal "AND" above)?
Lexing (tokenizing, scanning) is done in a lower level. Can be
hand-written, or using a tool (for example StringTokenizer, or JFlex).
The boundary between tokenizing, lexing and parsing is soft. In my
example tokenizing is done in 'read(): a token'.
> I usually prefer a separate tokenizing step, if only to make testing
> easier.
Sure! Not sure how to do that in JavaCC or ANTLR, but it is probably
possible as well.
> context-sensitive tokenizing
I'm not sure what you refer to. Keywords versus identifiers? Example
token types are: 'integer value', 'decimal value', 'text value',
'operator', 'quoted identifier', 'name'. The keywords are well defined
in Java, but for SQL, I wouldn't decide if it's a keyword or
identifier while tokenizing. Remarks are usually silently eaten by the
tokenizer (except for @deprecated in Javac).
> The final answer to this question is probably "whoever implements it
> gets to decide". For me, the easiest way to understand a parser would
> be the unit tests which demonstrate its functionality, anyway.
I fully agree.
Some example parser code:
Derby JavaCC source file (313 KB):
http://svn.apache.org/repos/asf/db/derby/code/trunk/java/engine/org/apache/derby/impl/sql/compile/sqlgrammar.jj
(the generated .java files are 691 + 314 + 20 + 5 = 1030 KB)
H2 hand-written parser (161 KB):
http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/command/Parser.java
Thomas
Re: master plan for jsr 283 query implementation
Posted by Bertrand Delacretaz <bd...@apache.org>.
Hi Thomas,
Thanks for the clarifications (and sorry for my somewhat ironic
statement about you being the only one wanting a hand-written parser,
but you kind of called for that ;-)
On 9/11/07, Thomas Mueller <th...@gmail.com> wrote:
> ...I have used JavaCC, ANTLR, and made hand-written parsers. Hand-written
> parsers are more flexible:...
Agreed, if you write it yourself you can do whatever.
> ...Many people think that hand-written parsers are hard to read, I don't think so:
>
> ...Java:
> private Expression readAnd() throws ParseException {
> Expression r = readUnary();
> while (readIf("AND")) {
> r = new AndExpression(r, readUnary());
> }
> return r;
> }
Cool - very readable indeed.
Just curious, don't you use use a separate tokenizing step in your
hand-written parsers (I'm asking because of the literal "AND" above)?
I usually prefer a separate tokenizing step, if only to make testing
easier. Is this due to the context-sensitive tokenizing that you
mention?
> > ...I think there is time better spent doing other things than writing parsers :-)
> I agree, if writing the parser yourself actually takes more time than
> using JavaCC. In my view, it doesn't, but this is just my opinion...
The final answer to this question is probably "whoever implements it
gets to decide". For me, the easiest way to understand a parser would
be the unit tests which demonstrate its functionality, anyway.
-Bertrand
Re: master plan for jsr 283 query implementation
Posted by Thomas Mueller <th...@gmail.com>.
Hi,
I have used JavaCC, ANTLR, and made hand-written parsers. Hand-written
parsers are more flexible:
- Returning meaningful error messages is easy
- Tokens that are sometimes identifiers and sometimes keywords
(many in SQL) are not problematic
- Strange grammar can be supported (SQL is strange, but not sure about JCR SQL)
- You can better optimize pure Java (probably irrelevant for Jackrabbit)
- You can support conditional grammar (irrelevant for Jackrabbit)
Flexibility is not always an advantage, specially if you develop a new
language: ambiguity in the grammar is easily found using a parser
generator. On the other hand, you could write the BNF and still use a
hand-written parser. I wrote a BNF parser / auto-complete tool, of
course hand-written ;-) Maybe this would be interesting for Jackrabbit
as well (a query tool with auto-complete).
Another advantage of a hand-written parser is that there is no new
language, just Java:
- Simplifies the build process a bit
- No need to learn JavaCC / ANTLR
Many people think that hand-written parsers are hard to read, I don't think so:
JavaCC:
void AndExpression() #void :
{}
{
(
UnaryExpression() (<AND> UnaryExpression())*
) #AndExpression(>1)
}
Java:
private Expression readAnd() throws ParseException {
Expression r = readUnary();
while (readIf("AND")) {
r = new AndExpression(r, readUnary());
}
return r;
}
The main functions of a hand-written parser are usually:
- read(): read a token
- boolean readIf(String token): checks if the current token is
'token', and eat it if true.
- read(String expected): eat a required token or throw an exception.
But (obviously) you will add more convenience methods.
Of course, some thing are complicated in both JavaCC and in a
hand-written parsers (in Jackrabbit, I don't understand JCRSQL.jjt,
Predicate(), line 300 - 377).
In term of 'work required': In my view, a hand-written parser requires
about the same amount of work than a JavaCC / ANTLR one, and are
easier to understand for a developer / maintainer.
> - Don't reinvent the wheel :-)
"Re-invent the wheel" would be if you write JavaCC or ANTLR yourself,
I don't suggest to do that. It's more like "using a GUI builder"
versus "writing the GUI code yourself".
> - Extensibility and easier maintenance
Having done both, I don't agree.
> You'll be faster, too, in terms of performance and implementation time.
Performance: a hand-written one can be better optimized (if you want
to). Implementation time: it depends on if you already know the tool
or have templates (for both approaches).
> - You need to define the grammar
I agree, needs to be done for the spec.
> JavaCC can generate Javadoc like grammar documentation
I don't think this generated documentation is 'suitable for human
consumption'. So far I found this:
http://www.w3.org/2002/11/xquery-xpath-applets/xpath-jjdoc.html and I
wouldn't want to learn the grammar from this file.
> I think there is time better spent doing other things than writing parsers :-)
I agree, if writing the parser yourself actually takes more time than
using JavaCC. In my view, it doesn't, but this is just my opinion.
Thomas
Re: master plan for jsr 283 query implementation
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 9/11/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
> ...does anyone know of any easier to understand solutions than using
> javacc? Maybe it is just that complex. Is antlr a better choice?...
I've used both, and found them comparable in terms of complexity and power.
Once you understand the concepts (took me a while initially because I
was ignorant about parsing and compilation techniques), they're really
not hard to use, and IMHO learning at least one of these tools is time
very well invested.
-Bertrand
Re: master plan for jsr 283 query implementation
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Bertrand Delacretaz wrote:
> On 9/11/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
>
>> ...WDOT?...
>
> I agree with Thomas that he'll probably be the only one to suggest a
> hand-written parser ;-)
Ok ;) So does anyone know of any easier to understand solutions than using
javacc? Maybe it is just that complex. Is antlr a better choice? If there is no
real other option than using javacc I wouldn't mind using javacc, but I just
thought it might a good point in time to think about alternatives.
Cheers,
Christoph
Re: master plan for jsr 283 query implementation
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 9/11/07, Christoph Kiehl <ch...@sulu3000.de> wrote:
> ...WDOT?...
I agree with Thomas that he'll probably be the only one to suggest a
hand-written parser ;-)
-Bertrand
Re: master plan for jsr 283 query implementation
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Thomas Mueller wrote:
>> use javacc for SQL2 parsing
>
> I would use a hand-written recursive descent parser. I know I'm
> probably the only one suggesting this...
Well, not quite ;) I asked because currently you need to have knowledge about
javacc to extend the parsers. I would like to make it easier to understand and
extend the query parsing. But I'm not sure if it really helps if we use a
hand-written parser. It might become quite complex as well.
WDOT?
Cheers,
Christoph
Re: master plan for jsr 283 query implementation
Posted by Thomas Mueller <th...@gmail.com>.
> use javacc for SQL2 parsing
I would use a hand-written recursive descent parser. I know I'm
probably the only one suggesting this...
Thomas
Re: master plan for jsr 283 query implementation
Posted by Christoph Kiehl <ch...@sulu3000.de>.
Marcel Reutegger wrote:
> well, those are actually just my thoughts how I think we should
> implement the query enhancements specified in JSR 283.
>
> there are basically three major blocks that we need to implement:
>
> - JQOM, allows you to programmatically create a query
> - JCR-SQL2, the new SQL query syntax
> - additional query features (joins, etc.)
>
> In a first step I already introduced temporary interfaces for the JQOM
> and implementing classes.
>
> I'd like to keep the current design of the query sub system for a while
> until we are ready to switch to the new JQOM as the basis for syntax
> independent query representation.
>
> That is, in a first phase my suggestion is the following:
>
> XPath---+
> +--->AQT----+
> SQL-----+ +---->LuceneQuery
> |
> SQL2------->JQOM----+
>
>
> AQT: abstract query tree
>
> And once the path SQL2->JQOM->LuceneQuery is stable:
>
> XPath---+ AQT (deprecated)
> |
> SQL-----+---->JQOM----->LuceneQuery
> |
> SQL2----+
>
>
> Comments and suggestions are welcome.
+1. Sounds reasonable. Do you want to use javacc for SQL2 parsing?
Cheers,
Christoph