You are viewing a plain text version of this content. The canonical link for it is here.

Posted to doxia-dev@maven.apache.org by Vincent Massol <vi...@massol.net> on 2007/10/21 18:02:34 UTC

List parsing with Confluence parser wrong?

Hi,

The APT and Confluence parsers behave differently when parsing lists.

The APT parser generates paragraph()/paragraph_() events for each  
list item whereas the Confluence parser doesn't.

So my questions are:

1) Who's right? This is very important since a Sink will output  
different results if the parsers behave differently
2) How do we ensure parsers are correct in the events they send?

For 2), we should probably have an abstract test case similar to what  
is done in AbstractSinkTest for Sinks.

For 1) I've checked and it seems TWiki also doens't output paragraph 
() events for list items.

So is the AptParser wrong?

Thanks
-Vincent

Re: List parsing with Confluence parser wrong?

Posted by Dave Syer <da...@hotmail.com>.

I don't think you can extrapolate "correct" behaviour for Confluence and
Twiki modules from Apt in this way.  What is correct for Confluence (read
Textile) users?  Doxia should try to get the output format as close as
possible to what the users of these external markup languages expect.

E.g. I can't find a way in Confluence to get a paragraph to render inside an
item list.  You can get a line break (with new line or \\ escape).  That's
what Doxia should do.

Jason van Zyl-2 wrote:
> 
> 
>>
>> For 1) I've checked and it seems TWiki also doens't output paragraph 
>> () events for list items.
>>
>> So is the AptParser wrong?
>>
> 
> No, I don't think so. The confluence and twiki were done together,  
> but the APT parser came first and was originally written by the first  
> author of the framework. So I would say the APT parser is right.
> 

-- 
View this message in context: http://www.nabble.com/List-parsing-with-Confluence-parser-wrong--tf4666566.html#a13456008
Sent from the Doxia - dev mailing list archive at Nabble.com.

Re: List parsing with Confluence parser wrong?

Posted by Jason van Zyl <ja...@maven.org>.

On 21 Oct 07, at 9:02 AM 21 Oct 07, Vincent Massol wrote:

> Hi,
>
> The APT and Confluence parsers behave differently when parsing lists.
>
> The APT parser generates paragraph()/paragraph_() events for each  
> list item whereas the Confluence parser doesn't.
>
> So my questions are:
>
> 1) Who's right? This is very important since a Sink will output  
> different results if the parsers behave differently

The APT parser came first and is the canonical parser and it came  
from the Pixware folks. I would use it as your rule of thumb.

> 2) How do we ensure parsers are correct in the events they send?
>

I would say parsing a document to an in memory model of a Sink and  
then validate that.

> For 2), we should probably have an abstract test case similar to  
> what is done in AbstractSinkTest for Sinks.
>
> For 1) I've checked and it seems TWiki also doens't output paragraph 
> () events for list items.
>
> So is the AptParser wrong?
>

No, I don't think so. The confluence and twiki were done together,  
but the APT parser came first and was originally written by the first  
author of the framework. So I would say the APT parser is right.

> Thanks
> -Vincent
>

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

Re: List parsing with Confluence parser wrong?

Posted by Lukas Theussl <lt...@apache.org>.


Jason van Zyl wrote:
> 
> On 25 Oct 07, at 3:21 AM 25 Oct 07, Lukas Theussl wrote:
> 
>>
>>
>> Vincent Massol wrote:
>>
>>> Hi,
>>> The APT and Confluence parsers behave differently when parsing lists.
>>> The APT parser generates paragraph()/paragraph_() events for each   
>>> list item whereas the Confluence parser doesn't.
>>> So my questions are:
>>> 1) Who's right? This is very important since a Sink will output   
>>> different results if the parsers behave differently
>>
>>
>> I think in this case the AptParser is wrong. I have recently  modified 
>> the xhtml sink [1] which, before, didn't emit paragraphs  within list 
>> items. I don't see a reason for that since paragraphs  are legal and 
>> significant in list items (ie <li>item</li> is  different from 
>> <li><p>item</p></li> and both are legal and  meaningful). However, the 
>> AptParser behavior remains to be  corrected, it's one of the few 
>> reasons why the apt module currently  doesn't pass the identity test 
>> (see DOXIA-134).
>>
> 
> Unless someone changed the parser that is the canonical parser as it  
> landed here.

By 'landed here', you mean: as it came from aptconvert? If yes, then
nothing has changed yet, an apt list item

       * item

always emits a paragraph (in html: <li><p>item</p></li>). If this is the
desired behavior then fine, but it means that you cannot markup list
items without paragraphs in apt (ie you cannot produce <li>item</li>).

If you want to be able to markup <li>item</li> in apt, then the apt
parser behavior (and probably the apt format) has to be changed.
However, I don't know what would be the best way to distinguish the two
cases in apt.

> 
>>> 2) How do we ensure parsers are correct in the events they send?
>>
>>
>> See related DOXIA-132. We don't have a mechanism yet to test  parsing 
>> events and since doxia is only about events (no object  model), I 
>> don't quite see how this can be done in general. In  practice, I think 
>> the standard is set by the AptParser, and the  model emitted by the 
>> SinkTestDocument, all parsers should try to be  consistent with that.
>>
>>> For 2), we should probably have an abstract test case similar to  
>>> what  is done in AbstractSinkTest for Sinks.
>>
>>
>> There is already an AbstractParserTest, it currently only does a  
>> simple check with the WellformednessCheckingSink, but it should be  
>> extended.
>>
> 
> There is a structure sink no? You should able to use that to parse  into 
> a model and verify.

Don't know what you mean here, StructureSink.java is not really a sink, 
it only contains two static utility methods...

> 
>> HTH,
>> -Lukas
>>
>> [1] https://svn.apache.org/viewvc?view=rev&revision=583579
>>
>>> For 1) I've checked and it seems TWiki also doens't output  paragraph 
>>> () events for list items.
>>> So is the AptParser wrong?
>>> Thanks
>>> -Vincent
>>
>>
> 
> Thanks,
> 
> Jason
> 
> ----------------------------------------------------------
> Jason van Zyl
> Founder,  Apache Maven
> jason at sonatype dot com
> ----------------------------------------------------------
> 
> 
> 
>

Re: List parsing with Confluence parser wrong?

Posted by Jason van Zyl <ja...@maven.org>.

On 25 Oct 07, at 3:21 AM 25 Oct 07, Lukas Theussl wrote:

>
>
> Vincent Massol wrote:
>> Hi,
>> The APT and Confluence parsers behave differently when parsing lists.
>> The APT parser generates paragraph()/paragraph_() events for each   
>> list item whereas the Confluence parser doesn't.
>> So my questions are:
>> 1) Who's right? This is very important since a Sink will output   
>> different results if the parsers behave differently
>
> I think in this case the AptParser is wrong. I have recently  
> modified the xhtml sink [1] which, before, didn't emit paragraphs  
> within list items. I don't see a reason for that since paragraphs  
> are legal and significant in list items (ie <li>item</li> is  
> different from <li><p>item</p></li> and both are legal and  
> meaningful). However, the AptParser behavior remains to be  
> corrected, it's one of the few reasons why the apt module currently  
> doesn't pass the identity test (see DOXIA-134).
>

Unless someone changed the parser that is the canonical parser as it  
landed here.

>> 2) How do we ensure parsers are correct in the events they send?
>
> See related DOXIA-132. We don't have a mechanism yet to test  
> parsing events and since doxia is only about events (no object  
> model), I don't quite see how this can be done in general. In  
> practice, I think the standard is set by the AptParser, and the  
> model emitted by the SinkTestDocument, all parsers should try to be  
> consistent with that.
>
>> For 2), we should probably have an abstract test case similar to  
>> what  is done in AbstractSinkTest for Sinks.
>
> There is already an AbstractParserTest, it currently only does a  
> simple check with the WellformednessCheckingSink, but it should be  
> extended.
>

There is a structure sink no? You should able to use that to parse  
into a model and verify.

> HTH,
> -Lukas
>
> [1] https://svn.apache.org/viewvc?view=rev&revision=583579
>
>> For 1) I've checked and it seems TWiki also doens't output  
>> paragraph () events for list items.
>> So is the AptParser wrong?
>> Thanks
>> -Vincent
>

Thanks,

Jason

----------------------------------------------------------
Jason van Zyl
Founder,  Apache Maven
jason at sonatype dot com
----------------------------------------------------------

Re: List parsing with Confluence parser wrong?

Posted by Lukas Theussl <lt...@apache.org>.

Vincent Massol wrote:
> Hi,
> 
> The APT and Confluence parsers behave differently when parsing lists.
> 
> The APT parser generates paragraph()/paragraph_() events for each  list 
> item whereas the Confluence parser doesn't.
> 
> So my questions are:
> 
> 1) Who's right? This is very important since a Sink will output  
> different results if the parsers behave differently

I think in this case the AptParser is wrong. I have recently modified 
the xhtml sink [1] which, before, didn't emit paragraphs within list 
items. I don't see a reason for that since paragraphs are legal and 
significant in list items (ie <li>item</li> is different from 
<li><p>item</p></li> and both are legal and meaningful). However, the 
AptParser behavior remains to be corrected, it's one of the few reasons 
why the apt module currently doesn't pass the identity test (see DOXIA-134).

> 2) How do we ensure parsers are correct in the events they send?

See related DOXIA-132. We don't have a mechanism yet to test parsing 
events and since doxia is only about events (no object model), I don't 
quite see how this can be done in general. In practice, I think the 
standard is set by the AptParser, and the model emitted by the 
SinkTestDocument, all parsers should try to be consistent with that.

> 
> For 2), we should probably have an abstract test case similar to what  
> is done in AbstractSinkTest for Sinks.

There is already an AbstractParserTest, it currently only does a simple 
check with the WellformednessCheckingSink, but it should be extended.

HTH,
-Lukas

[1] https://svn.apache.org/viewvc?view=rev&revision=583579

> 
> For 1) I've checked and it seems TWiki also doens't output paragraph () 
> events for list items.
> 
> So is the AptParser wrong?
> 
> Thanks
> -Vincent
> 
>