You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Bruno Dumon <br...@outerthought.org> on 2003/10/29 11:40:09 UTC

HTML editor widget (was Re: [proposal] Doco)

On Tue, 2003-10-28 at 19:20, Stefano Mazzocchi wrote:
> 
> > [1] Spoiling Bruno's "lonesome hacking cowboy" thought train, I just 
> > want to confirm that he actually started working on this.
> 
> Yey!!!
> 
> > He's still in a grumpy "friggin' stupid and unstable web browsers and 
> > Javascript as a development hosting environment"

for the record: I've never said any of that.

>  mood, though, so 
> > please light a candle for him. ;-)
> 
> I can do more: I'm willing to help!! Bruno, ask me if you need anything 
> (even privately, if you think it's better)

I've only just started with some little javascript experiments, so it's
not like any code has been written yet.

Here are some first random thoughts:

* different users of the widget (like the doco project vs the project
where we need it) will likely require different subsets of HTML to be
used.

* support for both Mozilla and IE is important. Other browsers should
fall back to a textarea with raw HTML in it.

* the HTML produced by the editor should be cleaned (i.e. not supported
tags & attributes removed) and normalized (formatted). The goal of this
is to deliver a nice XHTML-subset-doc for storage, and to show nice HTML
to people editing it manually. Hopefully this will also make it possible
to do meaningful text-based diffs.

My first thought was to do this cleanup stuff serverside (could be as
simple as an XSL, which would make it easily customisable too). However
it seems like you want to do all that on the client side?

* Currently in e.g. Linotype the source for the editor (thus of the
iframe) is fetched separately from the main page. This is harder to do
with cforms since then the pipeline from which the content is fetched
should also have access to the cforms Form which is stored somewhere in
a variable in a flowscript. For the cforms widget it would be easier I
think to embed the HTML directly in the page (e.g. as a Javascript
variable). This also makes it possible to assign the content either to
the html editor or the textarea depending on what the client supports.

* Automatic image upload: still need to think more about this. After
pressing the submit button (and afterwards possibly showing the form
again), the images will need to become available in the URL space. How
that's done will probably differ from application to application so we
could put that behaviour behind an interface.

* wiki syntax support: we have no need for this, so don't expect any
effort from me on that.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Bruno Dumon <br...@outerthought.org>.
On Mon, 2003-11-03 at 14:59, Marc van Kempen wrote:
> Bruno Dumon wrote:
> > On Sat, 2003-11-01 at 16:20, Marc van Kempen wrote:
> > 
> >>Bruno Dumon wrote:
> >>
> >>>On Fri, 2003-10-31 at 22:54, Marc van Kempen wrote:
> >>>
> >>>
> >>>>Bruno Dumon wrote:
> >>>
> >>><snip/>
> >>>
> >>>>>My first thought was to do this cleanup stuff serverside (could be as
> >>>>>simple as an XSL, which would make it easily customisable too). However
> >>>>>it seems like you want to do all that on the client side?
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>This won't work, you need valid xml to use xsl,
> >>>
> >>>
> >>>Ever heard of JTidy?
> >>>
> >>
> >>Yep, I'm not 100% sure if I tried JTidy (I'm pretty sure though), but I 
> >>did try the commandline program html tidy, and this crashed on some of 
> >>the html garbage that Word spits out. The last release of the program 
> >>was in 2000, so I didn't place a lot of trust in it.
> >>
> >>
> >>>>and the IE html in 
> >>>>particular can be very troublesome to fix.
> >>>
> >>><snip/>
> >>>
> >>>Thanks for sharing your experiences.
> >>>
> >>
> >>No problem, are you interested in my offer, or shouldn't I bother 
> >>discussing it (releasing my editor as open source)?
> > 
> > 
> > Ah sorry, didn't pay attention to that. Of course I'm interested, but on
> > the other hand I can't promise that we'll make use of it without ever
> > having seen the code.
> > 
> > What would be included? Only client side code or also the server-side
> > cleanup code?
> > 
> 
> Both if there is interest. It consist of the following components:
> 
> - editor.js + images
>    The implementation of the editor object and supporting images.
> - js-gui.js
>    Javascript gui components.
> 
> This will give you the editor,
> (I haven't worked out the image upload problem yet, for now you can 
> write a user-defined function and configure the editor object to use 
> that function. A default function is provided that will take a list of 
> images and let the user choose from it in a popup form.)
> 
> - HTMLFix.java
>    html clean up component, accepts a text stream, spits out a text
>    stream.
> 
> This component fixes the html, I also wrote a component that will remove 
> unwanted tags and attributes (actually it will only leave those tags and 
> attributes that you want, and remove all the rest). I'd have to cut that 
> out of an existing component.
> 
> Actually I just discussed it, and we can release the components as open 
> source

cool

>  (I believe the apache license is similar to the BSD license?)

yep

> 
> How shall I distribute it?

can you put in on a FTP server somewhere?

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Marc van Kempen <ma...@bowtie.nl>.
Bruno Dumon wrote:
> On Sat, 2003-11-01 at 16:20, Marc van Kempen wrote:
> 
>>Bruno Dumon wrote:
>>
>>>On Fri, 2003-10-31 at 22:54, Marc van Kempen wrote:
>>>
>>>
>>>>Bruno Dumon wrote:
>>>
>>><snip/>
>>>
>>>>>My first thought was to do this cleanup stuff serverside (could be as
>>>>>simple as an XSL, which would make it easily customisable too). However
>>>>>it seems like you want to do all that on the client side?
>>>>>
>>>>>
>>>>>
>>>>
>>>>This won't work, you need valid xml to use xsl,
>>>
>>>
>>>Ever heard of JTidy?
>>>
>>
>>Yep, I'm not 100% sure if I tried JTidy (I'm pretty sure though), but I 
>>did try the commandline program html tidy, and this crashed on some of 
>>the html garbage that Word spits out. The last release of the program 
>>was in 2000, so I didn't place a lot of trust in it.
>>
>>
>>>>and the IE html in 
>>>>particular can be very troublesome to fix.
>>>
>>><snip/>
>>>
>>>Thanks for sharing your experiences.
>>>
>>
>>No problem, are you interested in my offer, or shouldn't I bother 
>>discussing it (releasing my editor as open source)?
> 
> 
> Ah sorry, didn't pay attention to that. Of course I'm interested, but on
> the other hand I can't promise that we'll make use of it without ever
> having seen the code.
> 
> What would be included? Only client side code or also the server-side
> cleanup code?
> 

Both if there is interest. It consist of the following components:

- editor.js + images
   The implementation of the editor object and supporting images.
- js-gui.js
   Javascript gui components.

This will give you the editor,
(I haven't worked out the image upload problem yet, for now you can 
write a user-defined function and configure the editor object to use 
that function. A default function is provided that will take a list of 
images and let the user choose from it in a popup form.)

- HTMLFix.java
   html clean up component, accepts a text stream, spits out a text
   stream.

This component fixes the html, I also wrote a component that will remove 
unwanted tags and attributes (actually it will only leave those tags and 
attributes that you want, and remove all the rest). I'd have to cut that 
out of an existing component.

Actually I just discussed it, and we can release the components as open 
source (I believe the apache license is similar to the BSD license?)

How shall I distribute it?

Regards,
Marc.


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Bruno Dumon <br...@outerthought.org>.
On Sat, 2003-11-01 at 16:20, Marc van Kempen wrote:
> Bruno Dumon wrote:
> > On Fri, 2003-10-31 at 22:54, Marc van Kempen wrote:
> > 
> >>Bruno Dumon wrote:
> > 
> > <snip/>
> > 
> >>>My first thought was to do this cleanup stuff serverside (could be as
> >>>simple as an XSL, which would make it easily customisable too). However
> >>>it seems like you want to do all that on the client side?
> >>>
> >>> 
> >>>
> >>
> >>This won't work, you need valid xml to use xsl,
> > 
> > 
> > Ever heard of JTidy?
> > 
> 
> Yep, I'm not 100% sure if I tried JTidy (I'm pretty sure though), but I 
> did try the commandline program html tidy, and this crashed on some of 
> the html garbage that Word spits out. The last release of the program 
> was in 2000, so I didn't place a lot of trust in it.
> 
> > 
> >> and the IE html in 
> >>particular can be very troublesome to fix.
> > 
> > <snip/>
> > 
> > Thanks for sharing your experiences.
> > 
> 
> No problem, are you interested in my offer, or shouldn't I bother 
> discussing it (releasing my editor as open source)?

Ah sorry, didn't pay attention to that. Of course I'm interested, but on
the other hand I can't promise that we'll make use of it without ever
having seen the code.

What would be included? Only client side code or also the server-side
cleanup code?

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Marc van Kempen <ma...@bowtie.nl>.
Bruno Dumon wrote:
> On Fri, 2003-10-31 at 22:54, Marc van Kempen wrote:
> 
>>Bruno Dumon wrote:
> 
> <snip/>
> 
>>>My first thought was to do this cleanup stuff serverside (could be as
>>>simple as an XSL, which would make it easily customisable too). However
>>>it seems like you want to do all that on the client side?
>>>
>>> 
>>>
>>
>>This won't work, you need valid xml to use xsl,
> 
> 
> Ever heard of JTidy?
> 

Yep, I'm not 100% sure if I tried JTidy (I'm pretty sure though), but I 
did try the commandline program html tidy, and this crashed on some of 
the html garbage that Word spits out. The last release of the program 
was in 2000, so I didn't place a lot of trust in it.

> 
>> and the IE html in 
>>particular can be very troublesome to fix.
> 
> <snip/>
> 
> Thanks for sharing your experiences.
> 

No problem, are you interested in my offer, or shouldn't I bother 
discussing it (releasing my editor as open source)?

Regards,
Marc.


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Bruno Dumon <br...@outerthought.org>.
On Fri, 2003-10-31 at 22:54, Marc van Kempen wrote:
> Bruno Dumon wrote:
<snip/>
> >My first thought was to do this cleanup stuff serverside (could be as
> >simple as an XSL, which would make it easily customisable too). However
> >it seems like you want to do all that on the client side?
> >
> >  
> >
> This won't work, you need valid xml to use xsl,

Ever heard of JTidy?

>  and the IE html in 
> particular can be very troublesome to fix.
<snip/>

Thanks for sharing your experiences.

-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Marc van Kempen <ma...@bowtie.nl>.
Bruno Dumon wrote:
...

>* different users of the widget (like the doco project vs the project
>where we need it) will likely require different subsets of HTML to be
>used.
>
>* support for both Mozilla and IE is important. Other browsers should
>fall back to a textarea with raw HTML in it.
>
>* the HTML produced by the editor should be cleaned (i.e. not supported
>tags & attributes removed) and normalized (formatted). The goal of this
>is to deliver a nice XHTML-subset-doc for storage, and to show nice HTML
>to people editing it manually. Hopefully this will also make it possible
>to do meaningful text-based diffs.
>
>  
>
I have done some work on this. I have first written a js html editor for 
IE (>5.5) to be used in an XML content management system. For this we 
needed to clean the html and convert it to xhtml in order to be able to 
process it with xslt upon displaying pages.

One approach that I've tried is to generate the xhtml from the browser 
dom page with javascript, i.e. walk the tree and recursively generate 
<TAG> ... </TAG> entries, while surrounding all attributes with quotes. 
This could then be postprocessed on the server by parsing it with an XML 
parser and manipulating the DOM tree. This however proved to be a slight 
nightmare due to js/dom bugs in IE 5.5, if you'd be willing to drop 5.5 
support it would be easier, but it might also be possible to do this 
using more specific IE js constructions with which I'm not particular 
familiar.

Eventually we ended up doing this completely server side, I wrote one 
component to fix the html to be xhtml and after that I use an XML parser 
to remove all unwanted attributes and tags.

The biggest problem while handling the html is that you also have to 
parse Word html that is pasted into the editor, and the html that Word 
produces is truly gruesome!

While the server side solution works well for all html garbage that I 
have encountered until now, it is not completely satisfactory because 
when you paste the html into the editor you're looking at the 
unprocessed html, when it has been processed by the server a lot will 
have been removed and it can look rather different. One could try to 
explain this to the user, but it's better to filter the html directly 
after pasting it, so the user will not get confused.

I'm now in the process of writing an editor component that can handle IE 
and Mozilla. It is in a working state, but the code needs to cleaned and 
some stuff needs to be written (a table editor, a url editor, etc.), it 
is however for a closed source system. I could discuss it to see if we 
would be willing to release it as open source.

>My first thought was to do this cleanup stuff serverside (could be as
>simple as an XSL, which would make it easily customisable too). However
>it seems like you want to do all that on the client side?
>
>  
>
This won't work, you need valid xml to use xsl, and the IE html in 
particular can be very troublesome to fix.

>* Currently in e.g. Linotype the source for the editor (thus of the
>iframe) is fetched separately from the main page. This is harder to do
>with cforms since then the pipeline from which the content is fetched
>should also have access to the cforms Form which is stored somewhere in
>a variable in a flowscript. For the cforms widget it would be easier I
>think to embed the HTML directly in the page (e.g. as a Javascript
>variable). This also makes it possible to assign the content either to
>the html editor or the textarea depending on what the client supports.
>
>* Automatic image upload: still need to think more about this. After
>pressing the submit button (and afterwards possibly showing the form
>again), the images will need to become available in the URL space. How
>that's done will probably differ from application to application so we
>could put that behaviour behind an interface.
>
>  
>

This is an interesting problem, Stefano talked about embedding it into 
the document, how would you want to do this? That would be the best 
solution for an embeddable component!

>* wiki syntax support: we have no need for this, so don't expect any
>effort from me on that.
>
>  
>

Regards,
Marc.


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Bruno Dumon wrote:
> I've only just started with some little javascript experiments, so it's
> not like any code has been written yet.

You can look at
  http://bitfluxeditor.org/
for a start.

> * support for both Mozilla and IE is important.
That's going to give one or two headaches more :-)

J.Pietschmann


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Bruno Dumon <br...@outerthought.org>.
On Thu, 2003-10-30 at 10:34, Stefano Mazzocchi wrote: 
> On Wednesday, Oct 29, 2003, at 11:40 Europe/Rome, Bruno Dumon wrote:
> 
> > I've only just started with some little javascript experiments, so it's
> > not like any code has been written yet.
> 
> ok, but it's great to see you doing this
> 
> > Here are some first random thoughts:
> >
> > * different users of the widget (like the doco project vs the project
> > where we need it) will likely require different subsets of HTML to be
> > used.
> 
> True, even if, for XHTML, you can support different modules. For 
> example, I didn't support tables in Linotype.
> 
> > * support for both Mozilla and IE is important. Other browsers should
> > fall back to a textarea with raw HTML in it.
> 
> yes
> 
> > * the HTML produced by the editor should be cleaned (i.e. not supported
> > tags & attributes removed) and normalized (formatted). The goal of this
> > is to deliver a nice XHTML-subset-doc for storage, and to show nice 
> > HTML
> > to people editing it manually. Hopefully this will also make it 
> > possible
> > to do meaningful text-based diffs.
> 
> Yep
> 
> > My first thought was to do this cleanup stuff serverside (could be as
> > simple as an XSL, which would make it easily customisable too). However
> > it seems like you want to do all that on the client side?
> 
> Linotype already includes a DOM serializer, I think it already does 
> some pretty formatting and already has the ability to distinguish 
> between whitespace-safe elements and non.
> 
> > * Currently in e.g. Linotype the source for the editor (thus of the
> > iframe) is fetched separately from the main page. This is harder to do
> > with cforms since then the pipeline from which the content is fetched
> > should also have access to the cforms Form which is stored somewhere in
> > a variable in a flowscript. For the cforms widget it would be easier I
> > think to embed the HTML directly in the page (e.g. as a Javascript
> > variable). This also makes it possible to assign the content either to
> > the html editor or the textarea depending on what the client supports.
> 
> I thought about that too: my solution would be to have woody draw the 
> widget as an empty <iframe> and then fill it up at page load time from 
> some client-side javascript.
> 
> In theory it's easy, in practice, I expect tons of bugs and 
> incompatibilities between browsers (but haven't tried yet)

I think it's feasible.

I found this thing called "htmlArea", which is some javascript that
exploits the html editors in both IE and Mozilla, and it does things
like that without trouble.

See here:
http://www.interactivetools.com/products/htmlarea/
and here for an online demo:
http://dynarch.com/mishoo/htmlarea.epl

I'm wondering if maybe we should start from that one (it's BSD-style
licensed).

> Another thing I wanted to try is to embed the icons right into the page 
> instead of having them fetched from outside, this makes is easier since 
> you don't have to mount your icons somewhere in your URI space.
> 
> > * Automatic image upload: still need to think more about this. After
> > pressing the submit button (and afterwards possibly showing the form
> > again), the images will need to become available in the URL space. How
> > that's done will probably differ from application to application so we
> > could put that behaviour behind an interface.
> 
> hmmm, what aobut giving back the uploaded "Parts" back into the object 
> model that is accessible to the flowscript. the flow will handle them 
> and put them in the proper place... at this point, the flow will have 
> to be able to call a "link translation" on the page.

something like that, yes. I'll come back to this after I've been able to
put some thought to it.

> > * wiki syntax support: we have no need for this, so don't expect any
> > effort from me on that.
> 
> Fair enough, but please keep in mind that the editor will have "multi 
> views" and need to be defined in the description of the widget for that 
> particular form.

ok.


-- 
Bruno Dumon                             http://outerthought.org/
Outerthought - Open Source, Java & XML Competence Support Center
bruno@outerthought.org                          bruno@apache.org


Re: HTML editor widget (was Re: [proposal] Doco)

Posted by Stefano Mazzocchi <st...@apache.org>.
On Wednesday, Oct 29, 2003, at 11:40 Europe/Rome, Bruno Dumon wrote:

> I've only just started with some little javascript experiments, so it's
> not like any code has been written yet.

ok, but it's great to see you doing this

> Here are some first random thoughts:
>
> * different users of the widget (like the doco project vs the project
> where we need it) will likely require different subsets of HTML to be
> used.

True, even if, for XHTML, you can support different modules. For 
example, I didn't support tables in Linotype.

> * support for both Mozilla and IE is important. Other browsers should
> fall back to a textarea with raw HTML in it.

yes

> * the HTML produced by the editor should be cleaned (i.e. not supported
> tags & attributes removed) and normalized (formatted). The goal of this
> is to deliver a nice XHTML-subset-doc for storage, and to show nice 
> HTML
> to people editing it manually. Hopefully this will also make it 
> possible
> to do meaningful text-based diffs.

Yep

> My first thought was to do this cleanup stuff serverside (could be as
> simple as an XSL, which would make it easily customisable too). However
> it seems like you want to do all that on the client side?

Linotype already includes a DOM serializer, I think it already does 
some pretty formatting and already has the ability to distinguish 
between whitespace-safe elements and non.

> * Currently in e.g. Linotype the source for the editor (thus of the
> iframe) is fetched separately from the main page. This is harder to do
> with cforms since then the pipeline from which the content is fetched
> should also have access to the cforms Form which is stored somewhere in
> a variable in a flowscript. For the cforms widget it would be easier I
> think to embed the HTML directly in the page (e.g. as a Javascript
> variable). This also makes it possible to assign the content either to
> the html editor or the textarea depending on what the client supports.

I thought about that too: my solution would be to have woody draw the 
widget as an empty <iframe> and then fill it up at page load time from 
some client-side javascript.

In theory it's easy, in practice, I expect tons of bugs and 
incompatibilities between browsers (but haven't tried yet)

Another thing I wanted to try is to embed the icons right into the page 
instead of having them fetched from outside, this makes is easier since 
you don't have to mount your icons somewhere in your URI space.

> * Automatic image upload: still need to think more about this. After
> pressing the submit button (and afterwards possibly showing the form
> again), the images will need to become available in the URL space. How
> that's done will probably differ from application to application so we
> could put that behaviour behind an interface.

hmmm, what aobut giving back the uploaded "Parts" back into the object 
model that is accessible to the flowscript. the flow will handle them 
and put them in the proper place... at this point, the flow will have 
to be able to call a "link translation" on the page.

> * wiki syntax support: we have no need for this, so don't expect any
> effort from me on that.

Fair enough, but please keep in mind that the editor will have "multi 
views" and need to be defined in the description of the widget for that 
particular form.

--
Stefano.