You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by philguillard <ph...@gmail.com> on 2007/01/14 20:45:28 UTC
Convert string from html to xhtml
Hi,
I'd like to convert html code to xhtml, but that code is coming from
database, i know to use html generator -thus jtidy-, but this time it is
not a html file that i download, and i can't call a generator for each
database field.
Anybody any idea? Is there a way to call jtidy to convert a string in java?
Phil
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Convert string from html to xhtml
Posted by philguillard <ph...@gmail.com>.
Thanks a lot for this quick response.
Phil
Steven D. Majewski wrote:
>
> On Jan 14, 2007, at 2:45 PM, philguillard wrote:
>
>> Hi,
>>
>> I'd like to convert html code to xhtml, but that code is coming from
>> database, i know to use html generator -thus jtidy-, but this time it
>> is not a html file that i download, and i can't call a generator for
>> each database field.
>>
>> Anybody any idea? Is there a way to call jtidy to convert a string in
>> java?
>>
>
>
>
> See the jtidy howto docs:
> <http://jtidy.sourceforge.net/howto.html>
> and the API javadocs:
> <http://jtidy.sourceforge.net/apidocs/>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Convert string from html to xhtml
Posted by "Steven D. Majewski" <sd...@virginia.edu>.
On Jan 14, 2007, at 2:45 PM, philguillard wrote:
> Hi,
>
> I'd like to convert html code to xhtml, but that code is coming
> from database, i know to use html generator -thus jtidy-, but this
> time it is not a html file that i download, and i can't call a
> generator for each database field.
>
> Anybody any idea? Is there a way to call jtidy to convert a string
> in java?
>
See the jtidy howto docs:
<http://jtidy.sourceforge.net/howto.html>
and the API javadocs:
<http://jtidy.sourceforge.net/apidocs/>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Convert string from html to xhtml
Posted by Abbas Mousavi <ab...@yahoo.com>.
I had this problem also , this change in org.apache.cocoon.transformation.HTMLTransformer
solved the problem, the change is near line 173 >>>
new ByteArrayInputStream(text.getBytes("UTF-8"));
---------------------------------------------------------------------------------------
.cPA8 { color:#008000; } .c9Y6TC { color:#0000ff; } .c50I2O { color:#808080; } .c50HZ4 { color:#008080; } .c4ZSSG { color:#800080; } /*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.cocoon.transformation;
import java.io.BufferedInputStream;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.PrintWriter;
import java.io.StringWriter;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.StringTokenizer;
import org.apache.avalon.framework.configuration.Configurable;
import org.apache.avalon.framework.configuration.Configuration;
import org.apache.avalon.framework.configuration.ConfigurationException;
import org.apache.avalon.framework.parameters.Parameters;
import org.apache.cocoon.ProcessingException;
import org.apache.cocoon.environment.SourceResolver;
import org.apache.cocoon.transformation.AbstractSAXTransformer;
import org.apache.cocoon.xml.XMLUtils;
import org.apache.cocoon.xml.IncludeXMLConsumer;
import org.apache.excalibur.source.Source;
import org.w3c.tidy.Tidy;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
/**
* Converts (escaped) HTML snippets into JTidied HTML.
* This transformer expects a list of elements, passed as comma separated
* values of the "tags" parameter. It records the text enclosed in such
* elements and pass it thru JTidy to obtain valid XHTML.
*
* <p>TODO: Add namespace support.
* <p><strong>WARNING:</strong> This transformer should be considered unstable.
*
* @author <a href="mailto:d.madama@pro-netics.com">Daniele Madama</a>
* @author <a href="mailto:gianugo@apache.org">Gianugo Rabellino</a>
*
* @version CVS $Id: HTMLTransformer.java 433543 2006-08-22 06:22:54Z crossley $
*/
public class HTMLTransformer
extends AbstractSAXTransformer
implements Configurable {
/**
* Properties for Tidy format
*/
private Properties properties;
/**
* Tags that must be normalized
*/
private Map tags;
/**
* React on endElement calls that contain a tag to be
* tidied and run Jtidy on it, otherwise passthru.
*
* @see org.xml.sax.ContentHandler#endElement(java.lang.String, java.lang.String, java.lang.String)
*/
public void endElement(String uri, String name, String raw)
throws SAXException {
if (this.tags.containsKey(name)) {
String toBeNormalized = this.endTextRecording();
try {
this.normalize(toBeNormalized);
} catch (ProcessingException e) {
e.printStackTrace();
}
}
super.endElement(uri, name, raw);
}
/**
* Start buffering text if inside a tag to be normalized,
* passthru otherwise.
*
* @see org.xml.sax.ContentHandler#startElement(java.lang.String, java.lang.String, java.lang.String, org.xml.sax.Attributes)
*/
public void startElement(
String uri,
String name,
String raw,
Attributes attr)
throws SAXException {
super.startElement(uri, name, raw, attr);
if (this.tags.containsKey(name)) {
this.startTextRecording();
}
}
/**
* Configure this transformer, possibly passing to it
* a jtidy configuration file location.
*/
public void configure(Configuration config) throws ConfigurationException {
super.configure(config);
String configUrl = config.getChild("jtidy-config").getValue(null);
if (configUrl != null) {
org.apache.excalibur.source.SourceResolver resolver = null;
Source configSource = null;
try {
resolver = (org.apache.excalibur.source.SourceResolver)
this.manager.lookup(org.apache.excalibur.source.SourceResolver.ROLE);
configSource = resolver.resolveURI(configUrl);
if (getLogger().isDebugEnabled()) {
getLogger().debug(
"Loading configuration from " + configSource.getURI());
}
this.properties = new Properties();
this.properties.load(configSource.getInputStream());
} catch (Exception e) {
getLogger().warn("Cannot load configuration from " + configUrl);
throw new ConfigurationException(
"Cannot load configuration from " + configUrl,
e);
} finally {
if (null != resolver) {
this.manager.release(resolver);
resolver.release(configSource);
}
}
}
}
/**
* The beef: run JTidy on the buffered text and stream
* the result
*
* @param text the string to be tidied
*/
private void normalize(String text) throws ProcessingException {
try {
// Setup an instance of Tidy.
Tidy tidy = new Tidy();
tidy.setXmlOut(true);
if (this.properties == null) {
tidy.setXHTML(true);
} else {
tidy.setConfigurationFromProps(this.properties);
}
//Set Jtidy warnings on-off
tidy.setShowWarnings(getLogger().isWarnEnabled());
//Set Jtidy final result summary on-off
tidy.setQuiet(!getLogger().isInfoEnabled());
//Set Jtidy infos to a String (will be logged) instead of System.out
StringWriter stringWriter = new StringWriter();
PrintWriter errorWriter = new PrintWriter(stringWriter);
tidy.setErrout(errorWriter);
// Extract the document using JTidy and stream it.
ByteArrayInputStream bais =
new ByteArrayInputStream(text.getBytes("UTF-8"));
org.w3c.dom.Document doc =
tidy.parseDOM(new BufferedInputStream(bais), null);
// FIXME: Jtidy doesn't warn or strip duplicate attributes in same
// tag; stripping.
XMLUtils.stripDuplicateAttributes(doc, null);
errorWriter.flush();
errorWriter.close();
if (getLogger().isWarnEnabled()) {
getLogger().warn(stringWriter.toString());
}
IncludeXMLConsumer.includeNode(doc, this.contentHandler, this.lexicalHandler);
} catch (Exception e) {
throw new ProcessingException(
"Exception in HTMLTransformer.normalize()",
e);
}
}
/**
* Setup this component, passing the tag names to be tidied.
*/
public void setup(
SourceResolver resolver,
Map objectModel,
String src,
Parameters par)
throws ProcessingException, SAXException, IOException {
super.setup(resolver, objectModel, src, par);
String tagsParam = par.getParameter("tags", "");
if (getLogger().isDebugEnabled()) {
getLogger().debug("tags: " + tagsParam);
}
this.tags = new HashMap();
StringTokenizer tokenizer = new StringTokenizer(tagsParam, ",");
while (tokenizer.hasMoreElements()) {
String tok = tokenizer.nextToken().trim();
this.tags.put(tok, tok);
}
}
}
-----------------------------------------------------------------------------------------
philguillard <ph...@gmail.com> wrote: I use (jtidy)HtmlTransformer.
I had XSP (XML file in utf-8, database tables and jdbc connection in
utf-8) --> html or XML Serializer in UTF-8 this was ok,
and i get character encoding troubles in my language after inserting the
HTMLTransformer, confusion between ISO-8859-1 and UTF-8 ?...
I guess it is not possible to configure that in the html transformer?
Phil
Torsten Curdt wrote:
>
> On 15.01.2007, at 07:38, Bertrand Delacretaz wrote:
>
>> On 1/14/07, philguillard
wrote:
>>
>>> ... Anybody any idea? Is there a way to call jtidy to convert a
>>
>> string in java?...
>>
>> You could also use the (Neko)HtmlTransformer, see the samples at
>> http://cocoon.zones.apache.org/demos/release/samples/blocks/html/
>> welcome.
>> This allows you to cleanup HTML code contained in XML elements.
>
>
> If you are just after a library - I found tagsoup to do a very good job!
>
> http://mercury.ccil.org/~cowan/XML/tagsoup/
>
> cheers
> --
> Torsten
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
---------------------------------
We won't tell. Get more on shows you hate to love
(and love to hate): Yahoo! TV's Guilty Pleasures list.
Re: Convert string from html to xhtml
Posted by philguillard <ph...@gmail.com>.
I use (jtidy)HtmlTransformer.
I had XSP (XML file in utf-8, database tables and jdbc connection in
utf-8) --> html or XML Serializer in UTF-8 this was ok,
and i get character encoding troubles in my language after inserting the
HTMLTransformer, confusion between ISO-8859-1 and UTF-8 ?...
I guess it is not possible to configure that in the html transformer?
Phil
Torsten Curdt wrote:
>
> On 15.01.2007, at 07:38, Bertrand Delacretaz wrote:
>
>> On 1/14/07, philguillard <ph...@gmail.com> wrote:
>>
>>> ... Anybody any idea? Is there a way to call jtidy to convert a
>>
>> string in java?...
>>
>> You could also use the (Neko)HtmlTransformer, see the samples at
>> http://cocoon.zones.apache.org/demos/release/samples/blocks/html/
>> welcome.
>> This allows you to cleanup HTML code contained in XML elements.
>
>
> If you are just after a library - I found tagsoup to do a very good job!
>
> http://mercury.ccil.org/~cowan/XML/tagsoup/
>
> cheers
> --
> Torsten
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Convert string from html to xhtml
Posted by Torsten Curdt <tc...@apache.org>.
On 15.01.2007, at 07:38, Bertrand Delacretaz wrote:
> On 1/14/07, philguillard <ph...@gmail.com> wrote:
>
>> ... Anybody any idea? Is there a way to call jtidy to convert a
> string in java?...
>
> You could also use the (Neko)HtmlTransformer, see the samples at
> http://cocoon.zones.apache.org/demos/release/samples/blocks/html/
> welcome.
> This allows you to cleanup HTML code contained in XML elements.
If you are just after a library - I found tagsoup to do a very good job!
http://mercury.ccil.org/~cowan/XML/tagsoup/
cheers
--
Torsten
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: Convert string from html to xhtml
Posted by Bertrand Delacretaz <bd...@apache.org>.
On 1/14/07, philguillard <ph...@gmail.com> wrote:
>... Anybody any idea? Is there a way to call jtidy to convert a
string in java?...
You could also use the (Neko)HtmlTransformer, see the samples at
http://cocoon.zones.apache.org/demos/release/samples/blocks/html/welcome.
This allows you to cleanup HTML code contained in XML elements.
-Bertrand
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org