You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Rafael Alvarado <al...@Princeton.EDU> on 2004/03/09 22:47:44 UTC

My XSP is entifying my UTF

I have an XSP file that read some UTF8 content in a MySQL database and spits
out an XML document.  The problem is that the all the 16-bit UTF8 stuff gets
split into 8-byte characters and then entified.  I can find no place to
control this.  It also occurs in the generation phase, since I used a view
to check it out at that point.  Do I need to write my own generator to fix
this?!?


Rafael C. Alvarado
Manager of Humanities Computing Research Applications
316 87 Prospect | Princeton University


-----Original Message-----
From: Mark Lundquist [mailto:ml@wrinkledog.com] 
Sent: Tuesday, March 09, 2004 4:42 PM
To: users@cocoon.apache.org
Subject: Re: Need help w/ <wd:action> (and flowscript)


On Mar 9, 2004, at 1:14 PM, I wrote:

> Today I'm trying to move my application to Cocoon 2.1.4, and I'm 
> trying out some of the new stuff.  In particular, I'm having trouble 
> getting <wd:action> to work.
>
> Problem #1:  Validation is being triggered when I click my action 
> widget.  I didn't think that was possible with an action widget, but 
> apparently I've found a way :-/, or maybe I just didn't understand 
> this part.
>
> Problem #2:  The action for my widget needs visibility to objects that 
> are local to the caller of showForm(), and I can't figure out how to 
> get the right visibility inside the <on-action> javascript snippet.
>
> I tried this flowscript:
>
> 	function blah() {
> 	.
> 	.
> 		var doSomething = function() {
> 				// whatever
> 			}
> 		form.form.setAttribute ("doSomething", doSomething);
> 		form.showForm()
> 	.
> 	.
> 	}
>
> ...and this in the form definition file:
>
> 	<wd:action id="doSomething-button" action-command="doSomething">
> 		<wd:label>Do something!</wd:label>
> 		<wd:on-action>
> 			<javascript>
>
event.getSourceWidget().getForm().doSomething();
> 			<javascript>
> 		</wd:on-action>
> 	</wd:action>
>
> ...but Rhino says "doSomething is not a function" (actually, if I log 
> the value I see that it's undefined).

Aha, it worked when I wrote this in the <on-action> snippet:

	var doSomething = event.getSourceWidget().getForm().getAttribute
("doSomething");
	doSomething();

...and once I got that straightened out, the spurious validation problem
took care of itself as well!

I'd still be interested to know if I'm doing this in the "normative" 
way, or if there's some better way that I've missed...

Thanks,
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: My XSP is entifying my UTF

Posted by Joerg Heinicke <jo...@gmx.de>.
On 09.03.2004 23:54, Rafael Alvarado wrote:

> Here is my situation.  I run an etext server with documents written in
> several languages.  In creating a search interface for a collection of
> Hebrew documents, for example, I want to pull a distinct list of words from
> a db and create a set of lists for users to search with.  The values have to
> be in unicode, since they will be sent back to the database as a query
> string.  I don't want to have to translate entities back and forth into UTF8
> -- I would rather work in UTF8 and forget entities forever.  

It seems we have to go in details for a possible solutions of the 
problem. I thought that it is only a question of convenience when 
viewing the HTML output source. Are the HTML pages containing the UTF-8 
characters shown correctly? If so, it should be possible to get them 
back as UTF-8 in Cocoon and store them in the database.

> By the way, I had a similar problem with the HTML generator that uses Jtidy
> -- is this, too, the fault of Xalan?

Hmm, difficult to say from here. What exactly is the problem? As I said 
I thought your problem is only about serializing UTF-8 characters, so 
maybe รถ is escaped to &#246;, but at the end (i.e. in the browser) they 
are the same. But you seem to have something different in mind.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


RE: My XSP is entifying my UTF

Posted by Conal Tuohy <co...@paradise.net.nz>.
Rafael:

It's important to realise that character entities exist only in the in the
serialization of XML, and that inside a Cocoon pipeline the XML is not
serialized, so the entities do not exist. Inside Cocoon these "special"
characters are no different to any other characters: as far as Cocoon is
concerned every character is just a Java char. It's only when the characters
are serialized with a particular encoding that entity references are created
at all. And when a document containing these character entities is parsed,
the entities are converted back into Java characters. Therefore they only
appear where Cocoon interfaces with something else.

So problems could arise where the text is submitted to the database, or
where the text is serialised to a browser, or something like that. Though in
general you should not have to do any translation at all, since the xml
parser and serialiser should do it for you.

Where do the entities cause problems in your system?

Con



> -----Original Message-----
> From: Rafael Alvarado [mailto:alvarado@Princeton.EDU]
> Sent: Wednesday, 10 March 2004 11:54
> To: users@cocoon.apache.org
> Subject: RE: My XSP is entifying my UTF
>
>
> Here is my situation.  I run an etext server with documents written in
> several languages.  In creating a search interface for a collection of
> Hebrew documents, for example, I want to pull a distinct list
> of words from
> a db and create a set of lists for users to search with.  The
> values have to
> be in unicode, since they will be sent back to the database as a query
> string.  I don't want to have to translate entities back and
> forth into UTF8
> -- I would rather work in UTF8 and forget entities forever.
>
> By the way, I had a similar problem with the HTML generator
> that uses Jtidy
> -- is this, too, the fault of Xalan?
>
>
> Rafael C. Alvarado
> Manager of Humanities Computing Research Applications
> 316 87 Prospect | Princeton University
>
>
> -----Original Message-----
> From: Joerg Heinicke [mailto:joerg.heinicke@gmx.de]
> Sent: Tuesday, March 09, 2004 5:46 PM
> To: users@cocoon.apache.org
> Subject: Re: My XSP is entifying my UTF
>
> On 09.03.2004 23:31, Rafael Alvarado wrote:
>
> > OK, thanks for the clarification. So, then, if Xalan is the
> culprit,
> > can it be replaced in Cocoon? My memory says no, but I'll
> have to take
> > a look.  If it cannot be replaced, then I'll probably have
> to drop Cocoon!
>
> Much to my regret! Why are the entified characters so problematic?
>
> The good news: Cocoon does not depend on Xalan, but only on a JAXP
> compatible processor. So it can be replaced. I know few
> people using Saxon
> for example. The bad news: if you use JDK 1.4 Xalan is
> delivered with the
> JDK and it will be a bit more difficult to get it not used by
> Cocoon (e.g.
> by the ParanoidCocoonServlet).
>
> Joerg
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
> For additional commands, e-mail: users-help@cocoon.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


RE: My XSP is entifying my UTF

Posted by Rafael Alvarado <al...@Princeton.EDU>.
Here is my situation.  I run an etext server with documents written in
several languages.  In creating a search interface for a collection of
Hebrew documents, for example, I want to pull a distinct list of words from
a db and create a set of lists for users to search with.  The values have to
be in unicode, since they will be sent back to the database as a query
string.  I don't want to have to translate entities back and forth into UTF8
-- I would rather work in UTF8 and forget entities forever.  

By the way, I had a similar problem with the HTML generator that uses Jtidy
-- is this, too, the fault of Xalan?


Rafael C. Alvarado
Manager of Humanities Computing Research Applications
316 87 Prospect | Princeton University


-----Original Message-----
From: Joerg Heinicke [mailto:joerg.heinicke@gmx.de] 
Sent: Tuesday, March 09, 2004 5:46 PM
To: users@cocoon.apache.org
Subject: Re: My XSP is entifying my UTF

On 09.03.2004 23:31, Rafael Alvarado wrote:

> OK, thanks for the clarification. So, then, if Xalan is the culprit, 
> can it be replaced in Cocoon? My memory says no, but I'll have to take 
> a look.  If it cannot be replaced, then I'll probably have to drop Cocoon!

Much to my regret! Why are the entified characters so problematic?

The good news: Cocoon does not depend on Xalan, but only on a JAXP
compatible processor. So it can be replaced. I know few people using Saxon
for example. The bad news: if you use JDK 1.4 Xalan is delivered with the
JDK and it will be a bit more difficult to get it not used by Cocoon (e.g.
by the ParanoidCocoonServlet).

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: My XSP is entifying my UTF

Posted by Joerg Heinicke <jo...@gmx.de>.
On 09.03.2004 23:31, Rafael Alvarado wrote:

> OK, thanks for the clarification. So, then, if Xalan is the culprit, can it
> be replaced in Cocoon? My memory says no, but I'll have to take a look.  If
> it cannot be replaced, then I'll probably have to drop Cocoon! 

Much to my regret! Why are the entified characters so problematic?

The good news: Cocoon does not depend on Xalan, but only on a JAXP 
compatible processor. So it can be replaced. I know few people using 
Saxon for example. The bad news: if you use JDK 1.4 Xalan is delivered 
with the JDK and it will be a bit more difficult to get it not used by 
Cocoon (e.g. by the ParanoidCocoonServlet).

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


RE: My XSP is entifying my UTF

Posted by Rafael Alvarado <al...@Princeton.EDU>.
OK, thanks for the clarification. So, then, if Xalan is the culprit, can it
be replaced in Cocoon? My memory says no, but I'll have to take a look.  If
it cannot be replaced, then I'll probably have to drop Cocoon! 


Rafael C. Alvarado
Manager of Humanities Computing Research Applications
316 87 Prospect | Princeton University


-----Original Message-----
From: Joerg Heinicke [mailto:joerg.heinicke@gmx.de] 
Sent: Tuesday, March 09, 2004 5:31 PM
To: users@cocoon.apache.org
Subject: Re: My XSP is entifying my UTF

On 09.03.2004 23:14, Rafael Alvarado wrote:
> Do you mean that the problem is not in the XSP generator?  I have 
> written a simple PHP application to grab and present the data with no 
> problems, so I am totally baffled that an XML publication framework would
not handle such a
> basic task.   

Hmm, what shall I say. An entified character is at the end the same like a
non-entified character, isn't it? And so the spec makes no difference on it
or let it up to the processor to handle this when serializing the XML. In
our case (default installation) you must put the blame on Xalan. 
With PHP it's probably Sablotron, isn't it?

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: My XSP is entifying my UTF

Posted by Joerg Heinicke <jo...@gmx.de>.
On 09.03.2004 23:14, Rafael Alvarado wrote:
> Do you mean that the problem is not in the XSP generator?  I have written a
> simple PHP application to grab and present the data with no problems, so I
> am totally baffled that an XML publication framework would not handle such a
> basic task.   

Hmm, what shall I say. An entified character is at the end the same like 
a non-entified character, isn't it? And so the spec makes no difference 
on it or let it up to the processor to handle this when serializing the 
XML. In our case (default installation) you must put the blame on Xalan. 
With PHP it's probably Sablotron, isn't it?

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


RE: My XSP is entifying my UTF

Posted by Rafael Alvarado <al...@Princeton.EDU>.
Do you mean that the problem is not in the XSP generator?  I have written a
simple PHP application to grab and present the data with no problems, so I
am totally baffled that an XML publication framework would not handle such a
basic task.   


Rafael C. Alvarado
Manager of Humanities Computing Research Applications
316 87 Prospect | Princeton University


-----Original Message-----
From: Joerg Heinicke [mailto:joerg.heinicke@gmx.de] 
Sent: Tuesday, March 09, 2004 5:14 PM
To: users@cocoon.apache.org
Subject: Re: My XSP is entifying my UTF

On 09.03.2004 22:47, Rafael Alvarado wrote:

> I have an XSP file that read some UTF8 content in a MySQL database and 
> spits out an XML document.  The problem is that the all the 16-bit 
> UTF8 stuff gets split into 8-byte characters and then entified.  I can 
> find no place to control this.  It also occurs in the generation 
> phase, since I used a view to check it out at that point.  Do I need 
> to write my own generator to fix this?!?

AFAIK this is completely not controllable - maybe the one or another
processor provides an option, but not in the standard. Raising this issue on
Mulberry's XSL list (after searching the archives as this might be a
frequently asked question) might lead to more profound answers: 
http://www.mulberrytech.com/xsl/xsl-list/.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: My XSP is entifying my UTF

Posted by Joerg Heinicke <jo...@gmx.de>.
On 09.03.2004 22:47, Rafael Alvarado wrote:

> I have an XSP file that read some UTF8 content in a MySQL database and spits
> out an XML document.  The problem is that the all the 16-bit UTF8 stuff gets
> split into 8-byte characters and then entified.  I can find no place to
> control this.  It also occurs in the generation phase, since I used a view
> to check it out at that point.  Do I need to write my own generator to fix
> this?!?

AFAIK this is completely not controllable - maybe the one or another 
processor provides an option, but not in the standard. Raising this 
issue on Mulberry's XSL list (after searching the archives as this might 
be a frequently asked question) might lead to more profound answers: 
http://www.mulberrytech.com/xsl/xsl-list/.

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org