You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Rick Reumann <ri...@coxtarget.com> on 2001/01/12 14:46:19 UTC

Re: ampersand problem still, still

 Am I missing something obvious, or wouldn't &amp; do the job?
> 
> Ian

Actually I tried that first..then I tried both &#x26; and &#038; as 
others have suggested. I also haven't had much luck using the CDATA 
sections to try to escape it as when I do this I don't get an error 
but the parsing then breaks at the CDATA section which is just as 
bad. To recap, I'm trying to parse and XML doc using SAX2 and when I 
get to an ampersand in the doc it causes an error. Any help or 
direction where to go from here would be much appreciated. 
Thanks,
Rick

> On Thu, 11 Jan 2001, Rick Reumann wrote:
> 
> > Hi,
> > I've searched the archives for inofmation related to problems when
> > trying to parse a document that contains ampersands and the threads
> > seem to stop around July but with no solutions. I've just installed
> > version 1.2.3 of xerces.jar hoping this would help but I'm still
> > running into the same problem: when using SAX2 /xerces any ampersand
> > in a document that I try to parse causes the error:
> > org.xml.sax.SAXParseException: The entity name must immediately
> > follow the '&' in the entity reference. I've tried relacing the "&"
> > with various substitutions (such as &#x26;) but still no luck.  Any
> > suggestions/help ?
> 
> Am I missing something obvious, or wouldn't &amp; do the job?
> 
> Ian
> 
> -- 
> Ian Roberts                     | irr@decisionsoft.com
> DecisionSoft Ltd.               | http://www.decisionsoft.com/
>

Re: ampersand problem still, still

Posted by Rick Reumann <ri...@coxtarget.com>.

yes when i change it to &amp; the parser splits the line there into 
two pieces. Although I don't get an error when i use &amp; it still 
produces less than desirable results.


On 12 Jan 2001, at 11:44, David Waite wrote:

> Rick Reumann wrote:
> 
> > 
> > <bullet>Val-Pak supports mailings with media campaigns on national
> > TV and consumer publications.*Source: 1998 Elrick & Lavidge</bullet>
> 
> You mean "Elirck &amp; Lavidge" ?
> 
> -David Waite
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org For
> additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>

Re: ampersand problem still, still

Posted by David Waite <dw...@jabber.com>.

Rick Reumann wrote:

> 
> <bullet>Val-Pak supports mailings with media campaigns on national TV and consumer publications.*Source: 1998 Elrick & Lavidge</bullet>

You mean "Elirck &amp; Lavidge" ?

-David Waite

Re: ampersand problem still, still

Posted by Rick Reumann <ri...@coxtarget.com>.


On 12 Jan 2001, at 8:13, Luke Blanshard wrote:

> Can you please post a sample document that demonstrates this error
> when you run against the SAX2Count sample program?  The following
> sample file is parsed fine by SAX2Count in Xerces version 1.2.1:
> 

thanks for looking into this for me, I really appreciate it.
the actual doc looks like this...

<marketingBullets>
<bullet>The Val-Pak exclusive network of Neighborhood Trade Areas reach your best prospective customers without wasting advertising dollars. *Source: 1998 Elrick & Lavidge</bullet>
<bullet>Val-Pak mails over 15 billion ads annually in over 500 million familiar blue envelops to over 50 million unduplicated addresses.</bullet>
<bullet>Val-Pak supports mailings with media campaigns on national TV and consumer publications.*Source: 1998 Elrick & Lavidge</bullet>
<bullet>For over 32 years, Val-Pak mailings have been on-time and cost effective.</bullet>
</marketingBullets>

The parser class that I modified is... (possibly something in here 
could be causing the problem. This class is supposed to put put each 
row that starts with <bullet> and ends with </bullet> into a bean and 
then store that bean in a collection. Works fine if I don't have the 
ampersands. I'm new to all this so pardon if I'm doing some really 
stupid newbie mistake. If someone would rather, I could send the 
actual files).
 

import java.io.FileReader;
import org.xml.sax.XMLReader;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;
import java.util.*;
import FHPMarketingBulletsBean;
import Content;

public class FHPMarketingBulletsParser extends DefaultHandler
{
	
	private Collection col = new ArrayList();
	private FHPMarketingBulletsBean bulletsBean = null;
	private String currentElement = null;
	private String filename = null;
	 
	public FHPMarketingBulletsParser()
	{
		 super();
	}
	
	public Collection getParsedFile(String contentName) throws Exception
	{
		 
		Content cont = new Content( contentName );
		filename = cont.getFilename();
 		XMLReader xr = new org.apache.xerces.parsers.SAXParser();
 		FHPMarketingBulletsParser handler = new FHPMarketingBulletsParser();
 		xr.setContentHandler(handler);
  		xr.setErrorHandler(handler);
  		FileReader r = new FileReader(filename);
 		xr.parse(new InputSource(r));
 		return ( handler.col );
	} 

	public void startDocument ()
	{
 		bulletsBean = new FHPMarketingBulletsBean();
 	}
 
	public void endDocument ()
	{
		//endDocument
	}
 
	public void startElement (String uri, String name, String qName, Attributes atts)
	{
		currentElement = name;
 		if ( name.equals("bullet") )
		{
 			bulletsBean = new FHPMarketingBulletsBean();
 		}
	 
	}
 
	public void endElement (String uri, String name, String qName)
	{
 		if ( name.equals("bullet"))
		{
			col.add( bulletsBean );
		}
	}
 
	public void characters (char ch[], int start, int length)
	{
 		StringBuffer elementValue = new StringBuffer(length);
		for (int i = start; i < start + length; i++) 
		{
			switch (ch[i]) 
			{
				 
				case '\\':
				elementValue.append(ch[i]);
				break;
				
				case '"':
				elementValue.append(ch[i]);
				break;
				
				case '\n':
				//elementValue.append(ch[i]);
				break;
				
				case '\r':
				//elementValue.append(ch[i]);
				break;
				
				case '\t':
				//elementValue.append(ch[i]);
				break;
				
				default:
				elementValue.append(ch[i]);
				break;
			}
			
		}
		String temp = null;
		if ( elementValue.length() > 0 && !( elementValue.toString() ).equals(" ") )
		{
			temp = elementValue.toString();
		}
 		bulletsBean.setElementValue( currentElement, temp );
 	}
}


> <test>
>     Here&apos;s some sample text with ampersands &amp; other character
>     entity refs embedded. <test attr='&amp; here&apos;s an attribute
>     value with the same'/>
> </test>
> 
> Luke
> 
> Rick Reumann wrote:
> 
> >  Am I missing something obvious, or wouldn't &amp; do the job?
> > >
> > > Ian
> >
> > Actually I tried that first..then I tried both &#x26; and &#038; as
> > others have suggested. I also haven't had much luck using the CDATA
> > sections to try to escape it as when I do this I don't get an error
> > but the parsing then breaks at the CDATA section which is just as
> > bad. To recap, I'm trying to parse and XML doc using SAX2 and when I
> > get to an ampersand in the doc it causes an error. Any help or
> > direction where to go from here would be much appreciated. Thanks,
> > Rick
> >
> >
> > > On Thu, 11 Jan 2001, Rick Reumann wrote:
> > >
> > > > Hi,
> > > > I've searched the archives for inofmation related to problems
> > > > when trying to parse a document that contains ampersands and the
> > > > threads seem to stop around July but with no solutions. I've
> > > > just installed version 1.2.3 of xerces.jar hoping this would
> > > > help but I'm still running into the same problem: when using
> > > > SAX2 /xerces any ampersand in a document that I try to parse
> > > > causes the error: org.xml.sax.SAXParseException: The entity name
> > > > must immediately follow the '&' in the entity reference. I've
> > > > tried relacing the "&" with various substitutions (such as
> > > > &#x26;) but still no luck.  Any suggestions/help ?
> > >
> > > Am I missing something obvious, or wouldn't &amp; do the job?
> > >
> > > Ian
> > >
> > > --
> > > Ian Roberts                     | irr@decisionsoft.com
> > > DecisionSoft Ltd.               | http://www.decisionsoft.com/
> > >
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org For
> additional commands, e-mail: xerces-j-dev-help@xml.apache.org
>

Re: ampersand problem still, still

Posted by Luke Blanshard <lu...@quiq.com>.

Can you please post a sample document that demonstrates this error when you
run against the SAX2Count sample program?  The following sample file is
parsed fine by SAX2Count in Xerces version 1.2.1:

<test>
    Here&apos;s some sample text with ampersands &amp; other character
    entity refs embedded.
    <test attr='&amp; here&apos;s an attribute value with the same'/>
</test>

Luke

Rick Reumann wrote:

>  Am I missing something obvious, or wouldn't &amp; do the job?
> >
> > Ian
>
> Actually I tried that first..then I tried both &#x26; and &#038; as
> others have suggested. I also haven't had much luck using the CDATA
> sections to try to escape it as when I do this I don't get an error
> but the parsing then breaks at the CDATA section which is just as
> bad. To recap, I'm trying to parse and XML doc using SAX2 and when I
> get to an ampersand in the doc it causes an error. Any help or
> direction where to go from here would be much appreciated.
> Thanks,
> Rick
>
>
> > On Thu, 11 Jan 2001, Rick Reumann wrote:
> >
> > > Hi,
> > > I've searched the archives for inofmation related to problems when
> > > trying to parse a document that contains ampersands and the threads
> > > seem to stop around July but with no solutions. I've just installed
> > > version 1.2.3 of xerces.jar hoping this would help but I'm still
> > > running into the same problem: when using SAX2 /xerces any ampersand
> > > in a document that I try to parse causes the error:
> > > org.xml.sax.SAXParseException: The entity name must immediately
> > > follow the '&' in the entity reference. I've tried relacing the "&"
> > > with various substitutions (such as &#x26;) but still no luck.  Any
> > > suggestions/help ?
> >
> > Am I missing something obvious, or wouldn't &amp; do the job?
> >
> > Ian
> >
> > --
> > Ian Roberts                     | irr@decisionsoft.com
> > DecisionSoft Ltd.               | http://www.decisionsoft.com/
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org