You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Rick Reumann <ri...@coxtarget.com> on 2001/01/11 19:13:26 UTC

ampersand problem still?

Hi,
I've searched the archives for inofmation related to problems when 
trying to parse a document that contains ampersands and the threads 
seem to stop around July but with no solutions. I've just installed 
version 1.2.3 of xerces.jar hoping this would help but I'm still 
running into the same problem: when using SAX2 /xerces any ampersand 
in a document that I try to parse causes the error:
org.xml.sax.SAXParseException: The entity name must immediately follow the '&' in the entity reference.
I've tried relacing the "&" with various substitutions (such as 
&#x26;) but still no luck.  
Any suggestions/help ?
thanks,
Rick

Re: ampersand problem still, still

Posted by Rick Reumann <ri...@coxtarget.com>.
yes when i change it to &amp; the parser splits the line there into 
two pieces. Although I don't get an error when i use &amp; it still 
produces less than desirable results.


On 12 Jan 2001, at 11:44, David Waite wrote:

> Rick Reumann wrote:
> 
> > 
> > <bullet>Val-Pak supports mailings with media campaigns on national
> > TV and consumer publications.*Source: 1998 Elrick & Lavidge</bullet>
> 
> You mean "Elirck &amp; Lavidge" ?
> 
> -David Waite
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org For
> additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 



Re: ampersand problem still, still

Posted by David Waite <dw...@jabber.com>.
Rick Reumann wrote:

> 
> <bullet>Val-Pak supports mailings with media campaigns on national TV and consumer publications.*Source: 1998 Elrick & Lavidge</bullet>

You mean "Elirck &amp; Lavidge" ?

-David Waite


Re: ampersand problem still, still

Posted by Rick Reumann <ri...@coxtarget.com>.

On 12 Jan 2001, at 8:13, Luke Blanshard wrote:

> Can you please post a sample document that demonstrates this error
> when you run against the SAX2Count sample program?  The following
> sample file is parsed fine by SAX2Count in Xerces version 1.2.1:
> 

thanks for looking into this for me, I really appreciate it.
the actual doc looks like this...

<marketingBullets>
<bullet>The Val-Pak exclusive network of Neighborhood Trade Areas reach your best prospective customers without wasting advertising dollars. *Source: 1998 Elrick & Lavidge</bullet>
<bullet>Val-Pak mails over 15 billion ads annually in over 500 million familiar blue envelops to over 50 million unduplicated addresses.</bullet>
<bullet>Val-Pak supports mailings with media campaigns on national TV and consumer publications.*Source: 1998 Elrick & Lavidge</bullet>
<bullet>For over 32 years, Val-Pak mailings have been on-time and cost effective.</bullet>
</marketingBullets>

The parser class that I modified is... (possibly something in here 
could be causing the problem. This class is supposed to put put each 
row that starts with <bullet> and ends with </bullet> into a bean and 
then store that bean in a collection. Works fine if I don't have the 
ampersands. I'm new to all this so pardon if I'm doing some really 
stupid newbie mistake. If someone would rather, I could send the 
actual files).
 

import java.io.FileReader;
import org.xml.sax.XMLReader;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.helpers.DefaultHandler;
import java.util.*;
import FHPMarketingBulletsBean;
import Content;

public class FHPMarketingBulletsParser extends DefaultHandler
{
	
	private Collection col = new ArrayList();
	private FHPMarketingBulletsBean bulletsBean = null;
	private String currentElement = null;
	private String filename = null;
	 
	public FHPMarketingBulletsParser()
	{
		 super();
	}
	
	public Collection getParsedFile(String contentName) throws Exception
	{
		 
		Content cont = new Content( contentName );
		filename = cont.getFilename();
 		XMLReader xr = new org.apache.xerces.parsers.SAXParser();
 		FHPMarketingBulletsParser handler = new FHPMarketingBulletsParser();
 		xr.setContentHandler(handler);
  		xr.setErrorHandler(handler);
  		FileReader r = new FileReader(filename);
 		xr.parse(new InputSource(r));
 		return ( handler.col );
	} 

	public void startDocument ()
	{
 		bulletsBean = new FHPMarketingBulletsBean();
 	}
 
	public void endDocument ()
	{
		//endDocument
	}
 
	public void startElement (String uri, String name, String qName, Attributes atts)
	{
		currentElement = name;
 		if ( name.equals("bullet") )
		{
 			bulletsBean = new FHPMarketingBulletsBean();
 		}
	 
	}
 
	public void endElement (String uri, String name, String qName)
	{
 		if ( name.equals("bullet"))
		{
			col.add( bulletsBean );
		}
	}
 
	public void characters (char ch[], int start, int length)
	{
 		StringBuffer elementValue = new StringBuffer(length);
		for (int i = start; i < start + length; i++) 
		{
			switch (ch[i]) 
			{
				 
				case '\\':
				elementValue.append(ch[i]);
				break;
				
				case '"':
				elementValue.append(ch[i]);
				break;
				
				case '\n':
				//elementValue.append(ch[i]);
				break;
				
				case '\r':
				//elementValue.append(ch[i]);
				break;
				
				case '\t':
				//elementValue.append(ch[i]);
				break;
				
				default:
				elementValue.append(ch[i]);
				break;
			}
			
		}
		String temp = null;
		if ( elementValue.length() > 0 && !( elementValue.toString() ).equals(" ") )
		{
			temp = elementValue.toString();
		}
 		bulletsBean.setElementValue( currentElement, temp );
 	}
}


> <test>
>     Here&apos;s some sample text with ampersands &amp; other character
>     entity refs embedded. <test attr='&amp; here&apos;s an attribute
>     value with the same'/>
> </test>
> 
> Luke
> 
> Rick Reumann wrote:
> 
> >  Am I missing something obvious, or wouldn't &amp; do the job?
> > >
> > > Ian
> >
> > Actually I tried that first..then I tried both &#x26; and &#038; as
> > others have suggested. I also haven't had much luck using the CDATA
> > sections to try to escape it as when I do this I don't get an error
> > but the parsing then breaks at the CDATA section which is just as
> > bad. To recap, I'm trying to parse and XML doc using SAX2 and when I
> > get to an ampersand in the doc it causes an error. Any help or
> > direction where to go from here would be much appreciated. Thanks,
> > Rick
> >
> >
> > > On Thu, 11 Jan 2001, Rick Reumann wrote:
> > >
> > > > Hi,
> > > > I've searched the archives for inofmation related to problems
> > > > when trying to parse a document that contains ampersands and the
> > > > threads seem to stop around July but with no solutions. I've
> > > > just installed version 1.2.3 of xerces.jar hoping this would
> > > > help but I'm still running into the same problem: when using
> > > > SAX2 /xerces any ampersand in a document that I try to parse
> > > > causes the error: org.xml.sax.SAXParseException: The entity name
> > > > must immediately follow the '&' in the entity reference. I've
> > > > tried relacing the "&" with various substitutions (such as
> > > > &#x26;) but still no luck.  Any suggestions/help ?
> > >
> > > Am I missing something obvious, or wouldn't &amp; do the job?
> > >
> > > Ian
> > >
> > > --
> > > Ian Roberts                     | irr@decisionsoft.com
> > > DecisionSoft Ltd.               | http://www.decisionsoft.com/
> > >
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> > For additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org For
> additional commands, e-mail: xerces-j-dev-help@xml.apache.org
> 



Re: ampersand problem still, still

Posted by Luke Blanshard <lu...@quiq.com>.
Can you please post a sample document that demonstrates this error when you
run against the SAX2Count sample program?  The following sample file is
parsed fine by SAX2Count in Xerces version 1.2.1:

<test>
    Here&apos;s some sample text with ampersands &amp; other character
    entity refs embedded.
    <test attr='&amp; here&apos;s an attribute value with the same'/>
</test>

Luke

Rick Reumann wrote:

>  Am I missing something obvious, or wouldn't &amp; do the job?
> >
> > Ian
>
> Actually I tried that first..then I tried both &#x26; and &#038; as
> others have suggested. I also haven't had much luck using the CDATA
> sections to try to escape it as when I do this I don't get an error
> but the parsing then breaks at the CDATA section which is just as
> bad. To recap, I'm trying to parse and XML doc using SAX2 and when I
> get to an ampersand in the doc it causes an error. Any help or
> direction where to go from here would be much appreciated.
> Thanks,
> Rick
>
>
> > On Thu, 11 Jan 2001, Rick Reumann wrote:
> >
> > > Hi,
> > > I've searched the archives for inofmation related to problems when
> > > trying to parse a document that contains ampersands and the threads
> > > seem to stop around July but with no solutions. I've just installed
> > > version 1.2.3 of xerces.jar hoping this would help but I'm still
> > > running into the same problem: when using SAX2 /xerces any ampersand
> > > in a document that I try to parse causes the error:
> > > org.xml.sax.SAXParseException: The entity name must immediately
> > > follow the '&' in the entity reference. I've tried relacing the "&"
> > > with various substitutions (such as &#x26;) but still no luck.  Any
> > > suggestions/help ?
> >
> > Am I missing something obvious, or wouldn't &amp; do the job?
> >
> > Ian
> >
> > --
> > Ian Roberts                     | irr@decisionsoft.com
> > DecisionSoft Ltd.               | http://www.decisionsoft.com/
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: ampersand problem still, still

Posted by Rick Reumann <ri...@coxtarget.com>.
 Am I missing something obvious, or wouldn't &amp; do the job?
> 
> Ian

Actually I tried that first..then I tried both &#x26; and &#038; as 
others have suggested. I also haven't had much luck using the CDATA 
sections to try to escape it as when I do this I don't get an error 
but the parsing then breaks at the CDATA section which is just as 
bad. To recap, I'm trying to parse and XML doc using SAX2 and when I 
get to an ampersand in the doc it causes an error. Any help or 
direction where to go from here would be much appreciated. 
Thanks,
Rick

 
> On Thu, 11 Jan 2001, Rick Reumann wrote:
> 
> > Hi,
> > I've searched the archives for inofmation related to problems when
> > trying to parse a document that contains ampersands and the threads
> > seem to stop around July but with no solutions. I've just installed
> > version 1.2.3 of xerces.jar hoping this would help but I'm still
> > running into the same problem: when using SAX2 /xerces any ampersand
> > in a document that I try to parse causes the error:
> > org.xml.sax.SAXParseException: The entity name must immediately
> > follow the '&' in the entity reference. I've tried relacing the "&"
> > with various substitutions (such as &#x26;) but still no luck.  Any
> > suggestions/help ?
> 
> Am I missing something obvious, or wouldn't &amp; do the job?
> 
> Ian
> 
> -- 
> Ian Roberts                     | irr@decisionsoft.com
> DecisionSoft Ltd.               | http://www.decisionsoft.com/
>