You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by da...@egcrc.net on 2002/02/27 00:00:51 UTC

Re: Differences between explicit and implicit parsing for docs with DTDs?

Thanks for taking the time to look at my problem.
I am including the following below:
1. The TransformerException case, with backtrace.
2. The case when I explicitly parse using DocumentBuilder.
3. The raw XML file.

I have backtracked through NCBI's collection of DTDs and have found
that the TransformerException occurs while trying to process the following:

(from http://www.ncbi.nlm.nih.gov:80/entrez/query/DTD/nlmcommon_011101.dtd)

<!ATTLIST DataBankList
	CompleteYN (Y | N) "Y"
>

Could this fragment be causing the problem?

Thanks again,
d


**** Without parsing

% java GetURL dontParse "http://www.ncbi.nlm.nih.gov:80/" "entrez/query.fcgi?cmd=Save&db=PubMed&uid=11855986&dopt=XML"
Getting content stream
Printing document directly from stream
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 14th January 2002//EN" "/entrez/query/DTD/pubmed_020114.dtd" [
<!ENTITY %ArticleTitle.Ref "ArticleTitle">
<!ENTITY %ISSN.Ref "ISSN?">
<!ENTITY %DateCreated.Ref "DateCreated">
<!ENTITY %PubDate.Ref "PubDate">
<!ENTITY %PMID.Ref "PMID">
<!ENTITY %MedlineID.Ref "MedlineID?">
<!ENTITY %GrantID.Ref "GrantID?">
<!ENTITY %Agency.Ref "Agency">
<!ENTITY %Acronym.Ref "Acronym?">
<!ENTITY %personal.name "(LastName,(ForeName|(FirstName,MiddleName?))?,Initials?,Suffix?)">
<!ENTITY %author.name "((LastName,(ForeName|(FirstName,MiddleName?))?,Initials?,Suffix?) | CollectiveName)">
<!ELEMENT FirstName (#PCDATA)>
<!ELEMENT ForeName (#PCDATA)>
<!ELEMENT MiddleName (#PCDATA)>
<!ELEMENT LastName (#PCDATA)>
<!ELEMENT Initials (#PCDATA)>
<!ELEMENT Suffix (#PCDATA)>
<!ELEMENT CollectiveName (#PCDATA)>
<!ENTITY %normal.date "(Year,Month,Day,(Hour,(Minute,Second?)?)?)">
<!ENTITY %pub.date "((Year, ((Month, Day?) | Season)?) | MedlineDate)">
<!ELEMENT Year (#PCDATA)>
<!ELEMENT Month (#PCDATA)>
<!ELEMENT Day (#PCDATA)>
<!ELEMENT Season (#PCDATA)>
<!ELEMENT MedlineDate (#PCDATA)>
<!ELEMENT Hour (#PCDATA)>
<!ELEMENT Minute (#PCDATA)>
<!ELEMENT Second (#PCDATA)>
<!ENTITY %data.template "#PCDATA">
<!ENTITY %Abstract "(AbstractText,CopyrightInformation?)">
<!ELEMENT AbstractText (#PCDATA)>
<!ELEMENT CopyrightInformation (#PCDATA)>
<!ELEMENT NCBIArticle (PMID,Article,MedlineJournalInfo?)>
<!ELEMENT Article ((Journal|Book),ArticleTitle,Pagination,Abstract?,Affiliation?,AuthorList?,Language+,DataBankList?,GrantList?,PublicationTypeList,VernacularTitle?,DateOfElectronicPublication?)>
<!ELEMENT DataBankList (DataBank+)>
<!ELEMENT DataBank (DataBankName,AccessionNumberList?)>
<!ELEMENT DataBankName (#PCDATA)>
<!ELEMENT AccessionNumberList (AccessionNumber+)>
<!ELEMENT AccessionNumber (#PCDATA)>
javax.xml.transform.TransformerException: java.lang.NullPointerException
	at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:469)
	at GetURL.printParsedStream(GetURL.java:66)
	at GetURL.main(GetURL.java:155)
---------
java.lang.NullPointerException
	at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1193)
	at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:452)
	at GetURL.printParsedStream(GetURL.java:66)
	at GetURL.main(GetURL.java:155)
---------
java.lang.NullPointerException
	at org.apache.xalan.serialize.WriterToUTF8.write(WriterToUTF8.java:163)
	at org.apache.xalan.serialize.SerializerToXML.attributeDecl(SerializerToXML.java:2009)
	at org.apache.xalan.transformer.TransformerIdentityImpl.attributeDecl(TransformerIdentityImpl.java:1322)
	at org.apache.xerces.parsers.AbstractSAXParser.attributeDecl(AbstractSAXParser.java:918)
	at org.apache.xerces.impl.dtd.XMLDTDValidator.attributeDecl(XMLDTDValidator.java:1563)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanAttlistDecl(XMLDTDScannerImpl.java:1120)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanDecls(XMLDTDScannerImpl.java:1819)
	at org.apache.xerces.impl.XMLDTDScannerImpl.scanDTDExternalSubset(XMLDTDScannerImpl.java:295)
	at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(XMLDocumentScannerImpl.java:820)
	at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:333)
	at org.apache.xerces.parsers.StandardParserConfiguration.parse(StandardParserConfiguration.java:525)
	at org.apache.xerces.parsers.StandardParserConfiguration.parse(StandardParserConfiguration.java:581)
	at org.apache.xerces.parsers.XMLParser.parse(XMLParser.java:147)
	at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1157)
	at org.apache.xalan.transformer.TransformerIdentityImpl.transform(TransformerIdentityImpl.java:452)
	at GetURL.printParsedStream(GetURL.java:66)
	at GetURL.main(GetURL.java:155)


**** With Parsing

% java GetURL parse "http://www.ncbi.nlm.nih.gov:80/" "entrez/query.fcgi?cmd=Save&db=PubMed&uid=11855986&dopt=XML"
Getting content stream
Parsing stream to document
Printing document
<?xml version="1.0" encoding="UTF-8"?>
<!--    
	   This is the Current DTD which NLM has written for 
        External  Use.  If you are a NCBI User, use the information
        from the PubMedArticle Set.

        Comments and suggestions are welcome.
        (May 9, 2000)

        


       --><!-- ================================================================= --><!-- ================================================================= --><!-- Reference to Where the MEDLINECITATION DTD is located  --><!-- NLM Medline DTD              

        This is the DTD which NLM has written for External Use. 
        If you are a data Licensee, use the information
        from the MedlineCitation Set.       
 
        Comments and suggestions are welcome.
        November 1, 2001
       

      
--><!-- ================================================================= --><!--   NLM Medline DTD   --><!-- Typical usage:   

  <!DOCTYPE MedlineCitationSet PUBLIC "-//NLM//DTD NLM//EN">

--><!-- ================================================================= --><!--   Revision Notes Section  
 
  The following changes were made in the nlmmedline_011101.dtd:

       a.  Made MedlineID optional
 
  The following changes were made in the nlmmedline_010319.dtd:

       a.  The entity reference changed to:
          nlmmedlinecitation_010322.dtd.    
       b.  Added PMID entity 

       c.  The entity reference changed back to 
          nlmmedlinecitation_010319.dtd.
 
       d.  Added GrantID, Agency, Acronym entity references 

       e.  Added MedlineID entity  

 
--><!-- ================================================================= --><!-- ================================================================= --><!-- Reference to Where the NLM MedlineCitation DTD is located  --><!-- MedlineCitation DTD              

     This is the DTD which NLM has written for External Use.      
      
        Comments and suggestions are welcome.
        November 1, 2001 
  
--><!-- ================================================================= --><!--   Revision Notes Section  
  
 The following changes were made in the nlmmedlinecitation_011101.dtd:
 
     a.  Added NOTNLM to Owner entity.

     b.  Added Status Entity.
    
     c.  Added Status attribute for MedlineCitation.

     d.  Added RegistryNumber to Chemical Element.

     e.  Added PMID and MedlineID to CommentsCorrections. 
   
     f.  Added NRCBL value to SOURCE entity. 

 The following changes were made in the nlmmedlinecitation_010319.dtd:

     a.  Added the following values to the Owner entity: (NASA | PIP | KIE | HSR | HMD | SIS )
     
     b.  Removed AdditionalInformation element from MedlineCitation
       (note, this element had been optional and was never used)
 
     c.  Added the following elements to MedlineCitation:
         OtherID
         OtherAbstract 
         KeywordList
         SpaceFlightMission
         InvestigatorList
         GeneralNote 
   
     d.  Added Affiliation to Investigator

     e.  Removed Procurement Source from DTD

     f.  Removed AbstractAuthor from OtherAbstract

     g.  Added Attribute List for OtherAbstract Type  
   
     h.  Added NLM default to MedlineCitationOwner attribute. 

     i.  Added %PMID.Ref Entity

     j.  Added Attribute for Keyword 

     k.  Removed Element Owner and Element Type from DTD. 
 
     l.  Added NLM default to KeywordList & GeneralNote Owner attribute. 

     m.  Added attribute values for OtherID Source.

     n.  Added ErratumFor to the CommentsCorrections element.
   
     o.  Changed Investigator element to reference personalname.

     p.  Added SummaryForPatientsIn and OriginalReportIn to 
         CommentCorrections

     q.  Added %MedlineID.Ref Entity


--><!-- ================================================================= --><!--   NLM Medline DTD   --><!-- Typical usage:   

  <!DOCTYPE MedlineCitationSet PUBLIC "-//NLM//DTD NLM//EN">

--><!-- ================================================================= --><!-- Reference to Where the NLM Common DTD is located  --><!-- "http://www.nlm.nih.gov/databases/dtd/nlmcommon_011101.dtd" --><!-- NLMCommon DTD
      
     This is the DTD for data elements that are shared 
     among various applications at the NLM. 
     Comments and suggestions are welcome.

     November 1, 2001

  * = 0 or more occurrences (optional element, repeatable)
  ? = 0 or 1 occurrences (optional element, at most 1)
  + = 1 or more occurrences (required element, repeatable)
  | = choice, one or the other but not both 
  no symbol = required element

    
--><!--    NLMCommon.dtd

        Document Type Definition for the PubMed Article DTD
        $Id$

       
--><!-- ====================================================================== --><!--   Revision Notes Section

 The following changes were made in the nlmcommon_011101.dtd:

       a.  Added DescriptorName to MeshHeading field.

       b.  Added QualifierName to SubHeading field.

       c.  Added attribute for DescriptorName.

       d.  Added attribute for QualifierName.

       e.  Added ForeName to personal name field.  
        
 The following changes were made in the nlmcommon_010319.dtd:

       a.  Added Entity % Abstract to dtd.

       b.  Element Abstract now links to % Abstract.

       c.  Moved element definition of AbstractText & Copyright 
           Information to nlmcommon_0010319.dtd. They were previously 
           defined in nlmmedlinecitation.dtd.

       d.  Made Country and MedlineCode optional elements.

       e.  Element Grant now links to %GrantID.Ref, %Agency.Ref &
           %Acronym.Ref.

 The following change was made in the nlmcommon_001211.dtd:

       a. addition of NLMUniqueID to the DTD.

--><!-- Personal and Author names --><!-- Dates --><!-- ================================================================= --><!-- ================================================================= --><!-- This is the top level element for NCBIArticle --><!-- ================================================================= --><!-- This is the top level element for Article --><!-- Sometime in the future, MedlineCode will change to
     NLMUniqueID   --><!-- ================================================================= --><!-- ================================================================= --><!-- internal DTD entities --><!-- ================================================================= --><!-- This is the top level element for MedlineCitation --><!-- End of MedlineCitation group --><!-- ================================================================= --><!--             Further Definition of NLM Tags         --><!-- ================================================================= --><!-- ================================================================= --><!-- ================================================================= --><!-- ================================================================= --><!-- ================================================================= --><!-- ================================================================= --><!-- This is the top level element for PubMedArticle --><!-- ================================================================= --><!-- ================================================================= --><PubmedArticleSet>

<PubmedArticle>
<MedlineCitation Owner="NLM">
	<PMID>11855986</PMID>
	<DateCreated>
		<Year>2002</Year>
		<Month>Feb</Month>
		<Day>21</Day>
	</DateCreated>
	<Article>
		<Journal>
			<ISSN>0022-2623</ISSN>
			<JournalIssue>
				<Volume>45</Volume>
				<Issue>5</Issue>
				<PubDate>
					<Year>2002</Year>
					<Month>Feb</Month>
					<Day>28</Day>
				</PubDate>
			</JournalIssue>
		</Journal>
		<ArticleTitle>Synthesis and Nicotinic Binding Studies on Enantiopure Diazine Analogues of the Novel (2-Chloro-5-pyridyl)-9-azabicyclo[4.2.1]non-2-ene UB-165.</ArticleTitle>
		<Pagination>
			<MedlinePgn>1064-1072</MedlinePgn>
		</Pagination>
		<Abstract>
			<AbstractText>As part of our program aimed at optimizing therapeutic effects over toxic effects (as observed in the naturally occurring nicotinic acetylcholine receptor modulators (â^?)-nicotine, (â^?)-epibatidine, (â^?)-ferruginine, and (+)-anatoxin-a), we investigated the bioisosteric potential of diazines in the field of (+)-anatoxin-a-type structures. In the series of diazine analogues of deschloro-UB-165 (DUB-165, 6), bioisosteric replacement of the 3-pyridyl pharmacophoric element by a 4-pyridazinyl, 5-pyrimidinyl, or 2-pyrazinyl moiety resulted in novel nAChR ligands 7, 8, and 9. A palladium-catalyzed Suzuki cross-coupling of the 3-diethylboranylpyridine (14) and a Stille cross-coupling of the corresponding tributylstannyl diazines 15â^?17 with the vinyl triflate 13 of the N-protected 9-azabicyclo[4.2.1]nonan-2-one 12 constitute the key steps in the syntheses of these enantiopure anatoxinoids 6â^?9. Studies of the in vitro affinity for (α4)(2)(β2)(3), α3β4, and α7 nAChR subtypes by radioligand binding assays demonstrated that the diazine analogues 7â^?9 can be considered as pharmacologically attractive bioisosteres of DUB-165 (6) but with different effects on the binding affinity with regard to the diazine moiety. The pyrimidine-containing bioisostere 8 turned out to be the most active diazine analogue, which interacts potently (K(i) = 0.14 nM) with the (α4)(2)(β2)(3) subtype and differentiates significantly among the nAChR subtypes investigated. The nitrogens in this anatoxinoid 8 show by far the most negative atomic charges (calculated using the AM1 Hamiltonian). This qualitatively correlates with the highest binding affinity observed for 8 for all subtypes under consideration.</AbstractText>
		</Abstract>
		<Affiliation>Institut für Pharmazeutische Chemie der Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany, and Institut für Pharmazeutische Chemie der Rheinischen Friedrich-Wilhelms-Universität Bonn, Kreuzbergweg 26, D-53115 Bonn, Germany.</Affiliation>
		<AuthorList CompleteYN="Y">
			<Author>
				<LastName>Gohlke</LastName>
				<FirstName>Holger</FirstName>
				<Initials>H</Initials>
			</Author>
			<Author>
				<LastName>Gündisch</LastName>
				<FirstName>Daniela</FirstName>
				<Initials>D</Initials>
			</Author>
			<Author>
				<LastName>Schwarz</LastName>
				<FirstName>Simone</FirstName>
				<Initials>S</Initials>
			</Author>
			<Author>
				<LastName>Seitz</LastName>
				<FirstName>Gunther</FirstName>
				<Initials>G</Initials>
			</Author>
			<Author>
				<LastName>Tilotta</LastName>
				<FirstName>Maria</FirstName>
				<MiddleName>Cristina</MiddleName>
				<Initials>MC</Initials>
			</Author>
			<Author>
				<LastName>Wegge</LastName>
				<FirstName>Thomas</FirstName>
				<Initials>T</Initials>
			</Author>
		</AuthorList>
		<Language>ENG</Language>
		<PublicationTypeList>
			<PublicationType>JOURNAL ARTICLE</PublicationType>
		</PublicationTypeList>
	</Article>
	<MedlineJournalInfo>
		<Country/>
		<MedlineTA>J Med Chem</MedlineTA>
		<MedlineCode>J0F</MedlineCode>
		<NlmUniqueID>9716531</NlmUniqueID>
	</MedlineJournalInfo>
</MedlineCitation>
<PubmedData>
	<History>
		<PubMedPubDate PubStatus="pubmed">
			<Year>2002</Year>
			<Month>2</Month>
			<Day>22</Day>
			<Hour>10</Hour>
			<Minute>0</Minute>
		</PubMedPubDate>
		<PubMedPubDate PubStatus="medline">
			<Year>2002</Year>
			<Month>2</Month>
			<Day>22</Day>
			<Hour>10</Hour>
			<Minute>0</Minute>
		</PubMedPubDate>
	</History>
	<PublicationStatus>ppublish</PublicationStatus>
	<ArticleIdList>
		<ArticleId IdType="pubmed">11855986</ArticleId>
		<ArticleId IdType="pii">jm010936y</ArticleId>
	</ArticleIdList>
</PubmedData>
</PubmedArticle>



</PubmedArticleSet>


**** The raw XML file

<?xml version="1.0"?>
<!DOCTYPE PubmedArticleSet PUBLIC "-//NLM//DTD PubMedArticle, 14th January 2002//EN" "/test/dtd/pubmed_020114.dtd">
<PubmedArticleSet>

<PubmedArticle>
<MedlineCitation>
	<PMID>11855986</PMID>
	<DateCreated>
		<Year>2002</Year>
		<Month>Feb</Month>
		<Day>21</Day>
	</DateCreated>
	<Article>
		<Journal>
			<ISSN>0022-2623</ISSN>
			<JournalIssue>
				<Volume>45</Volume>
				<Issue>5</Issue>
				<PubDate>
					<Year>2002</Year>
					<Month>Feb</Month>
					<Day>28</Day>
				</PubDate>
			</JournalIssue>
		</Journal>
		<ArticleTitle>Synthesis and Nicotinic Binding Studies on Enantiopure Diazine Analogues of the Novel (2-Chloro-5-pyridyl)-9-azabicyclo[4.2.1]non-2-ene UB-165.</ArticleTitle>
		<Pagination>
			<MedlinePgn>1064-1072</MedlinePgn>
		</Pagination>
		<Abstract>
			<AbstractText>As part of our program aimed at optimizing therapeutic effects over toxic effects (as observed in the naturally occurring nicotinic acetylcholine receptor modulators (â^?)-nicotine, (â^?)-epibatidine, (â^?)-ferruginine, and (+)-anatoxin-a), we investigated the bioisosteric potential of diazines in the field of (+)-anatoxin-a-type structures. In the series of diazine analogues of deschloro-UB-165 (DUB-165, 6), bioisosteric replacement of the 3-pyridyl pharmacophoric element by a 4-pyridazinyl, 5-pyrimidinyl, or 2-pyrazinyl moiety resulted in novel nAChR ligands 7, 8, and 9. A palladium-catalyzed Suzuki cross-coupling of the 3-diethylboranylpyridine (14) and a Stille cross-coupling of the corresponding tributylstannyl diazines 15â^?17 with the vinyl triflate 13 of the N-protected 9-azabicyclo[4.2.1]nonan-2-one 12 constitute the key steps in the syntheses of these enantiopure anatoxinoids 6â^?9. Studies of the in vitro affinity for (α4)(2)(β2)(3), α3β4, and α7 nAChR subtypes by radioligand binding assays demonstrated that the diazine analogues 7â^?9 can be considered as pharmacologically attractive bioisosteres of DUB-165 (6) but with different effects on the binding affinity with regard to the diazine moiety. The pyrimidine-containing bioisostere 8 turned out to be the most active diazine analogue, which interacts potently (K(i) = 0.14 nM) with the (α4)(2)(β2)(3) subtype and differentiates significantly among the nAChR subtypes investigated. The nitrogens in this anatoxinoid 8 show by far the most negative atomic charges (calculated using the AM1 Hamiltonian). This qualitatively correlates with the highest binding affinity observed for 8 for all subtypes under consideration.</AbstractText>
		</Abstract>
		<Affiliation>Institut für Pharmazeutische Chemie der Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany, and Institut für Pharmazeutische Chemie der Rheinischen Friedrich-Wilhelms-Universität Bonn, Kreuzbergweg 26, D-53115 Bonn, Germany.</Affiliation>
		<AuthorList>
			<Author>
				<LastName>Gohlke</LastName>
				<FirstName>Holger</FirstName>
				<Initials>H</Initials>
			</Author>
			<Author>
				<LastName>Gündisch</LastName>
				<FirstName>Daniela</FirstName>
				<Initials>D</Initials>
			</Author>
			<Author>
				<LastName>Schwarz</LastName>
				<FirstName>Simone</FirstName>
				<Initials>S</Initials>
			</Author>
			<Author>
				<LastName>Seitz</LastName>
				<FirstName>Gunther</FirstName>
				<Initials>G</Initials>
			</Author>
			<Author>
				<LastName>Tilotta</LastName>
				<FirstName>Maria</FirstName>
				<MiddleName>Cristina</MiddleName>
				<Initials>MC</Initials>
			</Author>
			<Author>
				<LastName>Wegge</LastName>
				<FirstName>Thomas</FirstName>
				<Initials>T</Initials>
			</Author>
		</AuthorList>
		<Language>ENG</Language>
		<PublicationTypeList>
			<PublicationType>JOURNAL ARTICLE</PublicationType>
		</PublicationTypeList>
	</Article>
	<MedlineJournalInfo>
		<Country></Country>
		<MedlineTA>J Med Chem</MedlineTA>
		<MedlineCode>J0F</MedlineCode>
		<NlmUniqueID>9716531</NlmUniqueID>
	</MedlineJournalInfo>
</MedlineCitation>
<PubmedData>
	<History>
		<PubMedPubDate PubStatus="pubmed">
			<Year>2002</Year>
			<Month>2</Month>
			<Day>22</Day>
			<Hour>10</Hour>
			<Minute>0</Minute>
		</PubMedPubDate>
		<PubMedPubDate PubStatus="medline">
			<Year>2002</Year>
			<Month>2</Month>
			<Day>22</Day>
			<Hour>10</Hour>
			<Minute>0</Minute>
		</PubMedPubDate>
	</History>
	<PublicationStatus>ppublish</PublicationStatus>
	<ArticleIdList>
		<ArticleId IdType="pubmed">11855986</ArticleId>
		<ArticleId IdType="pii">jm010936y</ArticleId>
	</ArticleIdList>
</PubmedData>
</PubmedArticle>



</PubmedArticleSet>


Joseph Kesselman/CAM/Lotus wrote:
> 
> Hmmm. DOM2DTM _does_ have code which should skip the DOCUMENT_TYPE_NODE and
> its entire subtree as if they don't exist.
> 
> It would have helped a bit if you'd given us a complete stack trace for the
> Transformer Exception.
> 
> I've got my hands full at the moment, but I can try to look a this in
> greater detail later this week if someone else doesn't get to it first.