You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ted Yu <yu...@gmail.com> on 2010/03/04 20:06:00 UTC

Failed to set setXIncludeAware(true) for parser

Hi,
We use nutch 1.0
In nutch, we define the following according to
http://issues.apache.org/jira/browse/HADOOP-5254:

NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl"

But we still see:
ERROR org.apache.hadoop.conf.Configuration: Failed to set
setXIncludeAware(true) for parser
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@17fe1feb:java.lang.UnsupportedOperationException:
This parser does not support specification "null" version "null"
java.lang.UnsupportedOperationException: This parser does not support
specification "null" version "null"

Can someone provide hint why the above error still appears ?

Here is the command line (I symlinked jdk's rt.jar as 1rt.jar which appears
before xerces-2_6_2.jar in the classpath):
/usr/local/jdk1.6.0_14/bin/java -Xmx1000m
-Dhadoop.log.dir=/opt/kindsight/nutchbase/logs
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
-Dhadoop.log.file=hadoop.log
-Djava.library.path=/opt/kindsight/nutchbase/lib/native/Linux-amd64-64
-classpath
/opt/kindsight/nutchbase:/opt/kindsight/nutchbase/conf:/opt/kindsight/nutchbase/conf/batchclient:/opt/kindsight/nutchbase/lib/
*1rt.jar*:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/nutch-1.0.job:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/servlet-api.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-misc-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/tika-0.1-incubating.jar:/opt/kindsight/nutchbase/lib/3rdparty/junit-3.8.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-core-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/hbase-0.20.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-core-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-solrj-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/json_simple-1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2.jar:/opt/kindsight/nutchbase/lib/3rdparty/jetty-5.1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/jets3t-0.6.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-lang-2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2-apis.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-common-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/jdom-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-fileupload-1.3-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-api-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-beanutils-1.8.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/nutch-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-io-1.3.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/icu4j-4_0_1.jar:/opt/kindsight/nutchbase/lib/3rdparty/log4j-1.2.15.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-client-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/hbase-0.20.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/hadoop-core.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-codec-1.3.jar:/opt/kindsight/nutchbase/lib/3rdparty/jakarta-oro-2.0.8.jar:/opt/kindsight/nutchbase/lib/3rdparty/hsqldb-1.8.0.7.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-pool-1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-dbcp-1.2.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/mysql-connector-java-5.1.10-bin.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.0.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/zookeeper-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/taglibs-i18n.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-cli-1.2.jar:/usr/local/jdk1.6.0_14/lib/tools.jar:/opt/kindsight/nutchbase/build/nutch-*.job:/opt/kindsight/nutchbase/nutch-*.job:/opt/kindsight/nutchbase/lib/1rt.jar:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/jetty-ext/*.jar
com.rialto.nutchbase.fetcher.Fetcher -D db.max.outlinks.per.page=1000
domaincrawltable lpm/1-100303152908371-tomcatadmin/generate/3
lpm/1-100303152908371-tomcatadmin/parse/3 -threads 10 -actionid
1-100303152908371-tomcatadmin@domain_crawl

Thanks

RE: Tracking Metrics in Hadoop by User

Posted by sagar_shukla <sa...@persistent.co.in>.
Hi Steve,
      I had observed issues with Ganglia in terms of refresh of data when the nodes go down or removed from the cluster. It could be because of the complexity of the environment, but I found Nagios useful in that front.

There is a Hadoop plugin available for Nagios which provides node-based statistics. Though I have not used it, but you can give it a try and see if that is useful in providing the details that you want.
http://exchange.nagios.org/directory/Plugins/Others/check_hadoop%252Ddfs-2Esh/details

Thanks,
Sagar Shukla

-----Original Message-----
From: Stephen Watt [mailto:swatt@us.ibm.com] 
Sent: Tuesday, March 09, 2010 11:37 PM
To: common-user@hadoop.apache.org
Subject: Tracking Metrics in Hadoop by User

I'm interested in the ability to track metrics (such as CPU time, storage 
used per machine, across the cluster) in Hadoop by User. I've taken a look 
at the Fair and Capacity Schedulers and they seem oriented towards 
ensuring fair use between users' jobs rather than providing a feature 
which also reports what resources the users actually used on the cluster. 
Likewise, with other tools like Ganglia, which appear to be concerned with 
reporting metrics by machine (and not by job). I've also taken a look 
through the common/metrics tickets in JIRA and there does not seem to be 
any open work that addresses this requirement. 

Have I missed something ? Has anyone been able to do this ? Is there a way 
to capture metrics by Job (which could be correlated back to a user?) If 
not, is there any current or forecasted work in the project that addresses 
this requirement ? 

Kind regards
Steve Watt

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Tracking Metrics in Hadoop by User

Posted by Stephen Watt <sw...@us.ibm.com>.
I'm interested in the ability to track metrics (such as CPU time, storage 
used per machine, across the cluster) in Hadoop by User. I've taken a look 
at the Fair and Capacity Schedulers and they seem oriented towards 
ensuring fair use between users' jobs rather than providing a feature 
which also reports what resources the users actually used on the cluster. 
Likewise, with other tools like Ganglia, which appear to be concerned with 
reporting metrics by machine (and not by job). I've also taken a look 
through the common/metrics tickets in JIRA and there does not seem to be 
any open work that addresses this requirement. 

Have I missed something ? Has anyone been able to do this ? Is there a way 
to capture metrics by Job (which could be correlated back to a user?) If 
not, is there any current or forecasted work in the project that addresses 
this requirement ? 

Kind regards
Steve Watt

Re: Failed to set setXIncludeAware(true) for parser

Posted by Steve Loughran <st...@apache.org>.
Ted Yu wrote:
> Hi,
> We use nutch 1.0
> In nutch, we define the following according to
> http://issues.apache.org/jira/browse/HADOOP-5254:
> 
> NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR
> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl"
> 
> But we still see:
> ERROR org.apache.hadoop.conf.Configuration: Failed to set
> setXIncludeAware(true) for parser
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@17fe1feb:java.lang.UnsupportedOperationException:
> This parser does not support specification "null" version "null"
> java.lang.UnsupportedOperationException: This parser does not support
> specification "null" version "null"
> 
> Can someone provide hint why the above error still appears ?
> 
> Here is the command line (I symlinked jdk's rt.jar as 1rt.jar which appears
> before xerces-2_6_2.jar in the classpath):

-not sure that helps, as it complicates factory stuff.

Try

ant -diagnostics, post the results here as that does some XML parser 
diagnostics work