You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ted Yu <yu...@gmail.com> on 2010/03/04 20:06:00 UTC
Failed to set setXIncludeAware(true) for parser
Hi,
We use nutch 1.0
In nutch, we define the following according to
http://issues.apache.org/jira/browse/HADOOP-5254:
NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl"
But we still see:
ERROR org.apache.hadoop.conf.Configuration: Failed to set
setXIncludeAware(true) for parser
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@17fe1feb:java.lang.UnsupportedOperationException:
This parser does not support specification "null" version "null"
java.lang.UnsupportedOperationException: This parser does not support
specification "null" version "null"
Can someone provide hint why the above error still appears ?
Here is the command line (I symlinked jdk's rt.jar as 1rt.jar which appears
before xerces-2_6_2.jar in the classpath):
/usr/local/jdk1.6.0_14/bin/java -Xmx1000m
-Dhadoop.log.dir=/opt/kindsight/nutchbase/logs
-Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl
-Dhadoop.log.file=hadoop.log
-Djava.library.path=/opt/kindsight/nutchbase/lib/native/Linux-amd64-64
-classpath
/opt/kindsight/nutchbase:/opt/kindsight/nutchbase/conf:/opt/kindsight/nutchbase/conf/batchclient:/opt/kindsight/nutchbase/lib/
*1rt.jar*:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/nutch-1.0.job:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/servlet-api.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-misc-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/tika-0.1-incubating.jar:/opt/kindsight/nutchbase/lib/3rdparty/junit-3.8.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-core-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/hbase-0.20.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/lucene-core-2.4.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-solrj-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/json_simple-1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2.jar:/opt/kindsight/nutchbase/lib/3rdparty/jetty-5.1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/jets3t-0.6.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-lang-2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/xerces-2_6_2-apis.jar:/opt/kindsight/nutchbase/lib/3rdparty/apache-solr-common-1.3.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/jdom-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-fileupload-1.3-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-api-1.0.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-beanutils-1.8.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/nutch-1.0.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-io-1.3.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/icu4j-4_0_1.jar:/opt/kindsight/nutchbase/lib/3rdparty/log4j-1.2.15.jar:/opt/kindsight/nutchbase/lib/3rdparty/oozie-client-0.20.0.o0.1-SNAPSHOT.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/hbase-0.20.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/batch/hadoop-core.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-logging-1.1.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-codec-1.3.jar:/opt/kindsight/nutchbase/lib/3rdparty/jakarta-oro-2.0.8.jar:/opt/kindsight/nutchbase/lib/3rdparty/hsqldb-1.8.0.7.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-pool-1.4.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-dbcp-1.2.2.jar:/opt/kindsight/nutchbase/lib/3rdparty/mysql-connector-java-5.1.10-bin.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-httpclient-3.0.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/zookeeper-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/taglibs-i18n.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-collections-3.2.1.jar:/opt/kindsight/nutchbase/lib/3rdparty/commons-cli-1.2.jar:/usr/local/jdk1.6.0_14/lib/tools.jar:/opt/kindsight/nutchbase/build/nutch-*.job:/opt/kindsight/nutchbase/nutch-*.job:/opt/kindsight/nutchbase/lib/1rt.jar:/opt/kindsight/nutchbase/lib/batchplatform.jar:/opt/kindsight/nutchbase/lib/colo_common.jar:/opt/kindsight/nutchbase/lib/csreader.jar:/opt/kindsight/nutchbase/lib/pr_common.jar:/opt/kindsight/nutchbase/lib/jetty-ext/*.jar
com.rialto.nutchbase.fetcher.Fetcher -D db.max.outlinks.per.page=1000
domaincrawltable lpm/1-100303152908371-tomcatadmin/generate/3
lpm/1-100303152908371-tomcatadmin/parse/3 -threads 10 -actionid
1-100303152908371-tomcatadmin@domain_crawl
Thanks
RE: Tracking Metrics in Hadoop by User
Posted by sagar_shukla <sa...@persistent.co.in>.
Hi Steve,
I had observed issues with Ganglia in terms of refresh of data when the nodes go down or removed from the cluster. It could be because of the complexity of the environment, but I found Nagios useful in that front.
There is a Hadoop plugin available for Nagios which provides node-based statistics. Though I have not used it, but you can give it a try and see if that is useful in providing the details that you want.
http://exchange.nagios.org/directory/Plugins/Others/check_hadoop%252Ddfs-2Esh/details
Thanks,
Sagar Shukla
-----Original Message-----
From: Stephen Watt [mailto:swatt@us.ibm.com]
Sent: Tuesday, March 09, 2010 11:37 PM
To: common-user@hadoop.apache.org
Subject: Tracking Metrics in Hadoop by User
I'm interested in the ability to track metrics (such as CPU time, storage
used per machine, across the cluster) in Hadoop by User. I've taken a look
at the Fair and Capacity Schedulers and they seem oriented towards
ensuring fair use between users' jobs rather than providing a feature
which also reports what resources the users actually used on the cluster.
Likewise, with other tools like Ganglia, which appear to be concerned with
reporting metrics by machine (and not by job). I've also taken a look
through the common/metrics tickets in JIRA and there does not seem to be
any open work that addresses this requirement.
Have I missed something ? Has anyone been able to do this ? Is there a way
to capture metrics by Job (which could be correlated back to a user?) If
not, is there any current or forecasted work in the project that addresses
this requirement ?
Kind regards
Steve Watt
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Tracking Metrics in Hadoop by User
Posted by Stephen Watt <sw...@us.ibm.com>.
I'm interested in the ability to track metrics (such as CPU time, storage
used per machine, across the cluster) in Hadoop by User. I've taken a look
at the Fair and Capacity Schedulers and they seem oriented towards
ensuring fair use between users' jobs rather than providing a feature
which also reports what resources the users actually used on the cluster.
Likewise, with other tools like Ganglia, which appear to be concerned with
reporting metrics by machine (and not by job). I've also taken a look
through the common/metrics tickets in JIRA and there does not seem to be
any open work that addresses this requirement.
Have I missed something ? Has anyone been able to do this ? Is there a way
to capture metrics by Job (which could be correlated back to a user?) If
not, is there any current or forecasted work in the project that addresses
this requirement ?
Kind regards
Steve Watt
Re: Failed to set setXIncludeAware(true) for parser
Posted by Steve Loughran <st...@apache.org>.
Ted Yu wrote:
> Hi,
> We use nutch 1.0
> In nutch, we define the following according to
> http://issues.apache.org/jira/browse/HADOOP-5254:
>
> NUTCH_OPTS="$NUTCH_OPTS -Dhadoop.log.dir=$NUTCH_LOG_DIR
> -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl"
>
> But we still see:
> ERROR org.apache.hadoop.conf.Configuration: Failed to set
> setXIncludeAware(true) for parser
> org.apache.xerces.jaxp.DocumentBuilderFactoryImpl@17fe1feb:java.lang.UnsupportedOperationException:
> This parser does not support specification "null" version "null"
> java.lang.UnsupportedOperationException: This parser does not support
> specification "null" version "null"
>
> Can someone provide hint why the above error still appears ?
>
> Here is the command line (I symlinked jdk's rt.jar as 1rt.jar which appears
> before xerces-2_6_2.jar in the classpath):
-not sure that helps, as it complicates factory stuff.
Try
ant -diagnostics, post the results here as that does some XML parser
diagnostics work