You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@chukwa.apache.org by Kirk True <ki...@mustardgrain.com> on 2010/04/29 03:19:01 UTC

Re: Chukwa can't find Demux class - POSSIBLE FIX

Hi all,

The problem seems to stem from the fact that the call to 
DistributedCache.addFileToClassPath is passing in a Path that is in URI 
form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar whereas the 
DistributedCache API expects it to be a filesystem-based path (i.e. 
/chukwa/demux/mydemux.jar). I'm not sure why, but the FileStatus object 
returned by FileSystem.listStatus is returning a URL-based path instead 
of a filesystem-based path.

I kludged the Demux class' addParsers to strip the 
"hdfs://localhost:9000" portion of the string and now my class is found.

It's frustrating when stuff silently fails :) I even turned up the 
logging in Hadoop and Chukwa to TRACE and nothing was reported.

So, my question is, do I have something misconfigured that causes 
FileSystem.listStatus to return a URL-based path? Or does the code need 
to be changed?

Thanks,
Kirk

On 4/28/10 5:41 PM, Kirk True wrote:
> Hi all,
>
> Just for grins I copied the Java source byte-for-byte to the Chukwa 
> source folder and then ran:
>
>     ant clean main && cp build/*.jar .
>
>
> And it worked, as expected.
>
> When one adds custom demux classes to a JAR, sticks it in 
> hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow 
> magically merged with chukwa-core-0.4.0.jar to produce "job.jar" or do 
> they remain separate?
>
> Thanks,
> Kirk
>
> On 4/28/10 5:09 PM, Kirk True wrote:
>> Hi Jerome,
>>
>> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>>
>> I did notice that the JAVA_PLATFORM environment variable in 
>> chukwa-env.sh was set to 32-bit but Hadoop was defaulting to 64-bit 
>> (this is a 64-bit machine), but setting that to Linux-amd64-64 didn't 
>> make any difference.
>>
>> Thanks,
>> Kirk
>>
>> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>>> Are you using the same version of Java for your jar and Hadoop?
>>> /Jerome.
>>>
>>> On 4/28/10 3:33 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>
>>>     Hi Eric,
>>>
>>>     I added these to Hadoop's mapred-site.xml:
>>>
>>>
>>>     <property>
>>>     <name>keep.failed.task.files</name>
>>>     <value>true</value>
>>>     </property>
>>>     <property>
>>>     <name>mapred.job.tracker.persist.jobstatus.active</name>
>>>     <value>true</value>
>>>     </property>
>>>
>>>
>>>     This seems to have caused the task tracker directory to stick
>>>     around after the job is complete. So, for example, I have this
>>>     directory:
>>>
>>>
>>>     /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001
>>>
>>>
>>>     Under this directory I have the following files:
>>>
>>>
>>>     jars/
>>>     job.jar
>>>     org/ . . .
>>>     job.xml
>>>
>>>     My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
>>>     (apparently exploded job.jar) jars/org/... directory. However,
>>>     my demux JAR appears in three places in the job.xml:
>>>
>>>
>>>     <property>
>>>     <name>mapred.job.classpath.files</name>
>>>     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>>>     </property>
>>>     <property>
>>>     <name>mapred.jar</name>
>>>     <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001/jars/job.jar</value>
>>>     </property>
>>>     <property>
>>>     <name>mapred.cache.files</name>
>>>     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>>>     </property>
>>>
>>>
>>>     So it looks like when Demux.addParsers calls
>>>     DistributedCache.addFileToClassPath it's working as the above
>>>     job conf properties include my JAR.
>>>
>>>     Here's my JAR contents:
>>>
>>>
>>>     [kirk@skinner data-collection]$ unzip -l
>>>     data-collection-demux/target/data-collection-demux-0.1.jar
>>>     Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>>       Length     Date   Time    Name
>>>      --------    ----   ----    ----
>>>             0  04-28-10 15:19   META-INF/
>>>           123  04-28-10 15:19   META-INF/MANIFEST.MF
>>>             0  04-28-10 15:19   org/
>>>             0  04-28-10 15:19   org/apache/
>>>             0  04-28-10 15:19   org/apache/hadoop/
>>>             0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>>             0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>>             0  04-28-10 15:19
>>>       org/apache/hadoop/chukwa/extraction/demux/
>>>             0  04-28-10 15:19
>>>       org/apache/hadoop/chukwa/extraction/demux/processor/
>>>             0  04-28-10 15:19
>>>       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>>          1697  04-28-10 15:19
>>>       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.class
>>>             0  04-28-10 15:19   META-INF/maven/
>>>             0  04-28-10 15:19
>>>       META-INF/maven/com.cisco.flip.datacollection/
>>>             0  04-28-10 15:19
>>>       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>>          1448  04-28-10 00:23
>>>       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml
>>>           133  04-28-10 15:19
>>>       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.properties
>>>      --------                   -------
>>>          3401                   16 files
>>>
>>>
>>>     Here's how I'm copying the JAR into HDFS:
>>>
>>>
>>>     hadoop fs -mkdir /chukwa/demux
>>>     hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>>     /chukwa/demux
>>>
>>>     Any ideas of more things to try?
>>>
>>>     Thanks,
>>>     Kirk
>>>
>>>
>>>     On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"
>>>     <ey...@yahoo-inc.com> wrote:
>>>     > Kirk,
>>>     >
>>>     > The shell script and job related information are stored
>>>     temporarily in
>>>     > file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0xx
>>>     > x/, while the job is running.
>>>     >
>>>     > You should go into the jars directory and find out if the
>>>     compressed jar
>>>     > contains your class file.
>>>     >
>>>     > Regards,
>>>     > Eric
>>>     >
>>>     > On 4/28/10 1:57 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>     >
>>>     > > Hi Eric,
>>>     > >
>>>     > > I updated MapProcessorFactory.getProcessor to dump the URLs
>>>     from the
>>>     > > URLClassLoader from the MapProcessorFactory.class. This is
>>>     what I see:
>>>     > >
>>>     > >
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/conf/
>>>     > > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>>     > >
>>>     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>>     > >
>>>     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>>     > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>>     > >
>>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>>     > > attempt_201004281320_0001_m_000000_0/work/
>>>     > >
>>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>>     > > jars/classes
>>>     > >
>>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>>     > > jars/
>>>     > >
>>>     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>>>     > > attempt_201004281320_0001_m_000000_0/work/
>>>     > >
>>>     > >
>>>     > > Is that the expected classpath? I don't see any reference to
>>>     my JAR or the
>>>     > > Chukwa JARs.
>>>     > >
>>>     > > Also, when I try to view the contents of my
>>>     "job_<timestamp>_0001" directory,
>>>     > > it's automatically removed, so I can't really do any
>>>     forensics after the fact.
>>>     > > I know this is probably a Hadoop question, is it possible to
>>>     prevent that
>>>     > > auto-removal from occurring?
>>>     > >
>>>     > > Thanks,
>>>     > > Kirk
>>>     > >
>>>     > > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"
>>>     <ki...@mustardgrain.com> wrote:
>>>     > >> Hi Eric,
>>>     > >>
>>>     > >> On 4/28/10 10:23 AM, Eric Yang wrote:
>>>     > >>> Hi Kirk,
>>>     > >>>
>>>     > >>> Is the ownership of the jar file setup correctly as the
>>>     user that runs
>>>     > >>> demux?
>>>     > >>
>>>     > >> When browsing via the NameNode web UI, it lists permissions of
>>>     > >> "rw-r--r--" and "kirk" as the owner (which is also the user
>>>     ID running
>>>     > >> the Hadoop and Chukwa processes).
>>>     > >>
>>>     > >>>    You may find more information by looking at running
>>>     mapper task or
>>>     > >>> reducer task, and try to find out the task attempt shell
>>>     script.
>>>     > >>
>>>     > >> Where is the task attempt shell script located?
>>>     > >>
>>>     > >>>    Make sure
>>>     > >>> the files are downloaded correctly from distributed cache,
>>>     and referenced in
>>>     > >>> the locally generated jar file.  Hope this helps.
>>>     > >>>
>>>     > >>
>>>     > >> Sorry for asking such basic questions, but where is the locally
>>>     > >> generated JAR file found? I'm assuming under
>>>     /tmp/hadoop-<user>, by
>>>     > >> default? I saw one file named job_<timstamp>.jar but it
>>>     appeared to be a
>>>     > >> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my
>>>     "XmlBasedDemux"
>>>     > >> class was nowhere to be found.
>>>     > >>
>>>     > >> Thanks,
>>>     > >> Kirk
>>>     > >>
>>>     > >>> Regards,
>>>     > >>> Eric
>>>     > >>>
>>>     > >>> On 4/28/10 9:37 AM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>     > >>>
>>>     > >>>
>>>     > >>>> Hi guys,
>>>     > >>>>
>>>     > >>>> I have a custom Demux that I need to run to process my
>>>     input, but I'm
>>>     > >>>> getting
>>>     > >>>> ClassNotFoundException when running in Hadoop. This is
>>>     with the released
>>>     > >>>> 0.4.0
>>>     > >>>> build.
>>>     > >>>>
>>>     > >>>> I've done the following:
>>>     > >>>>
>>>     > >>>> 1. I put my Demux class in the correct package
>>>     > >>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>>     > >>>> 2. I've added the JAR containing the Demux implementation
>>>     to HDFS at
>>>     > >>>> /chuka/demux
>>>     > >>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>>     > >>>>
>>>     > >>>> The map/reduce job is picking up on the fact that I have a
>>>     custom Demux and
>>>     > >>>> is
>>>     > >>>> trying to load it, but I get a ClassNotFoundException. The
>>>     HDFS-based URL
>>>     > >>>> to
>>>     > >>>> the JAR is showing up in the job configuration in Hadoop,
>>>     which is another
>>>     > >>>> evidence that Chukwa and Hadoop know where the JAR lives
>>>     and that it's part
>>>     > >>>> of
>>>     > >>>> the Chukwa-initiated job.
>>>     > >>>>
>>>     > >>>> My Demux is very simple. I've stripped it down to a
>>>     System.out.println with
>>>     > >>>> dependencies on no other classes/JARs other than Chukwa,
>>>     Hadoop, and the
>>>     > >>>> core
>>>     > >>>> JDK. I've double-checked that my JAR is being built up
>>>     correctly. I'm
>>>     > >>>> completely flummoxed as to what I'm doing wrong.
>>>     > >>>>
>>>     > >>>> Any ideas what I'm missing? What other information can I
>>>     provide?
>>>     > >>>>
>>>     > >>>> Thanks!
>>>     > >>>> Kirk
>>>     > >>>>
>>>     > >>>>
>>>     > >>>
>>>     > >>
>>>     > >
>>>     > >
>>>     >
>>>     >
>>>
>>>

Re: Chukwa can't find Demux class - POSSIBLE FIX

Posted by Kirk True <ki...@mustardgrain.com>.
Hi Eric,

I've added CHUKWA-488 to track this issue. I hope to have a patch by EOD 
that fixes it (for me). If needed, it can be cleaned up before committing.

Thanks,
Kirk

On 5/19/10 6:51 PM, Eric Yang wrote:
> Kirk,
>
> Yes, it should be trivial to filter fs.default.name in Demux.java.  Please
> file a jira.  Thanks
>
> Regards,
> Eric
>
> On 5/19/10 6:01 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>
>    
>> Hi Eric,
>>
>> On 4/29/10 9:55 AM, Eric Yang wrote:
>>      
>>> Kirk,
>>>
>>> Is your tasktracker node on the same machine?  If it's refering to
>>> hdfs://localhost:9000, it means that your tasktracker will attempt to
>>> contact localhost as the namenode.  Make sure your fs.default.name is
>>> configured as your real hostname instead of localhost to prevent certain
>>> unexpected corner case similar to this.
>>>
>>>        
>> I grabbed the latest from SVN, and still see this problem :( I'm no
>> longer specifying the HDFS URL in Chukwa as CHUKWA-460 no longer
>> requires it. I updated the $HADOOP_HOME/conf/core-site.xml to specify
>> the actual host name (both full and short forms) and it still leaves
>> "hdfs://host.example.com" prefix in the classpath properties that Hadoop
>> is using. According to "Pro Hadoop" (as mentioned previously in this
>> email thread), the DistributedCache API wants the Path object to be
>> "/chukwa/demux/mydemux.jar", not
>> "hdfs://host.example.com:9000/chukwa/demux/mydemux.jar".
>>
>> Would it be possible to (somehow) grab the value of the
>> "fs.default.name" property in Demux.java and strip it off the path
>> before calling the DistributedCache API?
>>
>> Thanks,
>> Kirk
>>
>>      
>>> Regards,
>>> Eric
>>>
>>> On 4/29/10 9:46 AM, "Eric Yang"<ey...@yahoo-inc.com>   wrote:
>>>
>>>
>>>        
>>>> Kirk,
>>>>
>>>> On my system, it is returning /chukwa/demux/parsers.jar as URL.  I think
>>>> it¹s best to fix this in the code level.  Please file a jira, and I will
>>>> take care of this.  Thanks.
>>>>
>>>> Regards,
>>>> Eric
>>>>
>>>> On 4/28/10 6:50 PM, "Kirk True"<ki...@mustardgrain.com>   wrote:
>>>>
>>>>
>>>>          
>>>>> Hi Eric,
>>>>>
>>>>> If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two
>>>>> places:
>>>>> one is in the README and the other is in chukwa-collector-conf.xml for the
>>>>> writer.hdfs.filesystem property. I didn't change this file, so that should
>>>>> be
>>>>> the default. chukwa-common.xml's chukwa.data.dir is still just "/chukwa".
>>>>>
>>>>> Thanks,
>>>>> Kirk
>>>>>
>>>>> On 4/28/10 6:34 PM, Eric Yang wrote:
>>>>>
>>>>>            
>>>>>>    Re: Chukwa can't find Demux class - POSSIBLE FIX Hi Kirk,
>>>>>>
>>>>>> Check chukwa-common.xml and make sure that chukwa.data.dir does not have
>>>>>> hdfs://localhost:9000 pre-append to it.  It¹s best to leave namenode
>>>>>> address
>>>>>> out of this path for portability.
>>>>>>
>>>>>> Regards,
>>>>>> Eric
>>>>>>
>>>>>>
>>>>>> On 4/28/10 6:19 PM, "Kirk True"<ki...@mustardgrain.com>   wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> Hi all,
>>>>>>>
>>>>>>> The problem seems to stem from the fact that the call to
>>>>>>> DistributedCache.addFileToClassPath is passing in a Path that is in URI
>>>>>>> form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar whereas the
>>>>>>> DistributedCache API expects it to be a filesystem-based path (i.e.
>>>>>>> /chukwa/demux/mydemux.jar). I'm not sure why, but the FileStatus object
>>>>>>> returned by FileSystem.listStatus is returning a URL-based path instead
>>>>>>> of
>>>>>>> a
>>>>>>> filesystem-based path.
>>>>>>>
>>>>>>> I kludged the Demux class' addParsers to strip the
>>>>>>> "hdfs://localhost:9000"
>>>>>>> portion of the string and now my class is found.
>>>>>>>
>>>>>>> It's frustrating when stuff silently fails :) I even turned up the
>>>>>>> logging
>>>>>>> in Hadoop and Chukwa to TRACE and nothing was reported.
>>>>>>>
>>>>>>> So, my question is, do I have something misconfigured that causes
>>>>>>> FileSystem.listStatus to return a URL-based path? Or does the code need
>>>>>>> to
>>>>>>> be changed?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kirk
>>>>>>>
>>>>>>> On 4/28/10 5:41 PM, Kirk True wrote:
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>>>    Hi all,
>>>>>>>>
>>>>>>>> Just for grins I copied the Java source byte-for-byte to the Chukwa
>>>>>>>> source
>>>>>>>> folder and then ran:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> ant clean main&&   cp build/*.jar .
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>> And it worked, as expected.
>>>>>>>>
>>>>>>>> When one adds custom demux classes to a JAR, sticks it in
>>>>>>>> hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow
>>>>>>>> magically merged with chukwa-core-0.4.0.jar to produce "job.jar" or do
>>>>>>>> they
>>>>>>>> remain separate?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Kirk
>>>>>>>>
>>>>>>>> On 4/28/10 5:09 PM, Kirk True wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>>     Hi Jerome,
>>>>>>>>>
>>>>>>>>> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>>>>>>>>>
>>>>>>>>> I did notice that the JAVA_PLATFORM environment variable in
>>>>>>>>> chukwa-env.sh
>>>>>>>>> was set to 32-bit but Hadoop was defaulting to 64-bit (this is a 64-bit
>>>>>>>>> machine), but setting that to Linux-amd64-64 didn't make any
>>>>>>>>> difference.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kirk
>>>>>>>>>
>>>>>>>>> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>>>    Re: Chukwa can't find Demux class Are you using the same version of
>>>>>>>>>> Java
>>>>>>>>>> for your jar and Hadoop?
>>>>>>>>>> /Jerome.
>>>>>>>>>>
>>>>>>>>>> On 4/28/10 3:33 PM, "Kirk True"<ki...@mustardgrain.com>   wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>
>>>>>>>>>>> I added these to Hadoop's mapred-site.xml:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     <property>
>>>>>>>>>>>           <name>keep.failed.task.files</name>
>>>>>>>>>>>           <value>true</value>
>>>>>>>>>>>     </property>
>>>>>>>>>>>     <property>
>>>>>>>>>>>           <name>mapred.job.tracker.persist.jobstatus.active</name>
>>>>>>>>>>>           <value>true</value>
>>>>>>>>>>>     </property>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> This seems to have caused the task tracker directory to stick around
>>>>>>>>>>> after the job is complete. So, for example, I have this directory:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>> /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_000>>>>>>
>>>        
>>>>>            
>>> 1
>>>
>>>        
>>>>>>>>>>>
>>>>>>>>>>> Under this directory I have the following files:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> jars/
>>>>>>>>>>> job.jar
>>>>>>>>>>> org/ . . .
>>>>>>>>>>> job.xml
>>>>>>>>>>>
>>>>>>>>>>> My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
>>>>>>>>>>> (apparently exploded job.jar) jars/org/... directory. However, my
>>>>>>>>>>> demux
>>>>>>>>>>> JAR appears in three places in the job.xml:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    <property>
>>>>>>>>>>>       <name>mapred.job.classpath.files</name>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>
>>>        
>>>>>            
>>> <
>>>
>>>        
>>>>>>>>>>> /value>
>>>>>>>>>>> </property>
>>>>>>>>>>> <property>
>>>>>>>>>>>       <name>mapred.jar</name>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>> <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>
>>>        
>>>>>            
>>> 5
>>>
>>>        
>>>>>>>>>>> 19_0001/jars/job.jar</value>
>>>>>>>>>>> </property>
>>>>>>>>>>> <property>
>>>>>>>>>>>       <name>mapred.cache.files</name>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>
>>>        
>>>>>            
>>> <
>>>
>>>        
>>>>>>>>>>> /value>
>>>>>>>>>>> </property>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> So it looks like when Demux.addParsers calls
>>>>>>>>>>> DistributedCache.addFileToClassPath it's working as the above job
>>>>>>>>>>> conf
>>>>>>>>>>> properties include my JAR.
>>>>>>>>>>>
>>>>>>>>>>> Here's my JAR contents:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    [kirk@skinner data-collection]$ unzip -l
>>>>>>>>>>> data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>>>>> Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>>>>>     Length     Date   Time    Name
>>>>>>>>>>>    --------    ----   ----    ----
>>>>>>>>>>>           0  04-28-10 15:19   META-INF/
>>>>>>>>>>>         123  04-28-10 15:19   META-INF/MANIFEST.MF
>>>>>>>>>>>           0  04-28-10 15:19   org/
>>>>>>>>>>>           0  04-28-10 15:19   org/apache/
>>>>>>>>>>>           0  04-28-10 15:19   org/apache/hadoop/
>>>>>>>>>>>           0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>>>>>>>>>>           0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>>>>>>>>>>           0  04-28-10 15:19
>>>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/
>>>>>>>>>>>           0  04-28-10 15:19
>>>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/
>>>>>>>>>>>           0  04-28-10 15:19
>>>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>>>>>>>>>>        1697  04-28-10 15:19
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemu>>>>>>
>>>        
>>>>>            
>>> x
>>>
>>>        
>>>>>>>>>>> .class
>>>>>>>>>>>           0  04-28-10 15:19   META-INF/maven/
>>>>>>>>>>>           0  04-28-10 15:19
>>>>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/
>>>>>>>>>>>           0  04-28-10 15:19
>>>>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>>>>>>>>>>        1448  04-28-10 00:23
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>
>>>        
>>>>>            
>>> x
>>>
>>>        
>>>>>>>>>>> ml
>>>>>>>>>>>         133  04-28-10 15:19
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>
>>>        
>>>>>            
>>> p
>>>
>>>        
>>>>>>>>>>> roperties
>>>>>>>>>>>    --------                   -------
>>>>>>>>>>>        3401                   16 files
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Here's how I'm copying the JAR into HDFS:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>    hadoop fs -mkdir /chukwa/demux
>>>>>>>>>>> hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>>>>>>>>>> /chukwa/demux
>>>>>>>>>>>
>>>>>>>>>>> Any ideas of more things to try?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kirk
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"<ey...@yahoo-inc.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>>>> Kirk,
>>>>>>>>>>>>
>>>>>>>>>>>> The shell script and job related information are stored temporarily
>>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813>>>>>>>
>>>        
>>>>>            
>>> 2
>>>
>>>        
>>>>>>>>>>>> 0_0xx
>>>>>>>>>>>> x/, while the job is running.
>>>>>>>>>>>>
>>>>>>>>>>>> You should go into the jars directory and find out if the compressed
>>>>>>>>>>>> jar
>>>>>>>>>>>> contains your class file.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Eric
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/28/10 1:57 PM, "Kirk True"<ki...@mustardgrain.com>   wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I updated MapProcessorFactory.getProcessor to dump the URLs from
>>>>>>>>>>>>> the
>>>>>>>>>>>>> URLClassLoader from the MapProcessorFactory.class. This is what I
>>>>>>>>>>>>> see:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/conf/
>>>>>>>>>>>>> file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>>>        
>>>>>            
>>> 3
>>>
>>>        
>>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>>>        
>>>>>            
>>> 3
>>>
>>>        
>>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>>> jars/classes
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>>>        
>>>>>            
>>> 3
>>>
>>>        
>>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>>> jars/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>>>        
>>>>>            
>>> 3
>>>
>>>        
>>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is that the expected classpath? I don't see any reference to my JAR
>>>>>>>>>>>>> or
>>>>>>>>>>>>> the
>>>>>>>>>>>>> Chukwa JARs.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also, when I try to view the contents of my "job_<timestamp>_0001"
>>>>>>>>>>>>> directory,
>>>>>>>>>>>>> it's automatically removed, so I can't really do any forensics
>>>>>>>>>>>>> after
>>>>>>>>>>>>> the fact.
>>>>>>>>>>>>> I know this is probably a Hadoop question, is it possible to
>>>>>>>>>>>>> prevent
>>>>>>>>>>>>> that
>>>>>>>>>>>>> auto-removal from occurring?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Kirk
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"<ki...@mustardgrain.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/28/10 10:23 AM, Eric Yang wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Kirk,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is the ownership of the jar file setup correctly as the user that
>>>>>>>>>>>>> runs
>>>>>>>>>>>>> demux?
>>>>>>>>>>>>>
>>>>>>>>>>>>> When browsing via the NameNode web UI, it lists permissions of
>>>>>>>>>>>>> "rw-r--r--" and "kirk" as the owner (which is also the user ID
>>>>>>>>>>>>> running
>>>>>>>>>>>>> the Hadoop and Chukwa processes).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>      You may find more information by looking at running mapper task
>>>>>>>>>>>>> or
>>>>>>>>>>>>> reducer task, and try to find out the task attempt shell script.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Where is the task attempt shell script located?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>      Make sure
>>>>>>>>>>>>> the files are downloaded correctly from distributed cache, and
>>>>>>>>>>>>> referenced in
>>>>>>>>>>>>> the locally generated jar file.  Hope this helps.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry for asking such basic questions, but where is the locally
>>>>>>>>>>>>> generated JAR file found? I'm assuming under /tmp/hadoop-<user>, by
>>>>>>>>>>>>> default? I saw one file named job_<timstamp>.jar but it appeared to
>>>>>>>>>>>>> be a
>>>>>>>>>>>>> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my
>>>>>>>>>>>>> "XmlBasedDemux"
>>>>>>>>>>>>> class was nowhere to be found.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Kirk
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Eric
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/28/10 9:37 AM, "Kirk True"<ki...@mustardgrain.com>    wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a custom Demux that I need to run to process my input, but
>>>>>>>>>>>>> I'm
>>>>>>>>>>>>> getting
>>>>>>>>>>>>> ClassNotFoundException when running in Hadoop. This is with the
>>>>>>>>>>>>> released
>>>>>>>>>>>>> 0.4.0
>>>>>>>>>>>>> build.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've done the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I put my Demux class in the correct package
>>>>>>>>>>>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>>>>>>>>>>>> 2. I've added the JAR containing the Demux implementation to HDFS
>>>>>>>>>>>>> at
>>>>>>>>>>>>> /chuka/demux
>>>>>>>>>>>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>>>>>>>>>>>>
>>>>>>>>>>>>> The map/reduce job is picking up on the fact that I have a custom
>>>>>>>>>>>>> Demux and
>>>>>>>>>>>>> is
>>>>>>>>>>>>> trying to load it, but I get a ClassNotFoundException. The
>>>>>>>>>>>>> HDFS-based URL
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the JAR is showing up in the job configuration in Hadoop, which is
>>>>>>>>>>>>> another
>>>>>>>>>>>>> evidence that Chukwa and Hadoop know where the JAR lives and that
>>>>>>>>>>>>> it's part
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the Chukwa-initiated job.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My Demux is very simple. I've stripped it down to a
>>>>>>>>>>>>> System.out.println with
>>>>>>>>>>>>> dependencies on no other classes/JARs other than Chukwa, Hadoop,
>>>>>>>>>>>>> and the
>>>>>>>>>>>>> core
>>>>>>>>>>>>> JDK. I've double-checked that my JAR is being built up correctly.
>>>>>>>>>>>>> I'm
>>>>>>>>>>>>> completely flummoxed as to what I'm doing wrong.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas what I'm missing? What other information can I provide?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Kirk
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>
>>>>>>>>                  
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>
>>>>>            
>>>>
>>>>          
>>>
>>>        
>    

Re: Chukwa can't find Demux class - POSSIBLE FIX

Posted by Eric Yang <ey...@yahoo-inc.com>.
Kirk,

Yes, it should be trivial to filter fs.default.name in Demux.java.  Please
file a jira.  Thanks

Regards,
Eric

On 5/19/10 6:01 PM, "Kirk True" <ki...@mustardgrain.com> wrote:

> Hi Eric,
> 
> On 4/29/10 9:55 AM, Eric Yang wrote:
>> Kirk,
>> 
>> Is your tasktracker node on the same machine?  If it's refering to
>> hdfs://localhost:9000, it means that your tasktracker will attempt to
>> contact localhost as the namenode.  Make sure your fs.default.name is
>> configured as your real hostname instead of localhost to prevent certain
>> unexpected corner case similar to this.
>>    
> 
> I grabbed the latest from SVN, and still see this problem :( I'm no
> longer specifying the HDFS URL in Chukwa as CHUKWA-460 no longer
> requires it. I updated the $HADOOP_HOME/conf/core-site.xml to specify
> the actual host name (both full and short forms) and it still leaves
> "hdfs://host.example.com" prefix in the classpath properties that Hadoop
> is using. According to "Pro Hadoop" (as mentioned previously in this
> email thread), the DistributedCache API wants the Path object to be
> "/chukwa/demux/mydemux.jar", not
> "hdfs://host.example.com:9000/chukwa/demux/mydemux.jar".
> 
> Would it be possible to (somehow) grab the value of the
> "fs.default.name" property in Demux.java and strip it off the path
> before calling the DistributedCache API?
> 
> Thanks,
> Kirk
> 
>> Regards,
>> Eric
>> 
>> On 4/29/10 9:46 AM, "Eric Yang"<ey...@yahoo-inc.com>  wrote:
>> 
>>    
>>> Kirk,
>>> 
>>> On my system, it is returning /chukwa/demux/parsers.jar as URL.  I think
>>> it¹s best to fix this in the code level.  Please file a jira, and I will
>>> take care of this.  Thanks.
>>> 
>>> Regards,
>>> Eric
>>> 
>>> On 4/28/10 6:50 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>> 
>>>      
>>>> Hi Eric,
>>>> 
>>>> If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two
>>>> places:
>>>> one is in the README and the other is in chukwa-collector-conf.xml for the
>>>> writer.hdfs.filesystem property. I didn't change this file, so that should
>>>> be
>>>> the default. chukwa-common.xml's chukwa.data.dir is still just "/chukwa".
>>>> 
>>>> Thanks,
>>>> Kirk
>>>> 
>>>> On 4/28/10 6:34 PM, Eric Yang wrote:
>>>>        
>>>>>   Re: Chukwa can't find Demux class - POSSIBLE FIX Hi Kirk,
>>>>> 
>>>>> Check chukwa-common.xml and make sure that chukwa.data.dir does not have
>>>>> hdfs://localhost:9000 pre-append to it.  It¹s best to leave namenode
>>>>> address
>>>>> out of this path for portability.
>>>>> 
>>>>> Regards,
>>>>> Eric
>>>>> 
>>>>> 
>>>>> On 4/28/10 6:19 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>> 
>>>>> 
>>>>>          
>>>>>> Hi all,
>>>>>> 
>>>>>> The problem seems to stem from the fact that the call to
>>>>>> DistributedCache.addFileToClassPath is passing in a Path that is in URI
>>>>>> form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar whereas the
>>>>>> DistributedCache API expects it to be a filesystem-based path (i.e.
>>>>>> /chukwa/demux/mydemux.jar). I'm not sure why, but the FileStatus object
>>>>>> returned by FileSystem.listStatus is returning a URL-based path instead
>>>>>> of
>>>>>> a
>>>>>> filesystem-based path.
>>>>>> 
>>>>>> I kludged the Demux class' addParsers to strip the
>>>>>> "hdfs://localhost:9000"
>>>>>> portion of the string and now my class is found.
>>>>>> 
>>>>>> It's frustrating when stuff silently fails :) I even turned up the
>>>>>> logging
>>>>>> in Hadoop and Chukwa to TRACE and nothing was reported.
>>>>>> 
>>>>>> So, my question is, do I have something misconfigured that causes
>>>>>> FileSystem.listStatus to return a URL-based path? Or does the code need
>>>>>> to
>>>>>> be changed?
>>>>>> 
>>>>>> Thanks,
>>>>>> Kirk
>>>>>> 
>>>>>> On 4/28/10 5:41 PM, Kirk True wrote:
>>>>>> 
>>>>>>            
>>>>>>>   Hi all,
>>>>>>> 
>>>>>>> Just for grins I copied the Java source byte-for-byte to the Chukwa
>>>>>>> source
>>>>>>> folder and then ran:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>            
>>>>>>>> ant clean main&&  cp build/*.jar .
>>>>>>>> 
>>>>>>>> 
>>>>>>>>           
>>>>>>> 
>>>>>>> And it worked, as expected.
>>>>>>> 
>>>>>>> When one adds custom demux classes to a JAR, sticks it in
>>>>>>> hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow
>>>>>>> magically merged with chukwa-core-0.4.0.jar to produce "job.jar" or do
>>>>>>> they
>>>>>>> remain separate?
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Kirk
>>>>>>> 
>>>>>>> On 4/28/10 5:09 PM, Kirk True wrote:
>>>>>>> 
>>>>>>>            
>>>>>>>>    Hi Jerome,
>>>>>>>> 
>>>>>>>> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>>>>>>>> 
>>>>>>>> I did notice that the JAVA_PLATFORM environment variable in
>>>>>>>> chukwa-env.sh
>>>>>>>> was set to 32-bit but Hadoop was defaulting to 64-bit (this is a 64-bit
>>>>>>>> machine), but setting that to Linux-amd64-64 didn't make any
>>>>>>>> difference.
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Kirk
>>>>>>>> 
>>>>>>>> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>>>>>>>> 
>>>>>>>>           
>>>>>>>>>   Re: Chukwa can't find Demux class Are you using the same version of
>>>>>>>>> Java
>>>>>>>>> for your jar and Hadoop?
>>>>>>>>> /Jerome.
>>>>>>>>> 
>>>>>>>>> On 4/28/10 3:33 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>          
>>>>>>>>>> Hi Eric,
>>>>>>>>>> 
>>>>>>>>>> I added these to Hadoop's mapred-site.xml:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>    <property>
>>>>>>>>>>          <name>keep.failed.task.files</name>
>>>>>>>>>>          <value>true</value>
>>>>>>>>>>    </property>
>>>>>>>>>>    <property>
>>>>>>>>>>          <name>mapred.job.tracker.persist.jobstatus.active</name>
>>>>>>>>>>          <value>true</value>
>>>>>>>>>>    </property>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> This seems to have caused the task tracker directory to stick around
>>>>>>>>>> after the job is complete. So, for example, I have this directory:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>         
>> /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_000>>>>>>
>> >>
>> 1
>>    
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Under this directory I have the following files:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> jars/
>>>>>>>>>> job.jar
>>>>>>>>>> org/ . . .
>>>>>>>>>> job.xml
>>>>>>>>>> 
>>>>>>>>>> My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
>>>>>>>>>> (apparently exploded job.jar) jars/org/... directory. However, my
>>>>>>>>>> demux
>>>>>>>>>> JAR appears in three places in the job.xml:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>   <property>
>>>>>>>>>>      <name>mapred.job.classpath.files</name>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>         
>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>
>> >>
>> <
>>    
>>>>>>>>>> /value>
>>>>>>>>>> </property>
>>>>>>>>>> <property>
>>>>>>>>>>      <name>mapred.jar</name>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>         
>> <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>
>> >>
>> 5
>>    
>>>>>>>>>> 19_0001/jars/job.jar</value>
>>>>>>>>>> </property>
>>>>>>>>>> <property>
>>>>>>>>>>      <name>mapred.cache.files</name>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>         
>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>
>> >>
>> <
>>    
>>>>>>>>>> /value>
>>>>>>>>>> </property>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> So it looks like when Demux.addParsers calls
>>>>>>>>>> DistributedCache.addFileToClassPath it's working as the above job
>>>>>>>>>> conf
>>>>>>>>>> properties include my JAR.
>>>>>>>>>> 
>>>>>>>>>> Here's my JAR contents:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>   [kirk@skinner data-collection]$ unzip -l
>>>>>>>>>> data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>>>> Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>>>>    Length     Date   Time    Name
>>>>>>>>>>   --------    ----   ----    ----
>>>>>>>>>>          0  04-28-10 15:19   META-INF/
>>>>>>>>>>        123  04-28-10 15:19   META-INF/MANIFEST.MF
>>>>>>>>>>          0  04-28-10 15:19   org/
>>>>>>>>>>          0  04-28-10 15:19   org/apache/
>>>>>>>>>>          0  04-28-10 15:19   org/apache/hadoop/
>>>>>>>>>>          0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>>>>>>>>>          0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/
>>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/
>>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>>>>>>>>>       1697  04-28-10 15:19
>>>>>>>>>> 
>>>>>>>>>>         
>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemu>>>>>>
>> >>
>> x
>>    
>>>>>>>>>> .class
>>>>>>>>>>          0  04-28-10 15:19   META-INF/maven/
>>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/
>>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>>>>>>>>>       1448  04-28-10 00:23
>>>>>>>>>> 
>>>>>>>>>>         
>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>
>> >>
>> x
>>    
>>>>>>>>>> ml
>>>>>>>>>>        133  04-28-10 15:19
>>>>>>>>>> 
>>>>>>>>>>         
>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>
>> >>
>> p
>>    
>>>>>>>>>> roperties
>>>>>>>>>>   --------                   -------
>>>>>>>>>>       3401                   16 files
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Here's how I'm copying the JAR into HDFS:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>   hadoop fs -mkdir /chukwa/demux
>>>>>>>>>> hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>>>>>>>>> /chukwa/demux
>>>>>>>>>> 
>>>>>>>>>> Any ideas of more things to try?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Kirk
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"<ey...@yahoo-inc.com>
>>>>>>>>>> wrote:
>>>>>>>>>>         
>>>>>>>>>>> Kirk,
>>>>>>>>>>> 
>>>>>>>>>>> The shell script and job related information are stored temporarily
>>>>>>>>>>> in
>>>>>>>>>>> 
>>>>>>>>>>>        
>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813>>>>>>>
>> >>
>> 2
>>    
>>>>>>>>>>> 0_0xx
>>>>>>>>>>> x/, while the job is running.
>>>>>>>>>>> 
>>>>>>>>>>> You should go into the jars directory and find out if the compressed
>>>>>>>>>>> jar
>>>>>>>>>>> contains your class file.
>>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Eric
>>>>>>>>>>> 
>>>>>>>>>>> On 4/28/10 1:57 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>>        
>>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>> 
>>>>>>>>>>>> I updated MapProcessorFactory.getProcessor to dump the URLs from
>>>>>>>>>>>> the
>>>>>>>>>>>> URLClassLoader from the MapProcessorFactory.class. This is what I
>>>>>>>>>>>> see:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/conf/
>>>>>>>>>>>> file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>> >>
>> 3
>>    
>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>> >>
>> 3
>>    
>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>> jars/classes
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>> >>
>> 3
>>    
>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>> jars/
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
>> >>
>> 3
>>    
>>>>>>>>>>>> 20_0001/
>>>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Is that the expected classpath? I don't see any reference to my JAR
>>>>>>>>>>>> or
>>>>>>>>>>>> the
>>>>>>>>>>>> Chukwa JARs.
>>>>>>>>>>>> 
>>>>>>>>>>>> Also, when I try to view the contents of my "job_<timestamp>_0001"
>>>>>>>>>>>> directory,
>>>>>>>>>>>> it's automatically removed, so I can't really do any forensics
>>>>>>>>>>>> after
>>>>>>>>>>>> the fact.
>>>>>>>>>>>> I know this is probably a Hadoop question, is it possible to
>>>>>>>>>>>> prevent
>>>>>>>>>>>> that
>>>>>>>>>>>> auto-removal from occurring?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Kirk
>>>>>>>>>>>> 
>>>>>>>>>>>> On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"<ki...@mustardgrain.com>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>       
>>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>> 
>>>>>>>>>>>> On 4/28/10 10:23 AM, Eric Yang wrote:
>>>>>>>>>>>>       
>>>>>>>>>>>> Hi Kirk,
>>>>>>>>>>>> 
>>>>>>>>>>>> Is the ownership of the jar file setup correctly as the user that
>>>>>>>>>>>> runs
>>>>>>>>>>>> demux?
>>>>>>>>>>>>       
>>>>>>>>>>>> When browsing via the NameNode web UI, it lists permissions of
>>>>>>>>>>>> "rw-r--r--" and "kirk" as the owner (which is also the user ID
>>>>>>>>>>>> running
>>>>>>>>>>>> the Hadoop and Chukwa processes).
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>>>>>>>>>>>>     You may find more information by looking at running mapper task
>>>>>>>>>>>> or
>>>>>>>>>>>> reducer task, and try to find out the task attempt shell script.
>>>>>>>>>>>>       
>>>>>>>>>>>> Where is the task attempt shell script located?
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>>>>>>>>>>>>     Make sure
>>>>>>>>>>>> the files are downloaded correctly from distributed cache, and
>>>>>>>>>>>> referenced in
>>>>>>>>>>>> the locally generated jar file.  Hope this helps.
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>>>>>>>>>>>> Sorry for asking such basic questions, but where is the locally
>>>>>>>>>>>> generated JAR file found? I'm assuming under /tmp/hadoop-<user>, by
>>>>>>>>>>>> default? I saw one file named job_<timstamp>.jar but it appeared to
>>>>>>>>>>>> be a
>>>>>>>>>>>> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my
>>>>>>>>>>>> "XmlBasedDemux"
>>>>>>>>>>>> class was nowhere to be found.
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Kirk
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Eric
>>>>>>>>>>>> 
>>>>>>>>>>>> On 4/28/10 9:37 AM, "Kirk True"<ki...@mustardgrain.com>   wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have a custom Demux that I need to run to process my input, but
>>>>>>>>>>>> I'm
>>>>>>>>>>>> getting
>>>>>>>>>>>> ClassNotFoundException when running in Hadoop. This is with the
>>>>>>>>>>>> released
>>>>>>>>>>>> 0.4.0
>>>>>>>>>>>> build.
>>>>>>>>>>>> 
>>>>>>>>>>>> I've done the following:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. I put my Demux class in the correct package
>>>>>>>>>>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>>>>>>>>>>> 2. I've added the JAR containing the Demux implementation to HDFS
>>>>>>>>>>>> at
>>>>>>>>>>>> /chuka/demux
>>>>>>>>>>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>>>>>>>>>>> 
>>>>>>>>>>>> The map/reduce job is picking up on the fact that I have a custom
>>>>>>>>>>>> Demux and
>>>>>>>>>>>> is
>>>>>>>>>>>> trying to load it, but I get a ClassNotFoundException. The
>>>>>>>>>>>> HDFS-based URL
>>>>>>>>>>>> to
>>>>>>>>>>>> the JAR is showing up in the job configuration in Hadoop, which is
>>>>>>>>>>>> another
>>>>>>>>>>>> evidence that Chukwa and Hadoop know where the JAR lives and that
>>>>>>>>>>>> it's part
>>>>>>>>>>>> of
>>>>>>>>>>>> the Chukwa-initiated job.
>>>>>>>>>>>> 
>>>>>>>>>>>> My Demux is very simple. I've stripped it down to a
>>>>>>>>>>>> System.out.println with
>>>>>>>>>>>> dependencies on no other classes/JARs other than Chukwa, Hadoop,
>>>>>>>>>>>> and the
>>>>>>>>>>>> core
>>>>>>>>>>>> JDK. I've double-checked that my JAR is being built up correctly.
>>>>>>>>>>>> I'm
>>>>>>>>>>>> completely flummoxed as to what I'm doing wrong.
>>>>>>>>>>>> 
>>>>>>>>>>>> Any ideas what I'm missing? What other information can I provide?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Kirk
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>>>>>>>>>>>>       
>>>>>>>>>>>> 
>>>>>>>>>>>>       
>>>>>>>>>>> 
>>>>>>>>>>>        
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>         
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>          
>>>>>>>> 
>>>>>>>> 
>>>>>>>>           
>>>>>>> 
>>>>>>>            
>>>>>> 
>>>>>> 
>>>>>>            
>>>>        
>>>      
>>    


Re: Chukwa can't find Demux class - POSSIBLE FIX

Posted by Kirk True <ki...@mustardgrain.com>.
Hi Eric,

On 4/29/10 9:55 AM, Eric Yang wrote:
> Kirk,
>
> Is your tasktracker node on the same machine?  If it's refering to
> hdfs://localhost:9000, it means that your tasktracker will attempt to
> contact localhost as the namenode.  Make sure your fs.default.name is
> configured as your real hostname instead of localhost to prevent certain
> unexpected corner case similar to this.
>    

I grabbed the latest from SVN, and still see this problem :( I'm no 
longer specifying the HDFS URL in Chukwa as CHUKWA-460 no longer 
requires it. I updated the $HADOOP_HOME/conf/core-site.xml to specify 
the actual host name (both full and short forms) and it still leaves 
"hdfs://host.example.com" prefix in the classpath properties that Hadoop 
is using. According to "Pro Hadoop" (as mentioned previously in this 
email thread), the DistributedCache API wants the Path object to be 
"/chukwa/demux/mydemux.jar", not 
"hdfs://host.example.com:9000/chukwa/demux/mydemux.jar".

Would it be possible to (somehow) grab the value of the 
"fs.default.name" property in Demux.java and strip it off the path 
before calling the DistributedCache API?

Thanks,
Kirk

> Regards,
> Eric
>
> On 4/29/10 9:46 AM, "Eric Yang"<ey...@yahoo-inc.com>  wrote:
>
>    
>> Kirk,
>>
>> On my system, it is returning /chukwa/demux/parsers.jar as URL.  I think
>> it¹s best to fix this in the code level.  Please file a jira, and I will
>> take care of this.  Thanks.
>>
>> Regards,
>> Eric
>>
>> On 4/28/10 6:50 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>
>>      
>>> Hi Eric,
>>>
>>> If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two places:
>>> one is in the README and the other is in chukwa-collector-conf.xml for the
>>> writer.hdfs.filesystem property. I didn't change this file, so that should be
>>> the default. chukwa-common.xml's chukwa.data.dir is still just "/chukwa".
>>>
>>> Thanks,
>>> Kirk
>>>
>>> On 4/28/10 6:34 PM, Eric Yang wrote:
>>>        
>>>>   Re: Chukwa can't find Demux class - POSSIBLE FIX Hi Kirk,
>>>>
>>>> Check chukwa-common.xml and make sure that chukwa.data.dir does not have
>>>> hdfs://localhost:9000 pre-append to it.  It¹s best to leave namenode address
>>>> out of this path for portability.
>>>>
>>>> Regards,
>>>> Eric
>>>>
>>>>
>>>> On 4/28/10 6:19 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>
>>>>
>>>>          
>>>>> Hi all,
>>>>>
>>>>> The problem seems to stem from the fact that the call to
>>>>> DistributedCache.addFileToClassPath is passing in a Path that is in URI
>>>>> form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar whereas the
>>>>> DistributedCache API expects it to be a filesystem-based path (i.e.
>>>>> /chukwa/demux/mydemux.jar). I'm not sure why, but the FileStatus object
>>>>> returned by FileSystem.listStatus is returning a URL-based path instead of
>>>>> a
>>>>> filesystem-based path.
>>>>>
>>>>> I kludged the Demux class' addParsers to strip the "hdfs://localhost:9000"
>>>>> portion of the string and now my class is found.
>>>>>
>>>>> It's frustrating when stuff silently fails :) I even turned up the logging
>>>>> in Hadoop and Chukwa to TRACE and nothing was reported.
>>>>>
>>>>> So, my question is, do I have something misconfigured that causes
>>>>> FileSystem.listStatus to return a URL-based path? Or does the code need to
>>>>> be changed?
>>>>>
>>>>> Thanks,
>>>>> Kirk
>>>>>
>>>>> On 4/28/10 5:41 PM, Kirk True wrote:
>>>>>
>>>>>            
>>>>>>   Hi all,
>>>>>>
>>>>>> Just for grins I copied the Java source byte-for-byte to the Chukwa source
>>>>>> folder and then ran:
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>>>> ant clean main&&  cp build/*.jar .
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>> And it worked, as expected.
>>>>>>
>>>>>> When one adds custom demux classes to a JAR, sticks it in
>>>>>> hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow
>>>>>> magically merged with chukwa-core-0.4.0.jar to produce "job.jar" or do
>>>>>> they
>>>>>> remain separate?
>>>>>>
>>>>>> Thanks,
>>>>>> Kirk
>>>>>>
>>>>>> On 4/28/10 5:09 PM, Kirk True wrote:
>>>>>>
>>>>>>              
>>>>>>>    Hi Jerome,
>>>>>>>
>>>>>>> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>>>>>>>
>>>>>>> I did notice that the JAVA_PLATFORM environment variable in chukwa-env.sh
>>>>>>> was set to 32-bit but Hadoop was defaulting to 64-bit (this is a 64-bit
>>>>>>> machine), but setting that to Linux-amd64-64 didn't make any difference.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Kirk
>>>>>>>
>>>>>>> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>>>>>>>
>>>>>>>                
>>>>>>>>   Re: Chukwa can't find Demux class Are you using the same version of
>>>>>>>> Java
>>>>>>>> for your jar and Hadoop?
>>>>>>>> /Jerome.
>>>>>>>>
>>>>>>>> On 4/28/10 3:33 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>>> Hi Eric,
>>>>>>>>>
>>>>>>>>> I added these to Hadoop's mapred-site.xml:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    <property>
>>>>>>>>>          <name>keep.failed.task.files</name>
>>>>>>>>>          <value>true</value>
>>>>>>>>>    </property>
>>>>>>>>>    <property>
>>>>>>>>>          <name>mapred.job.tracker.persist.jobstatus.active</name>
>>>>>>>>>          <value>true</value>
>>>>>>>>>    </property>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> This seems to have caused the task tracker directory to stick around
>>>>>>>>> after the job is complete. So, for example, I have this directory:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
> /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_000>>>>>>>>
> 1
>    
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Under this directory I have the following files:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> jars/
>>>>>>>>> job.jar
>>>>>>>>> org/ . . .
>>>>>>>>> job.xml
>>>>>>>>>
>>>>>>>>> My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
>>>>>>>>> (apparently exploded job.jar) jars/org/... directory. However, my demux
>>>>>>>>> JAR appears in three places in the job.xml:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>      <name>mapred.job.classpath.files</name>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>>>
> <
>    
>>>>>>>>> /value>
>>>>>>>>> </property>
>>>>>>>>> <property>
>>>>>>>>>      <name>mapred.jar</name>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
> <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
> 5
>    
>>>>>>>>> 19_0001/jars/job.jar</value>
>>>>>>>>> </property>
>>>>>>>>> <property>
>>>>>>>>>      <name>mapred.cache.files</name>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>>>
> <
>    
>>>>>>>>> /value>
>>>>>>>>> </property>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So it looks like when Demux.addParsers calls
>>>>>>>>> DistributedCache.addFileToClassPath it's working as the above job conf
>>>>>>>>> properties include my JAR.
>>>>>>>>>
>>>>>>>>> Here's my JAR contents:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   [kirk@skinner data-collection]$ unzip -l
>>>>>>>>> data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>>> Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>>>    Length     Date   Time    Name
>>>>>>>>>   --------    ----   ----    ----
>>>>>>>>>          0  04-28-10 15:19   META-INF/
>>>>>>>>>        123  04-28-10 15:19   META-INF/MANIFEST.MF
>>>>>>>>>          0  04-28-10 15:19   org/
>>>>>>>>>          0  04-28-10 15:19   org/apache/
>>>>>>>>>          0  04-28-10 15:19   org/apache/hadoop/
>>>>>>>>>          0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>>>>>>>>          0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>>>>>>>>          0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/demux/
>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/
>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>>>>>>>>       1697  04-28-10 15:19
>>>>>>>>>
>>>>>>>>>                    
> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemu>>>>>>>>
> x
>    
>>>>>>>>> .class
>>>>>>>>>          0  04-28-10 15:19   META-INF/maven/
>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/
>>>>>>>>>          0  04-28-10 15:19
>>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>>>>>>>>       1448  04-28-10 00:23
>>>>>>>>>
>>>>>>>>>                    
> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>>>
> x
>    
>>>>>>>>> ml
>>>>>>>>>        133  04-28-10 15:19
>>>>>>>>>
>>>>>>>>>                    
> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>>>
> p
>    
>>>>>>>>> roperties
>>>>>>>>>   --------                   -------
>>>>>>>>>       3401                   16 files
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Here's how I'm copying the JAR into HDFS:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   hadoop fs -mkdir /chukwa/demux
>>>>>>>>> hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>>>>>>>> /chukwa/demux
>>>>>>>>>
>>>>>>>>> Any ideas of more things to try?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Kirk
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"<ey...@yahoo-inc.com>
>>>>>>>>> wrote:
>>>>>>>>>                    
>>>>>>>>>> Kirk,
>>>>>>>>>>
>>>>>>>>>> The shell script and job related information are stored temporarily in
>>>>>>>>>>
>>>>>>>>>>                      
> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813>>>>>>>>>
> 2
>    
>>>>>>>>>> 0_0xx
>>>>>>>>>> x/, while the job is running.
>>>>>>>>>>
>>>>>>>>>> You should go into the jars directory and find out if the compressed
>>>>>>>>>> jar
>>>>>>>>>> contains your class file.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Eric
>>>>>>>>>>
>>>>>>>>>> On 4/28/10 1:57 PM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>
>>>>>>>>>>> I updated MapProcessorFactory.getProcessor to dump the URLs from the
>>>>>>>>>>> URLClassLoader from the MapProcessorFactory.class. This is what I
>>>>>>>>>>> see:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/conf/
>>>>>>>>>>> file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>>>>>>
>>>>>>>>>>>                        
> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
> 3
>    
>>>>>>>>>>> 20_0001/
>>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>>>
>>>>>>>>>>>                        
> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
> 3
>    
>>>>>>>>>>> 20_0001/
>>>>>>>>>>> jars/classes
>>>>>>>>>>>
>>>>>>>>>>>                        
> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
> 3
>    
>>>>>>>>>>> 20_0001/
>>>>>>>>>>> jars/
>>>>>>>>>>>
>>>>>>>>>>>                        
> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
> 3
>    
>>>>>>>>>>> 20_0001/
>>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is that the expected classpath? I don't see any reference to my JAR
>>>>>>>>>>> or
>>>>>>>>>>> the
>>>>>>>>>>> Chukwa JARs.
>>>>>>>>>>>
>>>>>>>>>>> Also, when I try to view the contents of my "job_<timestamp>_0001"
>>>>>>>>>>> directory,
>>>>>>>>>>> it's automatically removed, so I can't really do any forensics after
>>>>>>>>>>> the fact.
>>>>>>>>>>> I know this is probably a Hadoop question, is it possible to prevent
>>>>>>>>>>> that
>>>>>>>>>>> auto-removal from occurring?
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kirk
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"<ki...@mustardgrain.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>                        
>>>>>>>>>>>> Hi Eric,
>>>>>>>>>>>>
>>>>>>>>>>>> On 4/28/10 10:23 AM, Eric Yang wrote:
>>>>>>>>>>>>                          
>>>>>>>>>>>>> Hi Kirk,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is the ownership of the jar file setup correctly as the user that
>>>>>>>>>>>>> runs
>>>>>>>>>>>>> demux?
>>>>>>>>>>>>>                            
>>>>>>>>>>>> When browsing via the NameNode web UI, it lists permissions of
>>>>>>>>>>>> "rw-r--r--" and "kirk" as the owner (which is also the user ID
>>>>>>>>>>>> running
>>>>>>>>>>>> the Hadoop and Chukwa processes).
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>>     You may find more information by looking at running mapper task
>>>>>>>>>>>>> or
>>>>>>>>>>>>> reducer task, and try to find out the task attempt shell script.
>>>>>>>>>>>>>                            
>>>>>>>>>>>> Where is the task attempt shell script located?
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>>     Make sure
>>>>>>>>>>>>> the files are downloaded correctly from distributed cache, and
>>>>>>>>>>>>> referenced in
>>>>>>>>>>>>> the locally generated jar file.  Hope this helps.
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>> Sorry for asking such basic questions, but where is the locally
>>>>>>>>>>>> generated JAR file found? I'm assuming under /tmp/hadoop-<user>, by
>>>>>>>>>>>> default? I saw one file named job_<timstamp>.jar but it appeared to
>>>>>>>>>>>> be a
>>>>>>>>>>>> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my "XmlBasedDemux"
>>>>>>>>>>>> class was nowhere to be found.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Kirk
>>>>>>>>>>>>
>>>>>>>>>>>>                          
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Eric
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 4/28/10 9:37 AM, "Kirk True"<ki...@mustardgrain.com>   wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have a custom Demux that I need to run to process my input, but
>>>>>>>>>>>>> I'm
>>>>>>>>>>>>> getting
>>>>>>>>>>>>> ClassNotFoundException when running in Hadoop. This is with the
>>>>>>>>>>>>> released
>>>>>>>>>>>>> 0.4.0
>>>>>>>>>>>>> build.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I've done the following:
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I put my Demux class in the correct package
>>>>>>>>>>>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>>>>>>>>>>>> 2. I've added the JAR containing the Demux implementation to HDFS
>>>>>>>>>>>>> at
>>>>>>>>>>>>> /chuka/demux
>>>>>>>>>>>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>>>>>>>>>>>>
>>>>>>>>>>>>> The map/reduce job is picking up on the fact that I have a custom
>>>>>>>>>>>>> Demux and
>>>>>>>>>>>>> is
>>>>>>>>>>>>> trying to load it, but I get a ClassNotFoundException. The
>>>>>>>>>>>>> HDFS-based URL
>>>>>>>>>>>>> to
>>>>>>>>>>>>> the JAR is showing up in the job configuration in Hadoop, which is
>>>>>>>>>>>>> another
>>>>>>>>>>>>> evidence that Chukwa and Hadoop know where the JAR lives and that
>>>>>>>>>>>>> it's part
>>>>>>>>>>>>> of
>>>>>>>>>>>>> the Chukwa-initiated job.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My Demux is very simple. I've stripped it down to a
>>>>>>>>>>>>> System.out.println with
>>>>>>>>>>>>> dependencies on no other classes/JARs other than Chukwa, Hadoop,
>>>>>>>>>>>>> and the
>>>>>>>>>>>>> core
>>>>>>>>>>>>> JDK. I've double-checked that my JAR is being built up correctly.
>>>>>>>>>>>>> I'm
>>>>>>>>>>>>> completely flummoxed as to what I'm doing wrong.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Any ideas what I'm missing? What other information can I provide?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>> Kirk
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>                            
>>>>>>>>>>>>                          
>>>>>>>>>>>
>>>>>>>>>>>                        
>>>>>>>>>>
>>>>>>>>>>                      
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                    
>>>>>>>>
>>>>>>>>
>>>>>>>>                  
>>>>>>>
>>>>>>>
>>>>>>>                
>>>>>>
>>>>>>              
>>>>>
>>>>>
>>>>>            
>>>        
>>      
>    

Re: Chukwa can't find Demux class - POSSIBLE FIX

Posted by Eric Yang <ey...@yahoo-inc.com>.
Kirk,

Is your tasktracker node on the same machine?  If it's refering to
hdfs://localhost:9000, it means that your tasktracker will attempt to
contact localhost as the namenode.  Make sure your fs.default.name is
configured as your real hostname instead of localhost to prevent certain
unexpected corner case similar to this.

Regards,
Eric

On 4/29/10 9:46 AM, "Eric Yang" <ey...@yahoo-inc.com> wrote:

> Kirk,
> 
> On my system, it is returning /chukwa/demux/parsers.jar as URL.  I think
> it¹s best to fix this in the code level.  Please file a jira, and I will
> take care of this.  Thanks.
> 
> Regards,
> Eric
> 
> On 4/28/10 6:50 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
> 
>> Hi Eric,
>> 
>> If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two places:
>> one is in the README and the other is in chukwa-collector-conf.xml for the
>> writer.hdfs.filesystem property. I didn't change this file, so that should be
>> the default. chukwa-common.xml's chukwa.data.dir is still just "/chukwa".
>> 
>> Thanks,
>> Kirk
>> 
>> On 4/28/10 6:34 PM, Eric Yang wrote:
>>>  Re: Chukwa can't find Demux class - POSSIBLE FIX Hi Kirk,
>>>  
>>> Check chukwa-common.xml and make sure that chukwa.data.dir does not have
>>> hdfs://localhost:9000 pre-append to it.  It¹s best to leave namenode address
>>> out of this path for portability.
>>>  
>>> Regards,
>>> Eric
>>>  
>>>  
>>> On 4/28/10 6:19 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>  
>>>   
>>>> Hi all,
>>>>  
>>>> The problem seems to stem from the fact that the call to
>>>> DistributedCache.addFileToClassPath is passing in a Path that is in URI
>>>> form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar whereas the
>>>> DistributedCache API expects it to be a filesystem-based path (i.e.
>>>> /chukwa/demux/mydemux.jar). I'm not sure why, but the FileStatus object
>>>> returned by FileSystem.listStatus is returning a URL-based path instead of
>>>> a
>>>> filesystem-based path.
>>>>  
>>>> I kludged the Demux class' addParsers to strip the "hdfs://localhost:9000"
>>>> portion of the string and now my class is found.
>>>>  
>>>> It's frustrating when stuff silently fails :) I even turned up the logging
>>>> in Hadoop and Chukwa to TRACE and nothing was reported.
>>>>  
>>>> So, my question is, do I have something misconfigured that causes
>>>> FileSystem.listStatus to return a URL-based path? Or does the code need to
>>>> be changed?
>>>>  
>>>> Thanks,
>>>> Kirk
>>>>  
>>>> On 4/28/10 5:41 PM, Kirk True wrote:
>>>>   
>>>>>  Hi all,
>>>>>  
>>>>> Just for grins I copied the Java source byte-for-byte to the Chukwa source
>>>>> folder and then ran:
>>>>>  
>>>>>  
>>>>>   
>>>>>> ant clean main && cp build/*.jar .
>>>>>>  
>>>>>>  
>>>>>   
>>>>> And it worked, as expected.
>>>>>  
>>>>> When one adds custom demux classes to a JAR, sticks it in
>>>>> hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow
>>>>> magically merged with chukwa-core-0.4.0.jar to produce "job.jar" or do
>>>>> they
>>>>> remain separate?
>>>>>  
>>>>> Thanks,
>>>>> Kirk
>>>>>  
>>>>> On 4/28/10 5:09 PM, Kirk True wrote:
>>>>>   
>>>>>>   Hi Jerome,
>>>>>>  
>>>>>> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>>>>>>  
>>>>>> I did notice that the JAVA_PLATFORM environment variable in chukwa-env.sh
>>>>>> was set to 32-bit but Hadoop was defaulting to 64-bit (this is a 64-bit
>>>>>> machine), but setting that to Linux-amd64-64 didn't make any difference.
>>>>>>  
>>>>>> Thanks,
>>>>>> Kirk
>>>>>>  
>>>>>> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>>>>>>   
>>>>>>>  Re: Chukwa can't find Demux class Are you using the same version of
>>>>>>> Java
>>>>>>> for your jar and Hadoop?
>>>>>>> /Jerome.
>>>>>>>  
>>>>>>> On 4/28/10 3:33 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>>>>>  
>>>>>>>   
>>>>>>>   
>>>>>>>> Hi Eric,
>>>>>>>>  
>>>>>>>> I added these to Hadoop's mapred-site.xml:
>>>>>>>>  
>>>>>>>>  
>>>>>>>>   <property>
>>>>>>>>         <name>keep.failed.task.files</name>
>>>>>>>>         <value>true</value>
>>>>>>>>   </property>
>>>>>>>>   <property>
>>>>>>>>         <name>mapred.job.tracker.persist.jobstatus.active</name>
>>>>>>>>         <value>true</value>
>>>>>>>>   </property>
>>>>>>>>   
>>>>>>>>  
>>>>>>>> This seems to have caused the task tracker directory to stick around
>>>>>>>> after the job is complete. So, for example, I have this directory:
>>>>>>>>  
>>>>>>>>  
>>>>>>>> 
/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_000>>>>>>>>
1
>>>>>>>>  
>>>>>>>>  
>>>>>>>> Under this directory I have the following files:
>>>>>>>>  
>>>>>>>>  
>>>>>>>> jars/
>>>>>>>> job.jar
>>>>>>>> org/ . . .
>>>>>>>> job.xml
>>>>>>>>  
>>>>>>>> My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
>>>>>>>> (apparently exploded job.jar) jars/org/... directory. However, my demux
>>>>>>>> JAR appears in three places in the job.xml:
>>>>>>>>  
>>>>>>>>  
>>>>>>>>  <property>
>>>>>>>>     <name>mapred.job.classpath.files</name>
>>>>>>>>     
>>>>>>>> 
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>>>
<
>>>>>>>> /value>
>>>>>>>> </property>
>>>>>>>> <property>
>>>>>>>>     <name>mapred.jar</name>
>>>>>>>>     
>>>>>>>> 
<value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>
5
>>>>>>>> 19_0001/jars/job.jar</value>
>>>>>>>> </property>
>>>>>>>> <property>
>>>>>>>>     <name>mapred.cache.files</name>
>>>>>>>>     
>>>>>>>> 
<value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar>>>>>>>>
<
>>>>>>>> /value>
>>>>>>>> </property>
>>>>>>>>   
>>>>>>>>  
>>>>>>>> So it looks like when Demux.addParsers calls
>>>>>>>> DistributedCache.addFileToClassPath it's working as the above job conf
>>>>>>>> properties include my JAR.
>>>>>>>>  
>>>>>>>> Here's my JAR contents:
>>>>>>>>  
>>>>>>>>  
>>>>>>>>  [kirk@skinner data-collection]$ unzip -l
>>>>>>>> data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>> Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>>   Length     Date   Time    Name
>>>>>>>>  --------    ----   ----    ----
>>>>>>>>         0  04-28-10 15:19   META-INF/
>>>>>>>>       123  04-28-10 15:19   META-INF/MANIFEST.MF
>>>>>>>>         0  04-28-10 15:19   org/
>>>>>>>>         0  04-28-10 15:19   org/apache/
>>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/
>>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/demux/
>>>>>>>>         0  04-28-10 15:19
>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/
>>>>>>>>         0  04-28-10 15:19
>>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>>>>>>>      1697  04-28-10 15:19
>>>>>>>> 
org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemu>>>>>>>>
x
>>>>>>>> .class
>>>>>>>>         0  04-28-10 15:19   META-INF/maven/
>>>>>>>>         0  04-28-10 15:19
>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/
>>>>>>>>         0  04-28-10 15:19
>>>>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>>>>>>>      1448  04-28-10 00:23
>>>>>>>> 
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>>>
x
>>>>>>>> ml
>>>>>>>>       133  04-28-10 15:19
>>>>>>>> 
META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.>>>>>>>>
p
>>>>>>>> roperties
>>>>>>>>  --------                   -------
>>>>>>>>      3401                   16 files
>>>>>>>>   
>>>>>>>>  
>>>>>>>> Here's how I'm copying the JAR into HDFS:
>>>>>>>>  
>>>>>>>>  
>>>>>>>>  hadoop fs -mkdir /chukwa/demux
>>>>>>>> hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>>>>>>> /chukwa/demux
>>>>>>>>   
>>>>>>>> Any ideas of more things to try?
>>>>>>>>  
>>>>>>>> Thanks,
>>>>>>>> Kirk
>>>>>>>>  
>>>>>>>>  
>>>>>>>> On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang" <ey...@yahoo-inc.com>
>>>>>>>> wrote:
>>>>>>>>> Kirk,
>>>>>>>>> 
>>>>>>>>> The shell script and job related information are stored temporarily in
>>>>>>>>> 
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813>>>>>>>>>
2
>>>>>>>>> 0_0xx
>>>>>>>>> x/, while the job is running.
>>>>>>>>> 
>>>>>>>>> You should go into the jars directory and find out if the compressed
>>>>>>>>> jar
>>>>>>>>> contains your class file.
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Eric
>>>>>>>>> 
>>>>>>>>> On 4/28/10 1:57 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Eric,
>>>>>>>>>> 
>>>>>>>>>> I updated MapProcessorFactory.getProcessor to dump the URLs from the
>>>>>>>>>> URLClassLoader from the MapProcessorFactory.class. This is what I
>>>>>>>>>> see:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/conf/
>>>>>>>>>> file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>>>>> 
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
3
>>>>>>>>>> 20_0001/
>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>> 
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
3
>>>>>>>>>> 20_0001/
>>>>>>>>>> jars/classes
>>>>>>>>>> 
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
3
>>>>>>>>>> 20_0001/
>>>>>>>>>> jars/
>>>>>>>>>> 
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281>>>>>>>>>>
3
>>>>>>>>>> 20_0001/
>>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Is that the expected classpath? I don't see any reference to my JAR
>>>>>>>>>> or
>>>>>>>>>> the
>>>>>>>>>> Chukwa JARs.
>>>>>>>>>> 
>>>>>>>>>> Also, when I try to view the contents of my "job_<timestamp>_0001"
>>>>>>>>>> directory,
>>>>>>>>>> it's automatically removed, so I can't really do any forensics after
>>>>>>>>>> the fact.
>>>>>>>>>> I know this is probably a Hadoop question, is it possible to prevent
>>>>>>>>>> that
>>>>>>>>>> auto-removal from occurring?
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Kirk
>>>>>>>>>> 
>>>>>>>>>> On Wed, 28 Apr 2010 13:16 -0700, "Kirk True" <ki...@mustardgrain.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi Eric,
>>>>>>>>>>> 
>>>>>>>>>>> On 4/28/10 10:23 AM, Eric Yang wrote:
>>>>>>>>>>>> Hi Kirk,
>>>>>>>>>>>> 
>>>>>>>>>>>> Is the ownership of the jar file setup correctly as the user that
>>>>>>>>>>>> runs
>>>>>>>>>>>> demux?
>>>>>>>>>>> 
>>>>>>>>>>> When browsing via the NameNode web UI, it lists permissions of
>>>>>>>>>>> "rw-r--r--" and "kirk" as the owner (which is also the user ID
>>>>>>>>>>> running
>>>>>>>>>>> the Hadoop and Chukwa processes).
>>>>>>>>>>> 
>>>>>>>>>>>>    You may find more information by looking at running mapper task
>>>>>>>>>>>> or
>>>>>>>>>>>> reducer task, and try to find out the task attempt shell script.
>>>>>>>>>>> 
>>>>>>>>>>> Where is the task attempt shell script located?
>>>>>>>>>>> 
>>>>>>>>>>>>    Make sure
>>>>>>>>>>>> the files are downloaded correctly from distributed cache, and
>>>>>>>>>>>> referenced in
>>>>>>>>>>>> the locally generated jar file.  Hope this helps.
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Sorry for asking such basic questions, but where is the locally
>>>>>>>>>>> generated JAR file found? I'm assuming under /tmp/hadoop-<user>, by
>>>>>>>>>>> default? I saw one file named job_<timstamp>.jar but it appeared to
>>>>>>>>>>> be a
>>>>>>>>>>> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my "XmlBasedDemux"
>>>>>>>>>>> class was nowhere to be found.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Kirk
>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Eric
>>>>>>>>>>>> 
>>>>>>>>>>>> On 4/28/10 9:37 AM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have a custom Demux that I need to run to process my input, but
>>>>>>>>>>>> I'm
>>>>>>>>>>>> getting
>>>>>>>>>>>> ClassNotFoundException when running in Hadoop. This is with the
>>>>>>>>>>>> released
>>>>>>>>>>>> 0.4.0
>>>>>>>>>>>> build.
>>>>>>>>>>>> 
>>>>>>>>>>>> I've done the following:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. I put my Demux class in the correct package
>>>>>>>>>>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>>>>>>>>>>> 2. I've added the JAR containing the Demux implementation to HDFS
>>>>>>>>>>>> at
>>>>>>>>>>>> /chuka/demux
>>>>>>>>>>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>>>>>>>>>>> 
>>>>>>>>>>>> The map/reduce job is picking up on the fact that I have a custom
>>>>>>>>>>>> Demux and
>>>>>>>>>>>> is
>>>>>>>>>>>> trying to load it, but I get a ClassNotFoundException. The
>>>>>>>>>>>> HDFS-based URL
>>>>>>>>>>>> to
>>>>>>>>>>>> the JAR is showing up in the job configuration in Hadoop, which is
>>>>>>>>>>>> another
>>>>>>>>>>>> evidence that Chukwa and Hadoop know where the JAR lives and that
>>>>>>>>>>>> it's part
>>>>>>>>>>>> of
>>>>>>>>>>>> the Chukwa-initiated job.
>>>>>>>>>>>> 
>>>>>>>>>>>> My Demux is very simple. I've stripped it down to a
>>>>>>>>>>>> System.out.println with
>>>>>>>>>>>> dependencies on no other classes/JARs other than Chukwa, Hadoop,
>>>>>>>>>>>> and the
>>>>>>>>>>>> core
>>>>>>>>>>>> JDK. I've double-checked that my JAR is being built up correctly.
>>>>>>>>>>>> I'm
>>>>>>>>>>>> completely flummoxed as to what I'm doing wrong.
>>>>>>>>>>>> 
>>>>>>>>>>>> Any ideas what I'm missing? What other information can I provide?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Kirk
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>  
>>>>>>>>  
>>>>>>>>   
>>>>>>>>  
>>>>>>>   
>>>>>>>  
>>>>>>   
>>>>>>  
>>>>>  
>>>>  
>>>>  
>> 
> 


Re: Chukwa can't find Demux class - POSSIBLE FIX

Posted by Eric Yang <ey...@yahoo-inc.com>.
Kirk,

On my system, it is returning /chukwa/demux/parsers.jar as URL.  I think
it¹s best to fix this in the code level.  Please file a jira, and I will
take care of this.  Thanks.

Regards,
Eric

On 4/28/10 6:50 PM, "Kirk True" <ki...@mustardgrain.com> wrote:

> Hi Eric,
> 
> If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two places:
> one is in the README and the other is in chukwa-collector-conf.xml for the
> writer.hdfs.filesystem property. I didn't change this file, so that should be
> the default. chukwa-common.xml's chukwa.data.dir is still just "/chukwa".
> 
> Thanks,
> Kirk
> 
> On 4/28/10 6:34 PM, Eric Yang wrote:
>>  Re: Chukwa can't find Demux class - POSSIBLE FIX Hi Kirk,
>>  
>> Check chukwa-common.xml and make sure that chukwa.data.dir does not have
>> hdfs://localhost:9000 pre-append to it.  It¹s best to leave namenode address
>> out of this path for portability.
>>  
>> Regards,
>> Eric
>>  
>>  
>> On 4/28/10 6:19 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>  
>>   
>>> Hi all,
>>>  
>>> The problem seems to stem from the fact that the call to
>>> DistributedCache.addFileToClassPath is passing in a Path that is in URI
>>> form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar whereas the
>>> DistributedCache API expects it to be a filesystem-based path (i.e.
>>> /chukwa/demux/mydemux.jar). I'm not sure why, but the FileStatus object
>>> returned by FileSystem.listStatus is returning a URL-based path instead of a
>>> filesystem-based path.
>>>  
>>> I kludged the Demux class' addParsers to strip the "hdfs://localhost:9000"
>>> portion of the string and now my class is found.
>>>  
>>> It's frustrating when stuff silently fails :) I even turned up the logging
>>> in Hadoop and Chukwa to TRACE and nothing was reported.
>>>  
>>> So, my question is, do I have something misconfigured that causes
>>> FileSystem.listStatus to return a URL-based path? Or does the code need to
>>> be changed?
>>>  
>>> Thanks,
>>> Kirk
>>>  
>>> On 4/28/10 5:41 PM, Kirk True wrote:
>>>   
>>>>  Hi all,
>>>>  
>>>> Just for grins I copied the Java source byte-for-byte to the Chukwa source
>>>> folder and then ran:
>>>>  
>>>>  
>>>>   
>>>>> ant clean main && cp build/*.jar .
>>>>>  
>>>>>  
>>>>   
>>>> And it worked, as expected.
>>>>  
>>>> When one adds custom demux classes to a JAR, sticks it in
>>>> hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow
>>>> magically merged with chukwa-core-0.4.0.jar to produce "job.jar" or do they
>>>> remain separate?
>>>>  
>>>> Thanks,
>>>> Kirk
>>>>  
>>>> On 4/28/10 5:09 PM, Kirk True wrote:
>>>>   
>>>>>   Hi Jerome,
>>>>>  
>>>>> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>>>>>  
>>>>> I did notice that the JAVA_PLATFORM environment variable in chukwa-env.sh
>>>>> was set to 32-bit but Hadoop was defaulting to 64-bit (this is a 64-bit
>>>>> machine), but setting that to Linux-amd64-64 didn't make any difference.
>>>>>  
>>>>> Thanks,
>>>>> Kirk
>>>>>  
>>>>> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>>>>>   
>>>>>>  Re: Chukwa can't find Demux class Are you using the same version of Java
>>>>>> for your jar and Hadoop?
>>>>>> /Jerome.
>>>>>>  
>>>>>> On 4/28/10 3:33 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>>>>  
>>>>>>   
>>>>>>   
>>>>>>> Hi Eric,
>>>>>>>  
>>>>>>> I added these to Hadoop's mapred-site.xml:
>>>>>>>  
>>>>>>>  
>>>>>>>   <property>
>>>>>>>         <name>keep.failed.task.files</name>
>>>>>>>         <value>true</value>
>>>>>>>   </property>
>>>>>>>   <property>
>>>>>>>         <name>mapred.job.tracker.persist.jobstatus.active</name>
>>>>>>>         <value>true</value>
>>>>>>>   </property>
>>>>>>>   
>>>>>>>  
>>>>>>> This seems to have caused the task tracker directory to stick around
>>>>>>> after the job is complete. So, for example, I have this directory:
>>>>>>>  
>>>>>>>  
>>>>>>> /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001
>>>>>>>  
>>>>>>>  
>>>>>>> Under this directory I have the following files:
>>>>>>>  
>>>>>>>  
>>>>>>> jars/
>>>>>>> job.jar
>>>>>>> org/ . . .
>>>>>>> job.xml
>>>>>>>  
>>>>>>> My Demux (XmlBasedDemux) doesn't appear in the job.jar or the
>>>>>>> (apparently exploded job.jar) jars/org/... directory. However, my demux
>>>>>>> JAR appears in three places in the job.xml:
>>>>>>>  
>>>>>>>  
>>>>>>>  <property>
>>>>>>>     <name>mapred.job.classpath.files</name>
>>>>>>>     
>>>>>>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar<
>>>>>>> /value>
>>>>>>> </property>
>>>>>>> <property>
>>>>>>>     <name>mapred.jar</name>
>>>>>>>     
>>>>>>> <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042815
>>>>>>> 19_0001/jars/job.jar</value>
>>>>>>> </property>
>>>>>>> <property>
>>>>>>>     <name>mapred.cache.files</name>
>>>>>>>     
>>>>>>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar<
>>>>>>> /value>
>>>>>>> </property>
>>>>>>>   
>>>>>>>  
>>>>>>> So it looks like when Demux.addParsers calls
>>>>>>> DistributedCache.addFileToClassPath it's working as the above job conf
>>>>>>> properties include my JAR.
>>>>>>>  
>>>>>>> Here's my JAR contents:
>>>>>>>  
>>>>>>>  
>>>>>>>  [kirk@skinner data-collection]$ unzip -l
>>>>>>> data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>> Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>>>   Length     Date   Time    Name
>>>>>>>  --------    ----   ----    ----
>>>>>>>         0  04-28-10 15:19   META-INF/
>>>>>>>       123  04-28-10 15:19   META-INF/MANIFEST.MF
>>>>>>>         0  04-28-10 15:19   org/
>>>>>>>         0  04-28-10 15:19   org/apache/
>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/
>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/demux/
>>>>>>>         0  04-28-10 15:19
>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/
>>>>>>>         0  04-28-10 15:19
>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>>>>>>      1697  04-28-10 15:19
>>>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux
>>>>>>> .class
>>>>>>>         0  04-28-10 15:19   META-INF/maven/
>>>>>>>         0  04-28-10 15:19
>>>>>>> META-INF/maven/com.cisco.flip.datacollection/
>>>>>>>         0  04-28-10 15:19
>>>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>>>>>>      1448  04-28-10 00:23
>>>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.x
>>>>>>> ml
>>>>>>>       133  04-28-10 15:19
>>>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.p
>>>>>>> roperties
>>>>>>>  --------                   -------
>>>>>>>      3401                   16 files
>>>>>>>   
>>>>>>>  
>>>>>>> Here's how I'm copying the JAR into HDFS:
>>>>>>>  
>>>>>>>  
>>>>>>>  hadoop fs -mkdir /chukwa/demux
>>>>>>> hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>>>>>> /chukwa/demux
>>>>>>>   
>>>>>>> Any ideas of more things to try?
>>>>>>>  
>>>>>>> Thanks,
>>>>>>> Kirk
>>>>>>>  
>>>>>>>  
>>>>>>> On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang" <ey...@yahoo-inc.com>
>>>>>>> wrote:
>>>>>>>> Kirk,
>>>>>>>> 
>>>>>>>> The shell script and job related information are stored temporarily in
>>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_20100428132
>>>>>>>> 0_0xx
>>>>>>>> x/, while the job is running.
>>>>>>>> 
>>>>>>>> You should go into the jars directory and find out if the compressed
>>>>>>>> jar
>>>>>>>> contains your class file.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Eric
>>>>>>>> 
>>>>>>>> On 4/28/10 1:57 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>>>>>> 
>>>>>>>>> Hi Eric,
>>>>>>>>> 
>>>>>>>>> I updated MapProcessorFactory.getProcessor to dump the URLs from the
>>>>>>>>> URLClassLoader from the MapProcessorFactory.class. This is what I see:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/conf/
>>>>>>>>> file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>>>>>>>> file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813
>>>>>>>>> 20_0001/
>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813
>>>>>>>>> 20_0001/
>>>>>>>>> jars/classes
>>>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813
>>>>>>>>> 20_0001/
>>>>>>>>> jars/
>>>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_2010042813
>>>>>>>>> 20_0001/
>>>>>>>>> attempt_201004281320_0001_m_000000_0/work/
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Is that the expected classpath? I don't see any reference to my JAR or
>>>>>>>>> the
>>>>>>>>> Chukwa JARs.
>>>>>>>>> 
>>>>>>>>> Also, when I try to view the contents of my "job_<timestamp>_0001"
>>>>>>>>> directory,
>>>>>>>>> it's automatically removed, so I can't really do any forensics after
>>>>>>>>> the fact.
>>>>>>>>> I know this is probably a Hadoop question, is it possible to prevent
>>>>>>>>> that
>>>>>>>>> auto-removal from occurring?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Kirk
>>>>>>>>> 
>>>>>>>>> On Wed, 28 Apr 2010 13:16 -0700, "Kirk True" <ki...@mustardgrain.com>
>>>>>>>>> wrote:
>>>>>>>>>> Hi Eric,
>>>>>>>>>> 
>>>>>>>>>> On 4/28/10 10:23 AM, Eric Yang wrote:
>>>>>>>>>>> Hi Kirk,
>>>>>>>>>>> 
>>>>>>>>>>> Is the ownership of the jar file setup correctly as the user that
>>>>>>>>>>> runs
>>>>>>>>>>> demux?
>>>>>>>>>> 
>>>>>>>>>> When browsing via the NameNode web UI, it lists permissions of
>>>>>>>>>> "rw-r--r--" and "kirk" as the owner (which is also the user ID
>>>>>>>>>> running
>>>>>>>>>> the Hadoop and Chukwa processes).
>>>>>>>>>> 
>>>>>>>>>>>    You may find more information by looking at running mapper task
>>>>>>>>>>> or
>>>>>>>>>>> reducer task, and try to find out the task attempt shell script.
>>>>>>>>>> 
>>>>>>>>>> Where is the task attempt shell script located?
>>>>>>>>>> 
>>>>>>>>>>>    Make sure
>>>>>>>>>>> the files are downloaded correctly from distributed cache, and
>>>>>>>>>>> referenced in
>>>>>>>>>>> the locally generated jar file.  Hope this helps.
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Sorry for asking such basic questions, but where is the locally
>>>>>>>>>> generated JAR file found? I'm assuming under /tmp/hadoop-<user>, by
>>>>>>>>>> default? I saw one file named job_<timstamp>.jar but it appeared to
>>>>>>>>>> be a
>>>>>>>>>> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my "XmlBasedDemux"
>>>>>>>>>> class was nowhere to be found.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Kirk
>>>>>>>>>> 
>>>>>>>>>>> Regards,
>>>>>>>>>>> Eric
>>>>>>>>>>> 
>>>>>>>>>>> On 4/28/10 9:37 AM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> Hi guys,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have a custom Demux that I need to run to process my input, but
>>>>>>>>>>>> I'm
>>>>>>>>>>>> getting
>>>>>>>>>>>> ClassNotFoundException when running in Hadoop. This is with the
>>>>>>>>>>>> released
>>>>>>>>>>>> 0.4.0
>>>>>>>>>>>> build.
>>>>>>>>>>>> 
>>>>>>>>>>>> I've done the following:
>>>>>>>>>>>> 
>>>>>>>>>>>> 1. I put my Demux class in the correct package
>>>>>>>>>>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>>>>>>>>>>> 2. I've added the JAR containing the Demux implementation to HDFS
>>>>>>>>>>>> at
>>>>>>>>>>>> /chuka/demux
>>>>>>>>>>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>>>>>>>>>>> 
>>>>>>>>>>>> The map/reduce job is picking up on the fact that I have a custom
>>>>>>>>>>>> Demux and
>>>>>>>>>>>> is
>>>>>>>>>>>> trying to load it, but I get a ClassNotFoundException. The
>>>>>>>>>>>> HDFS-based URL
>>>>>>>>>>>> to
>>>>>>>>>>>> the JAR is showing up in the job configuration in Hadoop, which is
>>>>>>>>>>>> another
>>>>>>>>>>>> evidence that Chukwa and Hadoop know where the JAR lives and that
>>>>>>>>>>>> it's part
>>>>>>>>>>>> of
>>>>>>>>>>>> the Chukwa-initiated job.
>>>>>>>>>>>> 
>>>>>>>>>>>> My Demux is very simple. I've stripped it down to a
>>>>>>>>>>>> System.out.println with
>>>>>>>>>>>> dependencies on no other classes/JARs other than Chukwa, Hadoop,
>>>>>>>>>>>> and the
>>>>>>>>>>>> core
>>>>>>>>>>>> JDK. I've double-checked that my JAR is being built up correctly.
>>>>>>>>>>>> I'm
>>>>>>>>>>>> completely flummoxed as to what I'm doing wrong.
>>>>>>>>>>>> 
>>>>>>>>>>>> Any ideas what I'm missing? What other information can I provide?
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>> Kirk
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>  
>>>>>>>  
>>>>>>>   
>>>>>>>  
>>>>>>   
>>>>>>  
>>>>>   
>>>>>  
>>>>  
>>>  
>>>  
> 


Re: Chukwa can't find Demux class - POSSIBLE FIX

Posted by Kirk True <ki...@mustardgrain.com>.
Hi Eric,

If I grep "hdfs://" in $CHUKWA_HOME/conf, the string shows up in two 
places: one is in the README and the other is in 
chukwa-collector-conf.xml for the writer.hdfs.filesystem property. I 
didn't change this file, so that should be the default. 
chukwa-common.xml's chukwa.data.dir is still just "/chukwa".

Thanks,
Kirk

On 4/28/10 6:34 PM, Eric Yang wrote:
> Hi Kirk,
>
> Check chukwa-common.xml and make sure that chukwa.data.dir does not 
> have hdfs://localhost:9000 pre-append to it.  It's best to leave 
> namenode address out of this path for portability.
>
> Regards,
> Eric
>
>
> On 4/28/10 6:19 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>
>     Hi all,
>
>     The problem seems to stem from the fact that the call to
>     DistributedCache.addFileToClassPath is passing in a Path that is
>     in URI form, i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar
>     whereas the DistributedCache API expects it to be a
>     filesystem-based path (i.e. /chukwa/demux/mydemux.jar). I'm not
>     sure why, but the FileStatus object returned by
>     FileSystem.listStatus is returning a URL-based path instead of a
>     filesystem-based path.
>
>     I kludged the Demux class' addParsers to strip the
>     "hdfs://localhost:9000" portion of the string and now my class is
>     found.
>
>     It's frustrating when stuff silently fails :) I even turned up the
>     logging in Hadoop and Chukwa to TRACE and nothing was reported.
>
>     So, my question is, do I have something misconfigured that causes
>     FileSystem.listStatus to return a URL-based path? Or does the code
>     need to be changed?
>
>     Thanks,
>     Kirk
>
>     On 4/28/10 5:41 PM, Kirk True wrote:
>
>         Hi all,
>
>         Just for grins I copied the Java source byte-for-byte to the
>         Chukwa source folder and then ran:
>
>
>             ant clean main && cp build/*.jar .
>
>
>         And it worked, as expected.
>
>         When one adds custom demux classes to a JAR, sticks it in
>         hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR
>         somehow magically merged with chukwa-core-0.4.0.jar to produce
>         "job.jar" or do they remain separate?
>
>         Thanks,
>         Kirk
>
>         On 4/28/10 5:09 PM, Kirk True wrote:
>
>              Hi Jerome,
>
>             Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>
>             I did notice that the JAVA_PLATFORM environment variable
>             in chukwa-env.sh was set to 32-bit but Hadoop was
>             defaulting to 64-bit (this is a 64-bit machine), but
>             setting that to Linux-amd64-64 didn't make any difference.
>
>             Thanks,
>             Kirk
>
>             On 4/28/10 4:00 PM, Jerome Boulon wrote:
>
>                 Re: Chukwa can't find Demux class Are you using the
>                 same version of Java for your jar and Hadoop?
>                 /Jerome.
>
>                 On 4/28/10 3:33 PM, "Kirk True"
>                 <ki...@mustardgrain.com> wrote:
>
>
>                     Hi Eric,
>
>                     I added these to Hadoop's mapred-site.xml:
>
>
>                     <property>
>                     <name>keep.failed.task.files</name>
>                     <value>true</value>
>                     </property>
>                     <property>
>                     <name>mapred.job.tracker.persist.jobstatus.active</name>
>                     <value>true</value>
>                     </property>
>
>
>                     This seems to have caused the task tracker
>                     directory to stick around after the job is
>                     complete. So, for example, I have this directory:
>
>
>                     /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001
>
>
>                     Under this directory I have the following files:
>
>
>                     jars/
>                     job.jar
>                     org/ . . .
>                     job.xml
>
>                     My Demux (XmlBasedDemux) doesn't appear in the
>                     job.jar or the (apparently exploded job.jar)
>                     jars/org/... directory. However, my demux JAR
>                     appears in three places in the job.xml:
>
>
>                     <property>
>                     <name>mapred.job.classpath.files</name>
>                     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>                     </property>
>                     <property>
>                     <name>mapred.jar</name>
>                     <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001/jars/job.jar</value>
>                     </property>
>                     <property>
>                     <name>mapred.cache.files</name>
>                     <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</value>
>                     </property>
>
>
>                     So it looks like when Demux.addParsers calls
>                     DistributedCache.addFileToClassPath it's working
>                     as the above job conf properties include my JAR.
>
>                     Here's my JAR contents:
>
>
>                     [kirk@skinner data-collection]$ unzip -l
>                     data-collection-demux/target/data-collection-demux-0.1.jar
>
>                     Archive:
>                      data-collection-demux/target/data-collection-demux-0.1.jar
>                       Length     Date   Time    Name
>                      --------    ----   ----    ----
>                             0  04-28-10 15:19   META-INF/
>                           123  04-28-10 15:19   META-INF/MANIFEST.MF
>                             0  04-28-10 15:19   org/
>                             0  04-28-10 15:19   org/apache/
>                             0  04-28-10 15:19   org/apache/hadoop/
>                             0  04-28-10 15:19   org/apache/hadoop/chukwa/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/processor/
>                             0  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>                          1697  04-28-10 15:19
>                       org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.class
>                             0  04-28-10 15:19   META-INF/maven/
>                             0  04-28-10 15:19
>                       META-INF/maven/com.cisco.flip.datacollection/
>                             0  04-28-10 15:19
>                       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>                          1448  04-28-10 00:23
>                       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml
>                           133  04-28-10 15:19
>                       META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.properties
>                      --------                   -------
>                          3401                   16 files
>
>
>                     Here's how I'm copying the JAR into HDFS:
>
>
>                     hadoop fs -mkdir /chukwa/demux
>                     hadoop fs -copyFromLocal
>                     /path/to/data-collection-demux-0.1.jar /chukwa/demux
>
>                     Any ideas of more things to try?
>
>                     Thanks,
>                     Kirk
>
>
>                     On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang"
>                     <ey...@yahoo-inc.com> wrote:
>                     > Kirk,
>                     >
>                     > The shell script and job related information are
>                     stored temporarily in
>                     > file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0xx
>                     > x/, while the job is running.
>                     >
>                     > You should go into the jars directory and find
>                     out if the compressed jar
>                     > contains your class file.
>                     >
>                     > Regards,
>                     > Eric
>                     >
>                     > On 4/28/10 1:57 PM, "Kirk True"
>                     <ki...@mustardgrain.com> wrote:
>                     >
>                     > > Hi Eric,
>                     > >
>                     > > I updated MapProcessorFactory.getProcessor to
>                     dump the URLs from the
>                     > > URLClassLoader from the
>                     MapProcessorFactory.class. This is what I see:
>                     > >
>                     > >
>                     > > file:/home/kirk/bin/hadoop-0.20.2/conf/
>                     > > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>                     > > file:/home/kirk/bin/hadoop-0.20.2/
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>                     > > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>                     > > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>                     > >
>                     file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > attempt_201004281320_0001_m_000000_0/work/
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > jars/classes
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > jars/
>                     > >
>                     file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_0001/
>                     > > attempt_201004281320_0001_m_000000_0/work/
>                     > >
>                     > >
>                     > > Is that the expected classpath? I don't see any
>                     reference to my JAR or the
>                     > > Chukwa JARs.
>                     > >
>                     > > Also, when I try to view the contents of my
>                     "job_<timestamp>_0001" directory,
>                     > > it's automatically removed, so I can't really
>                     do any forensics after the fact.
>                     > > I know this is probably a Hadoop question, is
>                     it possible to prevent that
>                     > > auto-removal from occurring?
>                     > >
>                     > > Thanks,
>                     > > Kirk
>                     > >
>                     > > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True"
>                     <ki...@mustardgrain.com> wrote:
>                     > >> Hi Eric,
>                     > >>
>                     > >> On 4/28/10 10:23 AM, Eric Yang wrote:
>                     > >>> Hi Kirk,
>                     > >>>
>                     > >>> Is the ownership of the jar file setup
>                     correctly as the user that runs
>                     > >>> demux?
>                     > >>
>                     > >> When browsing via the NameNode web UI, it
>                     lists permissions of
>                     > >> "rw-r--r--" and "kirk" as the owner (which is
>                     also the user ID running
>                     > >> the Hadoop and Chukwa processes).
>                     > >>
>                     > >>>    You may find more information by looking
>                     at running mapper task or
>                     > >>> reducer task, and try to find out the task
>                     attempt shell script.
>                     > >>
>                     > >> Where is the task attempt shell script located?
>                     > >>
>                     > >>>    Make sure
>                     > >>> the files are downloaded correctly from
>                     distributed cache, and referenced in
>                     > >>> the locally generated jar file.  Hope this helps.
>                     > >>>
>                     > >>
>                     > >> Sorry for asking such basic questions, but
>                     where is the locally
>                     > >> generated JAR file found? I'm assuming under
>                     /tmp/hadoop-<user>, by
>                     > >> default? I saw one file named
>                     job_<timstamp>.jar but it appeared to be a
>                     > >> byte-for-byte copy of chukwa-core-0.4.0.jar,
>                     i.e. my "XmlBasedDemux"
>                     > >> class was nowhere to be found.
>                     > >>
>                     > >> Thanks,
>                     > >> Kirk
>                     > >>
>                     > >>> Regards,
>                     > >>> Eric
>                     > >>>
>                     > >>> On 4/28/10 9:37 AM, "Kirk
>                     True"<ki...@mustardgrain.com>  wrote:
>                     > >>>
>                     > >>>
>                     > >>>> Hi guys,
>                     > >>>>
>                     > >>>> I have a custom Demux that I need to run to
>                     process my input, but I'm
>                     > >>>> getting
>                     > >>>> ClassNotFoundException when running in
>                     Hadoop. This is with the released
>                     > >>>> 0.4.0
>                     > >>>> build.
>                     > >>>>
>                     > >>>> I've done the following:
>                     > >>>>
>                     > >>>> 1. I put my Demux class in the correct package
>                     > >>>>
>                     (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>                     > >>>> 2. I've added the JAR containing the Demux
>                     implementation to HDFS at
>                     > >>>> /chuka/demux
>                     > >>>> 3. I've added an alias to it in
>                     chukwa-demux-conf.xml
>                     > >>>>
>                     > >>>> The map/reduce job is picking up on the fact
>                     that I have a custom Demux and
>                     > >>>> is
>                     > >>>> trying to load it, but I get a
>                     ClassNotFoundException. The HDFS-based URL
>                     > >>>> to
>                     > >>>> the JAR is showing up in the job
>                     configuration in Hadoop, which is another
>                     > >>>> evidence that Chukwa and Hadoop know where
>                     the JAR lives and that it's part
>                     > >>>> of
>                     > >>>> the Chukwa-initiated job.
>                     > >>>>
>                     > >>>> My Demux is very simple. I've stripped it
>                     down to a System.out.println with
>                     > >>>> dependencies on no other classes/JARs other
>                     than Chukwa, Hadoop, and the
>                     > >>>> core
>                     > >>>> JDK. I've double-checked that my JAR is
>                     being built up correctly. I'm
>                     > >>>> completely flummoxed as to what I'm doing wrong.
>                     > >>>>
>                     > >>>> Any ideas what I'm missing? What other
>                     information can I provide?
>                     > >>>>
>                     > >>>> Thanks!
>                     > >>>> Kirk
>                     > >>>>
>                     > >>>>
>                     > >>>
>                     > >>
>                     > >
>                     > >
>                     >
>                     >
>
>
>
>
>
>

Re: Chukwa can't find Demux class - POSSIBLE FIX

Posted by Eric Yang <ey...@yahoo-inc.com>.
Hi Kirk,

Check chukwa-common.xml and make sure that chukwa.data.dir does not have
hdfs://localhost:9000 pre-append to it.  It¹s best to leave namenode address
out of this path for portability.

Regards,
Eric


On 4/28/10 6:19 PM, "Kirk True" <ki...@mustardgrain.com> wrote:

> Hi all,
> 
> The problem seems to stem from the fact that the call to
> DistributedCache.addFileToClassPath is passing in a Path that is in URI form,
> i.e. hdfs://localhost:9000/chukwa/demux/mydemux.jar whereas the
> DistributedCache API expects it to be a filesystem-based path (i.e.
> /chukwa/demux/mydemux.jar). I'm not sure why, but the FileStatus object
> returned by FileSystem.listStatus is returning a URL-based path instead of a
> filesystem-based path.
> 
> I kludged the Demux class' addParsers to strip the "hdfs://localhost:9000"
> portion of the string and now my class is found.
> 
> It's frustrating when stuff silently fails :) I even turned up the logging in
> Hadoop and Chukwa to TRACE and nothing was reported.
> 
> So, my question is, do I have something misconfigured that causes
> FileSystem.listStatus to return a URL-based path? Or does the code need to be
> changed?
> 
> Thanks,
> Kirk
> 
> On 4/28/10 5:41 PM, Kirk True wrote:
>>  Hi all,
>>  
>> Just for grins I copied the Java source byte-for-byte to the Chukwa source
>> folder and then ran:
>>  
>>  
>>> ant clean main && cp build/*.jar .
>>>  
>>  
>> And it worked, as expected.
>>  
>> When one adds custom demux classes to a JAR, sticks it in
>> hdfs://localhost:9000/chukwa/demux/mydemux.jar, is that JAR somehow magically
>> merged with chukwa-core-0.4.0.jar to produce "job.jar" or do they remain
>> separate?
>>  
>> Thanks,
>> Kirk
>>  
>> On 4/28/10 5:09 PM, Kirk True wrote:
>>>   Hi Jerome,
>>>  
>>> Yes, they're all using $JAVA_HOME which is 1.6.0_18.
>>>  
>>> I did notice that the JAVA_PLATFORM environment variable in chukwa-env.sh
>>> was set to 32-bit but Hadoop was defaulting to 64-bit (this is a 64-bit
>>> machine), but setting that to Linux-amd64-64 didn't make any difference.
>>>  
>>> Thanks,
>>> Kirk
>>>  
>>> On 4/28/10 4:00 PM, Jerome Boulon wrote:
>>>>  Re: Chukwa can't find Demux class Are you using the same version of Java
>>>> for your jar and Hadoop?
>>>> /Jerome.
>>>>  
>>>> On 4/28/10 3:33 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>>  
>>>>   
>>>>> Hi Eric,
>>>>>  
>>>>> I added these to Hadoop's mapred-site.xml:
>>>>>  
>>>>>  
>>>>>    <property>
>>>>>         <name>keep.failed.task.files</name>
>>>>>         <value>true</value>
>>>>>   </property>
>>>>>   <property>
>>>>>         <name>mapred.job.tracker.persist.jobstatus.active</name>
>>>>>         <value>true</value>
>>>>>   </property>
>>>>>   
>>>>>  
>>>>> This seems to have caused the task tracker directory to stick around after
>>>>> the job is complete. So, for example, I have this directory:
>>>>>  
>>>>>  
>>>>> /tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519_0001
>>>>>  
>>>>>  
>>>>> Under this directory I have the following files:
>>>>>  
>>>>>  
>>>>> jars/
>>>>> job.jar
>>>>> org/ . . .
>>>>> job.xml
>>>>>  
>>>>> My Demux (XmlBasedDemux) doesn't appear in the job.jar or the (apparently
>>>>> exploded job.jar) jars/org/... directory. However, my demux JAR appears in
>>>>> three places in the job.xml:
>>>>>  
>>>>>  
>>>>>  <property>
>>>>>     <name>mapred.job.classpath.files</name>
>>>>>     
>>>>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</v
>>>>> alue>
>>>>> </property>
>>>>> <property>
>>>>>     <name>mapred.jar</name>
>>>>>     
>>>>> <value>/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281519
>>>>> _0001/jars/job.jar</value>
>>>>> </property>
>>>>> <property>
>>>>>     <name>mapred.cache.files</name>
>>>>>     
>>>>> <value>hdfs://localhost:9000/chukwa/demux/data-collection-demux-0.1.jar</v
>>>>> alue>
>>>>> </property>
>>>>>   
>>>>>  
>>>>> So it looks like when Demux.addParsers calls
>>>>> DistributedCache.addFileToClassPath it's working as the above job conf
>>>>> properties include my JAR.
>>>>>  
>>>>> Here's my JAR contents:
>>>>>  
>>>>>  
>>>>>  [kirk@skinner data-collection]$ unzip -l
>>>>> data-collection-demux/target/data-collection-demux-0.1.jar
>>>>> Archive:  data-collection-demux/target/data-collection-demux-0.1.jar
>>>>>   Length     Date   Time    Name
>>>>>  --------    ----   ----    ----
>>>>>         0  04-28-10 15:19   META-INF/
>>>>>       123  04-28-10 15:19   META-INF/MANIFEST.MF
>>>>>         0  04-28-10 15:19   org/
>>>>>         0  04-28-10 15:19   org/apache/
>>>>>         0  04-28-10 15:19   org/apache/hadoop/
>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/
>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/
>>>>>         0  04-28-10 15:19   org/apache/hadoop/chukwa/extraction/demux/
>>>>>         0  04-28-10 15:19
>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/
>>>>>         0  04-28-10 15:19
>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/
>>>>>      1697  04-28-10 15:19
>>>>> org/apache/hadoop/chukwa/extraction/demux/processor/mapper/XmlBasedDemux.c
>>>>> lass
>>>>>         0  04-28-10 15:19   META-INF/maven/
>>>>>         0  04-28-10 15:19   META-INF/maven/com.cisco.flip.datacollection/
>>>>>         0  04-28-10 15:19
>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/
>>>>>      1448  04-28-10 00:23
>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.xml
>>>>>       133  04-28-10 15:19
>>>>> META-INF/maven/com.cisco.flip.datacollection/data-collection-demux/pom.pro
>>>>> perties
>>>>>  --------                   -------
>>>>>      3401                   16 files
>>>>>   
>>>>>  
>>>>> Here's how I'm copying the JAR into HDFS:
>>>>>  
>>>>>  
>>>>>  hadoop fs -mkdir /chukwa/demux
>>>>> hadoop fs -copyFromLocal /path/to/data-collection-demux-0.1.jar
>>>>> /chukwa/demux
>>>>>   
>>>>> Any ideas of more things to try?
>>>>>  
>>>>> Thanks,
>>>>> Kirk
>>>>>  
>>>>>  
>>>>> On Wed, 28 Apr 2010 14:48 -0700, "Eric Yang" <ey...@yahoo-inc.com> wrote:
>>>>>> > Kirk,
>>>>>> >
>>>>>> > The shell script and job related information are stored temporarily in
>>>>>> > 
>>>>>> 
file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320_>>>>>>
0xx
>>>>>> > x/, while the job is running.
>>>>>> >
>>>>>> > You should go into the jars directory and find out if the compressed
>>>>>> jar
>>>>>> > contains your class file.
>>>>>> >
>>>>>> > Regards,
>>>>>> > Eric
>>>>>> >
>>>>>> > On 4/28/10 1:57 PM, "Kirk True" <ki...@mustardgrain.com> wrote:
>>>>>> >
>>>>>>> > > Hi Eric,
>>>>>>> > >
>>>>>>> > > I updated MapProcessorFactory.getProcessor to dump the URLs from the
>>>>>>> > > URLClassLoader from the MapProcessorFactory.class. This is what I
>>>>>>> see:
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/conf/
>>>>>>> > > file:/home/kirk/bin/jdk1.6.0_18/lib/tools.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/hadoop-0.20.2-core.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-cli-1.2.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-codec-1.3.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-el-1.0.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-httpclient-3.0.1.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-1.0.4.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-logging-api-1.0.4.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/commons-net-1.4.1.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/core-3.1.1.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/hsqldb-1.8.0.10.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-compiler-5.5.12.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jasper-runtime-5.5.12.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jets3t-0.6.1.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-6.1.14.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jetty-util-6.1.14.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/junit-3.8.1.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/kfs-0.2.2.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/log4j-1.2.15.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/mockito-all-1.8.0.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/oro-2.0.8.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/servlet-api-2.5-6.1.14.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-api-1.4.3.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/slf4j-log4j12-1.4.3.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/xmlenc-0.52.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-2.1.jar
>>>>>>> > > file:/home/kirk/bin/hadoop-0.20.2/lib/jsp-2.1/jsp-api-2.1.jar
>>>>>>> > > 
>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320
>>>>>>> _0001/
>>>>>>> > > attempt_201004281320_0001_m_000000_0/work/
>>>>>>> > > 
>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320
>>>>>>> _0001/
>>>>>>> > > jars/classes
>>>>>>> > > 
>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320
>>>>>>> _0001/
>>>>>>> > > jars/
>>>>>>> > > 
>>>>>>> file:/tmp/hadoop-kirk/mapred/local/taskTracker/jobcache/job_201004281320
>>>>>>> _0001/
>>>>>>> > > attempt_201004281320_0001_m_000000_0/work/
>>>>>>> > >
>>>>>>> > >
>>>>>>> > > Is that the expected classpath? I don't see any reference to my JAR
>>>>>>> or the
>>>>>>> > > Chukwa JARs.
>>>>>>> > >
>>>>>>> > > Also, when I try to view the contents of my "job_<timestamp>_0001"
>>>>>>> directory,
>>>>>>> > > it's automatically removed, so I can't really do any forensics after
>>>>>>> the fact.
>>>>>>> > > I know this is probably a Hadoop question, is it possible to prevent
>>>>>>> that
>>>>>>> > > auto-removal from occurring?
>>>>>>> > >
>>>>>>> > > Thanks,
>>>>>>> > > Kirk
>>>>>>> > >
>>>>>>> > > On Wed, 28 Apr 2010 13:16 -0700, "Kirk True" <ki...@mustardgrain.com>
>>>>>>> wrote:
>>>>>>>> > >> Hi Eric,
>>>>>>>> > >>
>>>>>>>> > >> On 4/28/10 10:23 AM, Eric Yang wrote:
>>>>>>>>> > >>> Hi Kirk,
>>>>>>>>> > >>>
>>>>>>>>> > >>> Is the ownership of the jar file setup correctly as the user
>>>>>>>>> that runs
>>>>>>>>> > >>> demux?
>>>>>>>> > >>
>>>>>>>> > >> When browsing via the NameNode web UI, it lists permissions of
>>>>>>>> > >> "rw-r--r--" and "kirk" as the owner (which is also the user ID
>>>>>>>> running
>>>>>>>> > >> the Hadoop and Chukwa processes).
>>>>>>>> > >>
>>>>>>>>> > >>>    You may find more information by looking at running mapper
>>>>>>>>> task or
>>>>>>>>> > >>> reducer task, and try to find out the task attempt shell script.
>>>>>>>> > >>
>>>>>>>> > >> Where is the task attempt shell script located?
>>>>>>>> > >>
>>>>>>>>> > >>>    Make sure
>>>>>>>>> > >>> the files are downloaded correctly from distributed cache, and
>>>>>>>>> referenced in
>>>>>>>>> > >>> the locally generated jar file.  Hope this helps.
>>>>>>>>> > >>>
>>>>>>>> > >>
>>>>>>>> > >> Sorry for asking such basic questions, but where is the locally
>>>>>>>> > >> generated JAR file found? I'm assuming under /tmp/hadoop-<user>,
by
>>>>>>>> > >> default? I saw one file named job_<timstamp>.jar but it appeared
>>>>>>>> to be a
>>>>>>>> > >> byte-for-byte copy of chukwa-core-0.4.0.jar, i.e. my
>>>>>>>> "XmlBasedDemux"
>>>>>>>> > >> class was nowhere to be found.
>>>>>>>> > >>
>>>>>>>> > >> Thanks,
>>>>>>>> > >> Kirk
>>>>>>>> > >>
>>>>>>>>> > >>> Regards,
>>>>>>>>> > >>> Eric
>>>>>>>>> > >>>
>>>>>>>>> > >>> On 4/28/10 9:37 AM, "Kirk True"<ki...@mustardgrain.com>  wrote:
>>>>>>>>> > >>>
>>>>>>>>> > >>>
>>>>>>>>>> > >>>> Hi guys,
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> I have a custom Demux that I need to run to process my input,
>>>>>>>>>> but I'm
>>>>>>>>>> > >>>> getting
>>>>>>>>>> > >>>> ClassNotFoundException when running in Hadoop. This is with
>>>>>>>>>> the released
>>>>>>>>>> > >>>> 0.4.0
>>>>>>>>>> > >>>> build.
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> I've done the following:
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> 1. I put my Demux class in the correct package
>>>>>>>>>> > >>>> (org.apache.hadoop.chukwa.extraction.demux.processor.mapper)
>>>>>>>>>> > >>>> 2. I've added the JAR containing the Demux implementation to
>>>>>>>>>> HDFS at
>>>>>>>>>> > >>>> /chuka/demux
>>>>>>>>>> > >>>> 3. I've added an alias to it in chukwa-demux-conf.xml
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> The map/reduce job is picking up on the fact that I have a
>>>>>>>>>> custom Demux and
>>>>>>>>>> > >>>> is
>>>>>>>>>> > >>>> trying to load it, but I get a ClassNotFoundException. The
>>>>>>>>>> HDFS-based URL
>>>>>>>>>> > >>>> to
>>>>>>>>>> > >>>> the JAR is showing up in the job configuration in Hadoop,
>>>>>>>>>> which is another
>>>>>>>>>> > >>>> evidence that Chukwa and Hadoop know where the JAR lives and
>>>>>>>>>> that it's part
>>>>>>>>>> > >>>> of
>>>>>>>>>> > >>>> the Chukwa-initiated job.
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> My Demux is very simple. I've stripped it down to a
>>>>>>>>>> System.out.println with
>>>>>>>>>> > >>>> dependencies on no other classes/JARs other than Chukwa,
>>>>>>>>>> Hadoop, and the
>>>>>>>>>> > >>>> core
>>>>>>>>>> > >>>> JDK. I've double-checked that my JAR is being built up
>>>>>>>>>> correctly. I'm
>>>>>>>>>> > >>>> completely flummoxed as to what I'm doing wrong.
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> Any ideas what I'm missing? What other information can I
>>>>>>>>>> provide?
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>> Thanks!
>>>>>>>>>> > >>>> Kirk
>>>>>>>>>> > >>>>
>>>>>>>>>> > >>>>
>>>>>>>>> > >>>
>>>>>>>> > >>
>>>>>>> > >
>>>>>>> > >
>>>>>> >
>>>>>> >
>>>>>  
>>>>>  
>>>>>  
>>>>  
>>>  
>