You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by rohan rai <hi...@gmail.com> on 2008/06/11 11:31:17 UTC

import location over Hadoop

Hi
  A simple thing such as a name annotator which has an import location of
type starts throwing exception when I create a jar of the application I am
developing and run over hadoop.

If I have to do it a java class file then I can use XMLInputSource in = new
XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);

But the relative paths in annotators, analysis engines etc starts throwing
exception

Please Help

Regards
Rohan

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

To simplify my question even furthur....I have a resource xml lets say a.xml
which goes inside hadoop job jar created by me
. After that in the mapper I want to read it what should I do.

2ndly to make it more complex. Their is some information about the location
about another resource file b.xml in the a.xml. The path specified is
relative to a.xml. For e.g if a.xml is folder 'a' and b.xml in folder 'b' ,
and both 'a' and 'b' folder is in folder 'c'. Then the path specified in
a.xml is ../b/b.xml. Now reading this info from a.xml I also have to parse
b.xml

How do I do that

Regards
Rohan

On Wed, Jun 11, 2008 at 5:29 PM, rohan rai <hi...@gmail.com> wrote:

> Well the question is for running UIMA over hadoop? How to do that as in
> UIMA there are xml descriptors which have relative urls and location? Which
> throws exception
>
> But I can probably do without that answer
>
> Simplifying the problem
>
> I create a jar for my application and I am trying to run a map reduce job
>
> In the map I am trying to read an xml resource which gives this kind of
> exceprion
>
> java.io.FileNotFoundException: /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml (No such file or directory)
>
> 	at java.io.FileInputStream.open(Native Method)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:106)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:66)
> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>
> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
> 	at java.net.URL.openStream(URL.java:1009)
> 	at org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)
>
>
> I think I require to pass on the content of the jar which contains the resource xml and classes(other than the JOB class) to each and every taskXXXXXXX getting created
>
> How can I do that
>
> REgards
> Rohan
>
>
>
>
> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <mb...@michael-baessler.de>
> wrote:
>
>> rohan rai wrote:
>> > Hi
>> >   A simple thing such as a name annotator which has an import location
>> of
>> > type starts throwing exception when I create a jar of the application I
>> am
>> > developing and run over hadoop.
>> >
>> > If I have to do it a java class file then I can use XMLInputSource in =
>> new
>> >
>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>> >
>> > But the relative paths in annotators, analysis engines etc starts
>> throwing
>> > exception
>> >
>> > Please Help
>> >
>> > Regards
>> > Rohan
>> >
>> I'm not sure I understand your question, but I think you need some help
>> with the exceptions you get.
>> Can you provide the exception stack trace?
>>
>> -- Michael
>>
>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Yup Yup Yup...it has the files...All the required classes and xml files...

On Thu, Jun 12, 2008 at 12:34 AM, Marshall Schor <ms...@schor.com> wrote:

> In the Jar that is being deployed, can you unzip it (Jars can be unzipped
> by any unzip tool) and see if it has in it (among many other things):
>
> <the top level / directory>
>   |
>   + types
>        |
>        + recordCandidateType.xml
> in other words, right below the top level, a directory called "types", and
> in that directory, a file called "recordCandidateType.xml" ?
>
> -Marshall
>
>
> rohan rai wrote:
>
>> Anyways just to specify neither import by name nor import by location
>> works....import by name results in following exception . If their is some
>> other way to specify the classpath then I dont know
>>
>> org.apache.uima.resource.ResourceInitializationException: An import
>> could not be resolved.  No .xml file with name
>> "types.recordCandidateType" was found in the class path or data path.
>> (Descriptor: <unknown>)
>>        at
>> org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:121)
>>        at
>> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:109)
>>        at
>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:124)
>>        at
>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>>        at
>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>>        at
>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:258)
>>        at
>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:303)
>>        at
>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:383)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:64)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:44)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
>>        at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
>>
>>
>> On Wed, Jun 11, 2008 at 7:17 PM, rohan rai <hi...@gmail.com> wrote:
>>
>>
>>
>>> I am sorry which jar are you talking about....To run UIMA App as a
>>> standalone I do not have to create the jar
>>> Are you saying Create a jar of the APP and then run it as a standalone??
>>>
>>> Regards
>>> Rohan
>>>
>>>
>>>
>>> On Wed, Jun 11, 2008 at 7:10 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>
>>>
>>>
>>>> So when you run it in Eclipse, it should run with
>>>> just the jar in the classpath, and no special setup
>>>> for the descriptors.  I assume you tried that?
>>>>
>>>> --Thilo
>>>>
>>>>
>>>> rohan rai wrote:
>>>>
>>>>
>>>>
>>>>> All the descriptors are in the jar....The whole app is in the
>>>>> jar.....then
>>>>> only I am running the jar on hadoop
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>
>>>>>  Best to put the descriptor in the jar, as I
>>>>>
>>>>>
>>>>>> said earlier...
>>>>>>
>>>>>>
>>>>>> rohan rai wrote:
>>>>>>
>>>>>>  Damn it can be run...somebody really gotcha put it in web ASAP...I
>>>>>>
>>>>>>
>>>>>>> promise
>>>>>>> if I somehow make it run in my m/c I will definitely put it up in my
>>>>>>> blog....
>>>>>>>
>>>>>>> Hey by the way to run UIMA annotator via eclipse with import name I
>>>>>>> have
>>>>>>> to
>>>>>>> add classpath in the build path(using eclipse)... Do I have to do
>>>>>>> something
>>>>>>> special to take care of that when running the same app in hadoop...
>>>>>>> Running
>>>>>>> hadoop via command line....
>>>>>>>
>>>>>>> Regards
>>>>>>> Rohan
>>>>>>>
>>>>>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>>>
>>>>>>>  I know for a fact that UIMA applications can be run on hadoop,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> so don't give up too quickly.  In your local tests, you need
>>>>>>>> to make sure that the system is really using the descriptor
>>>>>>>> you think it's using (which is why I suggested you test on a
>>>>>>>> different machine), not something it picks up from the environment.
>>>>>>>>
>>>>>>>> --Thilo
>>>>>>>>
>>>>>>>>
>>>>>>>> rohan rai wrote:
>>>>>>>>
>>>>>>>>  Yes with name import if I run it as a standalone it works perfectly
>>>>>>>> fine
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> but
>>>>>>>>> when I try to do it over hadoop then it goes haywire.
>>>>>>>>>
>>>>>>>>> I have to assume then a simple UIMA application with does a simple
>>>>>>>>> name
>>>>>>>>> annotation will also not run in that case
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Rohan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>
>>
>
>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

rohan rai wrote:
> Just edited it. Hopefully it is explanatory enough

That's great, thanks Rohan.

> 
> On Thu, Jun 12, 2008 at 2:24 PM, Thilo Goetz <tw...@gmx.de> wrote:
> 
>> Hi Rohan,
>>
>> good question.  I added a page under "developer tips" I
>> suggest you use:
>> http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> Hi Thilo
>>>
>>> Sorry for asking such a simple thing ...Under which topic should I add
>>> this
>>> info
>>>
>>> Regards
>>> Rohan
>>>
>>> On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>>
>>>  Hi Rohan,
>>>> I'm glad you got it to work.  This is useful information.  It would
>>>> be great if you could put it up on the UIMA Wiki:
>>>> http://cwiki.apache.org/UIMA/
>>>>
>>>> --Thilo
>>>>
>>>>
>>>> rohan rai wrote:
>>>>
>>>>  I think I got it.....Thanks for all the help you guys.........To make a
>>>>> simple UIMA app work over hadoop (I did it on pseudo distributed
>>>>> environment) 3-4 factors come together..
>>>>>
>>>>> 1) the UIMA app along with the mapper reducer and your job main file +
>>>>> the
>>>>> the resources should be contained within the job jar you created
>>>>>
>>>>> 2) probably all import in the descriptor should be import by name
>>>>> (haven't
>>>>> verified this works with location)
>>>>>
>>>>> 3) any resource being read in any of the class file should be done via
>>>>> Classloader
>>>>>  E.g XMLInputSource in = new
>>>>>
>>>>>
>>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>>
>>>>> 4) the When any AnalysisEngine or something like that of UIMA  is being
>>>>> getting produced (I am doing it in mapper) then ResourceManager should
>>>>> be
>>>>> used
>>>>>  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>>>>>               rMng.setExtensionClassPath(str, true); //Here str is the
>>>>> path to any of the resources which can be obtained via
>>>>>
>>>>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>>>>>               rMng.setDataPath(str);
>>>>>               aEngine =
>>>>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
>>>>>
>>>>> This 4th point has to be considered as when we read a xml without using
>>>>> classloader by default it reads from temp task directory eg.
>>>>>
>>>>>
>>>>>
>>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
>>>>>
>>>>> But all the resources and classes gets unjarred in
>>>>>
>>>>>
>>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
>>>>>
>>>>> directory
>>>>>
>>>>> So to tell the system to look out for the resources in the correct
>>>>> directory when not using classloader (which is what UIMA's
>>>>> XMLInputSource does)
>>>>> we have to use resource manager
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>>  ...
>>>>
>>>>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Just edited it. Hopefully it is explanatory enough

On Thu, Jun 12, 2008 at 2:24 PM, Thilo Goetz <tw...@gmx.de> wrote:

> Hi Rohan,
>
> good question.  I added a page under "developer tips" I
> suggest you use:
> http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop
>
> --Thilo
>
>
> rohan rai wrote:
>
>> Hi Thilo
>>
>> Sorry for asking such a simple thing ...Under which topic should I add
>> this
>> info
>>
>> Regards
>> Rohan
>>
>> On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <tw...@gmx.de> wrote:
>>
>>  Hi Rohan,
>>>
>>> I'm glad you got it to work.  This is useful information.  It would
>>> be great if you could put it up on the UIMA Wiki:
>>> http://cwiki.apache.org/UIMA/
>>>
>>> --Thilo
>>>
>>>
>>> rohan rai wrote:
>>>
>>>  I think I got it.....Thanks for all the help you guys.........To make a
>>>> simple UIMA app work over hadoop (I did it on pseudo distributed
>>>> environment) 3-4 factors come together..
>>>>
>>>> 1) the UIMA app along with the mapper reducer and your job main file +
>>>> the
>>>> the resources should be contained within the job jar you created
>>>>
>>>> 2) probably all import in the descriptor should be import by name
>>>> (haven't
>>>> verified this works with location)
>>>>
>>>> 3) any resource being read in any of the class file should be done via
>>>> Classloader
>>>>  E.g XMLInputSource in = new
>>>>
>>>>
>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>
>>>> 4) the When any AnalysisEngine or something like that of UIMA  is being
>>>> getting produced (I am doing it in mapper) then ResourceManager should
>>>> be
>>>> used
>>>>  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>>>>               rMng.setExtensionClassPath(str, true); //Here str is the
>>>> path to any of the resources which can be obtained via
>>>>
>>>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>>>>               rMng.setDataPath(str);
>>>>               aEngine =
>>>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
>>>>
>>>> This 4th point has to be considered as when we read a xml without using
>>>> classloader by default it reads from temp task directory eg.
>>>>
>>>>
>>>>
>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
>>>>
>>>> But all the resources and classes gets unjarred in
>>>>
>>>>
>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
>>>>
>>>> directory
>>>>
>>>> So to tell the system to look out for the resources in the correct
>>>> directory when not using classloader (which is what UIMA's
>>>> XMLInputSource does)
>>>> we have to use resource manager
>>>>
>>>> Regards
>>>> Rohan
>>>>
>>>>  ...
>>>
>>>
>>>
>>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

Hi Rohan,

good question.  I added a page under "developer tips" I
suggest you use:
http://cwiki.apache.org/confluence/display/UIMA/Running+UIMA+Apps+on+Hadoop

--Thilo

rohan rai wrote:
> Hi Thilo
> 
> Sorry for asking such a simple thing ...Under which topic should I add this
> info
> 
> Regards
> Rohan
> 
> On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <tw...@gmx.de> wrote:
> 
>> Hi Rohan,
>>
>> I'm glad you got it to work.  This is useful information.  It would
>> be great if you could put it up on the UIMA Wiki:
>> http://cwiki.apache.org/UIMA/
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> I think I got it.....Thanks for all the help you guys.........To make a
>>> simple UIMA app work over hadoop (I did it on pseudo distributed
>>> environment) 3-4 factors come together..
>>>
>>> 1) the UIMA app along with the mapper reducer and your job main file + the
>>> the resources should be contained within the job jar you created
>>>
>>> 2) probably all import in the descriptor should be import by name (haven't
>>> verified this works with location)
>>>
>>> 3) any resource being read in any of the class file should be done via
>>> Classloader
>>>   E.g XMLInputSource in = new
>>>
>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>
>>> 4) the When any AnalysisEngine or something like that of UIMA  is being
>>> getting produced (I am doing it in mapper) then ResourceManager should be
>>> used
>>>  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>>>                rMng.setExtensionClassPath(str, true); //Here str is the
>>> path to any of the resources which can be obtained via
>>>
>>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>>>                rMng.setDataPath(str);
>>>                aEngine =
>>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
>>>
>>> This 4th point has to be considered as when we read a xml without using
>>> classloader by default it reads from temp task directory eg.
>>>
>>>
>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
>>>
>>> But all the resources and classes gets unjarred in
>>>
>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
>>>
>>> directory
>>>
>>> So to tell the system to look out for the resources in the correct
>>> directory when not using classloader (which is what UIMA's
>>> XMLInputSource does)
>>> we have to use resource manager
>>>
>>> Regards
>>> Rohan
>>>
>> ...
>>
>>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Hi Thilo

Sorry for asking such a simple thing ...Under which topic should I add this
info

Regards
Rohan

On Thu, Jun 12, 2008 at 2:21 AM, Thilo Goetz <tw...@gmx.de> wrote:

> Hi Rohan,
>
> I'm glad you got it to work.  This is useful information.  It would
> be great if you could put it up on the UIMA Wiki:
> http://cwiki.apache.org/UIMA/
>
> --Thilo
>
>
> rohan rai wrote:
>
>> I think I got it.....Thanks for all the help you guys.........To make a
>> simple UIMA app work over hadoop (I did it on pseudo distributed
>> environment) 3-4 factors come together..
>>
>> 1) the UIMA app along with the mapper reducer and your job main file + the
>> the resources should be contained within the job jar you created
>>
>> 2) probably all import in the descriptor should be import by name (haven't
>> verified this works with location)
>>
>> 3) any resource being read in any of the class file should be done via
>> Classloader
>>   E.g XMLInputSource in = new
>>
>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>
>> 4) the When any AnalysisEngine or something like that of UIMA  is being
>> getting produced (I am doing it in mapper) then ResourceManager should be
>> used
>>  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>>                rMng.setExtensionClassPath(str, true); //Here str is the
>> path to any of the resources which can be obtained via
>>
>> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>>                rMng.setDataPath(str);
>>                aEngine =
>> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
>>
>> This 4th point has to be considered as when we read a xml without using
>> classloader by default it reads from temp task directory eg.
>>
>>
>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
>>
>> But all the resources and classes gets unjarred in
>>
>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
>>
>> directory
>>
>> So to tell the system to look out for the resources in the correct
>> directory when not using classloader (which is what UIMA's
>> XMLInputSource does)
>> we have to use resource manager
>>
>> Regards
>> Rohan
>>
> ...
>
>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

Hi Rohan,

I'm glad you got it to work.  This is useful information.  It would
be great if you could put it up on the UIMA Wiki:
http://cwiki.apache.org/UIMA/

--Thilo

rohan rai wrote:
> I think I got it.....Thanks for all the help you guys.........To make a
> simple UIMA app work over hadoop (I did it on pseudo distributed
> environment) 3-4 factors come together..
> 
> 1) the UIMA app along with the mapper reducer and your job main file + the
> the resources should be contained within the job jar you created
> 
> 2) probably all import in the descriptor should be import by name (haven't
> verified this works with location)
> 
> 3) any resource being read in any of the class file should be done via
> Classloader
>    E.g XMLInputSource in = new
> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
> 
> 4) the When any AnalysisEngine or something like that of UIMA  is being
> getting produced (I am doing it in mapper) then ResourceManager should be
> used
>   E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
>                 rMng.setExtensionClassPath(str, true); //Here str is the
> path to any of the resources which can be obtained via
> 
> //ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
>                 rMng.setDataPath(str);
>                 aEngine =
> UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);
> 
> This 4th point has to be considered as when we read a xml without using
> classloader by default it reads from temp task directory eg.
> 
> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/
> 
> But all the resources and classes gets unjarred in
> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work
> 
> directory
> 
> So to tell the system to look out for the resources in the correct
> directory when not using classloader (which is what UIMA's
> XMLInputSource does)
> we have to use resource manager
> 
> Regards
> Rohan
...

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

I think I got it.....Thanks for all the help you guys.........To make a
simple UIMA app work over hadoop (I did it on pseudo distributed
environment) 3-4 factors come together..

1) the UIMA app along with the mapper reducer and your job main file + the
the resources should be contained within the job jar you created

2) probably all import in the descriptor should be import by name (haven't
verified this works with location)

3) any resource being read in any of the class file should be done via
Classloader
   E.g XMLInputSource in = new
XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);

4) the When any AnalysisEngine or something like that of UIMA  is being
getting produced (I am doing it in mapper) then ResourceManager should be
used
  E.g. ResourceManager rMng=UIMAFramework.newDefaultResourceManager();
                rMng.setExtensionClassPath(str, true); //Here str is the
path to any of the resources which can be obtained via

//ClassLoader.getSystemResource(aeXmlDescriptor).getPath()
                rMng.setDataPath(str);
                aEngine =
UIMAFramework.produceAnalysisEngine(aSpecifier,rMng,null);

This 4th point has to be considered as when we read a xml without using
classloader by default it reads from temp task directory eg.

/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/task_200806112341_0002_m_000000_0/

But all the resources and classes gets unjarred in
/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806112341_0002/work

directory

So to tell the system to look out for the resources in the correct
directory when not using classloader (which is what UIMA's
XMLInputSource does)
we have to use resource manager

Regards
Rohan


On Thu, Jun 12, 2008 at 12:34 AM, Marshall Schor <ms...@schor.com> wrote:

> In the Jar that is being deployed, can you unzip it (Jars can be unzipped
> by any unzip tool) and see if it has in it (among many other things):
>
> <the top level / directory>
>   |
>   + types
>        |
>        + recordCandidateType.xml
> in other words, right below the top level, a directory called "types", and
> in that directory, a file called "recordCandidateType.xml" ?
>
> -Marshall
>
>
> rohan rai wrote:
>
>> Anyways just to specify neither import by name nor import by location
>> works....import by name results in following exception . If their is some
>> other way to specify the classpath then I dont know
>>
>> org.apache.uima.resource.ResourceInitializationException: An import
>> could not be resolved.  No .xml file with name
>> "types.recordCandidateType" was found in the class path or data path.
>> (Descriptor: <unknown>)
>>        at
>> org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:121)
>>        at
>> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:109)
>>        at
>> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:124)
>>        at
>> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
>>        at
>> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
>>        at
>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:258)
>>        at
>> org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:303)
>>        at
>> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:383)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:64)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:44)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
>>        at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
>>
>>
>> On Wed, Jun 11, 2008 at 7:17 PM, rohan rai <hi...@gmail.com> wrote:
>>
>>
>>
>>> I am sorry which jar are you talking about....To run UIMA App as a
>>> standalone I do not have to create the jar
>>> Are you saying Create a jar of the APP and then run it as a standalone??
>>>
>>> Regards
>>> Rohan
>>>
>>>
>>>
>>> On Wed, Jun 11, 2008 at 7:10 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>
>>>
>>>
>>>> So when you run it in Eclipse, it should run with
>>>> just the jar in the classpath, and no special setup
>>>> for the descriptors.  I assume you tried that?
>>>>
>>>> --Thilo
>>>>
>>>>
>>>> rohan rai wrote:
>>>>
>>>>
>>>>
>>>>> All the descriptors are in the jar....The whole app is in the
>>>>> jar.....then
>>>>> only I am running the jar on hadoop
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>
>>>>>  Best to put the descriptor in the jar, as I
>>>>>
>>>>>
>>>>>> said earlier...
>>>>>>
>>>>>>
>>>>>> rohan rai wrote:
>>>>>>
>>>>>>  Damn it can be run...somebody really gotcha put it in web ASAP...I
>>>>>>
>>>>>>
>>>>>>> promise
>>>>>>> if I somehow make it run in my m/c I will definitely put it up in my
>>>>>>> blog....
>>>>>>>
>>>>>>> Hey by the way to run UIMA annotator via eclipse with import name I
>>>>>>> have
>>>>>>> to
>>>>>>> add classpath in the build path(using eclipse)... Do I have to do
>>>>>>> something
>>>>>>> special to take care of that when running the same app in hadoop...
>>>>>>> Running
>>>>>>> hadoop via command line....
>>>>>>>
>>>>>>> Regards
>>>>>>> Rohan
>>>>>>>
>>>>>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>>>
>>>>>>>  I know for a fact that UIMA applications can be run on hadoop,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> so don't give up too quickly.  In your local tests, you need
>>>>>>>> to make sure that the system is really using the descriptor
>>>>>>>> you think it's using (which is why I suggested you test on a
>>>>>>>> different machine), not something it picks up from the environment.
>>>>>>>>
>>>>>>>> --Thilo
>>>>>>>>
>>>>>>>>
>>>>>>>> rohan rai wrote:
>>>>>>>>
>>>>>>>>  Yes with name import if I run it as a standalone it works perfectly
>>>>>>>> fine
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> but
>>>>>>>>> when I try to do it over hadoop then it goes haywire.
>>>>>>>>>
>>>>>>>>> I have to assume then a simple UIMA application with does a simple
>>>>>>>>> name
>>>>>>>>> annotation will also not run in that case
>>>>>>>>>
>>>>>>>>> Regards
>>>>>>>>> Rohan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>
>>
>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Correctly said that by name is imported without .xml and import by location
with .xml extension location but the p[roblem still remains. How to make it
work over hadoop

On Wed, Jun 11, 2008 atnd import  11:03 PM, Jaroslaw Cwiklik <
cwiklik@us.ibm.com> wrote:

> If I remember correctly, when using import by name you dont specify .xml
> extension. Also,
> if the resource has a path, you need to specify it like this:
>
> data.resources.file
>
> not
>
> data/resources/file.xml
>
> Hopefully, this helps.
>
> JC
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Jerry Cwiklik
> UIMA Extensions
> IBM T.J. Watson Research Center
> Hawtorne, NY, 10532
> Tel: 914-784-7665, T/L: 863-7665
> Email: cwiklik@us.ibm.com
>
>
> [image: Inactive hide details for "rohan rai" <hi...@gmail.com>]"rohan
> rai" <hi...@gmail.com>
>
>
>
>     *"rohan rai" <hi...@gmail.com>*
>
>             06/11/2008 01:02 PM
>             Please respond to
>             uima-user@incubator.apache.org
>
>
> To
>
> uima-user@incubator.apache.org
> cc
>
>
> Subject
>
> Re: import location over Hadoop
>
> Anyways just to specify neither import by name nor import by location
> works....import by name results in following exception . If their is some
> other way to specify the classpath then I dont know
>
> org.apache.uima.resource.ResourceInitializationException: An import
> could not be resolved.  No .xml file with name
> "types.recordCandidateType" was found in the class path or data path.
> (Descriptor: <unknown>)
> at
> org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:121)
> at
> org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:109)
> at
> org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:124)
> at
> org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
> at
> org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:258)
> at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:303)
> at
> org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:383)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:64)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:44)
> at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
>
>
> On Wed, Jun 11, 2008 at 7:17 PM, rohan rai <hi...@gmail.com> wrote:
>
> > I am sorry which jar are you talking about....To run UIMA App as a
> > standalone I do not have to create the jar
> > Are you saying Create a jar of the APP and then run it as a standalone??
> >
> > Regards
> > Rohan
> >
> >
> >
> > On Wed, Jun 11, 2008 at 7:10 PM, Thilo Goetz <tw...@gmx.de> wrote:
> >
> >> So when you run it in Eclipse, it should run with
> >> just the jar in the classpath, and no special setup
> >> for the descriptors.  I assume you tried that?
> >>
> >> --Thilo
> >>
> >>
> >> rohan rai wrote:
> >>
> >>> All the descriptors are in the jar....The whole app is in the
> >>> jar.....then
> >>> only I am running the jar on hadoop
> >>>
> >>> Regards
> >>> Rohan
> >>>
> >>> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
> >>>
> >>>  Best to put the descriptor in the jar, as I
> >>>> said earlier...
> >>>>
> >>>>
> >>>> rohan rai wrote:
> >>>>
> >>>>  Damn it can be run...somebody really gotcha put it in web ASAP...I
> >>>>> promise
> >>>>> if I somehow make it run in my m/c I will definitely put it up in my
> >>>>> blog....
> >>>>>
> >>>>> Hey by the way to run UIMA annotator via eclipse with import name I
> >>>>> have
> >>>>> to
> >>>>> add classpath in the build path(using eclipse)... Do I have to do
> >>>>> something
> >>>>> special to take care of that when running the same app in hadoop...
> >>>>> Running
> >>>>> hadoop via command line....
> >>>>>
> >>>>> Regards
> >>>>> Rohan
> >>>>>
> >>>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
> >>>>>
> >>>>>  I know for a fact that UIMA applications can be run on hadoop,
> >>>>>
> >>>>>> so don't give up too quickly.  In your local tests, you need
> >>>>>> to make sure that the system is really using the descriptor
> >>>>>> you think it's using (which is why I suggested you test on a
> >>>>>> different machine), not something it picks up from the environment.
> >>>>>>
> >>>>>> --Thilo
> >>>>>>
> >>>>>>
> >>>>>> rohan rai wrote:
> >>>>>>
> >>>>>>  Yes with name import if I run it as a standalone it works perfectly
> >>>>>> fine
> >>>>>>
> >>>>>>> but
> >>>>>>> when I try to do it over hadoop then it goes haywire.
> >>>>>>>
> >>>>>>> I have to assume then a simple UIMA application with does a simple
> >>>>>>> name
> >>>>>>> annotation will also not run in that case
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Rohan
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>
> >
>
>

Re: import location over Hadoop

Posted by Jaroslaw Cwiklik <cw...@us.ibm.com>.





If I remember correctly, when using import by name you dont specify .xml
extension. Also,
if the resource has a path, you need to specify it like this:

data.resources.file

      not

data/resources/file.xml

Hopefully, this helps.

JC


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Jerry Cwiklik
 UIMA Extensions
 IBM T.J.  Watson Research Center
 Hawtorne, NY, 10532
 Tel: 914-784-7665,  T/L: 863-7665
 Email: cwiklik@us.ibm.com



                                                                           
             "rohan rai"                                                   
             <hirohanin@gmail.                                             
             com>                                                       To 
                                       uima-user@incubator.apache.org      
             06/11/2008 01:02                                           cc 
             PM                                                            
                                                                   Subject 
                                       Re: import location over Hadoop     
             Please respond to                                             
             uima-user@incubat                                             
               or.apache.org                                               
                                                                           
                                                                           
                                                                           




Anyways just to specify neither import by name nor import by location
works....import by name results in following exception . If their is some
other way to specify the classpath then I dont know

org.apache.uima.resource.ResourceInitializationException: An import
could not be resolved.  No .xml file with name
"types.recordCandidateType" was found in the class path or data path.
(Descriptor: <unknown>)
             at
org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:121)

             at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:109)

             at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:124)

             at
org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)

             at
org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)

             at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:258)
             at
org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:303)
             at
org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:383)
             at
org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:64)
             at
org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:44)
             at
org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
             at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
             at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
             at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)


On Wed, Jun 11, 2008 at 7:17 PM, rohan rai <hi...@gmail.com> wrote:

> I am sorry which jar are you talking about....To run UIMA App as a
> standalone I do not have to create the jar
> Are you saying Create a jar of the APP and then run it as a standalone??
>
> Regards
> Rohan
>
>
>
> On Wed, Jun 11, 2008 at 7:10 PM, Thilo Goetz <tw...@gmx.de> wrote:
>
>> So when you run it in Eclipse, it should run with
>> just the jar in the classpath, and no special setup
>> for the descriptors.  I assume you tried that?
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> All the descriptors are in the jar....The whole app is in the
>>> jar.....then
>>> only I am running the jar on hadoop
>>>
>>> Regards
>>> Rohan
>>>
>>> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>
>>>  Best to put the descriptor in the jar, as I
>>>> said earlier...
>>>>
>>>>
>>>> rohan rai wrote:
>>>>
>>>>  Damn it can be run...somebody really gotcha put it in web ASAP...I
>>>>> promise
>>>>> if I somehow make it run in my m/c I will definitely put it up in my
>>>>> blog....
>>>>>
>>>>> Hey by the way to run UIMA annotator via eclipse with import name I
>>>>> have
>>>>> to
>>>>> add classpath in the build path(using eclipse)... Do I have to do
>>>>> something
>>>>> special to take care of that when running the same app in hadoop...
>>>>> Running
>>>>> hadoop via command line....
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>
>>>>>  I know for a fact that UIMA applications can be run on hadoop,
>>>>>
>>>>>> so don't give up too quickly.  In your local tests, you need
>>>>>> to make sure that the system is really using the descriptor
>>>>>> you think it's using (which is why I suggested you test on a
>>>>>> different machine), not something it picks up from the environment.
>>>>>>
>>>>>> --Thilo
>>>>>>
>>>>>>
>>>>>> rohan rai wrote:
>>>>>>
>>>>>>  Yes with name import if I run it as a standalone it works perfectly
>>>>>> fine
>>>>>>
>>>>>>> but
>>>>>>> when I try to do it over hadoop then it goes haywire.
>>>>>>>
>>>>>>> I have to assume then a simple UIMA application with does a simple
>>>>>>> name
>>>>>>> annotation will also not run in that case
>>>>>>>
>>>>>>> Regards
>>>>>>> Rohan
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>

Re: import location over Hadoop

Posted by Marshall Schor <ms...@schor.com>.

In the Jar that is being deployed, can you unzip it (Jars can be 
unzipped by any unzip tool) and see if it has in it (among many other 
things):

<the top level / directory>
    |
    + types
         |
         + recordCandidateType.xml 

in other words, right below the top level, a directory called "types", 
and in that directory, a file called "recordCandidateType.xml" ?

-Marshall

rohan rai wrote:
> Anyways just to specify neither import by name nor import by location
> works....import by name results in following exception . If their is some
> other way to specify the classpath then I dont know
>
> org.apache.uima.resource.ResourceInitializationException: An import
> could not be resolved.  No .xml file with name
> "types.recordCandidateType" was found in the class path or data path.
> (Descriptor: <unknown>)
> 	at org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:121)
> 	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:109)
> 	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:124)
> 	at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
> 	at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
> 	at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:258)
> 	at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:303)
> 	at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:383)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:64)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:44)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
>
>
> On Wed, Jun 11, 2008 at 7:17 PM, rohan rai <hi...@gmail.com> wrote:
>
>   
>> I am sorry which jar are you talking about....To run UIMA App as a
>> standalone I do not have to create the jar
>> Are you saying Create a jar of the APP and then run it as a standalone??
>>
>> Regards
>> Rohan
>>
>>
>>
>> On Wed, Jun 11, 2008 at 7:10 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>
>>     
>>> So when you run it in Eclipse, it should run with
>>> just the jar in the classpath, and no special setup
>>> for the descriptors.  I assume you tried that?
>>>
>>> --Thilo
>>>
>>>
>>> rohan rai wrote:
>>>
>>>       
>>>> All the descriptors are in the jar....The whole app is in the
>>>> jar.....then
>>>> only I am running the jar on hadoop
>>>>
>>>> Regards
>>>> Rohan
>>>>
>>>> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>
>>>>  Best to put the descriptor in the jar, as I
>>>>         
>>>>> said earlier...
>>>>>
>>>>>
>>>>> rohan rai wrote:
>>>>>
>>>>>  Damn it can be run...somebody really gotcha put it in web ASAP...I
>>>>>           
>>>>>> promise
>>>>>> if I somehow make it run in my m/c I will definitely put it up in my
>>>>>> blog....
>>>>>>
>>>>>> Hey by the way to run UIMA annotator via eclipse with import name I
>>>>>> have
>>>>>> to
>>>>>> add classpath in the build path(using eclipse)... Do I have to do
>>>>>> something
>>>>>> special to take care of that when running the same app in hadoop...
>>>>>> Running
>>>>>> hadoop via command line....
>>>>>>
>>>>>> Regards
>>>>>> Rohan
>>>>>>
>>>>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>>
>>>>>>  I know for a fact that UIMA applications can be run on hadoop,
>>>>>>
>>>>>>             
>>>>>>> so don't give up too quickly.  In your local tests, you need
>>>>>>> to make sure that the system is really using the descriptor
>>>>>>> you think it's using (which is why I suggested you test on a
>>>>>>> different machine), not something it picks up from the environment.
>>>>>>>
>>>>>>> --Thilo
>>>>>>>
>>>>>>>
>>>>>>> rohan rai wrote:
>>>>>>>
>>>>>>>  Yes with name import if I run it as a standalone it works perfectly
>>>>>>> fine
>>>>>>>
>>>>>>>               
>>>>>>>> but
>>>>>>>> when I try to do it over hadoop then it goes haywire.
>>>>>>>>
>>>>>>>> I have to assume then a simple UIMA application with does a simple
>>>>>>>> name
>>>>>>>> annotation will also not run in that case
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Rohan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Anyways just to specify neither import by name nor import by location
works....import by name results in following exception . If their is some
other way to specify the classpath then I dont know

org.apache.uima.resource.ResourceInitializationException: An import
could not be resolved.  No .xml file with name
"types.recordCandidateType" was found in the class path or data path.
(Descriptor: <unknown>)
	at org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:121)
	at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:109)
	at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:124)
	at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
	at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
	at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:258)
	at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:303)
	at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:383)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:64)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:44)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)


On Wed, Jun 11, 2008 at 7:17 PM, rohan rai <hi...@gmail.com> wrote:

> I am sorry which jar are you talking about....To run UIMA App as a
> standalone I do not have to create the jar
> Are you saying Create a jar of the APP and then run it as a standalone??
>
> Regards
> Rohan
>
>
>
> On Wed, Jun 11, 2008 at 7:10 PM, Thilo Goetz <tw...@gmx.de> wrote:
>
>> So when you run it in Eclipse, it should run with
>> just the jar in the classpath, and no special setup
>> for the descriptors.  I assume you tried that?
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> All the descriptors are in the jar....The whole app is in the
>>> jar.....then
>>> only I am running the jar on hadoop
>>>
>>> Regards
>>> Rohan
>>>
>>> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>
>>>  Best to put the descriptor in the jar, as I
>>>> said earlier...
>>>>
>>>>
>>>> rohan rai wrote:
>>>>
>>>>  Damn it can be run...somebody really gotcha put it in web ASAP...I
>>>>> promise
>>>>> if I somehow make it run in my m/c I will definitely put it up in my
>>>>> blog....
>>>>>
>>>>> Hey by the way to run UIMA annotator via eclipse with import name I
>>>>> have
>>>>> to
>>>>> add classpath in the build path(using eclipse)... Do I have to do
>>>>> something
>>>>> special to take care of that when running the same app in hadoop...
>>>>> Running
>>>>> hadoop via command line....
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>>
>>>>>  I know for a fact that UIMA applications can be run on hadoop,
>>>>>
>>>>>> so don't give up too quickly.  In your local tests, you need
>>>>>> to make sure that the system is really using the descriptor
>>>>>> you think it's using (which is why I suggested you test on a
>>>>>> different machine), not something it picks up from the environment.
>>>>>>
>>>>>> --Thilo
>>>>>>
>>>>>>
>>>>>> rohan rai wrote:
>>>>>>
>>>>>>  Yes with name import if I run it as a standalone it works perfectly
>>>>>> fine
>>>>>>
>>>>>>> but
>>>>>>> when I try to do it over hadoop then it goes haywire.
>>>>>>>
>>>>>>> I have to assume then a simple UIMA application with does a simple
>>>>>>> name
>>>>>>> annotation will also not run in that case
>>>>>>>
>>>>>>> Regards
>>>>>>> Rohan
>>>>>>>
>>>>>>>
>>>>>>>
>>>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

I am sorry which jar are you talking about....To run UIMA App as a
standalone I do not have to create the jar
Are you saying Create a jar of the APP and then run it as a standalone??

Regards
Rohan


On Wed, Jun 11, 2008 at 7:10 PM, Thilo Goetz <tw...@gmx.de> wrote:

> So when you run it in Eclipse, it should run with
> just the jar in the classpath, and no special setup
> for the descriptors.  I assume you tried that?
>
> --Thilo
>
>
> rohan rai wrote:
>
>> All the descriptors are in the jar....The whole app is in the jar.....then
>> only I am running the jar on hadoop
>>
>> Regards
>> Rohan
>>
>> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>
>>  Best to put the descriptor in the jar, as I
>>> said earlier...
>>>
>>>
>>> rohan rai wrote:
>>>
>>>  Damn it can be run...somebody really gotcha put it in web ASAP...I
>>>> promise
>>>> if I somehow make it run in my m/c I will definitely put it up in my
>>>> blog....
>>>>
>>>> Hey by the way to run UIMA annotator via eclipse with import name I have
>>>> to
>>>> add classpath in the build path(using eclipse)... Do I have to do
>>>> something
>>>> special to take care of that when running the same app in hadoop...
>>>> Running
>>>> hadoop via command line....
>>>>
>>>> Regards
>>>> Rohan
>>>>
>>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>>
>>>>  I know for a fact that UIMA applications can be run on hadoop,
>>>>
>>>>> so don't give up too quickly.  In your local tests, you need
>>>>> to make sure that the system is really using the descriptor
>>>>> you think it's using (which is why I suggested you test on a
>>>>> different machine), not something it picks up from the environment.
>>>>>
>>>>> --Thilo
>>>>>
>>>>>
>>>>> rohan rai wrote:
>>>>>
>>>>>  Yes with name import if I run it as a standalone it works perfectly
>>>>> fine
>>>>>
>>>>>> but
>>>>>> when I try to do it over hadoop then it goes haywire.
>>>>>>
>>>>>> I have to assume then a simple UIMA application with does a simple
>>>>>> name
>>>>>> annotation will also not run in that case
>>>>>>
>>>>>> Regards
>>>>>> Rohan
>>>>>>
>>>>>>
>>>>>>
>>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

So when you run it in Eclipse, it should run with
just the jar in the classpath, and no special setup
for the descriptors.  I assume you tried that?

--Thilo

rohan rai wrote:
> All the descriptors are in the jar....The whole app is in the jar.....then
> only I am running the jar on hadoop
> 
> Regards
> Rohan
> 
> On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:
> 
>> Best to put the descriptor in the jar, as I
>> said earlier...
>>
>>
>> rohan rai wrote:
>>
>>> Damn it can be run...somebody really gotcha put it in web ASAP...I promise
>>> if I somehow make it run in my m/c I will definitely put it up in my
>>> blog....
>>>
>>> Hey by the way to run UIMA annotator via eclipse with import name I have
>>> to
>>> add classpath in the build path(using eclipse)... Do I have to do
>>> something
>>> special to take care of that when running the same app in hadoop...
>>> Running
>>> hadoop via command line....
>>>
>>> Regards
>>> Rohan
>>>
>>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>>
>>>  I know for a fact that UIMA applications can be run on hadoop,
>>>> so don't give up too quickly.  In your local tests, you need
>>>> to make sure that the system is really using the descriptor
>>>> you think it's using (which is why I suggested you test on a
>>>> different machine), not something it picks up from the environment.
>>>>
>>>> --Thilo
>>>>
>>>>
>>>> rohan rai wrote:
>>>>
>>>>  Yes with name import if I run it as a standalone it works perfectly fine
>>>>> but
>>>>> when I try to do it over hadoop then it goes haywire.
>>>>>
>>>>> I have to assume then a simple UIMA application with does a simple name
>>>>> annotation will also not run in that case
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

All the descriptors are in the jar....The whole app is in the jar.....then
only I am running the jar on hadoop

Regards
Rohan

On Wed, Jun 11, 2008 at 6:54 PM, Thilo Goetz <tw...@gmx.de> wrote:

> Best to put the descriptor in the jar, as I
> said earlier...
>
>
> rohan rai wrote:
>
>> Damn it can be run...somebody really gotcha put it in web ASAP...I promise
>> if I somehow make it run in my m/c I will definitely put it up in my
>> blog....
>>
>> Hey by the way to run UIMA annotator via eclipse with import name I have
>> to
>> add classpath in the build path(using eclipse)... Do I have to do
>> something
>> special to take care of that when running the same app in hadoop...
>> Running
>> hadoop via command line....
>>
>> Regards
>> Rohan
>>
>> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>
>>  I know for a fact that UIMA applications can be run on hadoop,
>>> so don't give up too quickly.  In your local tests, you need
>>> to make sure that the system is really using the descriptor
>>> you think it's using (which is why I suggested you test on a
>>> different machine), not something it picks up from the environment.
>>>
>>> --Thilo
>>>
>>>
>>> rohan rai wrote:
>>>
>>>  Yes with name import if I run it as a standalone it works perfectly fine
>>>> but
>>>> when I try to do it over hadoop then it goes haywire.
>>>>
>>>> I have to assume then a simple UIMA application with does a simple name
>>>> annotation will also not run in that case
>>>>
>>>> Regards
>>>> Rohan
>>>>
>>>>
>>>
>>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

Best to put the descriptor in the jar, as I
said earlier...

rohan rai wrote:
> Damn it can be run...somebody really gotcha put it in web ASAP...I promise
> if I somehow make it run in my m/c I will definitely put it up in my
> blog....
> 
> Hey by the way to run UIMA annotator via eclipse with import name I have to
> add classpath in the build path(using eclipse)... Do I have to do something
> special to take care of that when running the same app in hadoop... Running
> hadoop via command line....
> 
> Regards
> Rohan
> 
> On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:
> 
>> I know for a fact that UIMA applications can be run on hadoop,
>> so don't give up too quickly.  In your local tests, you need
>> to make sure that the system is really using the descriptor
>> you think it's using (which is why I suggested you test on a
>> different machine), not something it picks up from the environment.
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> Yes with name import if I run it as a standalone it works perfectly fine
>>> but
>>> when I try to do it over hadoop then it goes haywire.
>>>
>>> I have to assume then a simple UIMA application with does a simple name
>>> annotation will also not run in that case
>>>
>>> Regards
>>> Rohan
>>>
>>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Damn it can be run...somebody really gotcha put it in web ASAP...I promise
if I somehow make it run in my m/c I will definitely put it up in my
blog....

Hey by the way to run UIMA annotator via eclipse with import name I have to
add classpath in the build path(using eclipse)... Do I have to do something
special to take care of that when running the same app in hadoop... Running
hadoop via command line....

Regards
Rohan

On Wed, Jun 11, 2008 at 6:47 PM, Thilo Goetz <tw...@gmx.de> wrote:

> I know for a fact that UIMA applications can be run on hadoop,
> so don't give up too quickly.  In your local tests, you need
> to make sure that the system is really using the descriptor
> you think it's using (which is why I suggested you test on a
> different machine), not something it picks up from the environment.
>
> --Thilo
>
>
> rohan rai wrote:
>
>> Yes with name import if I run it as a standalone it works perfectly fine
>> but
>> when I try to do it over hadoop then it goes haywire.
>>
>> I have to assume then a simple UIMA application with does a simple name
>> annotation will also not run in that case
>>
>> Regards
>> Rohan
>>
>
>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

I know for a fact that UIMA applications can be run on hadoop,
so don't give up too quickly.  In your local tests, you need
to make sure that the system is really using the descriptor
you think it's using (which is why I suggested you test on a
different machine), not something it picks up from the environment.

--Thilo

rohan rai wrote:
> Yes with name import if I run it as a standalone it works perfectly fine but
> when I try to do it over hadoop then it goes haywire.
> 
> I have to assume then a simple UIMA application with does a simple name
> annotation will also not run in that case
> 
> Regards
> Rohan

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Yes with name import if I run it as a standalone it works perfectly fine but
when I try to do it over hadoop then it goes haywire.

I have to assume then a simple UIMA application with does a simple name
annotation will also not run in that case

Regards
Rohan

On Wed, Jun 11, 2008 at 6:35 PM, Thilo Goetz <tw...@gmx.de> wrote:

> That's most likely because the XML isn't valid :-)
> Seriously, the "no content allowed in prolog" message
> is sometimes due to an incorrect text encoding.
>
> Does this run ok locally?
>
> --Thilo
>
>
> rohan rai wrote:
>
>> Thanks Thilo. Well If do that all sorts of invalid xml exception is
>> getting
>> thrown
>>
>> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
>> <unknown source>.
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
>>        at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
>> Caused by: org.xml.sax.SAXParseException: Content is not allowed in
>> prolog.
>>        at
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>>        at
>> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
>>        ... 8 more
>> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
>> <unknown source>.
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
>>        at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
>>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
>>        at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
>> Caused by: org.xml.sax.SAXParseException: Content is not allowed in
>> prolog.
>>        at
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
>>        at
>> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
>>        at
>> org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
>>
>>
>>
>> On Wed, Jun 11, 2008 at 6:08 PM, Thilo Goetz <tw...@gmx.de> wrote:
>>
>>  You need to use import by name instead of import
>>> by location in your descriptor.  Then things get
>>> loaded via the classpath and you should be ok
>>> (provided that you stick your descriptors in the
>>> jar of course).  I suggest you test this locally
>>> first by moving your application to a different
>>> machine where you don't have any descriptors
>>> lying around.  It'll be easier to debug than in
>>> hadoop.
>>>
>>> --Thilo
>>>
>>>
>>> rohan rai wrote:
>>>
>>>  Well the question is for running UIMA over hadoop? How to do that as in
>>>> UIMA
>>>> there are xml descriptors which have relative urls and location? Which
>>>> throws exception
>>>>
>>>> But I can probably do without that answer
>>>>
>>>> Simplifying the problem
>>>>
>>>> I create a jar for my application and I am trying to run a map reduce
>>>> job
>>>>
>>>> In the map I am trying to read an xml resource which gives this kind of
>>>> exceprion
>>>>
>>>> java.io.FileNotFoundException:
>>>>
>>>>
>>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
>>>> (No such file or directory)
>>>>       at java.io.FileInputStream.open(Native Method)
>>>>       at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>>>       at java.io.FileInputStream.<init>(FileInputStream.java:66)
>>>>       at
>>>>
>>>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>>>       at
>>>>
>>>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>>>       at java.net.URL.openStream(URL.java:1009)
>>>>       at
>>>> org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)
>>>>
>>>> I think I require to pass on the content of the jar which contains the
>>>> resource xml and classes(other than the JOB class) to each and every
>>>> taskXXXXXXX getting created
>>>>
>>>> How can I do that
>>>>
>>>> REgards
>>>> Rohan
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <
>>>> mba@michael-baessler.de>
>>>> wrote:
>>>>
>>>>  rohan rai wrote:
>>>>
>>>>> Hi
>>>>>>  A simple thing such as a name annotator which has an import location
>>>>>> of
>>>>>> type starts throwing exception when I create a jar of the application
>>>>>> I
>>>>>>
>>>>>>  am
>>>>>
>>>>>  developing and run over hadoop.
>>>>>>
>>>>>> If I have to do it a java class file then I can use XMLInputSource in
>>>>>> =
>>>>>>
>>>>>>  new
>>>>>
>>>>>
>>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>>
>>>>>  But the relative paths in annotators, analysis engines etc starts
>>>>>>
>>>>>>  throwing
>>>>>
>>>>>  exception
>>>>>>
>>>>>> Please Help
>>>>>>
>>>>>> Regards
>>>>>> Rohan
>>>>>>
>>>>>>  I'm not sure I understand your question, but I think you need some
>>>>>> help
>>>>>>
>>>>> with the exceptions you get.
>>>>> Can you provide the exception stack trace?
>>>>>
>>>>> -- Michael
>>>>>
>>>>>
>>>>>
>>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

That's most likely because the XML isn't valid :-)
Seriously, the "no content allowed in prolog" message
is sometimes due to an incorrect text encoding.

Does this run ok locally?

--Thilo

rohan rai wrote:
> Thanks Thilo. Well If do that all sorts of invalid xml exception is getting
> thrown
> 
> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
> <unknown source>.
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> 	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
> 	... 8 more
> org.apache.uima.util.InvalidXMLException: Invalid descriptor at
> <unknown source>.
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
> 	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
> 	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
> 	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
> 	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
> 	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
> 	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
> 
> 
> 
> On Wed, Jun 11, 2008 at 6:08 PM, Thilo Goetz <tw...@gmx.de> wrote:
> 
>> You need to use import by name instead of import
>> by location in your descriptor.  Then things get
>> loaded via the classpath and you should be ok
>> (provided that you stick your descriptors in the
>> jar of course).  I suggest you test this locally
>> first by moving your application to a different
>> machine where you don't have any descriptors
>> lying around.  It'll be easier to debug than in
>> hadoop.
>>
>> --Thilo
>>
>>
>> rohan rai wrote:
>>
>>> Well the question is for running UIMA over hadoop? How to do that as in
>>> UIMA
>>> there are xml descriptors which have relative urls and location? Which
>>> throws exception
>>>
>>> But I can probably do without that answer
>>>
>>> Simplifying the problem
>>>
>>> I create a jar for my application and I am trying to run a map reduce job
>>>
>>> In the map I am trying to read an xml resource which gives this kind of
>>> exceprion
>>>
>>> java.io.FileNotFoundException:
>>>
>>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
>>> (No such file or directory)
>>>        at java.io.FileInputStream.open(Native Method)
>>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>>        at java.io.FileInputStream.<init>(FileInputStream.java:66)
>>>        at
>>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>>        at
>>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>>        at java.net.URL.openStream(URL.java:1009)
>>>        at
>>> org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)
>>>
>>> I think I require to pass on the content of the jar which contains the
>>> resource xml and classes(other than the JOB class) to each and every
>>> taskXXXXXXX getting created
>>>
>>> How can I do that
>>>
>>> REgards
>>> Rohan
>>>
>>>
>>>
>>>
>>> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <
>>> mba@michael-baessler.de>
>>> wrote:
>>>
>>>  rohan rai wrote:
>>>>> Hi
>>>>>  A simple thing such as a name annotator which has an import location of
>>>>> type starts throwing exception when I create a jar of the application I
>>>>>
>>>> am
>>>>
>>>>> developing and run over hadoop.
>>>>>
>>>>> If I have to do it a java class file then I can use XMLInputSource in =
>>>>>
>>>> new
>>>>
>>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>>
>>>>> But the relative paths in annotators, analysis engines etc starts
>>>>>
>>>> throwing
>>>>
>>>>> exception
>>>>>
>>>>> Please Help
>>>>>
>>>>> Regards
>>>>> Rohan
>>>>>
>>>>>  I'm not sure I understand your question, but I think you need some help
>>>> with the exceptions you get.
>>>> Can you provide the exception stack trace?
>>>>
>>>> -- Michael
>>>>
>>>>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Thanks Thilo. Well If do that all sorts of invalid xml exception is getting
thrown

org.apache.uima.util.InvalidXMLException: Invalid descriptor at
<unknown source>.
	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)
	... 8 more
org.apache.uima.util.InvalidXMLException: Invalid descriptor at
<unknown source>.
	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:193)
	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:365)
	at org.apache.uima.util.impl.XMLParser_impl.parseResourceSpecifier(XMLParser_impl.java:346)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.dQFile(DQHadoopMain.java:45)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:37)
	at org.ziva.dq.hadoop.DQHadoopMain$Map.map(DQHadoopMain.java:1)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:208)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
Caused by: org.xml.sax.SAXParseException: Content is not allowed in prolog.
	at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1231)
	at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
	at org.apache.uima.util.impl.XMLParser_impl.parse(XMLParser_impl.java:176)



On Wed, Jun 11, 2008 at 6:08 PM, Thilo Goetz <tw...@gmx.de> wrote:

> You need to use import by name instead of import
> by location in your descriptor.  Then things get
> loaded via the classpath and you should be ok
> (provided that you stick your descriptors in the
> jar of course).  I suggest you test this locally
> first by moving your application to a different
> machine where you don't have any descriptors
> lying around.  It'll be easier to debug than in
> hadoop.
>
> --Thilo
>
>
> rohan rai wrote:
>
>> Well the question is for running UIMA over hadoop? How to do that as in
>> UIMA
>> there are xml descriptors which have relative urls and location? Which
>> throws exception
>>
>> But I can probably do without that answer
>>
>> Simplifying the problem
>>
>> I create a jar for my application and I am trying to run a map reduce job
>>
>> In the map I am trying to read an xml resource which gives this kind of
>> exceprion
>>
>> java.io.FileNotFoundException:
>>
>> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
>> (No such file or directory)
>>        at java.io.FileInputStream.open(Native Method)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>        at java.io.FileInputStream.<init>(FileInputStream.java:66)
>>        at
>> sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
>>        at
>> sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
>>        at java.net.URL.openStream(URL.java:1009)
>>        at
>> org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)
>>
>> I think I require to pass on the content of the jar which contains the
>> resource xml and classes(other than the JOB class) to each and every
>> taskXXXXXXX getting created
>>
>> How can I do that
>>
>> REgards
>> Rohan
>>
>>
>>
>>
>> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <
>> mba@michael-baessler.de>
>> wrote:
>>
>>  rohan rai wrote:
>>>
>>>> Hi
>>>>  A simple thing such as a name annotator which has an import location of
>>>> type starts throwing exception when I create a jar of the application I
>>>>
>>> am
>>>
>>>> developing and run over hadoop.
>>>>
>>>> If I have to do it a java class file then I can use XMLInputSource in =
>>>>
>>> new
>>>
>>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>>
>>>> But the relative paths in annotators, analysis engines etc starts
>>>>
>>> throwing
>>>
>>>> exception
>>>>
>>>> Please Help
>>>>
>>>> Regards
>>>> Rohan
>>>>
>>>>  I'm not sure I understand your question, but I think you need some help
>>> with the exceptions you get.
>>> Can you provide the exception stack trace?
>>>
>>> -- Michael
>>>
>>>
>>

Re: import location over Hadoop

Posted by Thilo Goetz <tw...@gmx.de>.

You need to use import by name instead of import
by location in your descriptor.  Then things get
loaded via the classpath and you should be ok
(provided that you stick your descriptors in the
jar of course).  I suggest you test this locally
first by moving your application to a different
machine where you don't have any descriptors
lying around.  It'll be easier to debug than in
hadoop.

--Thilo

rohan rai wrote:
> Well the question is for running UIMA over hadoop? How to do that as in UIMA
> there are xml descriptors which have relative urls and location? Which
> throws exception
> 
> But I can probably do without that answer
> 
> Simplifying the problem
> 
> I create a jar for my application and I am trying to run a map reduce job
> 
> In the map I am trying to read an xml resource which gives this kind of
> exceprion
> 
> java.io.FileNotFoundException:
> /tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
> (No such file or directory)
> 	at java.io.FileInputStream.open(Native Method)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:106)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:66)
> 	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
> 	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
> 	at java.net.URL.openStream(URL.java:1009)
> 	at org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)
> 
> I think I require to pass on the content of the jar which contains the
> resource xml and classes(other than the JOB class) to each and every
> taskXXXXXXX getting created
> 
> How can I do that
> 
> REgards
> Rohan
> 
> 
> 
> 
> On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <mb...@michael-baessler.de>
> wrote:
> 
>> rohan rai wrote:
>>> Hi
>>>   A simple thing such as a name annotator which has an import location of
>>> type starts throwing exception when I create a jar of the application I
>> am
>>> developing and run over hadoop.
>>>
>>> If I have to do it a java class file then I can use XMLInputSource in =
>> new
>> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
>>> But the relative paths in annotators, analysis engines etc starts
>> throwing
>>> exception
>>>
>>> Please Help
>>>
>>> Regards
>>> Rohan
>>>
>> I'm not sure I understand your question, but I think you need some help
>> with the exceptions you get.
>> Can you provide the exception stack trace?
>>
>> -- Michael
>>
>

Re: import location over Hadoop

Posted by rohan rai <hi...@gmail.com>.

Well the question is for running UIMA over hadoop? How to do that as in UIMA
there are xml descriptors which have relative urls and location? Which
throws exception

But I can probably do without that answer

Simplifying the problem

I create a jar for my application and I am trying to run a map reduce job

In the map I am trying to read an xml resource which gives this kind of
exceprion

java.io.FileNotFoundException:
/tmp/hadoop-root/mapred/local/taskTracker/jobcache/job_200806102252_0028/task_200806102252_0028_m_000000_0/./descriptors/annotators/RecordCandidateAnnotator.xml
(No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:106)
	at java.io.FileInputStream.<init>(FileInputStream.java:66)
	at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
	at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
	at java.net.URL.openStream(URL.java:1009)
	at org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:83)

I think I require to pass on the content of the jar which contains the
resource xml and classes(other than the JOB class) to each and every
taskXXXXXXX getting created

How can I do that

REgards
Rohan




On Wed, Jun 11, 2008 at 5:12 PM, Michael Baessler <mb...@michael-baessler.de>
wrote:

> rohan rai wrote:
> > Hi
> >   A simple thing such as a name annotator which has an import location of
> > type starts throwing exception when I create a jar of the application I
> am
> > developing and run over hadoop.
> >
> > If I have to do it a java class file then I can use XMLInputSource in =
> new
> >
> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
> >
> > But the relative paths in annotators, analysis engines etc starts
> throwing
> > exception
> >
> > Please Help
> >
> > Regards
> > Rohan
> >
> I'm not sure I understand your question, but I think you need some help
> with the exceptions you get.
> Can you provide the exception stack trace?
>
> -- Michael
>

Re: import location over Hadoop

Posted by Michael Baessler <mb...@michael-baessler.de>.

rohan rai wrote:
> Hi
>   A simple thing such as a name annotator which has an import location of
> type starts throwing exception when I create a jar of the application I am
> developing and run over hadoop.
> 
> If I have to do it a java class file then I can use XMLInputSource in = new
> XMLInputSource(ClassLoader.getSystemResourceAsStream(aeXmlDescriptor),null);
> 
> But the relative paths in annotators, analysis engines etc starts throwing
> exception
> 
> Please Help
> 
> Regards
> Rohan
> 
I'm not sure I understand your question, but I think you need some help with the exceptions you get.
Can you provide the exception stack trace?

-- Michael