You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Robert Spurrier <sp...@gmail.com> on 2015/08/24 20:04:58 UTC

Overriding PEAR Installation Metadata For Moving it Using Hadoop's DistributedCache

Hello,

I'm trying to use PEAR files with Hadoop's DistributedCache mechanism. 
The cache provides all of the distribution and cleanup mechanisms involved 
with metadata on a cluster of Hadoop datanodes, and the PEARs provide a 
convenient delivery for NLP pipelines. My problem is that it the 
DistributedCache is read-only, and the PEAR installation procedures require 
overwriting macros and creating files in the directory in which it will be 
used. So for now I install locally, compress the installed PEAR directory, 
and ship it off to the grid.  

Then I use an override mechanism to load an AE from the relocated PEAR:
I've modified the uimaj-core source, specifically ASB_impl.java and 
PearAnalysisEngineWrapper.java, to check for install directory override 
parameters. If given, the ConfigurationParameterSettings and 
ExternalResourceSpecifiers in the ResourceCreationSpecifier are modified 
by replacing the local install directory with the current datanode's 
DistributedCache directory, where the PEAR now lives. It works great, but 
I'd rather not deal with maintaining the tainted source, since to me right 
now it seems like something that was not intended for PEARs.

Now that I have some more time to try to do things 'right', is there a 
preferred way to leverage the API to make a portable pear PEAR when you 
don't know the name of the directory in which it will ultimately live? 
DistributedCache directories for a datanode are uniquely stamped, so I 
can't change anything until the PEAR mechanisms have loaded the 
description resources into memory.


Thanks for your time and effort, using UIMA in MapReduce has been a treat 
so far!


Rob

Re: Overriding PEAR Installation Metadata For Moving it Using Hadoop's DistributedCache

Posted by Marshall Schor <ms...@schor.com>.

Hi,

It sounds like you've got it mostly solved.  I'm wondering if the "fix" might be
to run a step after your PEAR install step, before you zip things up, that
replaces absolute paths with some kind of relative path specifications that
would work when the thing is zipped up and distributed?

-Marshall

On 8/24/2015 2:04 PM, Robert Spurrier wrote:
> Hello,
>
> I'm trying to use PEAR files with Hadoop's DistributedCache mechanism. 
> The cache provides all of the distribution and cleanup mechanisms involved 
> with metadata on a cluster of Hadoop datanodes, and the PEARs provide a 
> convenient delivery for NLP pipelines. My problem is that it the 
> DistributedCache is read-only, and the PEAR installation procedures require 
> overwriting macros and creating files in the directory in which it will be 
> used. So for now I install locally, compress the installed PEAR directory, 
> and ship it off to the grid.  
>
> Then I use an override mechanism to load an AE from the relocated PEAR:
> I've modified the uimaj-core source, specifically ASB_impl.java and 
> PearAnalysisEngineWrapper.java, to check for install directory override 
> parameters. If given, the ConfigurationParameterSettings and 
> ExternalResourceSpecifiers in the ResourceCreationSpecifier are modified 
> by replacing the local install directory with the current datanode's 
> DistributedCache directory, where the PEAR now lives. It works great, but 
> I'd rather not deal with maintaining the tainted source, since to me right 
> now it seems like something that was not intended for PEARs.
>
> Now that I have some more time to try to do things 'right', is there a 
> preferred way to leverage the API to make a portable pear PEAR when you 
> don't know the name of the directory in which it will ultimately live? 
> DistributedCache directories for a datanode are uniquely stamped, so I 
> can't change anything until the PEAR mechanisms have loaded the 
> description resources into memory.
>
>
> Thanks for your time and effort, using UIMA in MapReduce has been a treat 
> so far!
>
>
> Rob
>
>
>
>