You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by javaxtreme <sa...@virginia.edu> on 2008/06/25 17:45:45 UTC

Global Variables via DFS

Hello all,
I am having a bit of a problem with a seemingly simple problem. I would like
to have some global variable which is a byte array that all of my map tasks
have access to. The best way that I currently know of to do this is to have
a file sitting on the DFS and load that into each map task (note: the global
variable is very small ~20kB). My problem is that I can't seem to load any
file from the Hadoop DFS into my program via the API. I know that the
DistributedFileSystem class has to come into play, but for the life of me I
can't get it to work. 

I noticed there is an initialize() method within the DistributedFileSystem
class, and I thought that I would need to call that, however I'm unsure what
the URI parameter ought to be. I tried "localhost:50070" which stalled the
system and threw a connectionTimeout error. I went on to just attempt to
call DistributedFileSystem.open() but again my program failed this time with
a NullPointerException. I'm assuming that is stemming from he fact that my
DFS object is not "initialized".

Does anyone have any information on how exactly one programatically goes
about loading in a file from the DFS? I would greatly appreciate any help.

Cheers,
Sean M. Arietta
-- 
View this message in context: http://www.nabble.com/Global-Variables-via-DFS-tp18115661p18115661.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Global Variables via DFS

Posted by Sean Arietta <sa...@virginia.edu>.
Thanks very much for your help.

I ended up figuring out a solution a few hours ago. Here is what I did:

Path file = new Path("/user/seanarietta/testDB_candidate");
FileSystem fs = file.getFileSystem(conf);
		
FSDataInputStream data_in = fs.open(file, 1392);

That was in the configure method of the map task and allowed me to load in
the static byte array. I'm not sure if this was suggested by one of you, but
thank you very much for responding. Hopefully this will help someone with a
similar problem.

Cheers,
Sean


Sean Arietta wrote:
> 
> Hello all,
> I am having a bit of a problem with a seemingly simple problem. I would
> like to have some global variable which is a byte array that all of my map
> tasks have access to. The best way that I currently know of to do this is
> to have a file sitting on the DFS and load that into each map task (note:
> the global variable is very small ~20kB). My problem is that I can't seem
> to load any file from the Hadoop DFS into my program via the API. I know
> that the DistributedFileSystem class has to come into play, but for the
> life of me I can't get it to work. 
> 
> I noticed there is an initialize() method within the DistributedFileSystem
> class, and I thought that I would need to call that, however I'm unsure
> what the URI parameter ought to be. I tried "localhost:50070" which
> stalled the system and threw a connectionTimeout error. I went on to just
> attempt to call DistributedFileSystem.open() but again my program failed
> this time with a NullPointerException. I'm assuming that is stemming from
> he fact that my DFS object is not "initialized".
> 
> Does anyone have any information on how exactly one programatically goes
> about loading in a file from the DFS? I would greatly appreciate any help.
> 
> Cheers,
> Sean M. Arietta
> 

-- 
View this message in context: http://www.nabble.com/Global-Variables-via-DFS-tp18115661p18119996.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: Global Variables via DFS

Posted by Steve Loughran <st...@apache.org>.
javaxtreme wrote:
> Hello all,
> I am having a bit of a problem with a seemingly simple problem. I would like
> to have some global variable which is a byte array that all of my map tasks
> have access to. The best way that I currently know of to do this is to have
> a file sitting on the DFS and load that into each map task (note: the global
> variable is very small ~20kB). My problem is that I can't seem to load any
> file from the Hadoop DFS into my program via the API. I know that the
> DistributedFileSystem class has to come into play, but for the life of me I
> can't get it to work. 
> 
> I noticed there is an initialize() method within the DistributedFileSystem
> class, and I thought that I would need to call that, however I'm unsure what
> the URI parameter ought to be. I tried "localhost:50070" which stalled the
> system and threw a connectionTimeout error. I went on to just attempt to
> call DistributedFileSystem.open() but again my program failed this time with
> a NullPointerException. I'm assuming that is stemming from he fact that my
> DFS object is not "initialized".
> 
> Does anyone have any information on how exactly one programatically goes
> about loading in a file from the DFS? I would greatly appreciate any help.
> 

If the data changes, this sounds more like the kind of data that a 
distributed hash table or tuple space should be looking after...sharing 
facts between nodes

1. what is the rate of change of the data?
2. what are your requirements for consistency?

If the data is static, then yes, a shared file works.  Here's my code 
fragments to work with one. You grab the URI from the configuration, 
then initialise the DFS with both the URI and the configuration.

     public static DistributedFileSystem 
createFileSystem(ManagedConfiguration conf) throws 
SmartFrogRuntimeException {
         String filesystemURL = 
conf.get(HadoopConfiguration.FS_DEFAULT_NAME);
         URI uri = null;
         try {
             uri = new URI(filesystemURL);
         } catch (URISyntaxException e) {
             throw (SmartFrogRuntimeException) SmartFrogRuntimeException
                     .forward(ERROR_INVALID_FILESYSTEM_URI + filesystemURL,
                             e);
         }
         DistributedFileSystem dfs = new DistributedFileSystem();
         try {
             dfs.initialize(uri, conf);
         } catch (IOException e) {
             throw (SmartFrogRuntimeException) SmartFrogRuntimeException
                     .forward(ERROR_FAILED_TO_INITIALISE_FILESYSTEM, e);

         }
         return dfs;
     }

As to what URLs work, try  "localhost:9000"; this works on machines 
where I've brought a DFS up on that port. Use netstat to verify your 
chosen port is live.