You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Keith Wiley <kw...@keithwiley.com> on 2014/01/16 23:41:36 UTC
DistributedCache is empty
My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
What on Earth am I doing wrong here?
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"Luminous beings are we, not this crude matter."
-- Yoda
________________________________________________________________________________
Re: DistributedCache is empty
Posted by Keith Wiley <kw...@keithwiley.com>.
2.0.0
The problem was I was creating a new Configuration and giving it to the Job ctor (which I believe is demonstrated in some tutorials) whereas the correct behavior was to retrieve the preexisting Configuration and use that instead. This may be a distinction between writing a bare driver and one that overrides Configured and Tool.
On Jan 17, 2014, at 09:46 , Vinod Kumar Vavilapalli wrote:
> What is the version of Hadoop that you are using?
>
> +Vinod
>
> On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
>
>> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>>
>> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>>
>> What on Earth am I doing wrong here?
>>
>> ________________________________________________________________________________
>> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>>
>> "Luminous beings are we, not this crude matter."
>> -- Yoda
>> ________________________________________________________________________________
>>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
-- Keith Wiley
________________________________________________________________________________
Re: DistributedCache is empty
Posted by Keith Wiley <kw...@keithwiley.com>.
2.0.0
The problem was I was creating a new Configuration and giving it to the Job ctor (which I believe is demonstrated in some tutorials) whereas the correct behavior was to retrieve the preexisting Configuration and use that instead. This may be a distinction between writing a bare driver and one that overrides Configured and Tool.
On Jan 17, 2014, at 09:46 , Vinod Kumar Vavilapalli wrote:
> What is the version of Hadoop that you are using?
>
> +Vinod
>
> On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
>
>> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>>
>> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>>
>> What on Earth am I doing wrong here?
>>
>> ________________________________________________________________________________
>> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>>
>> "Luminous beings are we, not this crude matter."
>> -- Yoda
>> ________________________________________________________________________________
>>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
-- Keith Wiley
________________________________________________________________________________
Re: DistributedCache is empty
Posted by Keith Wiley <kw...@keithwiley.com>.
2.0.0
The problem was I was creating a new Configuration and giving it to the Job ctor (which I believe is demonstrated in some tutorials) whereas the correct behavior was to retrieve the preexisting Configuration and use that instead. This may be a distinction between writing a bare driver and one that overrides Configured and Tool.
On Jan 17, 2014, at 09:46 , Vinod Kumar Vavilapalli wrote:
> What is the version of Hadoop that you are using?
>
> +Vinod
>
> On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
>
>> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>>
>> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>>
>> What on Earth am I doing wrong here?
>>
>> ________________________________________________________________________________
>> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>>
>> "Luminous beings are we, not this crude matter."
>> -- Yoda
>> ________________________________________________________________________________
>>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
-- Keith Wiley
________________________________________________________________________________
Re: DistributedCache is empty
Posted by Keith Wiley <kw...@keithwiley.com>.
2.0.0
The problem was I was creating a new Configuration and giving it to the Job ctor (which I believe is demonstrated in some tutorials) whereas the correct behavior was to retrieve the preexisting Configuration and use that instead. This may be a distinction between writing a bare driver and one that overrides Configured and Tool.
On Jan 17, 2014, at 09:46 , Vinod Kumar Vavilapalli wrote:
> What is the version of Hadoop that you are using?
>
> +Vinod
>
> On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
>
>> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>>
>> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>>
>> What on Earth am I doing wrong here?
>>
>> ________________________________________________________________________________
>> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>>
>> "Luminous beings are we, not this crude matter."
>> -- Yoda
>> ________________________________________________________________________________
>>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
________________________________________________________________________________
Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
"It's a fine line between meticulous and obsessive-compulsive and a slippery
rope between obsessive-compulsive and debilitatingly slow."
-- Keith Wiley
________________________________________________________________________________
Re: DistributedCache is empty
Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
What is the version of Hadoop that you are using?
+Vinod
On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>
> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>
> What on Earth am I doing wrong here?
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
> -- Yoda
> ________________________________________________________________________________
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: DistributedCache is empty
Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
What is the version of Hadoop that you are using?
+Vinod
On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>
> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>
> What on Earth am I doing wrong here?
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
> -- Yoda
> ________________________________________________________________________________
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: DistributedCache is empty
Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
What is the version of Hadoop that you are using?
+Vinod
On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>
> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>
> What on Earth am I doing wrong here?
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
> -- Yoda
> ________________________________________________________________________________
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: DistributedCache is empty
Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
What is the version of Hadoop that you are using?
+Vinod
On Jan 16, 2014, at 2:41 PM, Keith Wiley <kw...@keithwiley.com> wrote:
> My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to work. Usage on the command line is straight forward, I simply add "-files foo.py,bar.py" right after the class name (where those files are in the current directory I'm running hadoop from, i.e., the local nonHDFS filesystem). The mapper then inspects the file list via DistributedCache.getLocalCacheFiles(context.getConfiguration()) and doesn't see the files, there's nothing there. Likewise, if I attempt to run those python scripts from the mapper using hadoop.util.Shell, the files obviously can't be found.
>
> That should have worked, so I shouldn't have to rely on the DC methods, but nevertheless, I tried anyway, so in the driver I create a new Configuration, then call DistributedCache.addCacheFile(new URI("./foo.py"), conf), thus referencing the local nonHDFS file in the current working directory. I then add conf to the job ctor, seems straight forward. Still no dice, the mapper can't see the files, they simply aren't there.
>
> What on Earth am I doing wrong here?
>
> ________________________________________________________________________________
> Keith Wiley kwiley@keithwiley.com keithwiley.com music.keithwiley.com
>
> "Luminous beings are we, not this crude matter."
> -- Yoda
> ________________________________________________________________________________
>
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.