You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Dan Filimon <da...@gmail.com> on 2013/02/12 19:05:38 UTC

Accessing the local filesystem from AbstractJob

When creating my own job driver, I'm unable to give it any inputs from
the local file system. An exception gets thrown when starting the job
(and trying to get the splits).
Apparently the files have to be on HDFS.

Is there any way around this (ideally, I'd like it to first look for
the file on the local file system and if no file is found, look at
HDFS)?

Re: Accessing the local filesystem from AbstractJob

Posted by Dan Filimon <da...@gmail.com>.
Yes, that's right. I tried it, and it worked but I forgot to e-mail saying so.
Thanks!

On Thu, Feb 14, 2013 at 7:16 PM, Ted Dunning <te...@gmail.com> wrote:
> I think that file: is the right way to access the local file system.
>
> On Wed, Feb 13, 2013 at 4:14 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> Hmm I think it will work if you use "file:///..." URIs? I haven't tried in
>> a long time though.
>>
>>
>> On Wed, Feb 13, 2013 at 12:12 PM, Dan Filimon
>> <da...@gmail.com>wrote:
>>
>> > I see. Well, my use case was wanting to run the job on one machine,
>> > being lazy and not wanting to put the files on HDFS. :)
>> >
>> > On Tue, Feb 12, 2013 at 8:27 PM, Sean Owen <sr...@gmail.com> wrote:
>> > > Yes because the input path is something processed by the jobtracker and
>> > > later the tasktrackers themselves, which won't be on your machine
>> > > (necessarily).
>> > >
>> > > Mappers can read the local file system but it's not clear what may or
>> may
>> > > not be there. Consider the distributed cache for smallish data.
>> > >
>> > >
>> > > On Tue, Feb 12, 2013 at 7:05 PM, Dan Filimon <
>> > dangeorge.filimon@gmail.com>wrote:
>> > >
>> > >> When creating my own job driver, I'm unable to give it any inputs from
>> > >> the local file system. An exception gets thrown when starting the job
>> > >> (and trying to get the splits).
>> > >> Apparently the files have to be on HDFS.
>> > >>
>> > >> Is there any way around this (ideally, I'd like it to first look for
>> > >> the file on the local file system and if no file is found, look at
>> > >> HDFS)?
>> > >>
>> >
>>

Re: Accessing the local filesystem from AbstractJob

Posted by Ted Dunning <te...@gmail.com>.
I think that file: is the right way to access the local file system.

On Wed, Feb 13, 2013 at 4:14 AM, Sean Owen <sr...@gmail.com> wrote:

> Hmm I think it will work if you use "file:///..." URIs? I haven't tried in
> a long time though.
>
>
> On Wed, Feb 13, 2013 at 12:12 PM, Dan Filimon
> <da...@gmail.com>wrote:
>
> > I see. Well, my use case was wanting to run the job on one machine,
> > being lazy and not wanting to put the files on HDFS. :)
> >
> > On Tue, Feb 12, 2013 at 8:27 PM, Sean Owen <sr...@gmail.com> wrote:
> > > Yes because the input path is something processed by the jobtracker and
> > > later the tasktrackers themselves, which won't be on your machine
> > > (necessarily).
> > >
> > > Mappers can read the local file system but it's not clear what may or
> may
> > > not be there. Consider the distributed cache for smallish data.
> > >
> > >
> > > On Tue, Feb 12, 2013 at 7:05 PM, Dan Filimon <
> > dangeorge.filimon@gmail.com>wrote:
> > >
> > >> When creating my own job driver, I'm unable to give it any inputs from
> > >> the local file system. An exception gets thrown when starting the job
> > >> (and trying to get the splits).
> > >> Apparently the files have to be on HDFS.
> > >>
> > >> Is there any way around this (ideally, I'd like it to first look for
> > >> the file on the local file system and if no file is found, look at
> > >> HDFS)?
> > >>
> >
>

Re: Accessing the local filesystem from AbstractJob

Posted by Sean Owen <sr...@gmail.com>.
Hmm I think it will work if you use "file:///..." URIs? I haven't tried in
a long time though.


On Wed, Feb 13, 2013 at 12:12 PM, Dan Filimon
<da...@gmail.com>wrote:

> I see. Well, my use case was wanting to run the job on one machine,
> being lazy and not wanting to put the files on HDFS. :)
>
> On Tue, Feb 12, 2013 at 8:27 PM, Sean Owen <sr...@gmail.com> wrote:
> > Yes because the input path is something processed by the jobtracker and
> > later the tasktrackers themselves, which won't be on your machine
> > (necessarily).
> >
> > Mappers can read the local file system but it's not clear what may or may
> > not be there. Consider the distributed cache for smallish data.
> >
> >
> > On Tue, Feb 12, 2013 at 7:05 PM, Dan Filimon <
> dangeorge.filimon@gmail.com>wrote:
> >
> >> When creating my own job driver, I'm unable to give it any inputs from
> >> the local file system. An exception gets thrown when starting the job
> >> (and trying to get the splits).
> >> Apparently the files have to be on HDFS.
> >>
> >> Is there any way around this (ideally, I'd like it to first look for
> >> the file on the local file system and if no file is found, look at
> >> HDFS)?
> >>
>

Re: Accessing the local filesystem from AbstractJob

Posted by Dan Filimon <da...@gmail.com>.
I see. Well, my use case was wanting to run the job on one machine,
being lazy and not wanting to put the files on HDFS. :)

On Tue, Feb 12, 2013 at 8:27 PM, Sean Owen <sr...@gmail.com> wrote:
> Yes because the input path is something processed by the jobtracker and
> later the tasktrackers themselves, which won't be on your machine
> (necessarily).
>
> Mappers can read the local file system but it's not clear what may or may
> not be there. Consider the distributed cache for smallish data.
>
>
> On Tue, Feb 12, 2013 at 7:05 PM, Dan Filimon <da...@gmail.com>wrote:
>
>> When creating my own job driver, I'm unable to give it any inputs from
>> the local file system. An exception gets thrown when starting the job
>> (and trying to get the splits).
>> Apparently the files have to be on HDFS.
>>
>> Is there any way around this (ideally, I'd like it to first look for
>> the file on the local file system and if no file is found, look at
>> HDFS)?
>>

Re: Accessing the local filesystem from AbstractJob

Posted by Sean Owen <sr...@gmail.com>.
Yes because the input path is something processed by the jobtracker and
later the tasktrackers themselves, which won't be on your machine
(necessarily).

Mappers can read the local file system but it's not clear what may or may
not be there. Consider the distributed cache for smallish data.


On Tue, Feb 12, 2013 at 7:05 PM, Dan Filimon <da...@gmail.com>wrote:

> When creating my own job driver, I'm unable to give it any inputs from
> the local file system. An exception gets thrown when starting the job
> (and trying to get the splits).
> Apparently the files have to be on HDFS.
>
> Is there any way around this (ideally, I'd like it to first look for
> the file on the local file system and if no file is found, look at
> HDFS)?
>