You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by ma...@nissatech.com on 2015/05/11 23:25:37 UTC

Reading a sequence file from distributed cache

Hello,

I'm new to Hadoop and I'm having a problem reading from a sequence file that I 
add to distributed cache.

I didn't have problems when I ran it in standalone mode, but now in pseudo- 
distributed and distributed I do.

I'm adding file to distributed cache like this
 
And reading from it in mapper's setup method
 And I'm getting FileNotFoundException.

Can anyone please help me and explain to me what is the problem and how to do 
this properly?

Thanks

Sent with [inky](http://inky.com?kme=signature)

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,

I'm using 1.2.1 in pseudo-distributed mode and the same code on a 
cluster with 0.20.2, but I'm having same problem in both cases. I'm 
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?

Do you have any idea what could be the problem? And what do you mean by 
- Have you seen this?

Maybe I'm making some mistake by using context passed to Mapper to read 
the file?

Configuration conf = context.getConfiguration();

Best regards,
Marko

On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I'm new to Hadoop and I'm having a problem reading from a sequence
>     file that I add to distributed cache.
>
>     I didn't have problems when I ran it in standalone mode, but now
>     in pseudo-distributed and distributed I do.
>
>     I'm adding file to distributed cache like this
>
>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
>     And reading from it in mapper's setup method
>
>     |         Configuration conf = context.getConfiguration();
>              FileSystem fs = FileSystem.get(conf);
>
>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>              List<Element> sketch = new ArrayList<Element>();
>
>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>              while(medoidsReader.next(medoidKey, medoidValue)){
>
>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>                  sketch.add(medoidWritable.getElement());
>              }|
>
>     And I'm getting FileNotFoundException.
>
>     Can anyone please help me and explain to me what is the problem
>     and how to do this properly?
>
>     Thanks
>
>     Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,

I'm using 1.2.1 in pseudo-distributed mode and the same code on a 
cluster with 0.20.2, but I'm having same problem in both cases. I'm 
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?

Do you have any idea what could be the problem? And what do you mean by 
- Have you seen this?

Maybe I'm making some mistake by using context passed to Mapper to read 
the file?

Configuration conf = context.getConfiguration();

Best regards,
Marko

On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I'm new to Hadoop and I'm having a problem reading from a sequence
>     file that I add to distributed cache.
>
>     I didn't have problems when I ran it in standalone mode, but now
>     in pseudo-distributed and distributed I do.
>
>     I'm adding file to distributed cache like this
>
>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
>     And reading from it in mapper's setup method
>
>     |         Configuration conf = context.getConfiguration();
>              FileSystem fs = FileSystem.get(conf);
>
>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>              List<Element> sketch = new ArrayList<Element>();
>
>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>              while(medoidsReader.next(medoidKey, medoidValue)){
>
>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>                  sketch.add(medoidWritable.getElement());
>              }|
>
>     And I'm getting FileNotFoundException.
>
>     Can anyone please help me and explain to me what is the problem
>     and how to do this properly?
>
>     Thanks
>
>     Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,

I'm using 1.2.1 in pseudo-distributed mode and the same code on a 
cluster with 0.20.2, but I'm having same problem in both cases. I'm 
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?

Do you have any idea what could be the problem? And what do you mean by 
- Have you seen this?

Maybe I'm making some mistake by using context passed to Mapper to read 
the file?

Configuration conf = context.getConfiguration();

Best regards,
Marko

On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I'm new to Hadoop and I'm having a problem reading from a sequence
>     file that I add to distributed cache.
>
>     I didn't have problems when I ran it in standalone mode, but now
>     in pseudo-distributed and distributed I do.
>
>     I'm adding file to distributed cache like this
>
>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
>     And reading from it in mapper's setup method
>
>     |         Configuration conf = context.getConfiguration();
>              FileSystem fs = FileSystem.get(conf);
>
>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>              List<Element> sketch = new ArrayList<Element>();
>
>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>              while(medoidsReader.next(medoidKey, medoidValue)){
>
>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>                  sketch.add(medoidWritable.getElement());
>              }|
>
>     And I'm getting FileNotFoundException.
>
>     Can anyone please help me and explain to me what is the problem
>     and how to do this properly?
>
>     Thanks
>
>     Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,

I'm using 1.2.1 in pseudo-distributed mode and the same code on a 
cluster with 0.20.2, but I'm having same problem in both cases. I'm 
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?

Do you have any idea what could be the problem? And what do you mean by 
- Have you seen this?

Maybe I'm making some mistake by using context passed to Mapper to read 
the file?

Configuration conf = context.getConfiguration();

Best regards,
Marko

On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I'm new to Hadoop and I'm having a problem reading from a sequence
>     file that I add to distributed cache.
>
>     I didn't have problems when I ran it in standalone mode, but now
>     in pseudo-distributed and distributed I do.
>
>     I'm adding file to distributed cache like this
>
>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
>     And reading from it in mapper's setup method
>
>     |         Configuration conf = context.getConfiguration();
>              FileSystem fs = FileSystem.get(conf);
>
>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>              List<Element> sketch = new ArrayList<Element>();
>
>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>              while(medoidsReader.next(medoidKey, medoidValue)){
>
>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>                  sketch.add(medoidWritable.getElement());
>              }|
>
>     And I'm getting FileNotFoundException.
>
>     Can anyone please help me and explain to me what is the problem
>     and how to do this properly?
>
>     Thanks
>
>     Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?

Have you seen this?

Regards,
Shahab

On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:

>  Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?

Have you seen this?

Regards,
Shahab

On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:

>  Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,

Thanks, I didn't understand that. Now I get it.

Best regards,
Marko

On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I have used getCacheFiles() instead of getLocalCacheFiles() and
>     now it works.
>
>     Can someone please explain the difference between the two? I'm not
>     able to find some good explanation about it to understand how it
>     works.
>
>     Thanks,
>     Marko
>
>
>     On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
>     <ma...@nissatech.com> wrote:
>>
>>     Hello,
>>
>>     I'm new to Hadoop and I'm having a problem reading from a
>>     sequence file that I add to distributed cache.
>>
>>     I didn't have problems when I ran it in standalone mode, but now
>>     in pseudo-distributed and distributed I do.
>>
>>     I'm adding file to distributed cache like this
>>
>>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>>     And reading from it in mapper's setup method
>>
>>     |         Configuration conf = context.getConfiguration();
>>              FileSystem fs = FileSystem.get(conf);
>>
>>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>>              List<Element> sketch = new ArrayList<Element>();
>>
>>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>>              while(medoidsReader.next(medoidKey, medoidValue)){
>>
>>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>>                  sketch.add(medoidWritable.getElement());
>>              }|
>>
>>     And I'm getting FileNotFoundException.
>>
>>     Can anyone please help me and explain to me what is the problem
>>     and how to do this properly?
>>
>>     Thanks
>>
>>     Sent with inky <http://inky.com?kme=signature>
>>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,

Thanks, I didn't understand that. Now I get it.

Best regards,
Marko

On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I have used getCacheFiles() instead of getLocalCacheFiles() and
>     now it works.
>
>     Can someone please explain the difference between the two? I'm not
>     able to find some good explanation about it to understand how it
>     works.
>
>     Thanks,
>     Marko
>
>
>     On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
>     <ma...@nissatech.com> wrote:
>>
>>     Hello,
>>
>>     I'm new to Hadoop and I'm having a problem reading from a
>>     sequence file that I add to distributed cache.
>>
>>     I didn't have problems when I ran it in standalone mode, but now
>>     in pseudo-distributed and distributed I do.
>>
>>     I'm adding file to distributed cache like this
>>
>>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>>     And reading from it in mapper's setup method
>>
>>     |         Configuration conf = context.getConfiguration();
>>              FileSystem fs = FileSystem.get(conf);
>>
>>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>>              List<Element> sketch = new ArrayList<Element>();
>>
>>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>>              while(medoidsReader.next(medoidKey, medoidValue)){
>>
>>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>>                  sketch.add(medoidWritable.getElement());
>>              }|
>>
>>     And I'm getting FileNotFoundException.
>>
>>     Can anyone please help me and explain to me what is the problem
>>     and how to do this properly?
>>
>>     Thanks
>>
>>     Sent with inky <http://inky.com?kme=signature>
>>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,

Thanks, I didn't understand that. Now I get it.

Best regards,
Marko

On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I have used getCacheFiles() instead of getLocalCacheFiles() and
>     now it works.
>
>     Can someone please explain the difference between the two? I'm not
>     able to find some good explanation about it to understand how it
>     works.
>
>     Thanks,
>     Marko
>
>
>     On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
>     <ma...@nissatech.com> wrote:
>>
>>     Hello,
>>
>>     I'm new to Hadoop and I'm having a problem reading from a
>>     sequence file that I add to distributed cache.
>>
>>     I didn't have problems when I ran it in standalone mode, but now
>>     in pseudo-distributed and distributed I do.
>>
>>     I'm adding file to distributed cache like this
>>
>>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>>     And reading from it in mapper's setup method
>>
>>     |         Configuration conf = context.getConfiguration();
>>              FileSystem fs = FileSystem.get(conf);
>>
>>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>>              List<Element> sketch = new ArrayList<Element>();
>>
>>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>>              while(medoidsReader.next(medoidKey, medoidValue)){
>>
>>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>>                  sketch.add(medoidWritable.getElement());
>>              }|
>>
>>     And I'm getting FileNotFoundException.
>>
>>     Can anyone please help me and explain to me what is the problem
>>     and how to do this properly?
>>
>>     Thanks
>>
>>     Sent with inky <http://inky.com?kme=signature>
>>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,

Thanks, I didn't understand that. Now I get it.

Best regards,
Marko

On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
>     Hello,
>
>     I have used getCacheFiles() instead of getLocalCacheFiles() and
>     now it works.
>
>     Can someone please explain the difference between the two? I'm not
>     able to find some good explanation about it to understand how it
>     works.
>
>     Thanks,
>     Marko
>
>
>     On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
>     <ma...@nissatech.com> wrote:
>>
>>     Hello,
>>
>>     I'm new to Hadoop and I'm having a problem reading from a
>>     sequence file that I add to distributed cache.
>>
>>     I didn't have problems when I ran it in standalone mode, but now
>>     in pseudo-distributed and distributed I do.
>>
>>     I'm adding file to distributed cache like this
>>
>>     |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>>     And reading from it in mapper's setup method
>>
>>     |         Configuration conf = context.getConfiguration();
>>              FileSystem fs = FileSystem.get(conf);
>>
>>              Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>>              List<Element> sketch = new ArrayList<Element>();
>>
>>              SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>>              Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>>              Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>>              while(medoidsReader.next(medoidKey, medoidValue)){
>>
>>                  ElementWritable medoidWritable = (ElementWritable)medoidValue;
>>                  sketch.add(medoidWritable.getElement());
>>              }|
>>
>>     And I'm getting FileNotFoundException.
>>
>>     Can anyone please help me and explain to me what is the problem
>>     and how to do this properly?
>>
>>     Thanks
>>
>>     Sent with inky <http://inky.com?kme=signature>
>>
>
>

Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.

Use of getCacheFiles is encouraged now which downloads using a URI.

Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same

Regards,
Shahab

On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:

>  Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
>   Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>
>

Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.

Use of getCacheFiles is encouraged now which downloads using a URI.

Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same

Regards,
Shahab

On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:

>  Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
>   Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>
>

Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.

Use of getCacheFiles is encouraged now which downloads using a URI.

Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same

Regards,
Shahab

On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:

>  Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
>   Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>
>

Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.

Use of getCacheFiles is encouraged now which downloads using a URI.

Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same

Regards,
Shahab

On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:

>  Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
>   Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello,

I have used getCacheFiles() instead of getLocalCacheFiles() and now it 
works.

Can someone please explain the difference between the two? I'm not able 
to find some good explanation about it to understand how it works.

Thanks,
Marko

On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence 
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in 
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> |         Configuration conf = context.getConfiguration();
>          FileSystem fs = FileSystem.get(conf);
>
>          Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>          List<Element> sketch = new ArrayList<Element>();
>
>          SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>          Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>          Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>          while(medoidsReader.next(medoidKey, medoidValue)){
>
>              ElementWritable medoidWritable = (ElementWritable)medoidValue;
>              sketch.add(medoidWritable.getElement());
>          }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and 
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>


Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?

Have you seen this?

Regards,
Shahab

On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:

>  Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?

Have you seen this?

Regards,
Shahab

On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:

>  Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
>         Configuration conf = context.getConfiguration();
>         FileSystem fs = FileSystem.get(conf);
>
>         Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>         List<Element> sketch = new ArrayList<Element>();
>
>         SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>         Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>         Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>         while(medoidsReader.next(medoidKey, medoidValue)){
>
>             ElementWritable medoidWritable = (ElementWritable)medoidValue;
>             sketch.add(medoidWritable.getElement());
>         }
>
>  And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
>   Sent with inky <http://inky.com?kme=signature>
>
>

Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello,

I have used getCacheFiles() instead of getLocalCacheFiles() and now it 
works.

Can someone please explain the difference between the two? I'm not able 
to find some good explanation about it to understand how it works.

Thanks,
Marko

On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence 
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in 
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> |         Configuration conf = context.getConfiguration();
>          FileSystem fs = FileSystem.get(conf);
>
>          Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>          List<Element> sketch = new ArrayList<Element>();
>
>          SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>          Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>          Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>          while(medoidsReader.next(medoidKey, medoidValue)){
>
>              ElementWritable medoidWritable = (ElementWritable)medoidValue;
>              sketch.add(medoidWritable.getElement());
>          }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and 
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>


Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello,

I have used getCacheFiles() instead of getLocalCacheFiles() and now it 
works.

Can someone please explain the difference between the two? I'm not able 
to find some good explanation about it to understand how it works.

Thanks,
Marko

On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence 
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in 
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> |         Configuration conf = context.getConfiguration();
>          FileSystem fs = FileSystem.get(conf);
>
>          Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>          List<Element> sketch = new ArrayList<Element>();
>
>          SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>          Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>          Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>          while(medoidsReader.next(medoidKey, medoidValue)){
>
>              ElementWritable medoidWritable = (ElementWritable)medoidValue;
>              sketch.add(medoidWritable.getElement());
>          }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and 
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>


Re: Reading a sequence file from distributed cache

Posted by Marko Dinic <ma...@nissatech.com>.
Hello,

I have used getCacheFiles() instead of getLocalCacheFiles() and now it 
works.

Can someone please explain the difference between the two? I'm not able 
to find some good explanation about it to understand how it works.

Thanks,
Marko

On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence 
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in 
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> |         Configuration conf = context.getConfiguration();
>          FileSystem fs = FileSystem.get(conf);
>
>          Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
>          List<Element> sketch = new ArrayList<Element>();
>
>          SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
>          Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>          Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
>          while(medoidsReader.next(medoidKey, medoidValue)){
>
>              ElementWritable medoidWritable = (ElementWritable)medoidValue;
>              sketch.add(medoidWritable.getElement());
>          }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and 
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>