You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by ma...@nissatech.com on 2015/05/11 23:25:37 UTC
Reading a sequence file from distributed cache
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence file that I
add to distributed cache.
I didn't have problems when I ran it in standalone mode, but now in pseudo-
distributed and distributed I do.
I'm adding file to distributed cache like this
And reading from it in mapper's setup method
And I'm getting FileNotFoundException.
Can anyone please help me and explain to me what is the problem and how to do
this properly?
Thanks
Sent with [inky](http://inky.com?kme=signature)
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,
I'm using 1.2.1 in pseudo-distributed mode and the same code on a
cluster with 0.20.2, but I'm having same problem in both cases. I'm
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?
Do you have any idea what could be the problem? And what do you mean by
- Have you seen this?
Maybe I'm making some mistake by using context passed to Mapper to read
the file?
Configuration conf = context.getConfiguration();
Best regards,
Marko
On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now
> in pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem
> and how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,
I'm using 1.2.1 in pseudo-distributed mode and the same code on a
cluster with 0.20.2, but I'm having same problem in both cases. I'm
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?
Do you have any idea what could be the problem? And what do you mean by
- Have you seen this?
Maybe I'm making some mistake by using context passed to Mapper to read
the file?
Configuration conf = context.getConfiguration();
Best regards,
Marko
On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now
> in pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem
> and how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,
I'm using 1.2.1 in pseudo-distributed mode and the same code on a
cluster with 0.20.2, but I'm having same problem in both cases. I'm
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?
Do you have any idea what could be the problem? And what do you mean by
- Have you seen this?
Maybe I'm making some mistake by using context passed to Mapper to read
the file?
Configuration conf = context.getConfiguration();
Best regards,
Marko
On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now
> in pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem
> and how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello Shahab,
I'm using 1.2.1 in pseudo-distributed mode and the same code on a
cluster with 0.20.2, but I'm having same problem in both cases. I'm
hopping that 1.2.1 code is back-compatible with 0.20.2 cluster?
Do you have any idea what could be the problem? And what do you mean by
- Have you seen this?
Maybe I'm making some mistake by using context passed to Mapper to read
the file?
Configuration conf = context.getConfiguration();
Best regards,
Marko
On Tue 12 May 2015 12:09:52 AM CEST, Shahab Yunus wrote:
> What version are you using?
>
> Have you seen this?
>
> Regards,
> Shahab
>
> On Mon, May 11, 2015 at 5:25 PM, <marko.dinic@nissatech.com
> <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now
> in pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem
> and how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?
Have you seen this?
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?
Have you seen this?
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,
Thanks, I didn't understand that. Now I get it.
Best regards,
Marko
On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and
> now it works.
>
> Can someone please explain the difference between the two? I'm not
> able to find some good explanation about it to understand how it
> works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
> <ma...@nissatech.com> wrote:
>>
>> Hello,
>>
>> I'm new to Hadoop and I'm having a problem reading from a
>> sequence file that I add to distributed cache.
>>
>> I didn't have problems when I ran it in standalone mode, but now
>> in pseudo-distributed and distributed I do.
>>
>> I'm adding file to distributed cache like this
>>
>> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>> And reading from it in mapper's setup method
>>
>> | Configuration conf = context.getConfiguration();
>> FileSystem fs = FileSystem.get(conf);
>>
>> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>> List<Element> sketch = new ArrayList<Element>();
>>
>> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>> while(medoidsReader.next(medoidKey, medoidValue)){
>>
>> ElementWritable medoidWritable = (ElementWritable)medoidValue;
>> sketch.add(medoidWritable.getElement());
>> }|
>>
>> And I'm getting FileNotFoundException.
>>
>> Can anyone please help me and explain to me what is the problem
>> and how to do this properly?
>>
>> Thanks
>>
>> Sent with inky <http://inky.com?kme=signature>
>>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,
Thanks, I didn't understand that. Now I get it.
Best regards,
Marko
On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and
> now it works.
>
> Can someone please explain the difference between the two? I'm not
> able to find some good explanation about it to understand how it
> works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
> <ma...@nissatech.com> wrote:
>>
>> Hello,
>>
>> I'm new to Hadoop and I'm having a problem reading from a
>> sequence file that I add to distributed cache.
>>
>> I didn't have problems when I ran it in standalone mode, but now
>> in pseudo-distributed and distributed I do.
>>
>> I'm adding file to distributed cache like this
>>
>> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>> And reading from it in mapper's setup method
>>
>> | Configuration conf = context.getConfiguration();
>> FileSystem fs = FileSystem.get(conf);
>>
>> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>> List<Element> sketch = new ArrayList<Element>();
>>
>> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>> while(medoidsReader.next(medoidKey, medoidValue)){
>>
>> ElementWritable medoidWritable = (ElementWritable)medoidValue;
>> sketch.add(medoidWritable.getElement());
>> }|
>>
>> And I'm getting FileNotFoundException.
>>
>> Can anyone please help me and explain to me what is the problem
>> and how to do this properly?
>>
>> Thanks
>>
>> Sent with inky <http://inky.com?kme=signature>
>>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,
Thanks, I didn't understand that. Now I get it.
Best regards,
Marko
On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and
> now it works.
>
> Can someone please explain the difference between the two? I'm not
> able to find some good explanation about it to understand how it
> works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
> <ma...@nissatech.com> wrote:
>>
>> Hello,
>>
>> I'm new to Hadoop and I'm having a problem reading from a
>> sequence file that I add to distributed cache.
>>
>> I didn't have problems when I ran it in standalone mode, but now
>> in pseudo-distributed and distributed I do.
>>
>> I'm adding file to distributed cache like this
>>
>> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>> And reading from it in mapper's setup method
>>
>> | Configuration conf = context.getConfiguration();
>> FileSystem fs = FileSystem.get(conf);
>>
>> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>> List<Element> sketch = new ArrayList<Element>();
>>
>> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>> while(medoidsReader.next(medoidKey, medoidValue)){
>>
>> ElementWritable medoidWritable = (ElementWritable)medoidValue;
>> sketch.add(medoidWritable.getElement());
>> }|
>>
>> And I'm getting FileNotFoundException.
>>
>> Can anyone please help me and explain to me what is the problem
>> and how to do this properly?
>>
>> Thanks
>>
>> Sent with inky <http://inky.com?kme=signature>
>>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Dear Shahab,
Thanks, I didn't understand that. Now I get it.
Best regards,
Marko
On Tue 12 May 2015 01:38:52 PM CEST, Shahab Yunus wrote:
> getLocalCacheFiles is deprecated and can only access files that were
> downloaded locally to the node running the task.
>
> Use of getCacheFiles is encouraged now which downloads using a URI.
>
> Have you seen this?
> http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
>
> Regards,
> Shahab
>
> On Tue, May 12, 2015 at 6:58 AM, Marko Dinic
> <marko.dinic@nissatech.com <ma...@nissatech.com>> wrote:
>
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and
> now it works.
>
> Can someone please explain the difference between the two? I'm not
> able to find some good explanation about it to understand how it
> works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com
> <ma...@nissatech.com> wrote:
>>
>> Hello,
>>
>> I'm new to Hadoop and I'm having a problem reading from a
>> sequence file that I add to distributed cache.
>>
>> I didn't have problems when I ran it in standalone mode, but now
>> in pseudo-distributed and distributed I do.
>>
>> I'm adding file to distributed cache like this
>>
>> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>>
>> And reading from it in mapper's setup method
>>
>> | Configuration conf = context.getConfiguration();
>> FileSystem fs = FileSystem.get(conf);
>>
>> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>>
>> List<Element> sketch = new ArrayList<Element>();
>>
>> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>>
>> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
>> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>>
>> while(medoidsReader.next(medoidKey, medoidValue)){
>>
>> ElementWritable medoidWritable = (ElementWritable)medoidValue;
>> sketch.add(medoidWritable.getElement());
>> }|
>>
>> And I'm getting FileNotFoundException.
>>
>> Can anyone please help me and explain to me what is the problem
>> and how to do this properly?
>>
>> Thanks
>>
>> Sent with inky <http://inky.com?kme=signature>
>>
>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.
Use of getCacheFiles is encouraged now which downloads using a URI.
Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
Regards,
Shahab
On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.
Use of getCacheFiles is encouraged now which downloads using a URI.
Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
Regards,
Shahab
On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.
Use of getCacheFiles is encouraged now which downloads using a URI.
Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
Regards,
Shahab
On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.
Use of getCacheFiles is encouraged now which downloads using a URI.
Have you seen this?
http://stackoverflow.com/questions/26492964/are-getcachefiles-and-getlocalcachefiles-the-same
Regards,
Shahab
On Tue, May 12, 2015 at 6:58 AM, Marko Dinic <ma...@nissatech.com>
wrote:
> Hello,
>
> I have used getCacheFiles() instead of getLocalCacheFiles() and now it
> works.
>
> Can someone please explain the difference between the two? I'm not able to
> find some good explanation about it to understand how it works.
>
> Thanks,
> Marko
>
>
> On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello,
I have used getCacheFiles() instead of getLocalCacheFiles() and now it
works.
Can someone please explain the difference between the two? I'm not able
to find some good explanation about it to understand how it works.
Thanks,
Marko
On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?
Have you seen this?
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Shahab Yunus <sh...@gmail.com>.
What version are you using?
Have you seen this?
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, <ma...@nissatech.com> wrote:
> Hello,
>
>
>
> I'm new to Hadoop and I'm having a problem reading from a sequence file
> that I add to distributed cache.
>
>
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
>
>
> I'm adding file to distributed cache like this
>
> DistributedCache.addCacheFile(new URI(currentMedoids), conf);
>
>
>
> And reading from it in mapper's setup method
>
> Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }
>
> And I'm getting FileNotFoundException.
>
>
>
> Can anyone please help me and explain to me what is the problem and how to
> do this properly?
>
>
>
> Thanks
>
>
> Sent with inky <http://inky.com?kme=signature>
>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello,
I have used getCacheFiles() instead of getLocalCacheFiles() and now it
works.
Can someone please explain the difference between the two? I'm not able
to find some good explanation about it to understand how it works.
Thanks,
Marko
On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello,
I have used getCacheFiles() instead of getLocalCacheFiles() and now it
works.
Can someone please explain the difference between the two? I'm not able
to find some good explanation about it to understand how it works.
Thanks,
Marko
On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>
Re: Reading a sequence file from distributed cache
Posted by Marko Dinic <ma...@nissatech.com>.
Hello,
I have used getCacheFiles() instead of getLocalCacheFiles() and now it
works.
Can someone please explain the difference between the two? I'm not able
to find some good explanation about it to understand how it works.
Thanks,
Marko
On 05/11/2015 11:25 PM, marko.dinic@nissatech.com wrote:
>
> Hello,
>
> I'm new to Hadoop and I'm having a problem reading from a sequence
> file that I add to distributed cache.
>
> I didn't have problems when I ran it in standalone mode, but now in
> pseudo-distributed and distributed I do.
>
> I'm adding file to distributed cache like this
>
> |DistributedCache.addCacheFile(new URI(currentMedoids), conf);|
>
> And reading from it in mapper's setup method
>
> | Configuration conf = context.getConfiguration();
> FileSystem fs = FileSystem.get(conf);
>
> Path[] paths = DistributedCache.getLocalCacheFiles(conf);
>
> List<Element> sketch = new ArrayList<Element>();
>
> SequenceFile.Reader medoidsReader = new SequenceFile.Reader(fs, paths[0], conf);
>
> Writable medoidKey = (Writable) medoidsReader.getKeyClass().newInstance();
> Writable medoidValue = (Writable) medoidsReader.getValueClass().newInstance();
>
> while(medoidsReader.next(medoidKey, medoidValue)){
>
> ElementWritable medoidWritable = (ElementWritable)medoidValue;
> sketch.add(medoidWritable.getElement());
> }|
>
> And I'm getting FileNotFoundException.
>
> Can anyone please help me and explain to me what is the problem and
> how to do this properly?
>
> Thanks
>
> Sent with inky <http://inky.com?kme=signature>
>