You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Darius Miliauskas <da...@gmail.com> on 2013/09/12 15:14:12 UTC

Using SparseVectorsFromSequenceFiles () in Java

Dear All,

I am trying to use SparseVectorsFromSequenceFiles () through Java code
(NetBeans 7&Windows 7) . here is my code (API):

//inputPath is the path of my SequenceFile
Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");

//outputPath where I expect some results
Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");

SparseVectorsFromSequenceFiles svfsf = new SparseVectorsFromSequenceFiles
();
svfsf.run(new String []{inputPath.toString(), outputPath.toString()
 });

Build is successful. However, at the end I got just the empty file what was
expected to be my output. Do you have any idea why the output file is
empty, and what I should change in the code to get the results?


Ciao,

Darius

Re: Using SparseVectorsFromSequenceFiles () in Java

Posted by Ken Krugler <kk...@transpac.com>.
Hi Darius,

On Sep 18, 2013, at 1:10am, Gokhan Capan wrote:

> It seems you hit a "Hadoop on Windows" issue, it might have something to do
> with how Hadoop sets file permissions.

From my experience, only the (old) 0.20.2 version of Hadoop works well with Cygwin, otherwise you run into file permissions issues like the one you mentioned.

If you want to give that version a try, and can't find a download, see http://scaleunlimited.com/downloads/3nn2pq/hadoop-0.20.2.tgz

-- Ken


> On Tue, Sep 17, 2013 at 3:02 PM, Darius Miliauskas <
> dariui.miliauskui@gmail.com> wrote:
> 
>> That's like a charm, Gokhan, your suggestion was on point again. However...
>> Despite the fact that the build is successful, the file is still empty,
>> and I got the exception as always on Windows:
>> 
>> java.io.IOException: Failed to set permissions of path:
>> \tmp\hadoop-DARIUS\mapred\staging\DARIUS331150778\.staging to 0777
>> at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
>> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:670)
>> at
>> 
>> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
>> at
>> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
>> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
>> at
>> 
>> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:415)
>> at
>> 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
>> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
>> at
>> 
>> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
>> at
>> 
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
>> at
>> 
>> org.apache.mahout.mahoutnewsrecommender2.Recommender.myRecommender(Recommender.java:99)
>> at org.apache.mahout.mahoutnewsrecommender2.App.main(App.java:26)
>> 
>> BUILD SUCCESSFUL (total time: 3 seconds)
>> 
>> 
>> Thanks,
>> 
>> Darius
>> 
>> 
>> 
>> 
>> 2013/9/12 Gokhan Capan <gk...@gmail.com>
>> 
>>> Although Windows is not officially supported, your
>>> svsf.run(new String[]{inputPath.toString(), outputPath.toString()})
>>> should be
>>> svsf.run(new String[]{"-i",inputPath.toString(), "-o",
>>> outputPath.toString()}) anyway.
>>> 
>>> Best
>>> 
>>> 
>>> Gokhan
>>> 
>>> 
>>> On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas <
>>> dariui.miliauskui@gmail.com> wrote:
>>> 
>>>> Dear All,
>>>> 
>>>> I am trying to use SparseVectorsFromSequenceFiles () through Java code
>>>> (NetBeans 7&Windows 7) . here is my code (API):
>>>> 
>>>> //inputPath is the path of my SequenceFile
>>>> Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");
>>>> 
>>>> //outputPath where I expect some results
>>>> Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");
>>>> 
>>>> SparseVectorsFromSequenceFiles svfsf = new
>> SparseVectorsFromSequenceFiles
>>>> ();
>>>> svfsf.run(new String []{inputPath.toString(), outputPath.toString()
>>>> });
>>>> 
>>>> Build is successful. However, at the end I got just the empty file what
>>> was
>>>> expected to be my output. Do you have any idea why the output file is
>>>> empty, and what I should change in the code to get the results?
>>>> 
>>>> 
>>>> Ciao,
>>>> 
>>>> Darius
>>>> 
>>> 
>> 

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr






Re: Using SparseVectorsFromSequenceFiles () in Java

Posted by Darius Miliauskas <da...@gmail.com>.
Hi again, Gokham,

yeah, I stucked with this Windows issues since all my attempts to write the
code face it (even tried to change libraries but it did not resolve).

Darius


2013/9/18 Gokhan Capan <gk...@gmail.com>

> Darius,
>
> It seems you hit a "Hadoop on Windows" issue, it might have something to do
> with how Hadoop sets file permissions.
>
>
> Gokhan
>
>
> On Tue, Sep 17, 2013 at 3:02 PM, Darius Miliauskas <
> dariui.miliauskui@gmail.com> wrote:
>
> > That's like a charm, Gokhan, your suggestion was on point again.
> However...
> >  Despite the fact that the build is successful, the file is still empty,
> > and I got the exception as always on Windows:
> >
> > java.io.IOException: Failed to set permissions of path:
> > \tmp\hadoop-DARIUS\mapred\staging\DARIUS331150778\.staging to 0777
> > at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
> > at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:670)
> > at
> >
> >
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
> > at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
> > at
> org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
> > at
> >
> >
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
> > at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:415)
> > at
> >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> > at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> > at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
> > at
> >
> >
> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
> > at
> >
> >
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
> > at
> >
> >
> org.apache.mahout.mahoutnewsrecommender2.Recommender.myRecommender(Recommender.java:99)
> > at org.apache.mahout.mahoutnewsrecommender2.App.main(App.java:26)
> >
> > BUILD SUCCESSFUL (total time: 3 seconds)
> >
> >
> > Thanks,
> >
> > Darius
> >
> >
> >
> >
> > 2013/9/12 Gokhan Capan <gk...@gmail.com>
> >
> > > Although Windows is not officially supported, your
> > > svsf.run(new String[]{inputPath.toString(), outputPath.toString()})
> > > should be
> > > svsf.run(new String[]{"-i",inputPath.toString(), "-o",
> > > outputPath.toString()}) anyway.
> > >
> > > Best
> > >
> > >
> > > Gokhan
> > >
> > >
> > > On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas <
> > > dariui.miliauskui@gmail.com> wrote:
> > >
> > > > Dear All,
> > > >
> > > > I am trying to use SparseVectorsFromSequenceFiles () through Java
> code
> > > > (NetBeans 7&Windows 7) . here is my code (API):
> > > >
> > > > //inputPath is the path of my SequenceFile
> > > > Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");
> > > >
> > > > //outputPath where I expect some results
> > > > Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");
> > > >
> > > > SparseVectorsFromSequenceFiles svfsf = new
> > SparseVectorsFromSequenceFiles
> > > > ();
> > > > svfsf.run(new String []{inputPath.toString(), outputPath.toString()
> > > >  });
> > > >
> > > > Build is successful. However, at the end I got just the empty file
> what
> > > was
> > > > expected to be my output. Do you have any idea why the output file is
> > > > empty, and what I should change in the code to get the results?
> > > >
> > > >
> > > > Ciao,
> > > >
> > > > Darius
> > > >
> > >
> >
>

Re: Using SparseVectorsFromSequenceFiles () in Java

Posted by Gokhan Capan <gk...@gmail.com>.
Darius,

It seems you hit a "Hadoop on Windows" issue, it might have something to do
with how Hadoop sets file permissions.


Gokhan


On Tue, Sep 17, 2013 at 3:02 PM, Darius Miliauskas <
dariui.miliauskui@gmail.com> wrote:

> That's like a charm, Gokhan, your suggestion was on point again. However...
>  Despite the fact that the build is successful, the file is still empty,
> and I got the exception as always on Windows:
>
> java.io.IOException: Failed to set permissions of path:
> \tmp\hadoop-DARIUS\mapred\staging\DARIUS331150778\.staging to 0777
> at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
> at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:670)
> at
>
> org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
> at
> org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
> at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
> at
>
> org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
> at
>
> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
> at
>
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
> at
>
> org.apache.mahout.mahoutnewsrecommender2.Recommender.myRecommender(Recommender.java:99)
> at org.apache.mahout.mahoutnewsrecommender2.App.main(App.java:26)
>
> BUILD SUCCESSFUL (total time: 3 seconds)
>
>
> Thanks,
>
> Darius
>
>
>
>
> 2013/9/12 Gokhan Capan <gk...@gmail.com>
>
> > Although Windows is not officially supported, your
> > svsf.run(new String[]{inputPath.toString(), outputPath.toString()})
> > should be
> > svsf.run(new String[]{"-i",inputPath.toString(), "-o",
> > outputPath.toString()}) anyway.
> >
> > Best
> >
> >
> > Gokhan
> >
> >
> > On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas <
> > dariui.miliauskui@gmail.com> wrote:
> >
> > > Dear All,
> > >
> > > I am trying to use SparseVectorsFromSequenceFiles () through Java code
> > > (NetBeans 7&Windows 7) . here is my code (API):
> > >
> > > //inputPath is the path of my SequenceFile
> > > Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");
> > >
> > > //outputPath where I expect some results
> > > Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");
> > >
> > > SparseVectorsFromSequenceFiles svfsf = new
> SparseVectorsFromSequenceFiles
> > > ();
> > > svfsf.run(new String []{inputPath.toString(), outputPath.toString()
> > >  });
> > >
> > > Build is successful. However, at the end I got just the empty file what
> > was
> > > expected to be my output. Do you have any idea why the output file is
> > > empty, and what I should change in the code to get the results?
> > >
> > >
> > > Ciao,
> > >
> > > Darius
> > >
> >
>

Re: Using SparseVectorsFromSequenceFiles () in Java

Posted by Darius Miliauskas <da...@gmail.com>.
That's like a charm, Gokhan, your suggestion was on point again. However...
 Despite the fact that the build is successful, the file is still empty,
and I got the exception as always on Windows:

java.io.IOException: Failed to set permissions of path:
\tmp\hadoop-DARIUS\mapred\staging\DARIUS331150778\.staging to 0777
at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:689)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:670)
at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514)
at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189)
at
org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:116)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:918)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at
org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
at
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
at
org.apache.mahout.mahoutnewsrecommender2.Recommender.myRecommender(Recommender.java:99)
at org.apache.mahout.mahoutnewsrecommender2.App.main(App.java:26)

BUILD SUCCESSFUL (total time: 3 seconds)


Thanks,

Darius




2013/9/12 Gokhan Capan <gk...@gmail.com>

> Although Windows is not officially supported, your
> svsf.run(new String[]{inputPath.toString(), outputPath.toString()})
> should be
> svsf.run(new String[]{"-i",inputPath.toString(), "-o",
> outputPath.toString()}) anyway.
>
> Best
>
>
> Gokhan
>
>
> On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas <
> dariui.miliauskui@gmail.com> wrote:
>
> > Dear All,
> >
> > I am trying to use SparseVectorsFromSequenceFiles () through Java code
> > (NetBeans 7&Windows 7) . here is my code (API):
> >
> > //inputPath is the path of my SequenceFile
> > Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");
> >
> > //outputPath where I expect some results
> > Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");
> >
> > SparseVectorsFromSequenceFiles svfsf = new SparseVectorsFromSequenceFiles
> > ();
> > svfsf.run(new String []{inputPath.toString(), outputPath.toString()
> >  });
> >
> > Build is successful. However, at the end I got just the empty file what
> was
> > expected to be my output. Do you have any idea why the output file is
> > empty, and what I should change in the code to get the results?
> >
> >
> > Ciao,
> >
> > Darius
> >
>

Re: Using SparseVectorsFromSequenceFiles () in Java

Posted by Gokhan Capan <gk...@gmail.com>.
Although Windows is not officially supported, your
svsf.run(new String[]{inputPath.toString(), outputPath.toString()})
should be
svsf.run(new String[]{"-i",inputPath.toString(), "-o",
outputPath.toString()}) anyway.

Best


Gokhan


On Thu, Sep 12, 2013 at 4:14 PM, Darius Miliauskas <
dariui.miliauskui@gmail.com> wrote:

> Dear All,
>
> I am trying to use SparseVectorsFromSequenceFiles () through Java code
> (NetBeans 7&Windows 7) . here is my code (API):
>
> //inputPath is the path of my SequenceFile
> Path inputPath = new Path(""C:\\Users\\DARIUS\\forTest1.txt");
>
> //outputPath where I expect some results
> Path outputPath = new Path("C:\\Users\\DARIUS\\forTest2.txt");
>
> SparseVectorsFromSequenceFiles svfsf = new SparseVectorsFromSequenceFiles
> ();
> svfsf.run(new String []{inputPath.toString(), outputPath.toString()
>  });
>
> Build is successful. However, at the end I got just the empty file what was
> expected to be my output. Do you have any idea why the output file is
> empty, and what I should change in the code to get the results?
>
>
> Ciao,
>
> Darius
>