You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Stefano Bellasio <st...@gmail.com> on 2010/12/04 19:04:44 UTC

How to edit dataset for SVD recommendations with DistributedLanczos?

Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your

Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Stefano Bellasio <st...@gmail.com>.
No one can help me? I can open a news thread with my questions, but first of all i want to do a resume of what i have to do:

1) My goal is to obtain recommendations from Grouplens data set. 
2) I started a series of tests with Mahout and different recommenders as slopeOne, user and item based
3) The second step is trying to use Hadoop with Mahout, all good with RecommenderJob with ItemBased in pseudo and distributed mode

4) The last step, i want to use SVD with my grouplens data set, but here i'm completely lost cause i need some hints to start. I need to "transform" my data set in a matrix i think and then i need to use DistributedLanczosSolver. All these things seems simple, but at least they are not, so i'm asking if someone can give me some example or explanations :) 

Thank you, i hope that someone will answer to my questions :) Stefano

Il giorno 06/dic/2010, alle ore 18.34, Derek O'Callaghan ha scritto:

> Yeah, that should work. You can pass in a different array to getSampleData() instead of DOCS, or change getSampleData() if you want to (i.e. changing the current "for (int i = 0; i < docs2.length; ..." loop body). I think that should be all you need...
> 
> 
> On 06/12/10 17:20, Stefano Bellasio wrote:
>> Thanks :) Found it! Well i think that the part useful for me is this one:
>> 
>>  private List<VectorWritable>  sampleData;
>> 
>>   private String[] termDictionary;
>> 
>>   @Override
>>   @Before
>>   public void setUp() throws Exception {
>>     super.setUp();
>>     Configuration conf = new Configuration();
>>     FileSystem fs = FileSystem.get(conf);
>>     // Create test data
>>     getSampleData(DOCS);
>>     ClusteringTestUtils.writePointsToFile(sampleData, true, getTestTempFilePath("testdata/file1"), fs, conf);
>>   }
>> 
>>   private void getSampleData(String[] docs2) throws IOException {
>>     sampleData = new ArrayList<VectorWritable>();
>>     RAMDirectory directory = new RAMDirectory();
>>     IndexWriter writer = new IndexWriter(directory,
>>                                          new StandardAnalyzer(Version.LUCENE_30),
>>                                          true,
>>                                          IndexWriter.MaxFieldLength.UNLIMITED);
>>     for (int i = 0; i<  docs2.length; i++) {
>>       Document doc = new Document();
>>       Fieldable id = new Field("id", "doc_" + i, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
>>       doc.add(id);
>>       // Store both position and offset information
>>       Fieldable text = new Field("content", docs2[i], Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.YES);
>>       doc.add(text);
>>       writer.addDocument(doc);
>>     }
>>     writer.close();
>>     IndexReader reader = IndexReader.open(directory, true);
>>     Weight weight = new TFIDF();
>>     TermInfo termInfo = new CachedTermInfo(reader, "content", 1, 100);
>> 
>>     int numTerms = 0;
>>     for (Iterator<TermEntry>  it = termInfo.getAllEntries(); it.hasNext();) {
>>       it.next();
>>       numTerms++;
>>     }
>>     termDictionary = new String[numTerms];
>>     int i = 0;
>>     for (Iterator<TermEntry>  it = termInfo.getAllEntries(); it.hasNext();) {
>>       String term = it.next().term;
>>       termDictionary[i] = term;
>>       System.out.println(i + " " + term);
>>       i++;
>>     }
>>     VectorMapper mapper = new TFDFMapper(reader, weight, termInfo);
>>     Iterable<Vector>  iterable = new LuceneIterable(reader, "id", "content", mapper);
>> 
>>     i = 0;
>>     for (Vector vector : iterable) {
>>       assertNotNull(vector);
>>       NamedVector namedVector;
>>       if (vector instanceof NamedVector) {
>>         //rename it for testing purposes
>>         namedVector = new NamedVector(((NamedVector) vector).getDelegate(), "P(" + i + ')');
>> 
>>       } else {
>>         namedVector = new NamedVector(vector, "P(" + i + ')');
>>       }
>>       System.out.println(AbstractCluster.formatVector(namedVector, termDictionary));
>>       sampleData.add(new VectorWritable(namedVector));
>>       i++;
>>     }
>>   }
>> 
>> 
>> Can i pass to sampledata my dataset and then using something like public void testKmeansSVD() ...am i right? Thanks
>> Il giorno 06/dic/2010, alle ore 18.04, Derek O'Callaghan ha scritto:
>> 
>>   
>>> Hi Stefano,
>>> 
>>> The class can be found in mahout-utils/src/test/java.
>>> 
>>> Derek
>>> 
>>> On 06/12/10 16:54, Stefano Bellasio wrote:
>>>     
>>>> Hi Derek, thanks! I'm looking in my mahout files, and i can't find this class under org.apache.mahout.clustering.TestClusterDumper, is there or another package?
>>>> Il giorno 06/dic/2010, alle ore 14.21, Derek O'Callaghan ha scritto:
>>>> 
>>>> 
>>>>       
>>>>> Hi Stefano,
>>>>> 
>>>>> TestClusterDumper has a few test methods which perform SVD with clustering, e.g. testKmeansSVD(). These methods demonstrate the creation of a matrix for use with SVD, so I think they might help to give you an overview of what's required.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Derek
>>>>> 
>>>>> On 04/12/10 18:04, Stefano Bellasio wrote:
>>>>> 
>>>>>         
>>>>>> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your
>>>>>> 
>>>>>> 
>>>>>>           
>>>> 
>>>>       
>> 
>>   


Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Stefano,

I did not get around to using SVD recommender yet, so I am not sure if i can
be of help, but I think general idea here is that you need to vectorize your
data in a form of a DistributedMatrix (which you don't have to use directly
actually), which is afaik is one or more hadoop Sequence files with keys
being IntWritable and values being VectorWritable. VectorWritable actually
can also pack at least 3 different type of vectors, dense, sparse sequential
and sparse random. I think sparse sequential is a good way to start off. I
think there are various utilities in Mahout to help you out with this but I
haven't used them. I am just saying that data prep seems to be pretty
simple.

What's not terribly straightforward is to start "folding in " new data (e.g.
if you have a completely new item coming in and you want to use your
existing decompostition to make a prediction for it without recomputing the
whole SVD) -- i haven't used Mahout to do that specifically so i am not sure
what help is available to do that kind of thing .

But at least preformatting input for SVD job seems pretty straightforward to
me even without any special prep tools.

Thanks.
-d

On Sun, Dec 19, 2010 at 10:02 AM, Stefano Bellasio <
stefanobellasio@gmail.com> wrote:

> No one can help me getting out of this problem? Please, it's quite
> important :) Thank you, Stefano

Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by tdpessem <to...@ugent.be>.
I also use the SVDRecommender. This is my piece of code. Maybe it can help
you. If you see problems, please reply.

            DataModel model = new FileDataModel(new File("ratings.csv"));
//rating file : userid,itemid,rating
            
            int numFeatures = 5;
            int initialSteps = 1;
            Recommender recommender = new SVDRecommender(model, numFeatures
, initialSteps);
            Recommender cachingRecommender = new
CachingRecommender(recommender);

            //Now we can get 10 recommendations for user ID "6" — done!
            List<RecommendedItem> recommendations =
cachingRecommender.recommend(6, 10);
-- 
View this message in context: http://lucene.472066.n3.nabble.com/How-to-edit-dataset-for-SVD-recommendations-with-DistributedLanczos-tp2019217p2329924.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Stefano Bellasio <st...@gmail.com>.
No one can help me getting out of this problem? Please, it's quite important :) Thank you, Stefano

Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Stefano Bellasio <st...@gmail.com>.
Hi Land, i'm trying yet to get something working out, but i have a question:
1) the DOCS is an array, i need to use something like "Ratings.txt" that has the form of userid, itemid, preference value...how can i pass this step? Thanks :)

Il giorno 06/dic/2010, alle ore 18.34, Derek O'Callaghan ha scritto:

> Yeah, that should work. You can pass in a different array to getSampleData() instead of DOCS, or change getSampleData() if you want to (i.e. changing the current "for (int i = 0; i < docs2.length; ..." loop body). I think that should be all you need...
> 
> 
> On 06/12/10 17:20, Stefano Bellasio wrote:
>> Thanks :) Found it! Well i think that the part useful for me is this one:
>> 
>>  private List<VectorWritable>  sampleData;
>> 
>>   private String[] termDictionary;
>> 
>>   @Override
>>   @Before
>>   public void setUp() throws Exception {
>>     super.setUp();
>>     Configuration conf = new Configuration();
>>     FileSystem fs = FileSystem.get(conf);
>>     // Create test data
>>     getSampleData(DOCS);
>>     ClusteringTestUtils.writePointsToFile(sampleData, true, getTestTempFilePath("testdata/file1"), fs, conf);
>>   }
>> 
>>   private void getSampleData(String[] docs2) throws IOException {
>>     sampleData = new ArrayList<VectorWritable>();
>>     RAMDirectory directory = new RAMDirectory();
>>     IndexWriter writer = new IndexWriter(directory,
>>                                          new StandardAnalyzer(Version.LUCENE_30),
>>                                          true,
>>                                          IndexWriter.MaxFieldLength.UNLIMITED);
>>     for (int i = 0; i<  docs2.length; i++) {
>>       Document doc = new Document();
>>       Fieldable id = new Field("id", "doc_" + i, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
>>       doc.add(id);
>>       // Store both position and offset information
>>       Fieldable text = new Field("content", docs2[i], Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.YES);
>>       doc.add(text);
>>       writer.addDocument(doc);
>>     }
>>     writer.close();
>>     IndexReader reader = IndexReader.open(directory, true);
>>     Weight weight = new TFIDF();
>>     TermInfo termInfo = new CachedTermInfo(reader, "content", 1, 100);
>> 
>>     int numTerms = 0;
>>     for (Iterator<TermEntry>  it = termInfo.getAllEntries(); it.hasNext();) {
>>       it.next();
>>       numTerms++;
>>     }
>>     termDictionary = new String[numTerms];
>>     int i = 0;
>>     for (Iterator<TermEntry>  it = termInfo.getAllEntries(); it.hasNext();) {
>>       String term = it.next().term;
>>       termDictionary[i] = term;
>>       System.out.println(i + " " + term);
>>       i++;
>>     }
>>     VectorMapper mapper = new TFDFMapper(reader, weight, termInfo);
>>     Iterable<Vector>  iterable = new LuceneIterable(reader, "id", "content", mapper);
>> 
>>     i = 0;
>>     for (Vector vector : iterable) {
>>       assertNotNull(vector);
>>       NamedVector namedVector;
>>       if (vector instanceof NamedVector) {
>>         //rename it for testing purposes
>>         namedVector = new NamedVector(((NamedVector) vector).getDelegate(), "P(" + i + ')');
>> 
>>       } else {
>>         namedVector = new NamedVector(vector, "P(" + i + ')');
>>       }
>>       System.out.println(AbstractCluster.formatVector(namedVector, termDictionary));
>>       sampleData.add(new VectorWritable(namedVector));
>>       i++;
>>     }
>>   }
>> 
>> 
>> Can i pass to sampledata my dataset and then using something like public void testKmeansSVD() ...am i right? Thanks
>> Il giorno 06/dic/2010, alle ore 18.04, Derek O'Callaghan ha scritto:
>> 
>>   
>>> Hi Stefano,
>>> 
>>> The class can be found in mahout-utils/src/test/java.
>>> 
>>> Derek
>>> 
>>> On 06/12/10 16:54, Stefano Bellasio wrote:
>>>     
>>>> Hi Derek, thanks! I'm looking in my mahout files, and i can't find this class under org.apache.mahout.clustering.TestClusterDumper, is there or another package?
>>>> Il giorno 06/dic/2010, alle ore 14.21, Derek O'Callaghan ha scritto:
>>>> 
>>>> 
>>>>       
>>>>> Hi Stefano,
>>>>> 
>>>>> TestClusterDumper has a few test methods which perform SVD with clustering, e.g. testKmeansSVD(). These methods demonstrate the creation of a matrix for use with SVD, so I think they might help to give you an overview of what's required.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Derek
>>>>> 
>>>>> On 04/12/10 18:04, Stefano Bellasio wrote:
>>>>> 
>>>>>         
>>>>>> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your
>>>>>> 
>>>>>> 
>>>>>>           
>>>> 
>>>>       
>> 
>>   


Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Derek O'Callaghan <de...@ucd.ie>.
Yeah, that should work. You can pass in a different array to 
getSampleData() instead of DOCS, or change getSampleData() if you want 
to (i.e. changing the current "for (int i = 0; i < docs2.length; ..." 
loop body). I think that should be all you need...


On 06/12/10 17:20, Stefano Bellasio wrote:
> Thanks :) Found it! Well i think that the part useful for me is this one:
>
>   private List<VectorWritable>  sampleData;
>
>    private String[] termDictionary;
>
>    @Override
>    @Before
>    public void setUp() throws Exception {
>      super.setUp();
>      Configuration conf = new Configuration();
>      FileSystem fs = FileSystem.get(conf);
>      // Create test data
>      getSampleData(DOCS);
>      ClusteringTestUtils.writePointsToFile(sampleData, true, getTestTempFilePath("testdata/file1"), fs, conf);
>    }
>
>    private void getSampleData(String[] docs2) throws IOException {
>      sampleData = new ArrayList<VectorWritable>();
>      RAMDirectory directory = new RAMDirectory();
>      IndexWriter writer = new IndexWriter(directory,
>                                           new StandardAnalyzer(Version.LUCENE_30),
>                                           true,
>                                           IndexWriter.MaxFieldLength.UNLIMITED);
>      for (int i = 0; i<  docs2.length; i++) {
>        Document doc = new Document();
>        Fieldable id = new Field("id", "doc_" + i, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
>        doc.add(id);
>        // Store both position and offset information
>        Fieldable text = new Field("content", docs2[i], Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.YES);
>        doc.add(text);
>        writer.addDocument(doc);
>      }
>      writer.close();
>      IndexReader reader = IndexReader.open(directory, true);
>      Weight weight = new TFIDF();
>      TermInfo termInfo = new CachedTermInfo(reader, "content", 1, 100);
>
>      int numTerms = 0;
>      for (Iterator<TermEntry>  it = termInfo.getAllEntries(); it.hasNext();) {
>        it.next();
>        numTerms++;
>      }
>      termDictionary = new String[numTerms];
>      int i = 0;
>      for (Iterator<TermEntry>  it = termInfo.getAllEntries(); it.hasNext();) {
>        String term = it.next().term;
>        termDictionary[i] = term;
>        System.out.println(i + " " + term);
>        i++;
>      }
>      VectorMapper mapper = new TFDFMapper(reader, weight, termInfo);
>      Iterable<Vector>  iterable = new LuceneIterable(reader, "id", "content", mapper);
>
>      i = 0;
>      for (Vector vector : iterable) {
>        assertNotNull(vector);
>        NamedVector namedVector;
>        if (vector instanceof NamedVector) {
>          //rename it for testing purposes
>          namedVector = new NamedVector(((NamedVector) vector).getDelegate(), "P(" + i + ')');
>
>        } else {
>          namedVector = new NamedVector(vector, "P(" + i + ')');
>        }
>        System.out.println(AbstractCluster.formatVector(namedVector, termDictionary));
>        sampleData.add(new VectorWritable(namedVector));
>        i++;
>      }
>    }
>
>
> Can i pass to sampledata my dataset and then using something like public void testKmeansSVD() ...am i right? Thanks
> Il giorno 06/dic/2010, alle ore 18.04, Derek O'Callaghan ha scritto:
>
>    
>> Hi Stefano,
>>
>> The class can be found in mahout-utils/src/test/java.
>>
>> Derek
>>
>> On 06/12/10 16:54, Stefano Bellasio wrote:
>>      
>>> Hi Derek, thanks! I'm looking in my mahout files, and i can't find this class under org.apache.mahout.clustering.TestClusterDumper, is there or another package?
>>> Il giorno 06/dic/2010, alle ore 14.21, Derek O'Callaghan ha scritto:
>>>
>>>
>>>        
>>>> Hi Stefano,
>>>>
>>>> TestClusterDumper has a few test methods which perform SVD with clustering, e.g. testKmeansSVD(). These methods demonstrate the creation of a matrix for use with SVD, so I think they might help to give you an overview of what's required.
>>>>
>>>> Regards,
>>>>
>>>> Derek
>>>>
>>>> On 04/12/10 18:04, Stefano Bellasio wrote:
>>>>
>>>>          
>>>>> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your
>>>>>
>>>>>
>>>>>            
>>>
>>>        
>
>    

Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Stefano Bellasio <st...@gmail.com>.
Thanks :) Found it! Well i think that the part useful for me is this one:

 private List<VectorWritable> sampleData;

  private String[] termDictionary;

  @Override
  @Before
  public void setUp() throws Exception {
    super.setUp();
    Configuration conf = new Configuration();
    FileSystem fs = FileSystem.get(conf);
    // Create test data
    getSampleData(DOCS);
    ClusteringTestUtils.writePointsToFile(sampleData, true, getTestTempFilePath("testdata/file1"), fs, conf);
  }

  private void getSampleData(String[] docs2) throws IOException {
    sampleData = new ArrayList<VectorWritable>();
    RAMDirectory directory = new RAMDirectory();
    IndexWriter writer = new IndexWriter(directory,
                                         new StandardAnalyzer(Version.LUCENE_30),
                                         true,
                                         IndexWriter.MaxFieldLength.UNLIMITED);
    for (int i = 0; i < docs2.length; i++) {
      Document doc = new Document();
      Fieldable id = new Field("id", "doc_" + i, Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS);
      doc.add(id);
      // Store both position and offset information
      Fieldable text = new Field("content", docs2[i], Field.Store.NO, Field.Index.ANALYZED, Field.TermVector.YES);
      doc.add(text);
      writer.addDocument(doc);
    }
    writer.close();
    IndexReader reader = IndexReader.open(directory, true);
    Weight weight = new TFIDF();
    TermInfo termInfo = new CachedTermInfo(reader, "content", 1, 100);

    int numTerms = 0;
    for (Iterator<TermEntry> it = termInfo.getAllEntries(); it.hasNext();) {
      it.next();
      numTerms++;
    }
    termDictionary = new String[numTerms];
    int i = 0;
    for (Iterator<TermEntry> it = termInfo.getAllEntries(); it.hasNext();) {
      String term = it.next().term;
      termDictionary[i] = term;
      System.out.println(i + " " + term);
      i++;
    }
    VectorMapper mapper = new TFDFMapper(reader, weight, termInfo);
    Iterable<Vector> iterable = new LuceneIterable(reader, "id", "content", mapper);

    i = 0;
    for (Vector vector : iterable) {
      assertNotNull(vector);
      NamedVector namedVector;
      if (vector instanceof NamedVector) {
        //rename it for testing purposes
        namedVector = new NamedVector(((NamedVector) vector).getDelegate(), "P(" + i + ')');

      } else {
        namedVector = new NamedVector(vector, "P(" + i + ')');
      }
      System.out.println(AbstractCluster.formatVector(namedVector, termDictionary));
      sampleData.add(new VectorWritable(namedVector));
      i++;
    }
  }


Can i pass to sampledata my dataset and then using something like public void testKmeansSVD() ...am i right? Thanks
Il giorno 06/dic/2010, alle ore 18.04, Derek O'Callaghan ha scritto:

> Hi Stefano,
> 
> The class can be found in mahout-utils/src/test/java.
> 
> Derek
> 
> On 06/12/10 16:54, Stefano Bellasio wrote:
>> Hi Derek, thanks! I'm looking in my mahout files, and i can't find this class under org.apache.mahout.clustering.TestClusterDumper, is there or another package?
>> Il giorno 06/dic/2010, alle ore 14.21, Derek O'Callaghan ha scritto:
>> 
>>   
>>> Hi Stefano,
>>> 
>>> TestClusterDumper has a few test methods which perform SVD with clustering, e.g. testKmeansSVD(). These methods demonstrate the creation of a matrix for use with SVD, so I think they might help to give you an overview of what's required.
>>> 
>>> Regards,
>>> 
>>> Derek
>>> 
>>> On 04/12/10 18:04, Stefano Bellasio wrote:
>>>     
>>>> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your
>>>> 
>>>>       
>> 
>>   


Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Derek O'Callaghan <de...@ucd.ie>.
Hi Stefano,

The class can be found in mahout-utils/src/test/java.

Derek

On 06/12/10 16:54, Stefano Bellasio wrote:
> Hi Derek, thanks! I'm looking in my mahout files, and i can't find this class under org.apache.mahout.clustering.TestClusterDumper, is there or another package?
> Il giorno 06/dic/2010, alle ore 14.21, Derek O'Callaghan ha scritto:
>
>    
>> Hi Stefano,
>>
>> TestClusterDumper has a few test methods which perform SVD with clustering, e.g. testKmeansSVD(). These methods demonstrate the creation of a matrix for use with SVD, so I think they might help to give you an overview of what's required.
>>
>> Regards,
>>
>> Derek
>>
>> On 04/12/10 18:04, Stefano Bellasio wrote:
>>      
>>> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your
>>>
>>>        
>
>    

Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Stefano Bellasio <st...@gmail.com>.
Hi Derek, thanks! I'm looking in my mahout files, and i can't find this class under org.apache.mahout.clustering.TestClusterDumper, is there or another package? 
Il giorno 06/dic/2010, alle ore 14.21, Derek O'Callaghan ha scritto:

> Hi Stefano,
> 
> TestClusterDumper has a few test methods which perform SVD with clustering, e.g. testKmeansSVD(). These methods demonstrate the creation of a matrix for use with SVD, so I think they might help to give you an overview of what's required.
> 
> Regards,
> 
> Derek
> 
> On 04/12/10 18:04, Stefano Bellasio wrote:
>> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your
>>   


Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Derek O'Callaghan <de...@ucd.ie>.
Hi Stefano,

TestClusterDumper has a few test methods which perform SVD with 
clustering, e.g. testKmeansSVD(). These methods demonstrate the creation 
of a matrix for use with SVD, so I think they might help to give you an 
overview of what's required.

Regards,

Derek

On 04/12/10 18:04, Stefano Bellasio wrote:
> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your
>    

Re: How to edit dataset for SVD recommendations with DistributedLanczos?

Posted by Ted Dunning <te...@gmail.com>.
Stefano,
It sounds like you have an interesting need in mind.  Can you remind
us again of a bit more context?  I think that several of your messages
got dropped due to being posted via nabble instead of directly.

On Sat, Dec 4, 2010 at 10:04 AM, Stefano Bellasio
<st...@gmail.com> wrote:
>
> Do i need to put all data in a matrix i think, but how? I used SVD command line of Mahout with seqdirectory and seq2sparse, but without success :) Well i think i need something like this finally: http://lucene.472066.n3.nabble.com/Using-SVD-with-Canopy-KMeans-td1407217.html#a1407801 but for recommendations. Can you help me with some suggestions or tutorials? I see that there is much interest in SVD and DistributedLanczos but really few suggestions and tutorials. Thank you again for your