You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by jamborta <ja...@gmail.com> on 2009/12/15 15:39:59 UTC

FileDataModel - taste library

hi,

I found something very weird, I can't figure out what's wrong.
I use this FileDataModel to read the dataset from disk:

DataModel model = new FileDataModel(new File("./data/all_data.data"));
int numUsers = model.getNumUsers();

on one machine it works like this:

15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
INFO: Creating FileDataModel for file .\data\all_data.data
15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
INFO: Reading file info...
15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
INFO: Read lines: 100000
15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 943 users
15-Dec-2009 14:29:33 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 943 users

which is correct.
on another one it seems to read something else at the same time. it gives me
this output:

15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
INFO: Creating FileDataModel for file .\data\all_data.data
15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
INFO: Reading file info...
15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
INFO: Read lines: 100000
15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
INFO: Reading file info...
15-Dec-2009 14:35:15 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 1000000 lines
15-Dec-2009 14:35:15 org.slf4j.impl.JCLLoggerAdapter info
INFO: Read lines: 1000209
15-Dec-2009 14:35:17 org.slf4j.impl.JCLLoggerAdapter info
INFO: Processed 6040 users

I have two datasets, but for some reason on the second machine it rereads it
from somewhere.

thanks a lot
-- 
View this message in context: http://old.nabble.com/FileDataModel---taste-library-tp26795792p26795792.html
Sent from the Mahout User List mailing list archive at Nabble.com.


Re: FileDataModel - taste library

Posted by jamborta <ja...@gmail.com>.
hi,

I found the problem in the meantime. i renamed my bigger data file, and the
reader thought it was an update file...

thanks
-- 
View this message in context: http://old.nabble.com/FileDataModel---taste-library-tp26795792p26795825.html
Sent from the Mahout User List mailing list archive at Nabble.com.


Re: FileDataModel - taste library

Posted by Sean Owen <sr...@gmail.com>.
I bet you have other files named like all_data.* in the same
directory? these are also processed as 'update files'.

On Tue, Dec 15, 2009 at 2:39 PM, jamborta <ja...@gmail.com> wrote:
>
> hi,
>
> I found something very weird, I can't figure out what's wrong.
> I use this FileDataModel to read the dataset from disk:
>
> DataModel model = new FileDataModel(new File("./data/all_data.data"));
> int numUsers = model.getNumUsers();
>
> on one machine it works like this:
>
> 15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Creating FileDataModel for file .\data\all_data.data
> 15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 100000
> 15-Dec-2009 14:29:32 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 943 users
> 15-Dec-2009 14:29:33 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 943 users
>
> which is correct.
> on another one it seems to read something else at the same time. it gives me
> this output:
>
> 15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Creating FileDataModel for file .\data\all_data.data
> 15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 100000
> 15-Dec-2009 14:35:13 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Reading file info...
> 15-Dec-2009 14:35:15 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 1000000 lines
> 15-Dec-2009 14:35:15 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Read lines: 1000209
> 15-Dec-2009 14:35:17 org.slf4j.impl.JCLLoggerAdapter info
> INFO: Processed 6040 users
>
> I have two datasets, but for some reason on the second machine it rereads it
> from somewhere.
>
> thanks a lot
> --
> View this message in context: http://old.nabble.com/FileDataModel---taste-library-tp26795792p26795792.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>
>