You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mirko <id...@googlemail.com> on 2009/11/04 18:04:17 UTC

GenericJDBCDataModel problem (getItems, getUsers)

Hi,
I have problems with the GenericJDBCDataModel. For me, the functions  
getUsers() and getItems() return nothing, although the SQL queries  
used in the functions definitely do return results. All other  
functions in the DataModel work as expected.

The results of the queries used in getUsers() and getItems() look as  
follows:

SQL result for GenericJDBCDataModel.GET_ITEMS_SQL_KEY:

item_id
------------
itemID_A
itemID_B
itemID_C
itemID_D
itemID_E
...

SQL result for GenericJDBCDataModel.GET_USERS_SQL_KEY

item_id		callret-1		user_id
----------------------------------------------
itemID_A	1		userID_X
itemID_B	1		userID_Y
itemID_C	1		userID_Z
...

I think these results should be correct (Preferences are always 1 in  
my data). But the Iterables returned by both functions seem to be empty:

GenericJDBCDataModel dm = new GenericJDBCDataModel(props);
for ( Item item : dm.getItems()){
	System.out.println(item.getID()); //Never reached
}
for ( User user : dm.getUsers()){
	System.out.println(user.getID()); //Never reached
}


I should note that I query an OpenLink Virtuoso RDF store, which uses  
SPARQL queries enclosed in SQL queries. That means that  'columns' and  
'tables' , and thus the variables DEFAULT_PREFERENCE_TABLE,  
DEFAULT_USER_ID_COLUMN, DEFAULT_ITEM_ID_COLUMN and  
DEFAULT_PREFERENCE_COLUMN of AbstractJDBCDataModel don't have any  
meaning. But as I understand the code, they are not required when  
using the GenericJDBCDataModel.

Is it possible that some exception is thrown 'silently' in the  
AbstractJDBCDataModel?

I have no idea where to look for the prob, so thanks for any hints....

Regards,
Mirko




Re: Recommender with filtering

Posted by Sean Owen <sr...@gmail.com>.
The easiest way to do this by far is with a Rescorer object, passed to
the recommend() method. The implementation should simply look at the
item ID it's given, and return NaN if it's *not* from the given
category, otherwise return the original score. It'll make sense if you
look at the interface but post back with questions here.

On Wed, Nov 4, 2009 at 5:42 PM, Shannon Hicks <sh...@pintley.com> wrote:
> In my recommender, my items are grouped into categories. I'd like to end up
> getting recommendations filtered by a specific category.
>
> Are there any examples or suggestions out there on how we might go about
> this? I'm thinking the way to go would be to do a JOIN query somewhere in
> the GenericJDBCDataModel when needed.
>
> Shan
>

Recommender with filtering

Posted by Shannon Hicks <sh...@pintley.com>.
In my recommender, my items are grouped into categories. I'd like to  
end up getting recommendations filtered by a specific category.

Are there any examples or suggestions out there on how we might go  
about this? I'm thinking the way to go would be to do a JOIN query  
somewhere in the GenericJDBCDataModel when needed.

Shan

Re: GenericJDBCDataModel problem (getItems, getUsers)

Posted by Sean Owen <sr...@gmail.com>.
On Thu, Nov 5, 2009 at 4:43 PM, Mirko
<id...@googlemail.com> wrote:
> Jep, this is what I did already. I looked at the logs from the library at
> debug level. From there, I copy-pasted the SQL statements and checked their
> results. And as I said, no exceptions are logged. But nevermind, I switched
> to v0.3 and MySQL anyway, so there's no use to get v0.1 working anymore.

OK, that's weird, but as you say, problem solved anyway.

> Maybe I will go back to implementing a DataModel for my RDF store with the
> current code version later. I like the idea not to need a special table for
> preferences, but to query preferences from RDF directly. However it makes
> not much sense anymore when needing LongIDs. I would have to
> translate-and-copy the data, which must use StringIDs, anyway). I will think
> about that...

Look at IDMigrator which will help you translate pretty easily.

Re: GenericJDBCDataModel problem (getItems, getUsers)

Posted by Mirko <id...@googlemail.com>.
Hi,

> Yes I would encourage you to switch, since a lot has improved, but yes
> the Strings-longs issue is the biggest one. It does enable quite a bit
> of performance improvement.

Ok, I switched to v0.3 and also use a MySQL DB to store preferences  
now. It was easy to implement with MySQL, although performance is  
still bad, but I didn't tweak MySQL server variables yet and also work  
with small JVM memory on my notebook.

> One possibility is that there are no results because
> the SQL isn't quite querying how you want it, or being executed as you
> think it is, and is returning no results. For example if you don't
> have the order of the placeholders right (the '?') things wouldn't
> work.

Well, the queries for getUsers and getItems have no parameters. Also,  
wrong number of parameters do throw exceptions.
>
> This is why I wonder if you can look at the log statements from the
> library (at debug level) which show the SQL statements being executed.
> Or simply attach a debugger and find out that way.


Jep, this is what I did already. I looked at the logs from the library  
at debug level. From there, I copy-pasted the SQL statements and  
checked their results. And as I said, no exceptions are logged. But  
nevermind, I switched to v0.3 and MySQL anyway, so there's no use to  
get v0.1 working anymore.

Maybe I will go back to implementing a DataModel for my RDF store with  
the current code version later. I like the idea not to need a special  
table for preferences, but to query preferences from RDF directly.  
However it makes not much sense anymore when needing LongIDs. I would  
have to translate-and-copy the data, which must use StringIDs,  
anyway). I will think about that...

Thanks,
Mirko


Re: GenericJDBCDataModel problem (getItems, getUsers)

Posted by Sean Owen <sr...@gmail.com>.
On Thu, Nov 5, 2009 at 8:32 AM, Mirko
<id...@googlemail.com> wrote:
> Hi Sean,
> I used v0.1. I got the latest snapshot (0.3) now and saw that a lot changed
> in the meanwhile; things seem a bit more complicated now. I will switch to
> 0.3 but that seems to be more tricky to implement on my data (e.g. I
> currently use Strings, not longs, for items and users in my data). Probably
> it'll be easier to copy my data to an already implemented DB (MySQL) and
> process them there instead of customizing the DataModel to fit my DB.

Yes I would encourage you to switch, since a lot has improved, but yes
the Strings-longs issue is the biggest one. It does enable quite a bit
of performance improvement.

Look for the "IDMigrator" class which can help create a temporary
solution. It's a helper class that helps maintain a mapping between
longs and Strings. This can be used with a JDBCDataModel to translate
back and forth for your database

Of course you'll probably want to ultimately use numbers in your
database too if possible, but this works in the short term. We can
discuss more how to do it if you go this way.


> As I understand the code, the GenericJDBCDatamodel works with any JDBC
> Datasource when adjusting the SQL_KEY values. Since my VirtuosoDB is a
> common JDBC javax.sql.Datasource it should not matter whether I send SQL or
> RDF-in-SQL, as long as the SQL ResultSets are of the expected structure.
> Would you agree on that? Or, what else do you think I do have to customize?

Yes that's right. One possibility is that there are no results because
the SQL isn't quite querying how you want it, or being executed as you
think it is, and is returning no results. For example if you don't
have the order of the placeholders right (the '?') things wouldn't
work.

This is why I wonder if you can look at the log statements from the
library (at debug level) which show the SQL statements being executed.
Or simply attach a debugger and find out that way.

The other possibility is, as I remember, the iterators can't throw an
exception when they hit a problem. They log the error and close the
iterator. This could also explain this output. Again, you'd have to
look into the log files, or a debugger, to confirm what's happening.

I'm suggesting one of these two things should be investigated.

Re: GenericJDBCDataModel problem (getItems, getUsers)

Posted by Mirko <id...@googlemail.com>.
Hi Sean,
I used v0.1. I got the latest snapshot (0.3) now and saw that a lot  
changed in the meanwhile; things seem a bit more complicated now. I  
will switch to 0.3 but that seems to be more tricky to implement on my  
data (e.g. I currently use Strings, not longs, for items and users in  
my data). Probably it'll be easier to copy my data to an already  
implemented DB (MySQL) and process them there instead of customizing  
the DataModel to fit my DB.

But still:

> If you are using some SPARQL-in-SQL system then you must be  
> customizing a
> lot, at least the SQL queries.

As I understand the code, the GenericJDBCDatamodel works with any JDBC  
Datasource when adjusting the SQL_KEY values. Since my VirtuosoDB is a  
common JDBC javax.sql.Datasource it should not matter whether I send  
SQL or RDF-in-SQL, as long as the SQL ResultSets are of the expected  
structure. Would you agree on that? Or, what else do you think I do  
have to customize?

Thanks,
Mirko



>
>
> On Nov 4, 2009 5:04 PM, "Mirko" <idonthaveenoughinformation@googlemail.com 
> >
> wrote:
>
> Hi,
> I have problems with the GenericJDBCDataModel. For me, the functions
> getUsers() and getItems() return nothing, although the SQL queries  
> used in
> the functions definitely do return results. All other functions in the
> DataModel work as expected.
>
> The results of the queries used in getUsers() and getItems() look as
> follows:
>
> SQL result for GenericJDBCDataModel.GET_ITEMS_SQL_KEY:
>
> item_id
> ------------
> itemID_A
> itemID_B
> itemID_C
> itemID_D
> itemID_E
> ...
>
> SQL result for GenericJDBCDataModel.GET_USERS_SQL_KEY
>
> item_id         callret-1               user_id
> ----------------------------------------------
> itemID_A        1               userID_X
> itemID_B        1               userID_Y
> itemID_C        1               userID_Z
> ...
>
> I think these results should be correct (Preferences are always 1 in  
> my
> data). But the Iterables returned by both functions seem to be empty:
>
> GenericJDBCDataModel dm = new GenericJDBCDataModel(props);
> for ( Item item : dm.getItems()){
>       System.out.println(item.getID()); //Never reached
> }
> for ( User user : dm.getUsers()){
>       System.out.println(user.getID()); //Never reached
> }
>
>
> I should note that I query an OpenLink Virtuoso RDF store, which  
> uses SPARQL
> queries enclosed in SQL queries. That means that  'columns' and  
> 'tables' ,
> and thus the variables DEFAULT_PREFERENCE_TABLE,  
> DEFAULT_USER_ID_COLUMN,
> DEFAULT_ITEM_ID_COLUMN and DEFAULT_PREFERENCE_COLUMN of
> AbstractJDBCDataModel don't have any meaning. But as I understand  
> the code,
> they are not required when using the GenericJDBCDataModel.
>
> Is it possible that some exception is thrown 'silently' in the
> AbstractJDBCDataModel?
>
> I have no idea where to look for the prob, so thanks for any hints....
>
> Regards,
> Mirko


Re: GenericJDBCDataModel problem (getItems, getUsers)

Posted by Sean Owen <sr...@gmail.com>.
If you are using some SPARQL-in-SQL system then you must be customizing a
lot, at least the SQL queries. It is hard to help since all that is not
project code.

Are you so sure the queries being executed work? Have you looked at log
output? What code version? No there should be no hidden exceptions - but
some are logged not thrown where not possible.

I would just debug, that will show the issue quickly.

On Nov 4, 2009 5:04 PM, "Mirko" <id...@googlemail.com>
wrote:

Hi,
I have problems with the GenericJDBCDataModel. For me, the functions
getUsers() and getItems() return nothing, although the SQL queries used in
the functions definitely do return results. All other functions in the
DataModel work as expected.

The results of the queries used in getUsers() and getItems() look as
follows:

SQL result for GenericJDBCDataModel.GET_ITEMS_SQL_KEY:

item_id
------------
itemID_A
itemID_B
itemID_C
itemID_D
itemID_E
...

SQL result for GenericJDBCDataModel.GET_USERS_SQL_KEY

item_id         callret-1               user_id
----------------------------------------------
itemID_A        1               userID_X
itemID_B        1               userID_Y
itemID_C        1               userID_Z
...

I think these results should be correct (Preferences are always 1 in my
data). But the Iterables returned by both functions seem to be empty:

GenericJDBCDataModel dm = new GenericJDBCDataModel(props);
for ( Item item : dm.getItems()){
       System.out.println(item.getID()); //Never reached
}
for ( User user : dm.getUsers()){
       System.out.println(user.getID()); //Never reached
}


I should note that I query an OpenLink Virtuoso RDF store, which uses SPARQL
queries enclosed in SQL queries. That means that  'columns' and 'tables' ,
and thus the variables DEFAULT_PREFERENCE_TABLE, DEFAULT_USER_ID_COLUMN,
DEFAULT_ITEM_ID_COLUMN and DEFAULT_PREFERENCE_COLUMN of
AbstractJDBCDataModel don't have any meaning. But as I understand the code,
they are not required when using the GenericJDBCDataModel.

Is it possible that some exception is thrown 'silently' in the
AbstractJDBCDataModel?

I have no idea where to look for the prob, so thanks for any hints....

Regards,
Mirko