You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Salil Apte <sa...@offlinelabs.com> on 2011/07/12 03:21:29 UTC

Connection Pooling

So I keep getting this warning from either Mahout or the server (I'm
guessing the former):

WARNING: You are not using ConnectionPoolDataSource. Make sure your
DataSource pools connections to the database itself, or database
performance will be severely reduced.

I'm not really sure why this is happening. I have the following
resource in my webapp's context.xml file. Is there anything else I
need to do enable connection pooling with a  JNDI resource?

<Resource name="jdbc/offline-local" auth="Container"
type="javax.sql.DataSource" username="root" password=""
driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
validationQuery="select 1" maxActive="16" maxIdle="4"
removeAbandoned="true" logAbandoned="true" />

Thanks in advance.

-Salil

Re: Connection Pooling

Posted by Vitali Mogilevsky <vi...@playedonline.com>.

Thanks, will test that

On Wed, Jul 13, 2011 at 12:11 PM, Sean Owen <sr...@gmail.com> wrote:

> That's all correct, it reads a lot. But you can avoid a lot of it by using
> caching wrappers.
> You also don't need to dump to a file. Use ReloadFromJDBCDataModel.
>
> On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
> <vi...@playedonline.com>wrote:
>
> > Hey,
> > I got the same problem, of slowness while using MYSQL data model, after a
> > small research and looking into mysql's query log, revealed that user -
> > user
> > recommendation just floods the database with thousands and thousands of
> > requests.
> > and thats on small database.
> > for now Im dumbping the database into file, and using filedata model
> which
> > works much faster
> >
> >
>

Re: Connection Pooling

Posted by Marko Ciric <ci...@gmail.com>.

Actually, as GenericDataModel class works very well as a super class of your
desired data model. This way everything is cached into memory and boosts
performance a lot. The reloading is actually easy to be implemented with the
refresh mechanism (Taste objects implement Refreshable interface). You can
also try RefreshHelper.

On 13 July 2011 20:58, Sean Owen <sr...@gmail.com> wrote:

> I was mixing this up with another class. It doesn't reload itself. You can
> call refresh() to do so.
>
> On Wed, Jul 13, 2011 at 7:34 PM, Salil Apte <sa...@offlinelabs.com> wrote:
>
> > Where can the interval be configured? BTW, ReloadFromJDBCDataModel
> > works like a dream so far :)
> >
> > On Wed, Jul 13, 2011 at 10:58 AM, Sean Owen <sr...@gmail.com> wrote:
> > > Yes it reloads after a configurable interval, or on demand.
> > >
> > > Clearing the cache for a user ID only means that user's data is
> > recomputed.
> > > It's not bad to call this frequently per se... I suppose you want to
> let
> > it
> > > cache as much and for as long as is valid and acceptable to your app.
> > >
> > > Your bottleneck is no longer reading from the DB if you're having it
> load
> > > into memory.
> > >
> > > On Wed, Jul 13, 2011 at 6:19 PM, Salil Apte <sa...@offlinelabs.com>
> > wrote:
> > >
> > >> Awesome, I will give ReloadFromJDBCDataModel a try. How does this
> > >> particular data model update itself on database changes? Does it just
> > >> happen periodically and if so, can this rate be change easily?
> > >>
> > >> Lastly, will calling clear(userId) on a recommender frequently be bad
> > >> for performance? I'm assuming with such small data amounts that the
> > >> actual recommendation algorithm is quite speedy and that the DB is
> > >> really the big bottleneck?
> > >>
> > >> On Wed, Jul 13, 2011 at 2:11 AM, Sean Owen <sr...@gmail.com> wrote:
> > >> > That's all correct, it reads a lot. But you can avoid a lot of it by
> > >> using
> > >> > caching wrappers.
> > >> > You also don't need to dump to a file. Use ReloadFromJDBCDataModel.
> > >> >
> > >> > On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
> > >> > <vi...@playedonline.com>wrote:
> > >> >
> > >> >> Hey,
> > >> >> I got the same problem, of slowness while using MYSQL data model,
> > after
> > >> a
> > >> >> small research and looking into mysql's query log, revealed that
> user
> > -
> > >> >> user
> > >> >> recommendation just floods the database with thousands and
> thousands
> > of
> > >> >> requests.
> > >> >> and thats on small database.
> > >> >> for now Im dumbping the database into file, and using filedata
> model
> > >> which
> > >> >> works much faster
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> >
>



-- 
--
Marko Ćirić
ciric.marko@gmail.com

Re: Connection Pooling

Posted by Sean Owen <sr...@gmail.com>.

I was mixing this up with another class. It doesn't reload itself. You can
call refresh() to do so.

On Wed, Jul 13, 2011 at 7:34 PM, Salil Apte <sa...@offlinelabs.com> wrote:

> Where can the interval be configured? BTW, ReloadFromJDBCDataModel
> works like a dream so far :)
>
> On Wed, Jul 13, 2011 at 10:58 AM, Sean Owen <sr...@gmail.com> wrote:
> > Yes it reloads after a configurable interval, or on demand.
> >
> > Clearing the cache for a user ID only means that user's data is
> recomputed.
> > It's not bad to call this frequently per se... I suppose you want to let
> it
> > cache as much and for as long as is valid and acceptable to your app.
> >
> > Your bottleneck is no longer reading from the DB if you're having it load
> > into memory.
> >
> > On Wed, Jul 13, 2011 at 6:19 PM, Salil Apte <sa...@offlinelabs.com>
> wrote:
> >
> >> Awesome, I will give ReloadFromJDBCDataModel a try. How does this
> >> particular data model update itself on database changes? Does it just
> >> happen periodically and if so, can this rate be change easily?
> >>
> >> Lastly, will calling clear(userId) on a recommender frequently be bad
> >> for performance? I'm assuming with such small data amounts that the
> >> actual recommendation algorithm is quite speedy and that the DB is
> >> really the big bottleneck?
> >>
> >> On Wed, Jul 13, 2011 at 2:11 AM, Sean Owen <sr...@gmail.com> wrote:
> >> > That's all correct, it reads a lot. But you can avoid a lot of it by
> >> using
> >> > caching wrappers.
> >> > You also don't need to dump to a file. Use ReloadFromJDBCDataModel.
> >> >
> >> > On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
> >> > <vi...@playedonline.com>wrote:
> >> >
> >> >> Hey,
> >> >> I got the same problem, of slowness while using MYSQL data model,
> after
> >> a
> >> >> small research and looking into mysql's query log, revealed that user
> -
> >> >> user
> >> >> recommendation just floods the database with thousands and thousands
> of
> >> >> requests.
> >> >> and thats on small database.
> >> >> for now Im dumbping the database into file, and using filedata model
> >> which
> >> >> works much faster
> >> >>
> >> >>
> >> >
> >>
> >
>

Re: Connection Pooling

Posted by Salil Apte <sa...@offlinelabs.com>.

Where can the interval be configured? BTW, ReloadFromJDBCDataModel
works like a dream so far :)

On Wed, Jul 13, 2011 at 10:58 AM, Sean Owen <sr...@gmail.com> wrote:
> Yes it reloads after a configurable interval, or on demand.
>
> Clearing the cache for a user ID only means that user's data is recomputed.
> It's not bad to call this frequently per se... I suppose you want to let it
> cache as much and for as long as is valid and acceptable to your app.
>
> Your bottleneck is no longer reading from the DB if you're having it load
> into memory.
>
> On Wed, Jul 13, 2011 at 6:19 PM, Salil Apte <sa...@offlinelabs.com> wrote:
>
>> Awesome, I will give ReloadFromJDBCDataModel a try. How does this
>> particular data model update itself on database changes? Does it just
>> happen periodically and if so, can this rate be change easily?
>>
>> Lastly, will calling clear(userId) on a recommender frequently be bad
>> for performance? I'm assuming with such small data amounts that the
>> actual recommendation algorithm is quite speedy and that the DB is
>> really the big bottleneck?
>>
>> On Wed, Jul 13, 2011 at 2:11 AM, Sean Owen <sr...@gmail.com> wrote:
>> > That's all correct, it reads a lot. But you can avoid a lot of it by
>> using
>> > caching wrappers.
>> > You also don't need to dump to a file. Use ReloadFromJDBCDataModel.
>> >
>> > On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
>> > <vi...@playedonline.com>wrote:
>> >
>> >> Hey,
>> >> I got the same problem, of slowness while using MYSQL data model, after
>> a
>> >> small research and looking into mysql's query log, revealed that user -
>> >> user
>> >> recommendation just floods the database with thousands and thousands of
>> >> requests.
>> >> and thats on small database.
>> >> for now Im dumbping the database into file, and using filedata model
>> which
>> >> works much faster
>> >>
>> >>
>> >
>>
>

Re: Connection Pooling

Posted by Sean Owen <sr...@gmail.com>.

Yes it reloads after a configurable interval, or on demand.

Clearing the cache for a user ID only means that user's data is recomputed.
It's not bad to call this frequently per se... I suppose you want to let it
cache as much and for as long as is valid and acceptable to your app.

Your bottleneck is no longer reading from the DB if you're having it load
into memory.

On Wed, Jul 13, 2011 at 6:19 PM, Salil Apte <sa...@offlinelabs.com> wrote:

> Awesome, I will give ReloadFromJDBCDataModel a try. How does this
> particular data model update itself on database changes? Does it just
> happen periodically and if so, can this rate be change easily?
>
> Lastly, will calling clear(userId) on a recommender frequently be bad
> for performance? I'm assuming with such small data amounts that the
> actual recommendation algorithm is quite speedy and that the DB is
> really the big bottleneck?
>
> On Wed, Jul 13, 2011 at 2:11 AM, Sean Owen <sr...@gmail.com> wrote:
> > That's all correct, it reads a lot. But you can avoid a lot of it by
> using
> > caching wrappers.
> > You also don't need to dump to a file. Use ReloadFromJDBCDataModel.
> >
> > On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
> > <vi...@playedonline.com>wrote:
> >
> >> Hey,
> >> I got the same problem, of slowness while using MYSQL data model, after
> a
> >> small research and looking into mysql's query log, revealed that user -
> >> user
> >> recommendation just floods the database with thousands and thousands of
> >> requests.
> >> and thats on small database.
> >> for now Im dumbping the database into file, and using filedata model
> which
> >> works much faster
> >>
> >>
> >
>

Re: Connection Pooling

Posted by aa...@motekmobile.com.

-----Original Message-----
From: Salil Apte <sa...@offlinelabs.com>
Date: Wed, 13 Jul 2011 10:19:47 
To: <us...@mahout.apache.org>
Reply-To: user@mahout.apache.org
Subject: Re: Connection Pooling

Awesome, I will give ReloadFromJDBCDataModel a try. How does this
particular data model update itself on database changes? Does it just
happen periodically and if so, can this rate be change easily?

Lastly, will calling clear(userId) on a recommender frequently be bad
for performance? I'm assuming with such small data amounts that the
actual recommendation algorithm is quite speedy and that the DB is
really the big bottleneck?

On Wed, Jul 13, 2011 at 2:11 AM, Sean Owen <sr...@gmail.com> wrote:
> That's all correct, it reads a lot. But you can avoid a lot of it by using
> caching wrappers.
> You also don't need to dump to a file. Use ReloadFromJDBCDataModel.
>
> On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
> <vi...@playedonline.com>wrote:
>
>> Hey,
>> I got the same problem, of slowness while using MYSQL data model, after a
>> small research and looking into mysql's query log, revealed that user -
>> user
>> recommendation just floods the database with thousands and thousands of
>> requests.
>> and thats on small database.
>> for now Im dumbping the database into file, and using filedata model which
>> works much faster
>>
>>
>

Re: Connection Pooling

Posted by Salil Apte <sa...@offlinelabs.com>.

Awesome, I will give ReloadFromJDBCDataModel a try. How does this
particular data model update itself on database changes? Does it just
happen periodically and if so, can this rate be change easily?

Lastly, will calling clear(userId) on a recommender frequently be bad
for performance? I'm assuming with such small data amounts that the
actual recommendation algorithm is quite speedy and that the DB is
really the big bottleneck?

On Wed, Jul 13, 2011 at 2:11 AM, Sean Owen <sr...@gmail.com> wrote:
> That's all correct, it reads a lot. But you can avoid a lot of it by using
> caching wrappers.
> You also don't need to dump to a file. Use ReloadFromJDBCDataModel.
>
> On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
> <vi...@playedonline.com>wrote:
>
>> Hey,
>> I got the same problem, of slowness while using MYSQL data model, after a
>> small research and looking into mysql's query log, revealed that user -
>> user
>> recommendation just floods the database with thousands and thousands of
>> requests.
>> and thats on small database.
>> for now Im dumbping the database into file, and using filedata model which
>> works much faster
>>
>>
>

Re: Connection Pooling

Posted by Sean Owen <sr...@gmail.com>.

That's all correct, it reads a lot. But you can avoid a lot of it by using
caching wrappers.
You also don't need to dump to a file. Use ReloadFromJDBCDataModel.

On Wed, Jul 13, 2011 at 9:57 AM, Vitali Mogilevsky
<vi...@playedonline.com>wrote:

> Hey,
> I got the same problem, of slowness while using MYSQL data model, after a
> small research and looking into mysql's query log, revealed that user -
> user
> recommendation just floods the database with thousands and thousands of
> requests.
> and thats on small database.
> for now Im dumbping the database into file, and using filedata model which
> works much faster
>
>

Re: Connection Pooling

Posted by Vitali Mogilevsky <vi...@playedonline.com>.

Hey,
I got the same problem, of slowness while using MYSQL data model, after a
small research and looking into mysql's query log, revealed that user - user
recommendation just floods the database with thousands and thousands of
requests.
and thats on small database.
for now Im dumbping the database into file, and using filedata model which
works much faster

On Wed, Jul 13, 2011 at 10:03 AM, Sean Owen <sr...@gmail.com> wrote:

> That's too small to be that slow. There are a bunch of ways this could be
> slower than it should.
>
> The DataSource may not matter. What's important is whether it is actually a
> pooling DataSource from the container. You may want to check whether it
> seems to be reusing connections.
> Table indexes, on user ID, item ID and both together as primary key are
> important.
> You may want a larger pool if you are sending concurrent requests.
> Use a caching wrapper around the UserSimilarity.
>
> But yes loading into memory is going to be much better for you. Use
> ReloadFromJDBCDataModel.
>
> On Wed, Jul 13, 2011 at 6:09 AM, Salil Apte <sa...@offlinelabs.com> wrote:
>
> > Oh yea, at runtime, I'm getting back a BasicDataSource object for my
> > DataSource. Is that correct?
> >
> > On Tue, Jul 12, 2011 at 9:59 PM, Salil Apte <sa...@offlinelabs.com>
> wrote:
> > > So I started actually looking at performance today and it is pretty
> > > horrendous. I've got about 61,000 rows in my database which I'm
> > > assuming isn't *that* many rows. But recommendations are taking > 20
> > > seconds. Is there some way to ensure pooling is turned on? What else
> > > is a big driver for performance? My tables are setup so that I have a
> > > multiple index (for uniqueness) for <user_id, item_id> pairs. That
> > > way, there cannot be two entries with the same <user_id, item_id>. I'm
> > > not sure where to go from here.
> > >
> > > Thanks for the help!
> > >
> > > On Tue, Jul 12, 2011 at 12:47 AM, Sean Owen <sr...@gmail.com> wrote:
> > >> You can ignore it. It just doesn't know for sure you have a pool.
> > >> I believe I have even removed this in a recent refactoring.
> > >>
> > >> On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <sa...@offlinelabs.com>
> > wrote:
> > >>
> > >>> So I keep getting this warning from either Mahout or the server (I'm
> > >>> guessing the former):
> > >>>
> > >>> WARNING: You are not using ConnectionPoolDataSource. Make sure your
> > >>> DataSource pools connections to the database itself, or database
> > >>> performance will be severely reduced.
> > >>>
> > >>> I'm not really sure why this is happening. I have the following
> > >>> resource in my webapp's context.xml file. Is there anything else I
> > >>> need to do enable connection pooling with a  JNDI resource?
> > >>>
> > >>> <Resource name="jdbc/offline-local" auth="Container"
> > >>> type="javax.sql.DataSource" username="root" password=""
> > >>> driverClassName="com.mysql.jdbc.Driver"
> > >>>
> > >>>
> >
> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
> > >>> validationQuery="select 1" maxActive="16" maxIdle="4"
> > >>> removeAbandoned="true" logAbandoned="true" />
> > >>>
> > >>> Thanks in advance.
> > >>>
> > >>> -Salil
> > >>>
> > >>
> > >
> >
>

Re: Connection Pooling

Posted by Sean Owen <sr...@gmail.com>.

That's too small to be that slow. There are a bunch of ways this could be
slower than it should.

The DataSource may not matter. What's important is whether it is actually a
pooling DataSource from the container. You may want to check whether it
seems to be reusing connections.
Table indexes, on user ID, item ID and both together as primary key are
important.
You may want a larger pool if you are sending concurrent requests.
Use a caching wrapper around the UserSimilarity.

But yes loading into memory is going to be much better for you. Use
ReloadFromJDBCDataModel.

On Wed, Jul 13, 2011 at 6:09 AM, Salil Apte <sa...@offlinelabs.com> wrote:

> Oh yea, at runtime, I'm getting back a BasicDataSource object for my
> DataSource. Is that correct?
>
> On Tue, Jul 12, 2011 at 9:59 PM, Salil Apte <sa...@offlinelabs.com> wrote:
> > So I started actually looking at performance today and it is pretty
> > horrendous. I've got about 61,000 rows in my database which I'm
> > assuming isn't *that* many rows. But recommendations are taking > 20
> > seconds. Is there some way to ensure pooling is turned on? What else
> > is a big driver for performance? My tables are setup so that I have a
> > multiple index (for uniqueness) for <user_id, item_id> pairs. That
> > way, there cannot be two entries with the same <user_id, item_id>. I'm
> > not sure where to go from here.
> >
> > Thanks for the help!
> >
> > On Tue, Jul 12, 2011 at 12:47 AM, Sean Owen <sr...@gmail.com> wrote:
> >> You can ignore it. It just doesn't know for sure you have a pool.
> >> I believe I have even removed this in a recent refactoring.
> >>
> >> On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <sa...@offlinelabs.com>
> wrote:
> >>
> >>> So I keep getting this warning from either Mahout or the server (I'm
> >>> guessing the former):
> >>>
> >>> WARNING: You are not using ConnectionPoolDataSource. Make sure your
> >>> DataSource pools connections to the database itself, or database
> >>> performance will be severely reduced.
> >>>
> >>> I'm not really sure why this is happening. I have the following
> >>> resource in my webapp's context.xml file. Is there anything else I
> >>> need to do enable connection pooling with a  JNDI resource?
> >>>
> >>> <Resource name="jdbc/offline-local" auth="Container"
> >>> type="javax.sql.DataSource" username="root" password=""
> >>> driverClassName="com.mysql.jdbc.Driver"
> >>>
> >>>
> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
> >>> validationQuery="select 1" maxActive="16" maxIdle="4"
> >>> removeAbandoned="true" logAbandoned="true" />
> >>>
> >>> Thanks in advance.
> >>>
> >>> -Salil
> >>>
> >>
> >
>

Re: Connection Pooling

Posted by Sebastian Schelter <ss...@apache.org>.

On 13.07.2011 08:12, Salil Apte wrote:

> Is an item based approached preferable when considering speed?

Itembased approaches are said to scale better as there exist less items 
than users in most usecases and item-item-similarities can be precomputed.

But before you change your setup we should do a little more 
investigation. Are you using ReloadFromJDBCDataModel? If you aren't, 
give that a try, it will hold your preference data in RAM which 
shouldn't be a big problem with 61k preferences and should reasonably 
speed up your calculations.

--sebastian

Re: Connection Pooling

Posted by Salil Apte <sa...@offlinelabs.com>.

I'm not sure I'm out of memory per se. It just feels like I'm
incurring a huge cost going out to the DB row-by-row when the system
could be doing a batch SELECT from the DB and calculating/caching
locally. But really, I'm not sure.

Is a UserSimilarity approach expected to be this slow with the amount
of data I have? Is an item based approached preferable when
considering speed?

On Tue, Jul 12, 2011 at 11:00 PM, Lance Norskog <go...@gmail.com> wrote:
> Mysql has some quirk about reading in batches. See this in the Solr
> wiki about it:
>
> http://wiki.apache.org/solr/DataImportHandlerFaq?highlight=%28mysql%29#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F
>
> I don't know how to set special properties in the JDBC data source.
>
> On Tue, Jul 12, 2011 at 10:09 PM, Salil Apte <sa...@offlinelabs.com> wrote:
>> Oh yea, at runtime, I'm getting back a BasicDataSource object for my
>> DataSource. Is that correct?
>>
>> On Tue, Jul 12, 2011 at 9:59 PM, Salil Apte <sa...@offlinelabs.com> wrote:
>>> So I started actually looking at performance today and it is pretty
>>> horrendous. I've got about 61,000 rows in my database which I'm
>>> assuming isn't *that* many rows. But recommendations are taking > 20
>>> seconds. Is there some way to ensure pooling is turned on? What else
>>> is a big driver for performance? My tables are setup so that I have a
>>> multiple index (for uniqueness) for <user_id, item_id> pairs. That
>>> way, there cannot be two entries with the same <user_id, item_id>. I'm
>>> not sure where to go from here.
>>>
>>> Thanks for the help!
>>>
>>> On Tue, Jul 12, 2011 at 12:47 AM, Sean Owen <sr...@gmail.com> wrote:
>>>> You can ignore it. It just doesn't know for sure you have a pool.
>>>> I believe I have even removed this in a recent refactoring.
>>>>
>>>> On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <sa...@offlinelabs.com> wrote:
>>>>
>>>>> So I keep getting this warning from either Mahout or the server (I'm
>>>>> guessing the former):
>>>>>
>>>>> WARNING: You are not using ConnectionPoolDataSource. Make sure your
>>>>> DataSource pools connections to the database itself, or database
>>>>> performance will be severely reduced.
>>>>>
>>>>> I'm not really sure why this is happening. I have the following
>>>>> resource in my webapp's context.xml file. Is there anything else I
>>>>> need to do enable connection pooling with a  JNDI resource?
>>>>>
>>>>> <Resource name="jdbc/offline-local" auth="Container"
>>>>> type="javax.sql.DataSource" username="root" password=""
>>>>> driverClassName="com.mysql.jdbc.Driver"
>>>>>
>>>>> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
>>>>> validationQuery="select 1" maxActive="16" maxIdle="4"
>>>>> removeAbandoned="true" logAbandoned="true" />
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>> -Salil
>>>>>
>>>>
>>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>

Re: Connection Pooling

Posted by Lance Norskog <go...@gmail.com>.

Mysql has some quirk about reading in batches. See this in the Solr
wiki about it:

http://wiki.apache.org/solr/DataImportHandlerFaq?highlight=%28mysql%29#I.27m_using_DataImportHandler_with_a_MySQL_database._My_table_is_huge_and_DataImportHandler_is_going_out_of_memory._Why_does_DataImportHandler_bring_everything_to_memory.3F

I don't know how to set special properties in the JDBC data source.

On Tue, Jul 12, 2011 at 10:09 PM, Salil Apte <sa...@offlinelabs.com> wrote:
> Oh yea, at runtime, I'm getting back a BasicDataSource object for my
> DataSource. Is that correct?
>
> On Tue, Jul 12, 2011 at 9:59 PM, Salil Apte <sa...@offlinelabs.com> wrote:
>> So I started actually looking at performance today and it is pretty
>> horrendous. I've got about 61,000 rows in my database which I'm
>> assuming isn't *that* many rows. But recommendations are taking > 20
>> seconds. Is there some way to ensure pooling is turned on? What else
>> is a big driver for performance? My tables are setup so that I have a
>> multiple index (for uniqueness) for <user_id, item_id> pairs. That
>> way, there cannot be two entries with the same <user_id, item_id>. I'm
>> not sure where to go from here.
>>
>> Thanks for the help!
>>
>> On Tue, Jul 12, 2011 at 12:47 AM, Sean Owen <sr...@gmail.com> wrote:
>>> You can ignore it. It just doesn't know for sure you have a pool.
>>> I believe I have even removed this in a recent refactoring.
>>>
>>> On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <sa...@offlinelabs.com> wrote:
>>>
>>>> So I keep getting this warning from either Mahout or the server (I'm
>>>> guessing the former):
>>>>
>>>> WARNING: You are not using ConnectionPoolDataSource. Make sure your
>>>> DataSource pools connections to the database itself, or database
>>>> performance will be severely reduced.
>>>>
>>>> I'm not really sure why this is happening. I have the following
>>>> resource in my webapp's context.xml file. Is there anything else I
>>>> need to do enable connection pooling with a  JNDI resource?
>>>>
>>>> <Resource name="jdbc/offline-local" auth="Container"
>>>> type="javax.sql.DataSource" username="root" password=""
>>>> driverClassName="com.mysql.jdbc.Driver"
>>>>
>>>> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
>>>> validationQuery="select 1" maxActive="16" maxIdle="4"
>>>> removeAbandoned="true" logAbandoned="true" />
>>>>
>>>> Thanks in advance.
>>>>
>>>> -Salil
>>>>
>>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Connection Pooling

Posted by Salil Apte <sa...@offlinelabs.com>.

Oh yea, at runtime, I'm getting back a BasicDataSource object for my
DataSource. Is that correct?

On Tue, Jul 12, 2011 at 9:59 PM, Salil Apte <sa...@offlinelabs.com> wrote:
> So I started actually looking at performance today and it is pretty
> horrendous. I've got about 61,000 rows in my database which I'm
> assuming isn't *that* many rows. But recommendations are taking > 20
> seconds. Is there some way to ensure pooling is turned on? What else
> is a big driver for performance? My tables are setup so that I have a
> multiple index (for uniqueness) for <user_id, item_id> pairs. That
> way, there cannot be two entries with the same <user_id, item_id>. I'm
> not sure where to go from here.
>
> Thanks for the help!
>
> On Tue, Jul 12, 2011 at 12:47 AM, Sean Owen <sr...@gmail.com> wrote:
>> You can ignore it. It just doesn't know for sure you have a pool.
>> I believe I have even removed this in a recent refactoring.
>>
>> On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <sa...@offlinelabs.com> wrote:
>>
>>> So I keep getting this warning from either Mahout or the server (I'm
>>> guessing the former):
>>>
>>> WARNING: You are not using ConnectionPoolDataSource. Make sure your
>>> DataSource pools connections to the database itself, or database
>>> performance will be severely reduced.
>>>
>>> I'm not really sure why this is happening. I have the following
>>> resource in my webapp's context.xml file. Is there anything else I
>>> need to do enable connection pooling with a  JNDI resource?
>>>
>>> <Resource name="jdbc/offline-local" auth="Container"
>>> type="javax.sql.DataSource" username="root" password=""
>>> driverClassName="com.mysql.jdbc.Driver"
>>>
>>> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
>>> validationQuery="select 1" maxActive="16" maxIdle="4"
>>> removeAbandoned="true" logAbandoned="true" />
>>>
>>> Thanks in advance.
>>>
>>> -Salil
>>>
>>
>

Re: Connection Pooling

Posted by Salil Apte <sa...@offlinelabs.com>.

So I started actually looking at performance today and it is pretty
horrendous. I've got about 61,000 rows in my database which I'm
assuming isn't *that* many rows. But recommendations are taking > 20
seconds. Is there some way to ensure pooling is turned on? What else
is a big driver for performance? My tables are setup so that I have a
multiple index (for uniqueness) for <user_id, item_id> pairs. That
way, there cannot be two entries with the same <user_id, item_id>. I'm
not sure where to go from here.

Thanks for the help!

On Tue, Jul 12, 2011 at 12:47 AM, Sean Owen <sr...@gmail.com> wrote:
> You can ignore it. It just doesn't know for sure you have a pool.
> I believe I have even removed this in a recent refactoring.
>
> On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <sa...@offlinelabs.com> wrote:
>
>> So I keep getting this warning from either Mahout or the server (I'm
>> guessing the former):
>>
>> WARNING: You are not using ConnectionPoolDataSource. Make sure your
>> DataSource pools connections to the database itself, or database
>> performance will be severely reduced.
>>
>> I'm not really sure why this is happening. I have the following
>> resource in my webapp's context.xml file. Is there anything else I
>> need to do enable connection pooling with a  JNDI resource?
>>
>> <Resource name="jdbc/offline-local" auth="Container"
>> type="javax.sql.DataSource" username="root" password=""
>> driverClassName="com.mysql.jdbc.Driver"
>>
>> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
>> validationQuery="select 1" maxActive="16" maxIdle="4"
>> removeAbandoned="true" logAbandoned="true" />
>>
>> Thanks in advance.
>>
>> -Salil
>>
>

Re: Connection Pooling

Posted by Sean Owen <sr...@gmail.com>.

You can ignore it. It just doesn't know for sure you have a pool.
I believe I have even removed this in a recent refactoring.

On Tue, Jul 12, 2011 at 2:21 AM, Salil Apte <sa...@offlinelabs.com> wrote:

> So I keep getting this warning from either Mahout or the server (I'm
> guessing the former):
>
> WARNING: You are not using ConnectionPoolDataSource. Make sure your
> DataSource pools connections to the database itself, or database
> performance will be severely reduced.
>
> I'm not really sure why this is happening. I have the following
> resource in my webapp's context.xml file. Is there anything else I
> need to do enable connection pooling with a  JNDI resource?
>
> <Resource name="jdbc/offline-local" auth="Container"
> type="javax.sql.DataSource" username="root" password=""
> driverClassName="com.mysql.jdbc.Driver"
>
> url="jdbc:mysql://localhost:3306/offlinedevel?autoReconnect=true&amp;cachePreparedStatements=true&amp;cachePrepStmts=true&amp;cacheResultSetMetadata=true&amp;alwaysSendSetIsolation=false&amp;elideSetAutoCommits=true"
> validationQuery="select 1" maxActive="16" maxIdle="4"
> removeAbandoned="true" logAbandoned="true" />
>
> Thanks in advance.
>
> -Salil
>