You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Karl Wettin <ka...@gmail.com> on 2008/04/18 22:29:45 UTC

Thread safety

I suppose nothing has to be thread safe, right?



     karl

Re: Maven

Posted by Isabel Drost <ap...@isabel-drost.de>.

On Saturday 19 April 2008, Karl Wettin wrote:
> I.e. create a module call core and mode all the code there.

+1 from me.

-- 
Strategy:	A long-range plan whose merit cannot be evaluated until sometime	
after those creating it have left the organization.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: Maven

Posted by Sean Owen <sr...@gmail.com>.

FWIW I am feeling a +1 on this. It feels like a decent bit of
future-proofing of our directory structure.

On Fri, Apr 18, 2008 at 7:25 PM, Karl Wettin <ka...@gmail.com> wrote:
>  Nothing at all. I just want to:
>
>  svn co http://svn.../trunk trunk
>  cd trunk
>  mkdir core
>  svn mv * core
>  svn commit
>
>  I.e. create a module call core and mode all the code there.
>
>  Future modules would then be placed in trunk root just as core, not as some
> sub directory of the core.
>
>  I just want to support Maveners like my self the use it to set up the
> development environment. It is very handy when if you like me create a new
> copy of the trunk for each patch you work on.
>
>  Official build, test, et c would still be Ant. Some root build.xml would
> have to be produced the day a second module is added.

Re: Maven

Posted by Karl Wettin <ka...@gmail.com>.

> I.e. create a module call core and mode all the code there.
>

I'm glad you're all cool with this.

I'll soon execute:

> svn mkdir https://svn.apache.org/repos/asf/lucene/mahout/trunk/core -m "MAHOUT-17, Use Maven project file system"
> svn mv https://svn.apache.org/repos/asf/lucene/mahout/trunk/build.xml https://svn.apache.org/repos/asf/lucene/mahout/trunk/core -m "MAHOUT-17, Use Maven project file system"
> svn mv https://svn.apache.org/repos/asf/lucene/mahout/trunk/src https://svn.apache.org/repos/asf/lucene/mahout/trunk/core -m "MAHOUT-17, Use Maven project file system"
> svn mv https://svn.apache.org/repos/asf/lucene/mahout/trunk/lib https://svn.apache.org/repos/asf/lucene/mahout/trunk/core -m "MAHOUT-17, Use Maven project file system"




         karl

Re: Maven

Posted by Karl Wettin <ka...@gmail.com>.

Sean Owen skrev:
> (What can I do to prepare? I think my stuff is laid out very very
> standardly and I have a Maven pom.xml even, which I can bring forward.
> I always thought Maven was a small pain to maintain, but, that's
> because I wasn't using it myself and getting value.)

Nothing at all. I just want to:

svn co http://svn.../trunk trunk
cd trunk
mkdir core
svn mv * core
svn commit

I.e. create a module call core and mode all the code there.

Future modules would then be placed in trunk root just as core, not as 
some sub directory of the core.

I just want to support Maveners like my self the use it to set up the 
development environment. It is very handy when if you like me create a 
new copy of the trunk for each patch you work on.

Official build, test, et c would still be Ant. Some root build.xml would 
have to be produced the day a second module is added.


     karl

Re: Maven (was: Thread safety)

Posted by Sean Owen <sr...@gmail.com>.

By "Grant" I mean "Karl".

On Fri, Apr 18, 2008 at 6:45 PM, Sean Owen <sr...@gmail.com> wrote:
>  I have no preference really but give it +0.5 on the grounds that Grant
>  thinks it is a good idea.

Re: Maven (was: Thread safety)

Posted by Ozgur Yilmazel <oz...@gmail.com>.

+1 for the change.

On Sat, Apr 19, 2008 at 8:05 AM, Grant Ingersoll <gs...@apache.org> wrote:
>
>  On Apr 18, 2008, at 6:45 PM, Sean Owen wrote:
>
>
> > (What can I do to prepare? I think my stuff is laid out very very
> > standardly and I have a Maven pom.xml even, which I can bring forward.
> > I always thought Maven was a small pain to maintain, but, that's
> > because I wasn't using it myself and getting value.)
> >
> > I have no preference really but give it +0.5 on the grounds that Grant
> > thinks it is a good idea.
> >
>
>  +1 on the change, but just note it is never just up to me.  I don't have
> any more special powers than anyone else and my +/-1 weighs exactly the same
> as everyone else.
>

Re: Maven (was: Thread safety)

Posted by Grant Ingersoll <gs...@apache.org>.

On Apr 18, 2008, at 6:45 PM, Sean Owen wrote:

> (What can I do to prepare? I think my stuff is laid out very very
> standardly and I have a Maven pom.xml even, which I can bring forward.
> I always thought Maven was a small pain to maintain, but, that's
> because I wasn't using it myself and getting value.)
>
> I have no preference really but give it +0.5 on the grounds that Grant
> thinks it is a good idea.

+1 on the change, but just note it is never just up to me.  I don't  
have any more special powers than anyone else and my +/-1 weighs  
exactly the same as everyone else.

Re: Maven (was: Thread safety)

Posted by Sean Owen <sr...@gmail.com>.

(What can I do to prepare? I think my stuff is laid out very very
standardly and I have a Maven pom.xml even, which I can bring forward.
I always thought Maven was a small pain to maintain, but, that's
because I wasn't using it myself and getting value.)

I have no preference really but give it +0.5 on the grounds that Grant
thinks it is a good idea.

On Fri, Apr 18, 2008 at 6:39 PM, Karl Wettin <ka...@gmail.com> wrote:
> Grant Ingersoll skrev:
>
> > Of course, perhaps, if and when Mahout is a TLP, Taste will be a complete
> self-contained sub project.
> >
>
>  I'm taking this oppertunity to lobby for moving the code base to trunk/core
> so we get a Maven project structure and not Maven modeule structure.
>
>  My arguments are:
>
>  It is a very small change and nothing else have to change.
>
>  We already committed to a Maven file structure and it is silly to not do it
> all the way. It's going to be a bigger hassle to refactor it to a Maven
> project structure if we want to do it in the future, i.e. when we want to
> seperate things in to sub projects, modules, contrib area, et c.
>
>
>  I know I have the powers, but I'm not going to do this change when all the
> other votes have been 0- (if there was such a thing). ;)
>
>
>
>       karl
>
>

Maven (was: Thread safety)

Posted by Karl Wettin <ka...@gmail.com>.

Grant Ingersoll skrev:
> Of course, perhaps, if and when Mahout is a TLP, Taste will be a 
> complete self-contained sub project.

I'm taking this oppertunity to lobby for moving the code base to 
trunk/core so we get a Maven project structure and not Maven modeule 
structure.

My arguments are:

It is a very small change and nothing else have to change.

We already committed to a Maven file structure and it is silly to not do 
it all the way. It's going to be a bigger hassle to refactor it to a 
Maven project structure if we want to do it in the future, i.e. when we 
want to seperate things in to sub projects, modules, contrib area, et c.


I know I have the powers, but I'm not going to do this change when all 
the other votes have been 0- (if there was such a thing). ;)



       karl

Re: Thread safety

Posted by Ted Dunning <te...@gmail.com>.

Given the speedups that can often be achieved by efficient disk access in a
large batch computation, this is more reasonable than it sounds on the face
of it.  The example of updating 1% of the records in a 1TB database
demonstrates how doing 100x more work (rewriting the entire database) can be
done about 100x faster (5-6 hours instead of 30 days).  If you can attain
comparable speedups and if > 0.01% of your audience comes back to view the
recommendations that you have pre-computed, then you win by building in
batch off-line.

Of course, if some clever spark figures out a way to do most of the work in
a batch and then just add water to the dehydrated recommendations in real
time you win even more because recommendations can change in real-time.  I
can't comment further than that.

On Sat, Apr 19, 2008 at 6:12 PM, Sean Owen <sr...@gmail.com> wrote:

> Honestly it is hard to do recommendations in real-time; most
> algorithms don't scale and don't parallelize easily. I've recommended
> to most people to just recompute recommendations offline periodically.
>
>
-- 
ted

Re: Thread safety

Posted by Sean Owen <sr...@gmail.com>.

Honestly it is hard to do recommendations in real-time; most
algorithms don't scale and don't parallelize easily. I've recommended
to most people to just recompute recommendations offline periodically.

Of course anything that's an online recommender can be used offline,
and I don't think the code is or must be designed with one or the
other in mind.

Yes I don't think one can make a real-time online system out of
Hadoop, that's not the idea. I think it can be used to crudely
parallelize offline computation of this sort... which is better than
no parallelization.

And then I think pieces of particular algorithms, like slope-one, can
be very much parallelized.

Bottom-line, I think the existing code can continue to provide on-line
recommendations which could be useful for small- to medium-sized data
sets, and can cleanly support computations in Hadoop. No redesign
should be needed. After the code is committed, let me provide some
examples of what I mean.

Sean

On Sat, Apr 19, 2008 at 6:35 PM, Ian Holsman <li...@holsman.net> wrote:
> Sean Owen wrote:
>
> > Yeah it should be easy and fine to separate the EJB, web service
> > clients further. Beyond that I think it's mostly driven by what we
> > want to achieve, and it sounds like that is Hadoop-ifying it
> > basically.
> >
> >
> >
>  So far, I was always thinking of mahout as a backend process. it would
> produce a file (or two), and that would be sucked up into SOLR or mysql (or
> whatever) that the webapp would make use of. Obviously this is a PITA as you
> would introduce a delay in how long it took before a event gets fed back
> into the system.
>
>  Mainly because we (our developers) know how to scale solr and mysql very
> easily, and making a hadoop cluster into a OLTP thing is completely new to
> us, and I was thinking it was not really designed for 10-30ms response
> times.
>
>
>  Or am I misjudging HDFS? could you run a webserver farm serving lots of
> static files on top of HDFS?
>  --Ian
>

Re: Thread safety

Posted by Ian Holsman <li...@holsman.net>.

Sean Owen wrote:
> Yeah it should be easy and fine to separate the EJB, web service
> clients further. Beyond that I think it's mostly driven by what we
> want to achieve, and it sounds like that is Hadoop-ifying it
> basically.
>
>   
So far, I was always thinking of mahout as a backend process. it would 
produce a file (or two), and that would be sucked up into SOLR or mysql 
(or whatever) that the webapp would make use of. Obviously this is a 
PITA as you would introduce a delay in how long it took before a event 
gets fed back into the system.

Mainly because we (our developers) know how to scale solr and mysql very 
easily, and making a hadoop cluster into a OLTP thing is completely new 
to us, and I was thinking it was not really designed for 10-30ms 
response times.


Or am I misjudging HDFS? could you run a webserver farm serving lots of 
static files on top of HDFS?
--Ian

Re: Thread safety

Posted by Sean Owen <sr...@gmail.com>.

Yeah it should be easy and fine to separate the EJB, web service
clients further. Beyond that I think it's mostly driven by what we
want to achieve, and it sounds like that is Hadoop-ifying it
basically.

So far I think of Hadoop as another client. We could provide a wrapper
that lets you run n instances of the same recommender, each computing
recommendations for a subset of users. This is a crude parallelization
but works for any algorithm or setup, and does provide a way to
(p)recompute all your user's recommendations in a nightly batch
process for example versus online.

(There is also one aspect of SlopeOneRecommender which can be readily
Hadoopified, and that is computing diffs. There the recommender
depends on Hadoop rather than the other way around. Another story.)

I suppose I would like to avoid dismantling the recommender's ability
to act as an active entity providing online recommendations, since
some people want and need that. Of course, anything that can be done
in real-time can support a batch process too, so so far no conflict
with Hadoop. If this begins to conflict with other goals I am sure we
can find ways to restructure the code to meet both needs.

For example: thread-safety comes up. My code is prepared for
multi-threaded access, but when accessed by a Hadoop job it's only
single-threaded. The thread-safeness is unnecessary but doesn't hurt
except to the extent it is overhead, but, nothing like a crticial
block involves synchronization so I really don't imagine this
conflicts with any other goal so far.

Sean

On Fri, Apr 18, 2008 at 5:37 PM, Grant Ingersoll <gs...@apache.org> wrote:
> I think, for Taste, we need to start figuring out how we are going to
> distribute the workload, etc.  My gut says the webapp stuff and the core
> recommender stuff, etc. should be moved to a "contrib" area, but not sure
> just yet on that.  That is, not sure how the EJB, web, stuff fits into the
> core of Mahout.  Will know better once we have the code committed and we can
> get several eyes looking at it.
>
>  Of course, perhaps, if and when Mahout is a TLP, Taste will be a complete
> self-contained sub project.
>
>  -Grant
>
>
>
>  On Apr 18, 2008, at 5:07 PM, Sean Owen wrote:
>
>
> > Thread safe, I think, meaning I might have two users making a request
> > to a web app, which are handled in two threads, simultaneously
> > accessing a Recommender. Stuff like caches need to make sure they
> > don't blow up if two people hit it at once.
> >
> > On Fri, Apr 18, 2008 at 5:02 PM, Ted Dunning <td...@veoh.com> wrote:
> >
> > >
> > >
> > > Thread safe?  Or reentrant?
> > >
> >
>
>
>

Re: Thread safety

Posted by Grant Ingersoll <gs...@apache.org>.

I think, for Taste, we need to start figuring out how we are going to  
distribute the workload, etc.  My gut says the webapp stuff and the  
core recommender stuff, etc. should be moved to a "contrib" area, but  
not sure just yet on that.  That is, not sure how the EJB, web, stuff  
fits into the core of Mahout.  Will know better once we have the code  
committed and we can get several eyes looking at it.

Of course, perhaps, if and when Mahout is a TLP, Taste will be a  
complete self-contained sub project.

-Grant

On Apr 18, 2008, at 5:07 PM, Sean Owen wrote:

> Thread safe, I think, meaning I might have two users making a request
> to a web app, which are handled in two threads, simultaneously
> accessing a Recommender. Stuff like caches need to make sure they
> don't blow up if two people hit it at once.
>
> On Fri, Apr 18, 2008 at 5:02 PM, Ted Dunning <td...@veoh.com>  
> wrote:
>>
>>
>> Thread safe?  Or reentrant?

Re: Thread safety

Posted by Sean Owen <sr...@gmail.com>.

Thread safe, I think, meaning I might have two users making a request
to a web app, which are handled in two threads, simultaneously
accessing a Recommender. Stuff like caches need to make sure they
don't blow up if two people hit it at once.

On Fri, Apr 18, 2008 at 5:02 PM, Ted Dunning <td...@veoh.com> wrote:
>
>
>  Thread safe?  Or reentrant?

Re: Thread safety

Posted by Ted Dunning <td...@veoh.com>.


Thread safe?  Or reentrant?


On 4/18/08 2:01 PM, "Sean Owen" <sr...@gmail.com> wrote:

> My stuff needs to be thread safe in some instances since it's supposed
> to be serving, maybe, multiple online requests at once. Some parts
> just need some basic synchronization; much doesn't.
> 
> On Fri, Apr 18, 2008 at 4:29 PM, Karl Wettin <ka...@gmail.com> wrote:
>> I suppose nothing has to be thread safe, right?
>> 
>> 
>> 
>>     karl
>>

Re: Thread safety

Posted by Sean Owen <sr...@gmail.com>.

My stuff needs to be thread safe in some instances since it's supposed
to be serving, maybe, multiple online requests at once. Some parts
just need some basic synchronization; much doesn't.

On Fri, Apr 18, 2008 at 4:29 PM, Karl Wettin <ka...@gmail.com> wrote:
> I suppose nothing has to be thread safe, right?
>
>
>
>     karl
>

Re: Thread safety

Posted by Ted Dunning <td...@veoh.com>.

I don't think that anything more than re-entrancy is important.


On 4/18/08 1:29 PM, "Karl Wettin" <ka...@gmail.com> wrote:

> I suppose nothing has to be thread safe, right?
> 
> 
> 
>      karl