You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by 施伟伟 <sh...@gmail.com> on 2009/12/04 11:14:14 UTC

deadlock in FastByIDMap.find() ?

Hi,

I found my application based on mahout using 99% CPU after load testing.
There is no actual requests at the moment but the CPU usage kept high.
I generated several thread dump and found one thread may be get stuck an
endless loop in FastByIDMap.find().
I am not sure what is the problem is. I am using customized data model
implemented by myself. It is more or less the same as FileDataModel but
reading data from a different data source.
Please find the detailed info below.

Thanks,
Weiwei

  private int find(long key) {
    int theHashCode = (int) key & 0x7FFFFFFF; // make sure it's positive
    long[] keys = this.keys;
    int hashSize = keys.length;
    int jump = 1 + theHashCode % (hashSize - 2);
    int index = theHashCode % hashSize;
    long currentKey = keys[index];
    while (currentKey != NULL && (currentKey == REMOVED || key !=
currentKey)) {
      if (index < jump) {
        index += hashSize - jump;
      } else {
        index -= jump;
      }
      currentKey = keys[index];
    }
    return index;
  }

Below is the thread dump.

This is the thread that got the lock in
org.apache.mahout.cf.taste.impl.common.Cache.get
"http-10.90.39.156-23869-Processor24" daemon prio=1 tid=0x085bb618
nid=0x4f5f runnable [0x825a8000..0x825a8f30]
    at org.apache.mahout.cf.taste.impl.common.FastMap.find(FastMap.java:113)
    at org.apache.mahout.cf.taste.impl.common.FastMap.get(FastMap.java:123)
    at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:74)
    - locked <0x96eadf98> (a org.apache.mahout.cf.taste.impl.common.FastMap)
    at
org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:118)


This is the thread that is blocked in
org.apache.mahout.cf.taste.impl.common.Cache.get. There are hundreds of such
threads in my thread dump file.
"http-10.90.39.156-23869-Processor272" daemon prio=1 tid=0x08405d78
nid=0x5479 waiting for monitor entry [0x731ef000..0x731f0130]
    at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:73)
    - waiting to lock <0x96eadf98> (a
org.apache.mahout.cf.taste.impl.common.FastMap)
    at
org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:115)

Re: deadlock in FastByIDMap.find() ?

Posted by Sean Owen <sr...@gmail.com>.
So, Weiwei, the issue was identified as relates to FastMap, not
FastByIDMap -- though the implementations are virtually the same.
Also, it does not look like it had to do with an instance you created,
but one used in a Cache. Therefore I am not sure that changing an
instance you use directly has any effect.

Did you have a chance to try the latest code? I added some checks for
bad states, temporarily, that would dump information if something goes
wrong.

No the iterator remove stuff looks OK to me:

  void iteratorRemove(int lastNext) {
    if (lastNext >= values.length) {
      throw new NoSuchElementException();
    }
    if (lastNext < 0) {
      throw new IllegalStateException();
    }
    values[lastNext] = null;
    ((Object[]) keys)[lastNext] = REMOVED;
    numEntries--;
  }


It's not a remove that would fill up the table, it's an add, that's
not accompanied by possible growth. I am still not able to find the
issue by inspection.

But again recall that the issue wasn't in an instance Weiwei used
directly, but one used indirectly in a Cache. Which is all the more
puzzling, since I've even added tests to try to reproduce any problem
in this area I can imagine, and nothing seems to be wrong yet.



On Thu, Dec 10, 2009 at 1:24 AM, Ted Dunning <te...@gmail.com> wrote:
> Sean, is this the necessary hint?
>
> Are there removal markers that don't get handled in subsequent insertions?
>
> 2009/12/9 施伟伟 <sh...@gmail.com>
>
>> BTW, I am using iterator.remove() to remove elements from FastByIDMap &
>> FastIDSet.
>> Will this be a problem?
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: deadlock in FastByIDMap.find() ?

Posted by Ted Dunning <te...@gmail.com>.
Sean, is this the necessary hint?

Are there removal markers that don't get handled in subsequent insertions?

2009/12/9 施伟伟 <sh...@gmail.com>

> BTW, I am using iterator.remove() to remove elements from FastByIDMap &
> FastIDSet.
> Will this be a problem?
>



-- 
Ted Dunning, CTO
DeepDyve

Re: deadlock in FastByIDMap.find() ?

Posted by 施伟伟 <sh...@gmail.com>.
BTW, I am using iterator.remove() to remove elements from FastByIDMap &
FastIDSet.
Will this be a problem?

I have moved the filtering logic into normal Map and Set. Will test if this
problem persists after this change.

Weiwei

2009/12/6 Sean Owen <sr...@gmail.com>

> If he's using the normal class APIs, then shouldn't matter what the
> heck is being done to the class. I can't spot a problem that would
> fill up the hash table.
>
> I don't think he's messing with the implementation or data structures
> directly?
>
> But yes I am still quite puzzled, I haven't seen this, can't reproduce
> it even with some tests that guess at pathological patterns, and can't
> see anything wrong with the logic. I'm stumped on this one.
>
> On Sat, Dec 5, 2009 at 7:54 PM, Ted Dunning <te...@gmail.com> wrote:
> > Is it possible that weiwei is filling the fastMap explicitly?
> >
> > On Sat, Dec 5, 2009 at 1:31 AM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> That all sounds fine, should be no problem with this usage pattern. I
> >> don't think the issue is where you think it is -- The problem seemed
> >> to be with the FastMap used to cache recommendations. Is there
> >> anything interesting about how this is used? I just added a couple
> >> unit tests for the cache to try to reproduce it but didn't see
> >> anything.
> >>
> >> You can remove CachingRecommender to see what happens but that would
> >> be a shame since it's just avoiding the problem. If there's an issue
> >> I'm keen to track it down.
> >>
> >> I can't figure out what the issue is. Maybe I can add some asserts to
> >> the code that could trigger some signs of a problem.
> >>
> >> 2009/12/5 施伟伟 <sh...@gmail.com>:
> >> > Thanks for reply.
> >> > It does not happen all the time. I saw this twice and could not easily
> >> > reproduce it.
> >> > Next time I will try testing it with jpda enabled.
> >> >
> >> > I am not sure if I am doing something wrong in the data model.
> >> > First I fill the FastMap with data read from external data source.
> >> > Then I iterate all entries in FastMap and find a set of items that I
> >> don't
> >> > need.
> >> > Finally I iterate the entries again to remove those items from
> FastMap.
> >> If
> >> > the item set of any userID becomes empty, the userID will be removed
> from
> >> > FastMap too.
> >> >
> >> > Maybe I should do these outside mahout using ordinary Map?
> >> >
> >> > Thanks,
> >> > Weiwei
> >> >
> >>
> >
> >
> >
> > --
> > Ted Dunning, CTO
> > DeepDyve
> >
>

Re: deadlock in FastByIDMap.find() ?

Posted by Sean Owen <sr...@gmail.com>.
If he's using the normal class APIs, then shouldn't matter what the
heck is being done to the class. I can't spot a problem that would
fill up the hash table.

I don't think he's messing with the implementation or data structures directly?

But yes I am still quite puzzled, I haven't seen this, can't reproduce
it even with some tests that guess at pathological patterns, and can't
see anything wrong with the logic. I'm stumped on this one.

On Sat, Dec 5, 2009 at 7:54 PM, Ted Dunning <te...@gmail.com> wrote:
> Is it possible that weiwei is filling the fastMap explicitly?
>
> On Sat, Dec 5, 2009 at 1:31 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> That all sounds fine, should be no problem with this usage pattern. I
>> don't think the issue is where you think it is -- The problem seemed
>> to be with the FastMap used to cache recommendations. Is there
>> anything interesting about how this is used? I just added a couple
>> unit tests for the cache to try to reproduce it but didn't see
>> anything.
>>
>> You can remove CachingRecommender to see what happens but that would
>> be a shame since it's just avoiding the problem. If there's an issue
>> I'm keen to track it down.
>>
>> I can't figure out what the issue is. Maybe I can add some asserts to
>> the code that could trigger some signs of a problem.
>>
>> 2009/12/5 施伟伟 <sh...@gmail.com>:
>> > Thanks for reply.
>> > It does not happen all the time. I saw this twice and could not easily
>> > reproduce it.
>> > Next time I will try testing it with jpda enabled.
>> >
>> > I am not sure if I am doing something wrong in the data model.
>> > First I fill the FastMap with data read from external data source.
>> > Then I iterate all entries in FastMap and find a set of items that I
>> don't
>> > need.
>> > Finally I iterate the entries again to remove those items from FastMap.
>> If
>> > the item set of any userID becomes empty, the userID will be removed from
>> > FastMap too.
>> >
>> > Maybe I should do these outside mahout using ordinary Map?
>> >
>> > Thanks,
>> > Weiwei
>> >
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: deadlock in FastByIDMap.find() ?

Posted by Ted Dunning <te...@gmail.com>.
Is it possible that weiwei is filling the fastMap explicitly?

On Sat, Dec 5, 2009 at 1:31 AM, Sean Owen <sr...@gmail.com> wrote:

> That all sounds fine, should be no problem with this usage pattern. I
> don't think the issue is where you think it is -- The problem seemed
> to be with the FastMap used to cache recommendations. Is there
> anything interesting about how this is used? I just added a couple
> unit tests for the cache to try to reproduce it but didn't see
> anything.
>
> You can remove CachingRecommender to see what happens but that would
> be a shame since it's just avoiding the problem. If there's an issue
> I'm keen to track it down.
>
> I can't figure out what the issue is. Maybe I can add some asserts to
> the code that could trigger some signs of a problem.
>
> 2009/12/5 施伟伟 <sh...@gmail.com>:
> > Thanks for reply.
> > It does not happen all the time. I saw this twice and could not easily
> > reproduce it.
> > Next time I will try testing it with jpda enabled.
> >
> > I am not sure if I am doing something wrong in the data model.
> > First I fill the FastMap with data read from external data source.
> > Then I iterate all entries in FastMap and find a set of items that I
> don't
> > need.
> > Finally I iterate the entries again to remove those items from FastMap.
> If
> > the item set of any userID becomes empty, the userID will be removed from
> > FastMap too.
> >
> > Maybe I should do these outside mahout using ordinary Map?
> >
> > Thanks,
> > Weiwei
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Re: deadlock in FastByIDMap.find() ?

Posted by Sean Owen <sr...@gmail.com>.
That all sounds fine, should be no problem with this usage pattern. I
don't think the issue is where you think it is -- The problem seemed
to be with the FastMap used to cache recommendations. Is there
anything interesting about how this is used? I just added a couple
unit tests for the cache to try to reproduce it but didn't see
anything.

You can remove CachingRecommender to see what happens but that would
be a shame since it's just avoiding the problem. If there's an issue
I'm keen to track it down.

I can't figure out what the issue is. Maybe I can add some asserts to
the code that could trigger some signs of a problem.

2009/12/5 施伟伟 <sh...@gmail.com>:
> Thanks for reply.
> It does not happen all the time. I saw this twice and could not easily
> reproduce it.
> Next time I will try testing it with jpda enabled.
>
> I am not sure if I am doing something wrong in the data model.
> First I fill the FastMap with data read from external data source.
> Then I iterate all entries in FastMap and find a set of items that I don't
> need.
> Finally I iterate the entries again to remove those items from FastMap. If
> the item set of any userID becomes empty, the userID will be removed from
> FastMap too.
>
> Maybe I should do these outside mahout using ordinary Map?
>
> Thanks,
> Weiwei
>

Re: deadlock in FastByIDMap.find() ?

Posted by 施伟伟 <sh...@gmail.com>.
Thanks for reply.
It does not happen all the time. I saw this twice and could not easily
reproduce it.
Next time I will try testing it with jpda enabled.

I am not sure if I am doing something wrong in the data model.
First I fill the FastMap with data read from external data source.
Then I iterate all entries in FastMap and find a set of items that I don't
need.
Finally I iterate the entries again to remove those items from FastMap. If
the item set of any userID becomes empty, the userID will be removed from
FastMap too.

Maybe I should do these outside mahout using ordinary Map?

Thanks,
Weiwei

2009/12/4 Sean Owen <sr...@gmail.com>

> It is not deadlock, since that would result in the app doing nothing,
> not consuming CPU. From the threads you dumped, I only see one waiting
> on a lock; deadlock would involve two threads waiting on locks.
>
> The "infinite loop" sounds more likely, as it would explain what you
> see. It would have to be the while loop. I am having trouble seeing
> how it could fill up with all non-null entries, since it grows when it
> gets pretty full. The "jump" should also touch all entries since the
> size of the hashtable is prime, no matter what the jump is.
>
> Any chance you can debug and look at the contents of the FastMap at
> that point? that would be a clue about the nature of the problem.
>
> On Fri, Dec 4, 2009 at 10:14 AM, 施伟伟 <sh...@gmail.com> wrote:
> > Hi,
> >
> > I found my application based on mahout using 99% CPU after load testing.
> > There is no actual requests at the moment but the CPU usage kept high.
> > I generated several thread dump and found one thread may be get stuck an
> > endless loop in FastByIDMap.find().
> > I am not sure what is the problem is. I am using customized data model
> > implemented by myself. It is more or less the same as FileDataModel but
> > reading data from a different data source.
> > Please find the detailed info below.
> >
> > Thanks,
> > Weiwei
> >
> >  private int find(long key) {
> >    int theHashCode = (int) key & 0x7FFFFFFF; // make sure it's positive
> >    long[] keys = this.keys;
> >    int hashSize = keys.length;
> >    int jump = 1 + theHashCode % (hashSize - 2);
> >    int index = theHashCode % hashSize;
> >    long currentKey = keys[index];
> >    while (currentKey != NULL && (currentKey == REMOVED || key !=
> > currentKey)) {
> >      if (index < jump) {
> >        index += hashSize - jump;
> >      } else {
> >        index -= jump;
> >      }
> >      currentKey = keys[index];
> >    }
> >    return index;
> >  }
> >
> > Below is the thread dump.
> >
> > This is the thread that got the lock in
> > org.apache.mahout.cf.taste.impl.common.Cache.get
> > "http-10.90.39.156-23869-Processor24" daemon prio=1 tid=0x085bb618
> > nid=0x4f5f runnable [0x825a8000..0x825a8f30]
> >    at
> org.apache.mahout.cf.taste.impl.common.FastMap.find(FastMap.java:113)
> >    at
> org.apache.mahout.cf.taste.impl.common.FastMap.get(FastMap.java:123)
> >    at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:74)
> >    - locked <0x96eadf98> (a
> org.apache.mahout.cf.taste.impl.common.FastMap)
> >    at
> >
> org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:118)
> >
> >
> > This is the thread that is blocked in
> > org.apache.mahout.cf.taste.impl.common.Cache.get. There are hundreds of
> such
> > threads in my thread dump file.
> > "http-10.90.39.156-23869-Processor272" daemon prio=1 tid=0x08405d78
> > nid=0x5479 waiting for monitor entry [0x731ef000..0x731f0130]
> >    at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:73)
> >    - waiting to lock <0x96eadf98> (a
> > org.apache.mahout.cf.taste.impl.common.FastMap)
> >    at
> >
> org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:115)
> >
>

Re: deadlock in FastByIDMap.find() ?

Posted by Sean Owen <sr...@gmail.com>.
It is not deadlock, since that would result in the app doing nothing,
not consuming CPU. From the threads you dumped, I only see one waiting
on a lock; deadlock would involve two threads waiting on locks.

The "infinite loop" sounds more likely, as it would explain what you
see. It would have to be the while loop. I am having trouble seeing
how it could fill up with all non-null entries, since it grows when it
gets pretty full. The "jump" should also touch all entries since the
size of the hashtable is prime, no matter what the jump is.

Any chance you can debug and look at the contents of the FastMap at
that point? that would be a clue about the nature of the problem.

On Fri, Dec 4, 2009 at 10:14 AM, 施伟伟 <sh...@gmail.com> wrote:
> Hi,
>
> I found my application based on mahout using 99% CPU after load testing.
> There is no actual requests at the moment but the CPU usage kept high.
> I generated several thread dump and found one thread may be get stuck an
> endless loop in FastByIDMap.find().
> I am not sure what is the problem is. I am using customized data model
> implemented by myself. It is more or less the same as FileDataModel but
> reading data from a different data source.
> Please find the detailed info below.
>
> Thanks,
> Weiwei
>
>  private int find(long key) {
>    int theHashCode = (int) key & 0x7FFFFFFF; // make sure it's positive
>    long[] keys = this.keys;
>    int hashSize = keys.length;
>    int jump = 1 + theHashCode % (hashSize - 2);
>    int index = theHashCode % hashSize;
>    long currentKey = keys[index];
>    while (currentKey != NULL && (currentKey == REMOVED || key !=
> currentKey)) {
>      if (index < jump) {
>        index += hashSize - jump;
>      } else {
>        index -= jump;
>      }
>      currentKey = keys[index];
>    }
>    return index;
>  }
>
> Below is the thread dump.
>
> This is the thread that got the lock in
> org.apache.mahout.cf.taste.impl.common.Cache.get
> "http-10.90.39.156-23869-Processor24" daemon prio=1 tid=0x085bb618
> nid=0x4f5f runnable [0x825a8000..0x825a8f30]
>    at org.apache.mahout.cf.taste.impl.common.FastMap.find(FastMap.java:113)
>    at org.apache.mahout.cf.taste.impl.common.FastMap.get(FastMap.java:123)
>    at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:74)
>    - locked <0x96eadf98> (a org.apache.mahout.cf.taste.impl.common.FastMap)
>    at
> org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:118)
>
>
> This is the thread that is blocked in
> org.apache.mahout.cf.taste.impl.common.Cache.get. There are hundreds of such
> threads in my thread dump file.
> "http-10.90.39.156-23869-Processor272" daemon prio=1 tid=0x08405d78
> nid=0x5479 waiting for monitor entry [0x731ef000..0x731f0130]
>    at org.apache.mahout.cf.taste.impl.common.Cache.get(Cache.java:73)
>    - waiting to lock <0x96eadf98> (a
> org.apache.mahout.cf.taste.impl.common.FastMap)
>    at
> org.apache.mahout.cf.taste.impl.recommender.CachingRecommender.recommend(CachingRecommender.java:115)
>