You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crail.apache.org by Jeongyoon Eo <je...@gmail.com> on 2019/06/20 07:40:59 UTC

Cannot allocate memory error when using RDMA

Hi,

I'm trying TeraSort example on Spark-Crail using RDMA by using latest
incubator-crail and disni, crail-spark-io, crail-spark-terasort from
https://github.com/zrlio.

I'm using two machines with Ubuntu 18.04, one for CrailNameNode and the
other for StorageServer.
When running start-crail.sh, following error appears from StorageServer
Crail log.

19/06/20 14:58:42 INFO crail: connected to namenode(s) /172.30.100.4:9060
Exception in thread "main" java.io.IOException: j2c::regMr: ibv_reg_mr
failed: Cannot allocate memory

        at com.ibm.disni.verbs.impl.NativeDispatcher._regMr(Native Method)
        at
com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:91)
        at
com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:36)
        at
org.apache.crail.storage.rdma.RdmaStorageServer.allocateResource(RdmaStorageServer.java:112)
        at
org.apache.crail.storage.StorageServer.main(StorageServer.java:152)

When testing RDMA by C code, ibv_reg_mr succeeded, so I think there might
be some conflict between libdisni.so which Crail uses(or other Crail
components?) and the underlying RDMA libraries.

Is there anyone who experienced this kind of Cannot allocate memory errors?
If so, could you share your troubleshooting story?
Any other help would be great!

Thank you in advance.

- Jeongyoon

Re: Cannot allocate memory error when using RDMA

Posted by Jeongyoon Eo <je...@gmail.com>.
Thank you for your help!
It works like a charm 😄

Best regards,
- Jeongyoon

2019년 6월 20일 (목) 오후 5:01, Jonas Pfefferle <pe...@japf.ch>님이 작성:

> Hi Jeongyoon,
>
>
> This looks like a user limit problem. Can you check if you have "max
> locked
> memory" set to a high enough value or unlimited (use "ulimit -l"). You can
> set memlock in /etc/security/limits.conf e.g.:
>
> *            soft    memlock         unlimited
> *            hard    memlock         unlimited
>
> Regards,
> Jonas
>
>   On Thu, 20 Jun 2019 16:40:59 +0900
>   Jeongyoon Eo <je...@gmail.com> wrote:
> > Hi,
> >
> > I'm trying TeraSort example on Spark-Crail using RDMA by using
> >latest
> > incubator-crail and disni, crail-spark-io, crail-spark-terasort from
> > https://github.com/zrlio.
> >
> > I'm using two machines with Ubuntu 18.04, one for CrailNameNode and
> >the
> > other for StorageServer.
> > When running start-crail.sh, following error appears from
> >StorageServer
> > Crail log.
> >
> > 19/06/20 14:58:42 INFO crail: connected to namenode(s)
> >/172.30.100.4:9060
> > Exception in thread "main" java.io.IOException: j2c::regMr:
> >ibv_reg_mr
> > failed: Cannot allocate memory
> >
> >        at com.ibm.disni.verbs.impl.NativeDispatcher._regMr(Native
> >Method)
> >        at
> > com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:91)
> >        at
> > com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:36)
> >        at
> >
> org.apache.crail.storage.rdma.RdmaStorageServer.allocateResource(RdmaStorageServer.java:112)
> >        at
> > org.apache.crail.storage.StorageServer.main(StorageServer.java:152)
> >
> > When testing RDMA by C code, ibv_reg_mr succeeded, so I think there
> >might
> > be some conflict between libdisni.so which Crail uses(or other Crail
> > components?) and the underlying RDMA libraries.
> >
> > Is there anyone who experienced this kind of Cannot allocate memory
> >errors?
> > If so, could you share your troubleshooting story?
> > Any other help would be great!
> >
> > Thank you in advance.
> >
> > - Jeongyoon
>
>
>

Re: Cannot allocate memory error when using RDMA

Posted by Jonas Pfefferle <pe...@japf.ch>.
Hi Jeongyoon,


This looks like a user limit problem. Can you check if you have "max locked 
memory" set to a high enough value or unlimited (use "ulimit -l"). You can 
set memlock in /etc/security/limits.conf e.g.:

*            soft    memlock         unlimited
*            hard    memlock         unlimited

Regards,
Jonas

  On Thu, 20 Jun 2019 16:40:59 +0900
  Jeongyoon Eo <je...@gmail.com> wrote:
> Hi,
> 
> I'm trying TeraSort example on Spark-Crail using RDMA by using 
>latest
> incubator-crail and disni, crail-spark-io, crail-spark-terasort from
> https://github.com/zrlio.
> 
> I'm using two machines with Ubuntu 18.04, one for CrailNameNode and 
>the
> other for StorageServer.
> When running start-crail.sh, following error appears from 
>StorageServer
> Crail log.
> 
> 19/06/20 14:58:42 INFO crail: connected to namenode(s) 
>/172.30.100.4:9060
> Exception in thread "main" java.io.IOException: j2c::regMr: 
>ibv_reg_mr
> failed: Cannot allocate memory
> 
>        at com.ibm.disni.verbs.impl.NativeDispatcher._regMr(Native 
>Method)
>        at
> com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:91)
>        at
> com.ibm.disni.verbs.impl.NatRegMrCall.execute(NatRegMrCall.java:36)
>        at
> org.apache.crail.storage.rdma.RdmaStorageServer.allocateResource(RdmaStorageServer.java:112)
>        at
> org.apache.crail.storage.StorageServer.main(StorageServer.java:152)
> 
> When testing RDMA by C code, ibv_reg_mr succeeded, so I think there 
>might
> be some conflict between libdisni.so which Crail uses(or other Crail
> components?) and the underlying RDMA libraries.
> 
> Is there anyone who experienced this kind of Cannot allocate memory 
>errors?
> If so, could you share your troubleshooting story?
> Any other help would be great!
> 
> Thank you in advance.
> 
> - Jeongyoon