Wednesday, July 15, 2009

Semaphore limits...

Living in a world of powerful Worker Nodes with many cores per node and many GBs of RAM i thought that running "big" jobs won't be a problem.

A strange call reached our helpdesk where a user was able to submit jobs to multiple 2-core WNs but was but was unable to submit a job to a single x-core node (where x > 6). After some debugging the errors that the job was getting was due to a semaphores limit.

In specific the job needed many semaphore arrays (21 per core) where only 128 were available (on our SL4 WNs) and the following error kept appearing:
p4_error: semget failed for setnum: 0

The solution came via sysctl where one can set these limits via the kernel.sem parameter:
# sysctl kernel.sem
kernel.sem = 250 32000 32 128

The limit we were hitting is the last number which is the SEMMNI (how many semaphore arrays can be allocated). Using the following command we were able to adjust this. The magic number we choosed was "512":
#/sbin/sysctl -w kernel.sem="250 32000 32 512"

No comments:

Post a Comment