Friday, July 17, 2009

Bye bye SL3! Bye bye gLite 3.0...

The last piece of gLite 3.0 node was decommissioned this week with the shutdown of node001.grid.auth.gr.

This node was the sBDII/lcg-CE for GR-01-AUTH for long time now and was serving as Torque server for local PBS queues.

A new node (XEN guest node with SL4-x86_64 and gLite 3.1) was setup in April to take over this task but the migration of the local users/queues was postponed till July. The node has already processed tens of thousands jobs.

With this migration we finally achieved the milestone where ALL our Grid nodes are controlled by our Quattor installation.

Wednesday, July 15, 2009

Semaphore limits...

Living in a world of powerful Worker Nodes with many cores per node and many GBs of RAM i thought that running "big" jobs won't be a problem.

A strange call reached our helpdesk where a user was able to submit jobs to multiple 2-core WNs but was but was unable to submit a job to a single x-core node (where x > 6). After some debugging the errors that the job was getting was due to a semaphores limit.

In specific the job needed many semaphore arrays (21 per core) where only 128 were available (on our SL4 WNs) and the following error kept appearing:
p4_error: semget failed for setnum: 0

The solution came via sysctl where one can set these limits via the kernel.sem parameter:
# sysctl kernel.sem
kernel.sem = 250 32000 32 128

The limit we were hitting is the last number which is the SEMMNI (how many semaphore arrays can be allocated). Using the following command we were able to adjust this. The magic number we choosed was "512":
#/sbin/sysctl -w kernel.sem="250 32000 32 512"

Tuesday, July 7, 2009

Migrating to gmail...

It's been long time since my last post... This post is about a migration i'm trying for my mailbox as lately i'm experiencing some issues with my primary mail account...

It's a strange situation because although my mailbox was migrated to another imap server, which clearly puts the blame at the server, i was not the only one who was migrated but i was the only one that experienced issues. (Using same OS and imap client software).

At the beginning the issue was due to the postfix's default max connections per IP given that Mail.app at os X opens many connections in parallel.

After raising this limit things were better but again it wasn't as it was.... So i though why not outsourcing the imap server work at Google? This is something that a colleague is already doing. I didn't want to migrate all my mails at gmail at once thus i just added the following to procmailrc at my account at our mailbox in order to copy all messages to my gmail account:
:0c:
! myemail@gmail.com

That did the trick and i soon realized how world would be without mail filters...
I started to add filters and "labels" (i really loved gmail labels! MUCH MUCH better than imap folders) and i realized that there is no way to filter my mails using non-standard headers (i did hard work on my imap client to find the best "custom" mail header to categorize my email sources and now this is unusable :( ).

Anyway the feeling after 2 days using gmail in comparison to my primary email is that gmail is much much faster but the spam filtering is not as "educated" as the one i was used to. Of course as the anti-SPAM software runs before procmail i can rely on both Google's and my previous mail provider anti-SPAM assertions.

Another very interesting comment that i have about gmail is that it can find which mails have arrived more than once for my email account (i.e. when i get a reply from a mailing list and the sender is using both my email and mailing list as recipients) and gives me only one of them. I thought that i was losing mails at first but actually all the information is preserved as the gmail recognizes both sources and if i have a filter that adds a label to each source the the mail will have all the labels.