Tuesday, September 15, 2009

Hex-editing your GPFS Terabytes

I remember the old times where you could find a those geek guys that would use a hex editor to "patch" your favorite game and become bullet-proof or with 99...9 (still counting) lives or ... do what ever you wanted in order to win. At that time i was sure that hex editors are powerful to save you from a "disaster" but i couldn't think what a disaster can be.

We are using GPFS as network file system for our clusters and except from dummy scratch space for MPI jobs it is also used for some local user's home directory. A local team need to expand their GPFS filesystem so we had to add a few disks to our array. The procedure sounded trivial, adding the new disks to the array, create a new logical volume and finally add the new raw device to GPFS filesystem.

But of course something went wrong. The new volume was about 10TB in size which due to a GPFS limitation we had to partition in at least 2 partitions. Easy work via parted but what happens when parted (for a reason still unknown) "modifies" the partition table of other logical volume which is part of the GPFS filesystem as a whole (without partition table).

Well the parted simply ruins the first sectors of a GPFS NSD which means it ruins all the valuable information (Disk ids and NSD ids as well as the filesystem definition) from this disk. The users report "We are receiving 'Input/Output error' when using the X file" and everything gets worse and worse.

Fortunately there IS a solution to this disaster. Although we couldn't find any official IBM documentation on this (apart from some posts in GPFS's forum), there is a way to recover from this situation. What you need is a hex editor, the famous "dd" and a lot of patience.

First copy the sector 8 from each disk within the GPFS filesystem. This sector is the File System Descriptor and it is common on all disks. Next we have to recover sector 2 and sector 1. Sector 2 is GPFS disk identifier (know also as the NSD-ID). Finally the sector 1 contains information about the disk which is called the disk descriptor.

Due to legal reasons i'm not sure if i'm allowed to reveal more information on how to do this but studying carefully the sectors starting from 8 and going to 2 and then 1 you are able to recover your FS.

Monday, September 14, 2009

Big brother secures our transactions?

I'm at Germany for a meeting and i found out that i was out of money....
Well i thought this is easy, let's visit an ATM and get some. I went at the ATM after a few random clicks i was able to find the "English" button and i requested my money.
A few more clicks to find the "OK" button at the machine and there i had my money.
After a few minutes i received a phone call. The person introduced to me as member of the security incident team of my bank. Asked me if i got these money from Germany or if my card was stolen!
In one way i felt safe but ... do they really monitor what i do all the time???
I should also mention that all this happened at 3 a.m. Monday morning.

Friday, July 17, 2009

Bye bye SL3! Bye bye gLite 3.0...

The last piece of gLite 3.0 node was decommissioned this week with the shutdown of node001.grid.auth.gr.

This node was the sBDII/lcg-CE for GR-01-AUTH for long time now and was serving as Torque server for local PBS queues.

A new node (XEN guest node with SL4-x86_64 and gLite 3.1) was setup in April to take over this task but the migration of the local users/queues was postponed till July. The node has already processed tens of thousands jobs.

With this migration we finally achieved the milestone where ALL our Grid nodes are controlled by our Quattor installation.

Wednesday, July 15, 2009

Semaphore limits...

Living in a world of powerful Worker Nodes with many cores per node and many GBs of RAM i thought that running "big" jobs won't be a problem.

A strange call reached our helpdesk where a user was able to submit jobs to multiple 2-core WNs but was but was unable to submit a job to a single x-core node (where x > 6). After some debugging the errors that the job was getting was due to a semaphores limit.

In specific the job needed many semaphore arrays (21 per core) where only 128 were available (on our SL4 WNs) and the following error kept appearing:
p4_error: semget failed for setnum: 0

The solution came via sysctl where one can set these limits via the kernel.sem parameter:
# sysctl kernel.sem
kernel.sem = 250 32000 32 128

The limit we were hitting is the last number which is the SEMMNI (how many semaphore arrays can be allocated). Using the following command we were able to adjust this. The magic number we choosed was "512":
#/sbin/sysctl -w kernel.sem="250 32000 32 512"

Tuesday, July 7, 2009

Migrating to gmail...

It's been long time since my last post... This post is about a migration i'm trying for my mailbox as lately i'm experiencing some issues with my primary mail account...

It's a strange situation because although my mailbox was migrated to another imap server, which clearly puts the blame at the server, i was not the only one who was migrated but i was the only one that experienced issues. (Using same OS and imap client software).

At the beginning the issue was due to the postfix's default max connections per IP given that Mail.app at os X opens many connections in parallel.

After raising this limit things were better but again it wasn't as it was.... So i though why not outsourcing the imap server work at Google? This is something that a colleague is already doing. I didn't want to migrate all my mails at gmail at once thus i just added the following to procmailrc at my account at our mailbox in order to copy all messages to my gmail account:
! myemail@gmail.com

That did the trick and i soon realized how world would be without mail filters...
I started to add filters and "labels" (i really loved gmail labels! MUCH MUCH better than imap folders) and i realized that there is no way to filter my mails using non-standard headers (i did hard work on my imap client to find the best "custom" mail header to categorize my email sources and now this is unusable :( ).

Anyway the feeling after 2 days using gmail in comparison to my primary email is that gmail is much much faster but the spam filtering is not as "educated" as the one i was used to. Of course as the anti-SPAM software runs before procmail i can rely on both Google's and my previous mail provider anti-SPAM assertions.

Another very interesting comment that i have about gmail is that it can find which mails have arrived more than once for my email account (i.e. when i get a reply from a mailing list and the sender is using both my email and mailing list as recipients) and gives me only one of them. I thought that i was losing mails at first but actually all the information is preserved as the gmail recognizes both sources and if i have a filter that adds a label to each source the the mail will have all the labels.

Sunday, June 28, 2009

You can do the DB scheme work online!

I'm working on a project with colleagues from other institutes. Within this project we decided to some work on a client-server model application with a database back-end.

We had many mail exchanges, nice figures to describe workflows, phone meetings, video conference meetings but it was time to start doing some work.

One first thing we had to agree on was the DB scheme we are going to work on and i was surprised by a tool that one of the colleagues used to give us his SQL model. The tool is called wwwsqldesigner and it is open-source. Of course there is a demo installation to use if you don't want to install it your self.

What i liked most is that you are able to get your design in XML format. Then you can send the XML file to the rest developing team who can upload it again either to their local installation or the demo one, do their changes and publish a new version etc... Of course you are able to save your model at the server and then others can just select it from a list in order to view it and change it.

I liked it so much that i'm thinking of installing it locally and upload our local project's schemes.

Friday, June 26, 2009

I don't want all these mails on my iPhone!

A week ago i was writing on this blog about about the iPhone 3.0 OS update and as a disadvantage i had that there is no mail filters yet.

I really hate it when my laptop is not connected to the imap server (in order to filter all my mails) and i get all these SPAM and mailing list mails on my iPhone. It makes it totally useless.

The first thought i had was to use procmail which seem to be powerful with one "show-stopper" for me. It requires to have access to your mailbox server in order to upload your procmail configuration.

Then i thought to search for a simple client that will connect to IMAP, filter my mails and then logout. The client that i found is imapfilter which is actually developed a Greek guy!

Its operation does EXACTLY what i want. You feed it with an easy to read configuration (LUA):

-- Options --

options.timeout = 120
options.subscribe = true

-- Accounts --

-- Connects to "imap1.mail.server", as user "user1" with "secret1" as password.
account1 = {
server = 'imap1.mail.server',
username = 'user1',
password = 'secret1',
ssl = 'ssl3',

-- Filters --

spam = {
'header "X-DSPAM-Result" "Spam"',

-- Commands --

-- Get status (messages, recent, unseen) of the mailbox.
-- check(account1, 'INBOX')

-- Move messages between mailboxes at the same account.
results = match(account1, 'INBOX', spam)
move(account1, 'INBOX', account1, 'SPAM', results)

I just setup a cronjob for this and works perfect!

Thursday, June 25, 2009

Lets cut some (gLite) Hydra heads

You may be familiar with the Lernaean Hydra. The complexity of this beast was perfect to name a gLite service that is used to encrypt/decrypt data.

This service is based on the "Shamir's Secret Sharing" algorithm where a the encryption/decryption key is divided to X parts and Y parts of them (where Y <= X) are needed to reconstruct the key.

A requirement for data encryption was raised sometime in the previous years and we had deployed 3 gLite Hydra servers (each one will hold a part of every user's key and only 2 of them would be required for encryption/decryption operations) with clear geographic and administration separation.

A software update to one of them led to a "funny" situation where no new keys were able to be registered and no old ones could be unregistered. (These are the only operations that require all the servers to be up and responding). The tool that was provided to (re)configure the service had the very interesting operation of dropping every DB table and re-create them using the predefined schema.

A re-configuration of the updated server gave us a "everything just doesn't work" state, which we had to resolve under user community pressure. Note that if the service just didn't work, users may have lost lots of human/cpu hours because they are just able to get an encrypted output which they can't decrypt.

Analysis to the DB at another gLite Hydra instance gave us an idea of how this service stores its data. Due to luck the actual keys were not deleted by the configuration script but only the relation between users and keys was deleted.

A copy of the user database and some reverse engineering at the relation DB at a working Hydra instance was enough to recover the service with (almost?) no cost.

That reminded me that common Murphy's law where the backup you have is either unreadable at the time you needed or was last updated BEFORE your critical data was stored.

Saturday, June 20, 2009

OpenMP jobs on Grid? (The LCG-CE - PBS approach)

There was a user support requirement for OpenMP jobs in Grid. OpenMP is a shared-memory implementation which means that all processes must run on the same box.

Well this can easily achieved at PBS side by using the directive:
#PBS -l nodes=1:ppn=X

Where "X" is the number of requested processes. But the main issue is HOW can we get this requirement based on what WMS gives to us on submission?

After googling this, the "correct" solution can only be achieved at CREAM CE where users can select a number of requirements that will not only be used for job matching process at WMS but also passed to the CE. You can find more info on this here.

LCG CEs on the other hand are only getting a poor RSL which doesn't carry almost any of the user's requirements. So lets get in LCG CE's internals...

First a job reaches the globus-gatekeeper. At this phase user's proxy is matched to a pool account. GateKeeper's task is to authenticate the user and the job and pass it to the globus job manager.

The globus job manager uses the GRAM protocol to report the job state and submits the job to the globus-job-manager-marshal which is using a perl module to talk to the relevant queuing system.

This perl module is responsible for the creation of the job (shell script) that will be submitted to the PBS server. In this module the CpuNumber requirement is translated by default to:
#PBS -l nodes=X

So this is the part we need to change in order to create OpenMP jobs. The next issue now is how we find out if user has asked for OpenMP job. I've noticed that the JDL option "Environment" is passed to the job executable that will be submitted thus a definition like the following:
Environment = {"OPENMP=true"};
can do the trick.

The whole above approach works but for sure needs a lot of work but as proof of concept is more than ok...
In the (near) future i would like to test the CREAM CE which, as i said before, has a more clear way to support requirements from JDLs using the CeForwardParameters definition.

Friday, June 19, 2009

Coding on multiple SVN repositories...

As a developer i use to use repositories (mainly SVN) for code versioning and to interact with other developers.

Involvement in developments from other teams within a project usually require that the (production) repository is hosted somewhere centrally. This give us the advantage of having one code-base where all developers to work. The main disadvantage of this implementation though is that usually developers doesn't commit till they have something really stable and working.

Another disadvantage is that it is not clear to someone outside your mind to find out on what you are working on (and usually "manager" guys need to do so).

I was proposed to use a local repository for every developing i do where it would be easy to have "every change" commits and commit stable versions to the central repositories. This will give us both frequent commits (thus clear history view) and other are able to see on what you are working and probably comment on this work. At first i was highly against this... It's clear that it adds a lot of additional work without giving us many clear advantages.

As this was a "manager's" proposition i had to try it. The initial thought was to work on our local repository and then, when i have something stable, take a diff since last sync of the repositories to apply it to the remote (central repository).

But ... thinking on this again, is it the "svn tagging" procedure having different server for trunk and tags?

An implementation to test:
  1. Creation of a local test repository with trunk and tag trees
  2. Create a new repository to serve as "the remote central repository"
  3. Create a new tag at the first repository which will have an svn:external link to "the remote central repository"
  4. Start tagging as normal on the local test repository but always at the same tag.

iPhone 3.0 OS is here...

It took me about 8 hours "check for updates" clicking and finally around 8pm on Wednesday it was available!

For the 230 MB download, the first 200MB were downloaded within a few seconds while the last 30 took about half an hour! (was one of the first downloaders?)

About 24 hours experience on iPhone 3.0 and i think that finally there is all the missing "phone" functionality.

Thumps up:
  • I can write Greek!
  • iPod shake! (shake to shuffle)
  • "Search iPhone" or the iPhone spotlight. Everything is about a text box away from your screen.
  • mms (never used it before but it was a pity that iPhone was unable to do something that 30 euro mobiles do)
Thumps down:
  • Some apps 3rd party report "compatibility errors" (fortunately without any (visible) malfunction).
  • Still no background applications (no skype on background)
  • No email filters (should I consider procmail?)