Monday, July 4, 2011

Developing the proper way

Recently i wanted to submit some patches to the Lustre Filesystem to allow the compilation of the kernel modules (patchless client) for Fedora 14 kernels. I initially posted my patches to the discuss list but i was pointed to the company that maintains the community based Lustre fork (whamcloud).

There i was amazed by the way that code is maintained as it uses all the current best practices that i could think of for developing. Thus i wanted to share them with my undefined readers.

Git repository:
First the code is publicly available on a git repo. People can clone the repository locally, do their modifications and they push them to Gerrit.

Gerrit:
Gerrit is a code reviewing web tool for git repositories which allows people to submit via "git push" patches. Patches are verified and reviewed. In Lustre's case, code verification is done by Jenkins while review is done by at least 2 code reviewers

Jenkins:
Jenkins monitors execution of repeated jobs. In Lustre's case it is used for automated builds of the current master branch + the committed patchset.

Finally when patches are approved they are merged to the master branch.

Tuesday, February 8, 2011

Could you please nslookup my CRL?

How do you publish your CRLs?
The common answer to this is via an http(s) URL or from an LDAP server.

If you hosted a CA that serves Grid community you would find out soon that your CDPs are hit very hard. According to our logs we get more than 250 hits per minute in order to download a file sized in most of the times less than 100 KB.

On the other hand Relying Parties to these CAs many times blame CAs for their outages or unavailability and want to have a reliable caching mechanism of the CRLs distributed in the whole world. Currently caching depends on CA's webserver configuration AND at client's willing to cache things. While big sites use squids to save their bandwidth, you still see many requests from the same origin for the same file getting the same response "cache this for 1 hour" but still trying to get a fresh one out of our server.

So yesterday on my way home i was thinking what if we host all the CRLs on a DNS server?
DNS resolvers have a reliable caching mechanism which is spread usually to one per site, the only limitation is how to store this info. TXT records have limit of 1300 bytes (max recommended size by a recent draft of IETF in order to fit in a single 1500-byte Ethernet package). One way to move is to split them. CRLs in PEM format are in base64 form which means that they have 65 bytes per line (64 data + newline) thus up to 20 lines per TXT record. Will this scale?

Our current infrastructure (as part of EGI) supports ~ 100 CAs. The majority of the CAs are covered by less than 10 TXT records. Less than 10 CAs need up to 40 TXT records while we have about 5 CAs that need more than 100 TXT records with the top reaching the 1770 records!

So yes this should cover the needs of the "small" CAs in terms of CRL size but would be unreliable for the "big" ones.

I'm planning to do some testing on this on my next spare time period and report here with my results (hopefully sooner than my last post).