Tuesday, June 26, 2012

Fabric management redesigned!

People who know me probably know that when it comes to fabric management i'm a fan of Quattor. Quattor is a great tool that can manage nodes from installation part (utilizing pxeboot and kickstart) till fine-tuning service features. While Quattor is a strong tool that could help administering hundreds or even thousands of nodes, it has some weak spots which i'd like to get rid of:
  • Very steep learning curve
  • Some operations may be quite time consuming (i.e. doing errata updates)
  • Use of a custom programming language (PAN) which is usually unknown to even senior admins
  • Inventory of the assets is based on what you describe and not on what exists

The above usually lead to misuse of the Quattor which in turn leads to unmaintained or difficult to maintain templates.
After a very long time of using Quattor and also evaluating other fabric management tools (i.e. puppet, chef) my conclusion is that while they can help admins to scale their infrastructure, they work pretty much against the way that administrators are used to. Admins are used to ssh and do their stuff while these management systems don't know how to interpret these changes so they simply skip them.
So lets move one step back and redesign it!
What if you had something monitoring your systems and had the ability to upload your local changes from the node to the central repository of your configuration?
I'm thinking of a solution which will have the following features:
  • A simple PXE image to inventory your nodes (that way you only need to know the MAC address of your nodes before importing them to the system).
  • PXE and kickstart (or equivalent) configuration to bootstrap nodes
  • Ability to take-over a pre-installed node (i.e. cloud VM)
  • Components (per feature) that will run on the nodes and will:
    1. Identify current configuration
    2. Change the configuration
    3. Prepare a feedback commit if the admin wants to push the local changes on the node to the configuration system
  • A simple tool/daemon that will run on the nodes as a wrapper for the components
  • All the configuration will be based on a tagging system so that each node will have its own tag (to override configuration) and then a sorted list of tags to include (i.e. a tag called web-server and a tag called production would result a node that will be part of the production web-servers)
  • The configuration server would store its data on a DB and timestamp every change. Changes should be easily reverted in case it is needed.
  • The configuration server would provide API interface with read-only access for other tools to consume data (i.e. an dynamic infrastructure monitoring system) and read-write to update the data.

A list of the first components (i.e. before announce alpha release) that i'm thinking of is:
  1. package managing
  2. file generator
  3. daemon handler
  4. user management

And a demo could be the installation of a node from scratch with a couple of users and the ssh service up and running.

So in principal this is going to be yet another fabric management tool but with the addition of feedback from the nodes to the server. From the fabric management tools that i've used i found that Chef is closer to what i'm thinking of and basically because of the "ohai" util that feedbacks information to the server. I'll probably depend on this.

PS: This is just an idea for now so please add comments, implementation will start when time allows. The implementation will be done based on RedHat rules (thus compatible with RedHat Enterprise Linux and its clones) but should be modular enough to be extended to other distributions.


  1. Hi Christos,
    I don't see what is different ot chef (and ohai) and puppet (and facter) to this, you can do the tags per host if you want to at git level.

    1. Hey Steve,
      the feedback mechanism is difference. I may have lost something on chef/puppet (which i'd love to as i'd prefer to have it done by someone else) but what i want is to do is the following (lets use as example the package management).
      Assume that in your manifest/recipe you describe a web server and for some specific reason you want your production webservers to have a specific version of httpd. Both chef and puppet can handle this.

      Next assume that an admin is staging an update to the httpd. The normal chef/puppet/quattor approach for this would be that you create a new tags for this, in this tag put the updated httpd config and then tag the staging node to the updated tag.

      But the normal admin would just login to the node and execute "yum update httpd".

      What i want is to have a method/tool to be able to say afterwards:
      The following changes are normally going to be revered (as in your recipe/manifest you have a specific version defined). Do you want me to create a new tag or add these changes to an existing tag that this node belong?

      This would actually make the fabric management easy to use for admins who may be senior on doing things right but may not have the knowledge/time to make them in the management system itself.

      While package management may be a trivial case, think for example advanced network configuration with bonding, bridges, vlans, aliases and whatever else you can think, wouldn't it be much easier if you created the network config files you want and then import them to the management tool?

      Thanks for your time/comment!

  2. Christos,

    Maybe something like the Project Augeas+some serialization libraries would help this goal?

    Augeas doesn't work at all for the traditional toolkit approach you describe. But it's pretty good at reading existing configurations for in-memory manipulations.

    I'd be happy to collaborate in the serialization portions.