wakkerma@debian.org
Configuration management is quickly becoming a very important issue. Having programs which do cool stuff is great, but we need to store their configuration as well. We see more and more different configuration systems being introduced all the time, which is not very practical. This text introduces a general configuration management system which flexible enough to be used for all kinds of applications.
The configuration management system can be divided into a couple of modules, each with a clear function. One of the important aspects of this design is that is uses a virtual database, which consists of one or mode real databases. This gives an amazing amount of flexibility since it allows you to combine very different methods of data access in a single system. Each real database is implemented using a module that does the real work. Here is a short overview of the modules that implement the database:
the database library. This module defines the interface to the modules that implement the real databases, and presents the virtual database to the user
driver modules, which implement the real work
a trigger daemon, which handles triggers registered by the administrator or by users
All configuration information is stored in what I call the configuration space. This is a database with a special design which resembles the method we look at configuration information. This is done by defining a hierarchy of information. Each package receives its own space in the hierarchy. Each package is free to use a flat space, or divide it's space further into sub-hierarchies. If multiple packages share a common purpose they may use a shared toplevel hierarchy, preferably with the same name as a shared (virtual) package name (for example, both mutt and elm can use mail-reader, strn and nn could use news-reader). This shared tree can also be used as a default, ie a variable news-reader/nntpserver can be used by strn if strn/nntpserver does not exist.
Multiple types of variables can be stored in the configuration space. A preliminary list of types is: strings, numbers, lists, hostnames, IP addresses. Each variable can have meta-data associated with it for special purposes. The minimum meta-data associated with a variable is: long and short description, type, default value and an "isdefault" flag. The "isdefault" flag states if a variable has been changed from its default value. This can be used when upgrading a package to check if the user has changed the default, or it's safe to change it to a new default. This gives us the same result as the md5sum-checking dpkg does for conffiles, but on a much finer-grained level (per variable instead of per file).
The configuration may be spread across multiple databases. We use a virtual database to represent these databases as one big database. Each part of the virtual database is handled by one or more database modules.
Occasionally it is useful to use multiple modules to handle a single part of the database and choose which module to use at runtime (for examples backup-databases where if a local copy fails a central database is used). To do this it is possible to stack modules and give some criteria about how to choose which modules to use. (PAM is an example of how this can be implemented).
For utilities there are two ways to configure the modules: either programically by registering them at runtime with the database library, or load a configuration file which specifies the information. For each module we define the real databases, their entry-points in the virtual database and other optional information. Here is an example of a configuration file:
# First define all databases we use # Use the stack-driver to implement a search-path. In this case # we try each driver sequentially until we find a variable driver "stack" { # Some global options start "/config"; # Where to `mount' the driver entry { when "always"; # Always try this driver driver "Orcale" { # Company-wide Oracle database instance "Config"; # Oracle-specific parameters user "config" password "config"; root "/common/$(ARCH)/"; # Starting-point for database access }; }; entry { when "notfound"; # Use when nothing has been found yet driver "LDAP" { # Department-wide LDAP database driver "LDAP"; # Use an LDAP database root "/config/$(ARCH); # Starting-point for database access }; }; }; driver "DHCP"" { # DHCP for host-specific config start "/network"; # Where to `mount' this database };
(This format is heavily based on the bind named.conf format: it's quite easy to parse and very flexible. It can be any other kind of format as well). We start by defining the databases to use. Each database has a driver to use and a root from which we start looking. Variables such as $(HOSTNAME) and $(ARCH) can be used here. Each driver may also add other variables, like instance, user and password for the Oracle-example given here. Some database may have mini-drivers, like the DHCP-database: they can define only a couple of variables like IP-address, hostname, etc. and return a not-found error on all other requests.
Triggers are actions that can be invoked when a certain change is made in the database. These can be very useful, for example a running program can use it to detect when it should reload its configuration. To support transactions there is a trigger daemon with which all triggers are registered. Every time a change is made in the database the library reports the change to this daemon. The daemon will then check if a trigger has been registerered for this change, check if the user who registered the trigger is allowed to see this change and then invokes the action.
When we are dealing with a remote database the trigger daemon will simply register the trigger with the remove trigger daemon when it is registered. When a trigger daemon notes an action for a remote trigger that should be run it tries to contact the other trigger daemon to tell it perform the action.
For more complex operation transactions are needed. A transaction is a set of operations, which the database will perform together atomically. If one of the operations fail, all actions that have been performed should be undone. To implement this each database-module must be able to lock its real database to ensure no other updates are being done while the transaction is being processed.
Now that we have a database to store information in, we need a method to see what data we have and simple methods to access it. This document describes a system to keep track of all our data.
The problem: we have a database that can store all kinds of exciting information. But now we need to find something, figure out what a variable actually is, etc. This mean we need to keep track of all possible variables.
Another reason we need to keep track of variables is database cleanup: if you remove an application from your system you might want to remove its configuration from the database.
To implement all this we add two new extensions: a list with information about variables (the meta-database), and a list which maps variables onto package names. We won't store information about the actual variables, since that would mean duplicating a lot of information. What we will do is make a list of variable templates, and add the template name in the list of variables for each package.
So, what do we need to store in a variable template? Of course we need a name to identify the template. Secondly a type so we can verify data. Of course a default value is useful as well, and finally we need a description of the variable. We actually use two descriptions: a short one (limited to 50 characters or so) and an extended one.
The extended description may be word-wrapped by the FrontEnd. To make separate paragraphs in it, use . on a line by itself to separate them. Both the description and extended description may have substitutions embeded in them. Ie, ${foo}. These will be expanded when the FrontEnd displays the descriptions.
Using this information it is easy to create frontends with which we can examine the database, since we always know the purpose and owner of a variable. Here is an example:
Template: hostname Type: string Default: debian Description: unqualified hostname for this computer This is the name by which this computer will be known on the network. It has to be a unique name in your domain. Only alphanumeric characters are allowed.
When we create new variables in the database we can now simply specify the package name, the template name and a location in the database.
We encounter a problem if you do things like adding slightly more complex information in the database, for example a printer configuration. Adding a bunch of separate variables each time isn't very practical, so we need a way of grouping them. To do this we use a new template type: a container.
Here is an example:
Template: printer Type: container Description: printer configuration You can configure a printer in here. Template: printer/name Type: String Default: lp Description: print name This is the name of the printer. If your printer has multiple names you can separate them with a `|'. Template: printer/filter Type: String Default: postscript Description: print filter This is the filter through which all print jobs are piped. You can use it to convert jobs into something your printer understands.
If you register a new variable in the database and specify the printer template the system will immediately see it needs to make a folder/directory/whatever with the entries name and filter.
Now that we have defined how to store template information we can easily keep track of all variables in the database. Life is sweet.
Of course applications can use the database and meta-database directly. But there should be a simple system to interact with the user that is simple and modular enough to be used with systems ranging from shell-scripts to Fortran programs. To do this we define a general frontend that can be driven using the simplest and most common form of communication: stdin and stdout.
Using this simple form of communication gives us a great advantage: it becomes easy to change the frontend. That means the user can switch between a console, a graphical or even a web-interface at will.
Besides being able to switch between types of frontends there is another important aspect of a good user interface: user friendliness. We have to account for the fact that some users know more then others and change the information we show or ask from the user. We do this by giving everything a priority and giving the user control over what kind of questions he wants to see. Experts can request to see everything, while novices get the option of only seeing only important questions. Finally there is an option to simply skip all questions, so it becomes possible to do automatic configuration using default values or values that are downloaded into the database from a remote location. This makes it simple for example to install and manage clusters or lab rooms or do installs for dummies.
This communication between the frontend and the application should be as simple as possible. Since most IO implementations default to line-buffered IO, so we use a simple language where each command is exactly one line. A preliminary list of commands is:
[ This conversion to XML is not done yet. ]
Debian has had an excellent packaging system a long time now. There is one thing missing though: a system to handle the configuration of packages so we don't have to stop the installation every time a package needs some data from the user or wants to show some information.
We want to make a package which does not break older dpkg's, and we want to be able to get the configuration information before the package is unpacked. To do this we add two new files, config and templates, to control.tar.gz of a .deb package. Since all installation-software (apt, dselect, dpkg) download the package before installing it, we can extract this before the package is unpacked. Since older dpkg's will not process the extra files, we can do to things: either create an extra assertion --assert-configmodule in dpkg which is checked in the preinst, or up the version number of the package.
The templates file lists the templates for variables that this package uses. This is done using the format as used in the example in the text on meta-data.
The config-file contains a new element, which I call the configmodule. This is a program that will determine the configuration before the package is unpacked. This means it is run before the preinst, and before the package is unpacked! Unless pre-depends are used, this will mean that the module can only assume the base-system is installed. This is done to make sure that we can use the desired configuration in the preinst if necessary.
How does the configmodule get its information? The configmodule needs a way to retrieve information from the configuration space, ask the user for information if necessary, etc. But we don't want to implement a user interface for each package. To solve this we use a separate frontend as specified in the text on frontends.
In some cases it would be nice for the user to be able to move backwards to a previous question of the configmodule asked (of course, it's always possible to step back to an entirely different configmodule as well). To accomplish this, a configmodule can use the CAPB command to tell the frontend it supports moving backward (ie "CAPB backup"). Any script that does this should check the return value from the GO command to see if the frontend has returned a status code asking it to back up a step. If so, it should back up (ie, jump back to where it asked the configmodule to present the previous block of questions. This obviously will make the configmodule more complicated, and won't be needed in simple cases. The frontend's most likely response if the configmodule indicates it has this capability is to add a "go back" option/button to each prompt it displays.