Config

class snappl.config.Config(configfile=None, clone=None, files_read=None, _ok_to_call=False, _recursed=False)[source]

Bases: object

Interface for yaml config file.

Read a yaml file that might include other yaml files, and provide an interface. The top level of the yaml must be a dict. Only supports dicts, lists, and scalars.

USAGE

  1. Instantiate a config object with:

    confobj = Config.get()
    

    or:

    confobj = Config.get(filename)
    

    in the former case, it will get the default file (see below). IMPORTANT : Do NOT instantiate a config item with config=Config().

    The default file: normally, the default file is specified in the environment variable SNPIT_CONFIG. The first time you call Config.get() without any arguments, it will set the default config to be what it read from the file pointed to by $SNPIT_CONFIG, and return that config. You can subvert this by calling Config.get(filename,setdefault=True). In that case, it will read the file in filename, and set the config there to be the default config that you’ll thereafter get when calling Config.get() without any arguments.

    If the config file has a lot of levels to it, and you are only intersted in a subset, you can do:

    confobj = Config.get( prefix='toplevel.midlevel' )
    

    Thereafter, if you do Config.value('sublevel.value'), it will be equivalent to having done Config.value('toplevel.midlevel.sublevel.value') on an object you get with just Config.get().

  2. (Optional.) You can set things up so that (almost) anything in the config can be overridden on the command line. You must be using argparse for this to work. First, instantiate your argparse.ArgumentParser object and add your own arguments. Next, call the augment_argparse() method of your Config object. Run the parse_args() method of your ArgumentParser, and then pass the return value to the parse_args() method of your Config object. For example:

    from snappl.config import Config
    import argparse
    
    cfg = Config.get()
    
    parser = argparse.ArgumentParser( 'test.py', description='Do things' )
    parser.add_argument( "-m", "--my-argument", help="My argument; there may be more" )
    cfg.augment_argparse( parser )
    args = parser.parse_args()
    cfg.parse_args( args )
    

    Config.augment_argparse will add all of the “leaf node” config options as config arguments, using the fieldspec (see (3) below), replacing “.” with “-”. Exception: if there is a list, it will not work down into the list, but will replace the whole list with as single multi-valued argument. For example, if your config is:

    scalar: val
    
    dict:
      key1: val
      key2:
        subkey1: val
        subkey2: val
      list:
        - val
        - val2
    

    Then you when you call Config.augment_argparse, you will have new arguments:

    --scalar
    --dict-key1
    --dict-key2-subkey1
    --dict-key2-subkey2
    --dict-list   ( with nargs="*" )
    

    You should ignore these; when you call Config.parse_args, it will look for all of them.

  3. Get the value of something in your config with:

    configval = confobj.value( fieldspec )
    

    where fieldspec is just the field for a scalar, or .-separated fields for lists and dicts. For lists, it must be a (0-offset) integer index. For example, if the yaml files includes:

    storage:
      images:
        format: fits
        single_file: false
        name_convention: "{inst_name}_{date}_{time}_{section_id}_{band}_{im_type}_{prov_hash:.6s}"
    

    then confobj.value("storage.images.format") will return "fits". You can also ask configobj.value for higher levels. For example, config.value("storage.images") will return a dictionary:

    { "format": "fits",
      "single_file": False,
      "name_convention": "{inst_name}_{date}_{time}_{section_id}_{band}_{im_type}_{prov_hash:.6s}"
    }
    
  4. Change a config value with:

    confobj.set_value( fieldspec, value )
    

    This only changes it for the running session, it does not affect the YAML files in storage. This will usually not work. To use this, you must have set static to False when calling Config.get. You should use this with great care, and if you’re using it outside of tests, make sure to carefully evaluate your life choices. The whole point of this config system is that it’s an interface to config files, so if you’re making runtime changes, then things are scary.

CONFIG FILES

This class reads yaml files (which can have other included yaml file). The top level structure of a config file must be a dictionary.

When reading a config file, it is processed as follows:

The “current working config” starts as an empty dictionary ({}). When everything is done, it can be a big messy hierarchy. Each key of the top level dictionary can have a value that is a scalar, a list, or a dictionary. The structure is recursive; each element of each list can be a scalar, a list, or a dictionary, and the value associated with each key in each dictionary can itself be a scalar, a list, or a dictionary.

SUBSTITUTION

Any scalar value that has ${something} in it will have ${something} replaced. The replacement will first look to see if in the current tree there is a config value that matches something; if so, then that is replaced. (This is for internal references.) Failing that, it will try to find the environment variable something. If it exists, then that is replaced.

It will iterate through this repeatedly until nothing changes. (Thought required: it’s possible somebody could set up an infinite loop with the right config variables doing this…. Should perhaps put in circular reference detection, but for now there will just be a limit to the number of iterations.) That way, you can have something reference another config option which in turn references an env var, and at the end it will all work.

Note that something must only consist of characters in the range a-z, A-Z, 0-9, and _ (which is standard for environment variables), plus . (for back references). If you’ve named a config option you want to refer to with something else (e.g. using α or é), you’re SOL. Likewise if you have env vars named that way. So, for example, if you have this config file:

top:
  sub1:
    val1: ${HOME}
  thing: ${top.sub1.val1}

And your homedirectory is /home/you, then after config parsing is done, .value(‘top.sub1.val1’) and .value(‘top.thing’) will both return /home/you.

This substitution is done at the end, after all includes have been pulled in, so “forward references” are possible, though I would recommend avoiding using that as you’re just likely to confuse yourself. Keep it simple.

INCLUDES: SPECIAL KEYS

A config file can have several special keys:

preloads
replaceable_preloads
augments
overrides
destructive_appends
appends

The value associated with each key is a list of file paths relative to the directory where this file is found. All of this can get quite complicated, so use a lot of caution when using it. The safest thing to do is to only use preloads and augments.

HOW PARSING THE CONFIGS WORK

To really understand the following, you have to read “DEFINITION OF TERMS” below. Repeating what is said above, and will be said again, all of this is very complicated, so to be safe you may wish to never use any of the special keys other than “preloads” and “augments”.

preloads is a list of files which are read first, in order, to make

a config dictionary (called the “preload config dictionary”). Files later in the list augment the config info from files earlier in the list. This config dictionary is set aside for the time being.

replaceable_preloads is list of files read next, in order, to make

a new config dictionary (called the “working config dictionary”). Files later in the list destructive_append files earlier in the list.

The current file is parsed next. It does a destructive_append on

the working config dictionary (which will just be {} if there aren’t any replaceable_prelaods). Then, the working config dictionary augments the preload config dictionary, and the result is the new working config dictionary.

augments is a list of files read next, in order. Each one

augments the current working dictionary.

overrides is a list of files read next, in order. Each one

overrides the current working dictionary.

destructive_appends is a list of files read next, in order. Each

one does a destructive_append on the current working dictionary.

appends is a list of files read last. Each one appends to the

current working dictionary.

Any file that’s read can itself have the special keys indicating other files to include. If there is any circular inclusion – or, really, if any file is referred to more than once when the whole thing is being parsed – that is an error and will raise an exception. (This isn’t a perfect protection. You can do things with symbolic links to get around this and effectively have one file include another which then includes the first one again. Just don’t do that.)

DEFINITION OF TERMS

Above, the words “destructive_augment”, “augment”, “override”, and “append” were used to describe how to combine information from two different files. Exactly what happens is complicated; if you really want to know, see the source code of:

util/config.py::Config._merge_trees()

Here’s an attempt to define it. For all the operations below, we are trying to combine two values– call them the “left” value and “right” value. Initially, that’s the two dictionaries that are the top level things being combined, but later it might be something else. To compare two values:

augment

This is the safest one. If you try to set a config option that is already set, it will raise an exception. This is what you use if you want to protect yourself against accidentally setting the same option in more than one file and not realizing it, which can lead to all kinds of confusion. Indeed, if you’re worried about this, never use anything other than preloads and augments.

  • If the left and right values have different types, types (scalar vs. list vs. dict), this is an error. This will never happen at the very top level, because both left and right are dictionaries at the top level.

  • If the current item being compared is a list or a scalar, then this is an error; you’re not allowed to replace an already-existing list or scalar config option.

  • If the current item being compared is a dictionary, then merge the two dictionaries. Any keys in the right dictionary that aren’t in the left dictionary are added to the left dictionary with the value from the right dictionary. If a key is already in both dictionaries, then it recurses using the augment method.

append

Generally speaking, stuff in the right value is added to stuff in the left value, but nothing from the left value will be replaced.

  • If the current item being compared have different types (scalar vs. list vs. dict), this is an error. This will never happen at the very top level, because both left and right are dictionaries at the top level.

  • If the item being compared is a list, then then the right list extends the left list. (Literally using list.extend().)

  • If the item being compared is a scalar, then this is an error.

  • If the current item being compared is a dictionary, then merge the two dictionaries. Any keys in the right dictionary that aren’t in the left dictionary are added to the left dictionary with the value from the right dictionary. If a key is already in both dictionaries, then it recurses using the append method.

destructive_append

Works much like augment with the exception that if the item being compared is a scalar, then the right value replaces the left value.

override

Generally speaking, when overriding, the right value replaces the left value, but there are wrinkles.

  • If the current item being compared have different types (scalar vs. list vs. dict), the new (right) value replaces the old (left) value. This will never happen at the very top level, because both left and right are dictionaries at the top level. Be warned: you can wipe out entire trees of config options here! (Imagine if the left tree had a dictionary and the right tree had a scalar.)

  • If the current item being compared is a dictionary, then the dictionaries are merged in exactly the same manner as “append”, with the modification that recursing down into the dictionary passes along the fact that we’re overriding rather than append.

  • If the current item being compared is a list, then the right list replaces the left list. (This could potentially throw away a gigantic hierarchy if lists and dicts and scalars from the left wide, which is as designed.)

  • If the item being compared is a scalar, then the right value replaces the left value.

This can be very confusing, so keeping your config files simple.

WARNINGS

  • Won’t work with any old yaml file. Top level must be a dictionary. Don’t name your dictionary fields as numbers, as the code will then detect it as a list rather than a dict.

  • The yaml parser seems to be parsing “yyyy-mm-dd” strings as a datetime.date; beware.

  • python’s yaml reading of floats is broken. It will read 1e10 as a string, not a float. Write 1.0e+10 to make it work right. There are two things here: the first is the + after e (which, I think is part of the yaml spec, even though we can freely omit that + in Python and C). The second is the decimal point; the YAML spec says it’s not necessary, but python won’t recognize it as a float without it.

Don’t directly instantiate a Config object, call static method Config.get().

Parameters:
  • configfile (str or Path, or None)

  • clone (Config object, default None) –

    If clone is not None, then build the current object as a copy of the config object passed in clone. In this case, the returned config object is modifiable.

    Otherwise, read the configfile and build the object based on that; in this case, the returned config object is not supposed to be modified, and set_value won’t work. (Of course, you can always go and muck about directly with the _data property, but don’t do that!)

Methods Summary

augment_argparse(parser[, path, _dict])

Add arguments to an ArgumentParser for all config values.

delete_field(field[, missing_ok])

Remove a field from the config.

dump_to_dict_for_params([omitkeys, keepkeys])

Dump the config to a dictionary suitable for use in a Provenance params field.

get([configfile, setdefault, prefix, ...])

Returns a Config object.

init([configfile, setdefault])

Initialize configuration globally for process.

parse_args(args[, path, _dict])

Update config options from argparse arguments.

set_value(field, value[, structpass, ...])

Set a value in the config object.

value([field, default, struct])

Get a value from the config structure.

Methods Documentation

augment_argparse(parser, path='', _dict=None)[source]

Add arguments to an ArgumentParser for all config values.

See the Config docstring for instructions on use.

Parameters:
  • parser (ArgumentParser) – The ArgumentParser to which additional arguments should be added.

  • path (str) – Used internally for recursion.

  • _dict (dict) – Used internally for recursion.

delete_field(field, missing_ok=False)[source]

Remove a field from the config.

Use this this with great care.

dump_to_dict_for_params(omitkeys=['system'], keepkeys=None)[source]

Dump the config to a dictionary suitable for use in a Provenance params field.

Specify one of omitkeys or keepkeys.

Parameters:
  • omitkeys (None, or list of str) –

    This is a list of keys to delete from the config before exporting it. (The internal state of the config will not be affected, only what is exported.) Be careful not to list a subkey of a key that’s already earlier in the list, or you’ll get errors.

    By default, the top-level key “system” is deleted, as per the Roman SNPIT standard that this holds all of the (but only the) system-specific config needed to run at a particular place. (system should not include anything that changes the behavior of the code.)

    However, this default is a bit profligate, as it will keep all of the config options for all codes, not just the code you’re running right now. Use with thought.

  • keepkeys (None, or list of str) – This is a list of keeps to keep in the export. Currently, only top-level keys are supported.

Returns:

This is a deep copy of the internal dictionary, so ideally you should be able to do anything you want to it without screwing up the internal config state.

Return type:

dict

static get(configfile=None, setdefault=None, prefix=None, static=True, reread=False, clone=None)[source]

Returns a Config object.

Parameters:
  • configfile (str or Pathlib.Path, default None) – The config file to read (if it hasn’t been read before, or if reread is True). If None, will return the default config context for the current session (which is normally the one in the file pointed to by environment variable SNPIT_CONFIG, but see “setdefault” below. If that env var is needed but not set, then an exception will be raised).

  • setdefault (bool, default None) –

    Avoid use of this, as it is mucking about with global variables and as such can cause confusion. If True, set the Config object read by this method to be the session default config. If False, never set the Config object read by this method to be the session default config. If not specified, which is usually what you want, then if configfile is None, the configfile in SNPIT_CONFIG will be read and set to the be the session default Config; if configfile is not None, read that config file, but don’t make it the session default config.

    Normal usage of Config is to make a call early on to either Config.init() or Config.get() without parameters. That will read the config file in SNPIT_CONFIG and make that the default config for the process. If, for some reason, you want to read a different config file and make that the default config file for the process, then pass a configfile here and make setdefault True. If, for some truly perverse reason, you want to the config in SNPIT_CONFIG but not set it to the session default, then call Config.get(setdefault=False), and question your life choices.

  • prefix (string, default None) –

    If not None, then all calls to the .value() and .set_value() methods of the config object will add this string (followed by a .) to the string you actually pass. So, for instance, if your config file consists of:

    toplevel:
      midlevel1:
        sublevel1:
          val1: 1
          val2: 2
        sublevel2:
          str1: cat
          str2: kitten
      midlevel2:
        foo: bar
    

    then, if you did:

    cfg = Config.get( prefix='toplevel.midlevel1' )
    

    then cfg.value('sublevel1.val1') would return 1, and cfg.value('sublevel1.val2') would return 2. This is here as a convenience to save you from typing a bunch of extra stuff when within one function you only need part of the config hierarchy.

staticbool, default True

If True (the default), then you get one of the config object singletons described below. In this case, it is not permitted to modify the config. If False, you get back a clone of the config singleton, and that clone is not stored anywhere other than the return value. In this case, you may modify the config. Call Config.get(static=False) to get a modifiable version of the default config.

rereadbool, default False

If True, then the config file will be reread if it’s already been cache. If static is True and reread is True, then the singleton will be modified (meaning that thereafter, whenever you get() that singleton, you’ll get the new config values that were just reread here). If static is False and reread is True, then you the returned Config object will read the config files from disk, but will not change the singleton. Ignored if clone is not None.

cloneConfig, default None

If given, return a clone of this Config object. The returned object, if all is working properly, is a deep copy, so it should be safe to mangle it. It’s not possibe to set a cloned config object as default, nor is the cloned object ever set as a singleton, so setdefault and static are both ignored and treated as False when making a clone.

Return type:

Config object

Config objects are stored as an array of singletons (as class variables of the Config class). That is, if you pass a config file that has been passed before in the current execution context, you’ll get back exactly the same object each time (unless static is True). If you pass a config file that hasn’t been passed before, it will read the indicated configuration file, cache an object associated with it for future calls, and return that object (unless static is False, in which case the the object is still cached, but you get a copy of that object rather than the object itself).

If you don’t pass a config file, then you will get back the default config object. If there is no default config object (because neither Config.get() nor Config.init() have been called previously), and if the class is not configured with a “default default”, then an exception will be raised.

static init(configfile=None, setdefault=None)[source]

Initialize configuration globally for process.

Parameters:
  • configfile (str or pathlib.Path, default None) – See documentation of the configfile parameter in Config.get

  • setdefault (bool, default None) – See documentation of the setdefault parameter in Config.get

parse_args(args, path='', _dict=None)[source]

Update config options from argparse arguments.

See the docstring for the Config class for instructions on using this.

Parameters:
  • args (Namespace) – Something returned by argparser.ArgumentParser.parse_args()

  • path (string) – Used internally for recursion

  • _dict (dict) – Used internally for recursion

set_value(field, value, structpass=None, appendlists=False)[source]

Set a value in the config object.

If the config object was created with static=True (which is the case for all the singleton objects stored in the Config class), use of this method raises an exception.

Parameters:
  • field (str) – See value() for more information

  • value (str, int, float, list, or dict)

  • structpass (some object with a ".struct" field) – Used internally when the Config object is building it’s own _data field; don’t use externally

  • appendlists (bool, default False) – If true and if field is a pre-existing list, then value is appended to the list. Otherwise, value replaces the pre-existing field if there is one.

  • in (Does not save to disk. Follows the standard rules docuemnted)

  • True ("augment" and "override"; if appendlists is)

  • uses

  • "augment"

  • if (else "override". Will create the whole hierarchy)

  • necessary.

value(field=None, default=<snappl.config.NoValue object>, struct=None)[source]

Get a value from the config structure.

Parameters:
  • field (str) –

    The field specification, relative to the top level of the config. So, to get the value of a keyword aligned to column 0 of the config file, the field is just that keyword. For trees, separate fields by periods. If there is an array somewhere in the tree, then the array index as a number is the field for that branch.

    For example, if the config yaml file is;

    scalar: value

    dict1:
    dict2:

    sub1: 2level1 sub2: 2level2

    dict3:
    list:
    • list0

    • list1

    then you could get values with:

    configobj.value( “scalar” ) –> returns “value” configobj.value( “dict1.dict2.sub2” ) –> returns “2level2” configobj.value( “dict3.list.1” ) –> returns “list1”

    You can also specify a branch to get back the rest of the subtree; for instance configobj.value( “dict1.dict2” ) would return the dictionary { “sub1”: “2level1”, “sub2”: “2level2” }.

    If this is None, return the entire config tree as a dictionary (or, if working with a confg created with prefix=, maybe a list).

  • default (object, default NoValue instance) – Used internally, don’t muck with this.

  • struct (dict, default None) – If passed, use this dictionary in place of the object’s own config dictionary. Avoid use.

Returns:

If a list or dict, you get a deep copy of the original list or dict. As such, it’s safe to modify the return value without worrying about changing the internal config. (If you want to change the internal config, use set_value().)

Return type:

int, float, str, list, or dict