Usage

Overview

snappl has a set of utilities for the Roman SNPIT, including all the classes and functions necessary for communicating with the internal snpit database.

If you’re here for the SNPIT November 2025 pipeline test, see November 2025 SNPIT Pipeline Test Primer.

Things you need to understand:

November 2025 SNPIT Pipeline Test Primer

The database connection is under heavy development, and more things are showing up every day. Right now, the following is available:

  • Find image L2 images in the database

  • Find segmentation maps in the database

  • Saving newly discovered DiaObjects to the database

  • Finding DiaObjects

  • Saving updated positions for DiaObjects

  • Saving lightcurves to the database

  • Finding and reading lightcurves from the database

  • Saving 1d transient spectra to the database

  • Finding and reading 1d transient spectra from the database

This section describes what you need to do in order to connect to the database.

WARNING: because everything is under heavy development, it’s possible that interfaces will change. We will try to avoid doing this, because it’s a pain when everybody has to adapt, but we’re still building this, so it may be inevitable.

Recipes

Read everything to understand what’s going on, but these are intended as a quick start.

Prerequisites

These recipes assume you have a working environment (see Choose a working environment), which hopefully is as simple as pip install roman-snpit-snappl, but see that section for all the details and where you eventually need to be heading.

They also assume you have set up a config file. If you’re on NERSC, not running in a container, then save the config file for running natively on NERSC. (This is the file nov2025_nersc_native_config.yaml from the top level of the environment roman snpit github archive). Note that you will need to do two small edits in this file! If you are running in a podman container, then look at the in-container config file. (This is the file nov2025_nersc_container_config.yaml from the top level of the environment roman snpit github archive); you will also need to download interactive-podman-nov2025.sh. If you are elsewhere, you will need to edit the config file to have the right paths to find things on your system.

You need to set the environment variable SNPIT_CONFIG to point to where this configuration file lives.

Finally, once at the top of your code you need to do:

from snappl.dbclient import SNPITDBClient

dbclient = SNPITDBClient.get()

You can then make futher connections to the database using this dbclient variable. Many snappl functions that connect to the database take an optional dbclient parameter, to which you can pass this. By default, most of them will, in the background, just use the first one that was created with the first call to SNPITDBClient.get(). (See the docstring for SNPITDBClient.get if you are morbidly curious.)

Variables/Arguments for IDs and Provenances

Below, you will be told you need to know a number of object ids and/or provenance-related values. These will generally be provided by orchestration. You should make them things that can be passed on the command line. I recommend using the following command-line arguments — choose the ones that you need (they are all string values):

--diaobject-id
--diaobject-provenance-tag
--diaobject-process
--diaobject-position-provenance-tag
--diaobject-position-process
--image-id
--image-provenance-tag
--image-process
--segmap-provenance-tag
--segmap-process
--ltcv-provenance-tag
--ltcv-process
--spec1d-provenance-tag
--spec1d-process

If you just plan to call functions from python, then you can make the things you need keyword arguments. I recommend using the same names as above, only replacing the dashes with underscores (and removing the two dashes at the beginning). The examples below all assume that the variables have these names.

For a list of provenance tags we will be using during the November 2025 run, see Provenance Tags We Will Use In November 2025.

Finding a DiaObject

You need to know either the diaobjectd_id of the object (which you will generally be given), or you need to know the diaobject_provenance_tag and diaobject_process, and you must have enough search criteria to find the object. If you’re doing the latter, read the docstring on snappl.diaobject.DiaObject.find_objects. For the former:

from snappl.diaobject import DiaObject

diaobject = DiaObject.get_object( diaobject_id=diaobject_id )

The returned DiaObject object has, among other things, properties .id, .ra and .dec.

Getting an updated diaobject position

You need the diaobject_position_provenance_tag and diaobject_position_process.

Do:

diaobj_pos = diaobject.get_position( provenance_tag=diaobject_position_provenance_tag,
                                     process=diaobject_position_process )

You get back a dictionary that has a number of keys including ra and dec.

Getting a Specific Image

Orchestration has given you a image_id that you are supposed to do something with. E.g.,, you are running sidecar, and you’re supposed to subtract and search this image.

from snappl.image import Image

image = Image.get_image( image_id )

You will get back an Image object (really, an object of a class that is a subclass of Image). It has a number of properties. Most important are .data, .noise, and .flags, which hold 2d numpy arrays. There is also a .get_fits_header() method that currently works, but be careful using this as this method will not work in the future when we’re using ASDF files. See the docstrings in snappl.image.Image for more details. Some of the stuff you might want is available directly as properties of and Image object.

Finding Images

You need to know the image_provenance_tag and image_process.

See the docstring on snappl.imagecollection.ImageCollection.find_images if you want to do more than what’s below.

Finding all images in a given band that include a ra and dec

Do:

from snappl.image import Image

images = Image.find_images( provenance_tag=image_provenacne_tag, process=image_process,
                            ra=ra, dec=dec, band=band )

where band=band is optional but often useful. (If you don’t specify it, you will get back images from all bands.) You will get back a list of Image objects, which have a number of properties (some of which may be None if they are unknown):

  • data : the data array, in ADU, a 2d numpy array

  • noise : the ADU uncertainty on the data, a 2d numpy array

  • flags : pixel flags, a 2d numpy array of ints [we are still working on definining these]

  • width : int, the width of the image in pixels; horizontal as viewed on ds9; corresponds to data.shape[1]

  • height : int, the height of the image in pixels; vertical as viewed on ds9; corresponds to data.shape[0]

  • image_shape : tuple of ints, (height, width)

  • pointing : int or str, the pointing; warning this property name will change when we know more

  • sca : int, the chip of the image

  • ra : float, the nominal center of the image in decimal degrees (usu. from the header)

  • dec : float, the nominal center of the image in decimal degrees (usu. from the header)

  • coord_center : tuple of (ra, dec) [I THINK], calcualted from the image WCS

  • ra_corner_00 : float, decimal degrees, the RA of pixel (0, 0)

  • ra_corner_01 : float, decimal degrees, the RA of pixel (0, height-1)

  • ra_corner_10 : float, decimal degrees, the RA of pixel (width-1, 0)

  • ra_corner_11 : float, decimal degrees, the RA of pixel (width-1, height-1)

  • dec_corner_00 : float, decimal degrees, the Dec of pixel (0, 0)

  • dec_corner_01 : float, decimal degrees, the Dec of pixel (0, height-1)

  • dec_corner_10 : float, decimal degrees, the Dec of pixel (width-1, 0)

  • dec_corner_11 : float, decimal degrees, the Dec of pixel (width-1, height-1)

  • band : str, the filter/band/whatever you call it for this image

  • mjd : float, mjd (days) when the image exposure started

  • position_angle : float, position angle in degrees north of east (CHECK THIS)

  • exptime float, exposure time in seconds

  • sky_level float; an estimate of the sky level (in ADU) if known, None otherwise

  • zeropoint : float; convert ADU to AB magnitudes with m = -2.5*log10(adu) + zeropoint

Warning: because of how numpy arrays are indexed, if you want to get the flux value in pixel (ix, iy), you would look at .data[iy, ix].

There is also a .get_fits_header() method that currently works, but be careful using this as this method will not work in the future when we’re using ASDF files. See the docstrings in snappl.image.Image for more details.

Finding Segmentation Maps

You need to know the segmap_provenance_tag and the segmap_process.

See the docstring on snappl.segmap.SegmentationMap.find_segmaps for more information on searches you can do beyond what’s below.

Finding all segmaps that include a ra and dec

Do:

from snappl.segmap import SegmentationMap

segmaps = SegmentationMap.find_segmaps( provenance_tag=segmap_provenance_tag,
                                        process=segmap_process,
                                        ra=ra, dec=dec )

You get back a list of SegmentationMap objects. These have a number of properties, most import of which is image, which holds an Image object. You can get the image data for the segmentation map for the first element of the list with segmaps[0].image.data (a 2d numpy array).

Saving a new DiaObject

You are running sidecar and you’ve found a new diaobject you want to save. You need a process (we shall assume process='sidecar' here), the major and minor version of your code, and the params that define how the code runs. The latter is just a dictionary; you can build it yourself, but see Making Provenances below. Finally, assume that images is an list that has the snappl.image.Image objects of the images that you’ve used; replace images[0] below with wherever you have your Image object:

from snappl.provenance import Provenance
from snappl.diaobject import DiaObject

imageprov = Provenance.get_by_id( images[0].provenance_id )
prov = Provenance( process='sidecar', major=major, minor=minor, params=params,
                   upstreams=[ imageprov ] )
# You only have to do this next line once for a given provenance;
#   once the provenance is in the database, you never need to save it again.
prov.save_to_db( tag=diaobject_provenance_tag )   # See note below

diaobj = DiaObject( provenance_id=prov.id, ra=ra, dec=dec, name=optional, mjd_discovery=mjd )
diaobj.save_object()

This will save the object to the database. You can then look at diaobj.id to see what UUID it was assigned. You do not need to give it a name, but you can if you want to. (The database uses the id as the unique identifier.) mjd_discovery should be the MJD of the science image that the object was found on.

Finding and reading lightcurves

You need to know the ltcv_provenance_tag and ltcv_process, and the diaobject_id of the object for which you want to get lightcurves:

from snappl.lightcurve import Lightcurve

ltcvs = Lightcurve.find_lightcurves( provenance_tag=ltcv_provenance_tag,
                                     process=ltcv_process,
                                     diaobject=diaobject_id,
                                     band=band,         # optional
                                   )

You will get back a list of Lightcurve objects. You can find the actual lightcurve data of the first lightcurve from the list with ltcvs[0].lightcurve. This is an astropy QTable. You can read the metadata from ltcvs[0].lightcurve.meta.

Coming soon: a way to read a combined lightcurve that has all of the bands mixed together. (Not implemented yet.)

Saving lightcurves

You need to make sure you’ve created a dictionary with all the necessary metadata. Also make sure you’ve created a data table with the necessary columns; this can be an astropy Table, a pandas DataFrame, or a dict of lists. We shall call these two things meta and data.

Assume that you’ve made the lightcurve for object diaobject (a DiaObject object), and that you have a list of your images in images. Adjust below for the variables where you really have things. Finally, if you used an updated DiaObject position, make sure you have set the ra and dec in meta from that.

Finally, you will need to know the ltcv_provenance_tag we’re using.

Below, process is probably either campari or phrosty. major and minor are the major and minor parts of the version, which you should parse from campari.__version__ or phrosty.__version__. params are the parameters as described below in Making Provenances.

Do:

from snappl.provenance import Provenance
from snappl.lightcurve import Lightcurve

imgprov = Provenance.get_by_id( images[0].provenance_id )
objprov = Provenance.get_by_id( diaobject.provenance_id )
objposprov = Provenance.get_by_id( diaobj_pos['provenance_id'] )

ltcvprov = Provenance( process=process, major=major, minor=minor, params=params,
                       upstreams=[imgprov, objprov, objposprov] )
# The next line only needs to be run once.  Once you've saved it to the database,
#   you never need to do this again.
ltcvprov.save_to_db( tag=ltcv_provenance_tag )

meta['provenance_id'] = ltcvprov.id
meta['diaobject_id'] = diaobject.id
meta['diaobject_position_id'] = diaobj_pos['id']
for att in [ 'ra', 'dec', 'ra_err', 'dec_err', 'ra_dec_covar' ]:
    meta[att] = diaobj_pos[att]

ltcv = Lightcurve( data=data, meta=meta )
ltcv.write()
ltcv.save_to_db()

You can look at ltcv.id to see the UUID of the lightcurve you saved, in case you are curious.

If you used the ra and dec that was in DiaObject, then meta['diaobject_position_id'] should be None. Skip everything else above that refers to diaobj_pos.

Finding and reading 1d Spectra

Right now we only have 1-d transient spectra, defined by the schema here.

If you already know the ID of the spectrum you want, you can just do:

from snappl.spectrum1d import Spectrum1d

spec = Spectrum1d.get_spectrum1d( spectrum1d_id )
That will return a Spectrum1d object. See its docstring for more information. The most important property is probably combined_data, which has dictionary with four elements; each value of the dictionary is a numpy array of the same length:
  • lamb : the wavelength (in Å)

  • flam : the flux (in what erg/s/cm²/Å)

  • func : the uncertainty on flam

  • count : an integer, the number of individual image spectra that contributed to this data point.

The data_dict property has the full dictionary of data defined in the 1d spectrum schema.

You can find a spectra for a given object with:

from snappl.spectrum1d import Spectrum1d

specs = Spectrum1d.find_spectra( provenance_tag=spec1d_provenance_tag,
                                 process=spec1d_process,
                                 diaobject_id=diaobject_id,
                                 mjd_start_min=mjd0,       # optional
                                 mjd_end_max=mjd1        # optional
                               )

The mjd... parmaeters are optional; include these if you want to provide a time window into which all the images that went into the spectra fit. See the find_spectra docstring for more optional parameters you can supply for the search. You will get back a list of Spectrum1d objects.

Saving 1d Spectra

You need to have the diaobject (a DiaObject object) for which you made the spectrum, potentially a diaobj_pos, an improved position for the object, and images, a list of Image object that held the dispersed images from which you are making the spectrum. You need to know the spec1d_provenance_tag.

You need to know the process (which is probably just the name of your code), and the major and minor versions of your code. Finally, you need to know the params that define how your code runs. The latter is just a dictionary; you can build it yourself, but see Making Provenances below.

You build a data structure that is described on the wiki; call that spec_struct. Note that if you pass all of provenance, diaobject, and diaobject_position_id to the Spectrum1d constructor, you do not necessarily need to have all those IDs in the spec_struct ahead of time; the constructor will fill out what is necessary. If you do both pass them and put them in the spec_struct, they must be consistent. You do not need to (and should not) set meta['id'] or meta['filepath'] yourself. id will be generated when you construct the Spectrum1d object, and filepath will be set automatically when you save the spectrum.

Do:

from snappl.provenance import Provenance
from snappl.spec1d import Spectrum1d

diaobj_prov = Provenance.get_by_id( diaobject.provenance_id )
imageprov = Provenance.get_by_id( images[0].provenance_id )
diaobj_pos_prov = Provenance.get_by_id( diaobj_pos['id'] )

spec1d_prov = Provenance( process=process, major=major, minor=minor, params=params,
                          upstreams=[ diaobj_prov, imageprov, diaobj_pos_prov ] )
# The next line only needs to be run once.  Once
#   you have saved a Provenance to the database you
#   never need to save it again
spec1d_prov.save_to_db( tag=spec1d_provenance_tag )

# Make sure that all the mandatory metadata is there.  In particular, it is necessary
#   for each spec_struct['individual][n]['meta']['image_id'] to be set (where n is an
#   integer).  You could do this with:
for i in range( len(images) ):
    spec_struct['individual'][i]['meta']['image_id'] = images[i].id

spec1d = Spectrum1d( spec_struct )
spec1d.save_to_db( write=True )

Note that when you create a Spectrum1d, it will keep a copy of the spec_struct object you passed in its spec_struct property. It will also modify this object, adding the id and filepath attributes, as well other things based on what you pass to the constructor. You can always get back at the internally stored structure with the data_dict property of the Spectrum1d object.


Choose a working environment

Whatever it is, you will need to pip install roman-snpit-snappl. This package is under heavy development, so you will want to update your install often. This provides the snappl modules that you are currently reading the documentation for.

We strongly recommend you think ahead towards developing your code to run in a container. The SNPIT will probably eventually need to run everything it does in containers. On your desktop or laptop, you can use Docker. On NERSC, you can use podman-hpc. On many other HPC clusters, you can use Singularity.

The SN PIT provides a containerized environment which includes the latest version of snappl at https://github.com/Roman-Supernova-PIT/environment . You can pull the docker image for this environment from one of:

  • registry.nersc.gov/m4385/rknop/roman-snpit-env:cpu

  • registry.nersc.gov/m4385/rknop/roman-snpit-env:cpu-dev

  • registry.nersc.gov/m4385/rknop/roman-snpit-env:cuda

  • registry.nersc.gov/m4385/rknop/roman-snpit-env:cuda-dev

  • rknop/roman-snpit-env:cpu

  • rknop/roman-snpit-env:cpu-dev

  • rknop/roman-snpit-env:cuda

  • rknop/roman-snpit-env:cuda-dev

We recommend you use the cpu version, unless you need CUDA, in which case try the cuda version, but you may need the cuda-dev version (which is terribly bloated).

WARNING: The snpit docker environment does not currently work on ARM architecture machines (because of issues with Galsim and fftw). This means that if you’re on a Mac, you’re SOL. If you’re on a Linux machine, do uname -a and look towards the end of the output to see if you’re on x86_64 or ARM. We hope to resolve this eventually. For now, as much as possible run on x86_64 machines. (However, reports are that pip instasll roman-snpit-snappl does work on ARM Macs, so the issue may just be we need to figure out how to get the docker images to build for ARM.)

You can, of course, create your own containerized environment for your code to run in, but you will need to support it, and eventually you will need to deliver it for the PIT to run in production. For that reason, we strongly recommend you start trying to use the standard SNPIT environment. Ideally, your code should be pip installable from PyPI, and eventually your code will be included in the environment just like snappl currently is.

Making Provenances

Before you save anything to the database, you need to make a Provenance for it. For example, consider the difference imaging lightcurve package phrosty. It will need to have a diaobject (let’s assume it’s in the variable obj), and it will need to have a list of images (let’s assume they’re in the variable images; we’ll leave aside details of template vs. science images for now). Let’s assume phrosty is using the Config system in snappl, and has put all of its configuration under photometry.phrosty. (There are details here you must be careful about; things like paths on your current system should not go under photometry.phrosty, but should go somewhere underneath system.. The current object and list of images you’re working on should not be in the configuration, but should just be passed via command-line parameters. The idea is that the configuration has all of, but only, the things that are the same for a large number of runs on a large number of input files which guarantee (as much as possible) the same output files.)

phrosty could then determine its own provenance with:

from snappl.config import Config
from snappl.provenance import Provenance

objprov = Provenance.get_by_id( obj.provenance_id )
improv = Provenance.get_by_id( images[0].provenance_id )
phrostyprov = Provenance( process='phrosty', major=MAJOR, minor=MINOR,
                          upstreams=[ objprov, improv ],
                          params=Config.get(), omitkeys=None, keepkeys=[ 'photometry.phrosty' ] )

See Provenance below for more details about what all of this means. Here, MAJOR and MINOR are the first two parts of the semantic version of phrosty.

We recommend that phrosty put in its output files, somewhere, in addition to what’s obvious:

  • The provenance_id for phrosty (obtained from phrostyprov.id).

  • The configuration parameters for phrosty (obtained from phrostprov.params — a dictionary).

(If you’re very anal, you may want to save a gigantic dictionary structure including everything from phrostyprov and everything from all of the upstream provenances, and the upstreams of the upstreams, etc.)

NOTE: provenance can also store environment and environment version, but we don’t have that fully defined yet.

Before saving anything to the database, you will need to make sure that the provenance has been saved to the database. If you are sure that you’ve saved this same Provenance before, you can skip this step, but at some point you will need to:

phrostyprov.save_to_db( tag=PROVENANCE_TAG )

where PROVENANCE_TAG is a string; see Provenance Tags We Will Use In November 2025 below for a list of what we plan to use.

Provenance Tags We Will Use In November 2025

WARNING: View this list as preliminary. It may all change at any moment.

provenance_tag

process

for

produced by

notes

ou2024_truth

load_ou2024_diaobject

diaobject

(primordial)

Truth-table objects

ou2204_truth

diaobject_position

diaobject_position

(primordial)

Truth-table positions (1)

ou2024

load_ou2024_image

direct image

(primordial)

l2 images

ou2024

load_ou2024_dispimages

prism image

(primordial)

l2 prism images

nov2025

sidecar

diaobject

sidecar

objects found by sidecar dia

nov2025

phrosty

lightcurve

phrosty

forced-phot lightcurves on sidecar objs

nov2025_ou2024

phrosty

lightcurve

phrosty

forced-phot lightcurves on ou2024 objs

nov2025

phrosty_position

diaobject_position

phrosty

improved positions of sidecar objs

nov2025

campari

lightcurve

campari

scene-modelling lightcurve

nov2025

(something)

spectrum1d

(something)

spectra of sidecar objs w/ phrosty poses

nov2025_ou2024

(something)

spectrum1d

(something)

spectra of ou2024 truth objs w/ truth poses

(1) These positions will be identical to the positions in the DiaObject object for the ou2024_truth provenance tag. However, they are still here so you can write the code to use the positions table.

Connecting to the Database

To connect to the database, you need three things. First, you have to know the url of the web API front-end to the database. You must also have a username and a password for that web API. (NOTE: the config system is likely to change in the future, so exactly how this works may change.) If you’re using The Roman SNPIT Test Environment, then the test fixture dbclient configures a user with username test and password test_password, and in that environment the url of the web API is https://webserver:8080/.

You configure all of these things by setting the system.db.url, system.db.username, and either system.db.password or system.db.password_file in the configuration yaml files. (See Config below.) For example, see the default snpit_system_config.yaml in the Roman SNPIT environment. Do not save passwords to any git archive, and do not leave them sitting about in insecure places. Of course, having to type it all the time is a pain. A reasonable compromise is to have a secrets directory under your home directory that is not world-readable (chown 700 secrets). Then you can create files in there. Put your password in a file, and set the location of that file in the system.db.password_file config. (Make system.db.password to be null so the password file will be used.) If you’re using a docker container, of course you’ll need to bind-mount your secrets directory.

Once you’ve configured these things, you should be able to connect to the database. You can get a connection object with:

from snappl.dbclient import SNPITDBClient

dbclient = SNPITDBClient.get()

Thereafter, you can pass this dbclient as an optional argument to any snappl function that accesses the database. (Lots of the examples below do not explicitly include this argument, but you could add it to them.) Most of these functions will just use the first dbclient you created (which is cached in the background) if you don’t pass one, or create one themselves if one doesn’t already exists, so usually passing one isn’t necessary.

Config

snappl includes a config system whereby configuration files can be stored in yaml files. It has the ability to include other yaml files, and to override any of the config values on the command line, if properly used.

The Default Confg

You can find an example/default config for the Roman SNPIT in two files in the environment github repo:

Notice that the first one includes the second one. In the standard Roman SNPIT docker image, these two files are present in the root directory (/).

Ideally, all config for every SNPIT application will be in this default config file, so we can all use the same config and be sure we know what we’re doing. Of course, that’s far too cumbersome for development, so during development you will want to make your own config file with just the things you need in it.

By convention, everything underneath the system top level key are the things that you might have to change when moving from one cluster to another cluster, but that don’t change the behavior of the code. This includes paths for where to find things, configurations as to where the database is, login credentials, and the like. Everything that is _not_ under system should be things that define the behavior of your code. These are the things that are the same every you run on different inputs. It should _not_ include things like the specific images or diaobjects you’re currently working on. Ideally, everything that’s _not_ in system, if it stays the same, will give the same outputs on the same inputs when run anywhere.

Using Config

To use config, you first have to set the environment variable SNIPIT_CONFIG to the location of the top-level config file. If you’re using the default config and working in the roman snpit docker image, you can do this with:

export SNPIT_CONFIG=/default_snpit_config.yaml

Then, in your code, to get access to the config, you can just run:

from snappl.config import Config

...

cfg = Config.get()
tmpdir = Config.value( 'system.paths.temp_dir` )

Config.get() gets you a config object. Then, just call that object’s value method to get the actual config values. Separate different levels of dictionaries in the config with periods, as in the example. (Look at default_snpit_config.yaml to see how the config file corresponds to the value in the example above.)

There are more complicated uses of Config (including reading different, custom config files, modifying the config at runtime, understanding how the config files and all the possible modes of including other files are composed). Read the docstring on snappl.config.Config for more information.

Overriding Parameters on the Command Line

At runtime, if you set things up properly, you can override some of the parameters from the config file with command-line arguments. To accomplish this, you must be using python’s argparse package. When you’re ready to parse your arguments, write the following code:

configparser = argarse.ArgumentParser( add_help=False )
configparser.add_argument( '-c', '--config-file', default=None,
                           help=( "Location of the .yaml config file; defaults to the value of the "
                                  "SNPIT_CONFIG environment variable." ) )
args, leftovers = configparser.parse_known_args()

try:
    cfg = Config.get( args.config_file, setdefault=True )
except RuntimeError as e:
    if str(e) == 'No default config defined yet; run Config.init(configfile)':
        sys.stderr.write( "Error, no configuration file defined.\n"
                          "Either run <your application name> with -c <configfile>\n"
                          "or set the SNPIT_CONFIG environment variable.\n" )
        sys.exit(1)
    else:
        raise

parser = argparse.ArgumentParser()
# Put in the config_file argument, even though it will never be found, so it shows up in help
parser.add_argument( '-c', '--config-file', help="Location of the .yaml config file" )

After that, put all of the parser.add_argument lines that you need for the command-line arguments to your code. Then, at the bottom, after you’re done with all of your parser.add_argument calls, put in the code:

cfg.augment_argparse( parser )
args = parser.parse_args( leftovers )
cfg.parse_args( args )

At this point in your code, you can get access to the command line arguments you specified with the args variable as usual. However, the running config (that you get with Config.get()) will _also_ have been updated with any changes made on the command line.

If you’ve set your code up like this, run it with --help. You will see the help on the arguments you defined, but you will also see optional arguments for everything that is in the config file.

TODO : make it so you can only include some of the top-level keys from the config file in what gets overridden on the command line, to avoid things getting far too cluttered with irrelevant options.

Provenance

Everything stored in the internal Roman SNPIT database has a Provenance associated with it. The purpose of Provenance is twofold:

  • It allows us to store multiple versions of the same thing in the database. (E.g., suppose you wanted to build a lightcurve for an object using two different configurations of your photometry software. If the database just stored “the lightcurve for this object”, it wouldn’t be possible to store both. However, in this case, the two lightcurves would have different provenances, so both can be stored.)

  • It keeps track of the code and the configuration used to create the thing stored in the database. Ideally, this includes all of the parameters (see below) for the code, in addition to the code and code version, as well as (optionally) information about the environment in which the code should be run, such that we could reproduce the output files by running the same code with the same configuration again.

A provenance is defined by:

  • The process : this is usually the name of the code that produced the thing saved to the database.

  • The major and minor version of the process; Roman SNPIT code should use semantic versioning.

  • params, The parameters of the process (see below)

  • Optionally: the environment, and env_major and env_minor, the major and minor versions of the environment. (By default, these three are all None.)

  • upstreams, the immediate upstream provenances (see below).

An id is generated from the provenance based on a hash of all the information in the provenance, available in the id property of a Provenance object. This id is a UUID (sort of), and will be something ugly like f76f39a2-edcf-4e31-ba6b-e3d4335cc972. Crucially, every time you create a provenance with all the same information, you will always get exactly the same id.

Provenance Tags

Provenances hold all the necessary information, and as such are cumbersome. Provenance IDs are 128-bit numbers, and are not very human readable. For this reason, we have provenance tags, which are human readable, and also allow us to collect together the provenances of a bunch of different processes into a coherent set of data products.

A provenance tag is defined by a human-readable string tag, and by the process (which is the same as the process of a Provenance.) For a given (tag, process) pair, there can only be one Provenance. That means that you can uniquely define a Provenance by its tag and its process.

We should be careful not to create tags willy-nilly. Ideally, we will have a small number of provenance tags in the database that correspond to sets of runs through the entire pipeline.

Getting Provenances from the Database

If, somehow, you got your hands on a provenance_id (the ugly 128-bit number), and you want to get the full Provenance object for it, you can accomplish that with:

from snappl.provenance import Provenance

prov = Provenance.get_by_id( provenance_id )

You will find provenance ids in the provenance_id field of things you pulled out of the database. For example, if you have a DiaObject object (call it obj) that you got with DiaObject.get_object or DiaObject.find_objects, then you can find the id of the provenance of that DiaObject in obj.provenance_id.

If, instead, you know (e.g. because the user passed this on the command line) that you want to work on the objects that we have chosen to tag with the provenance tag realtime, and the process rapid_alerts (for instance, these may be objects we learned about from the RAPID alert stream), then you could get the provenance with:

prov = Provenance.get_provs_for_tag( 'realtime', 'rapid_alerts' )

Parameters

The params field of a Provenance is a dictionary that should include everything necessary for the specified version of your code to produce the same output on the same input. It should not include things like input filenames. The idea is that the same Provenance will apply to everything that is part of a given run. Only when you are changing the configuration, or when you are getting input files from an earlier part of the pipeline, should the Provenance change.

If you are using the Config system, and you’ve put all of these parameters (but no system-specific, like base paths, and no input files) in the config yaml file, then you can get a suitable params with:

cfg = Config.get()
params = cfg.dump_to_dict_for_params( keepkeys=[ 'photometry.phrosty' ], omitkeys=None )

The list in keepkeys are the keys (including the full substructure below that key) from the config that you want to include in the dictionary. This allows you to select out the parts of the config that are relevant to your code. system and anything starting with system. should never be in keepkeys.

Upstreams

The upstream provenances are the ones that created the input files you use. For example, campari has three basic types of inputs: a diaobject, the supernova it’s running on; a diaobject_position, an updated position of the object; and images, the images it’s fitting its model to. Thus, it would have three upstream provenances, one for each of these things.

It can figure out these upstreams by just looking at the provenance_id field of the objects its using. Again, for example, campari will have (somehow) obtained a snappl.diaobject.DiaObject object; call that diaobj. It can get the diaobject provenance by just looking at diaobj.provenance_id. (To actually get the full Provenance object from the id, run snappl.provenance.Provenance.get_by_id( provenance_id ).)

Upstreams is part of the provenance because even if you run your code with all the same parameters, if you’re taking input files that were from a differently configured process earlier in the pipeline, you expect different outputs. Upstreams basically specify which sorts of input files are valid for this provenance.

Creating a Provenance

Just create a provenance with:

from snappl.provenance import Provenance

prov = Provenance( process, major, minor, params=<params>, upstreams=<upstreams> )

In this call, process is a string, major and minor are integers, params is a dictionary (see Parameters), and upstreams is a list of Provenance objects (see Upstreams).

If this is a totally new Provenance— you’ve never made it before— then save it to the database with:

prov.save_to_db( tag=<tag> )

Here, <tag> is the provenance tag that you want to tag this provenance with. If the provenance already exists in the database, or if another provenance from the same process is already tagged with this tag, you will get an error. If the provenance you’re trying to save already exists, that’s fine; it won’t resave it, it will just notice that it’s there. So, this is safe to call even if you aren’t sure if you’ve saved it before or not. If, for some reason, you really want this to be a new provenance, add exists=False to the call. In that case, if the provenance already exists, an exception will be raised.

The Roman SNPIT Test Environment

(This is currently a bit of a mess, and I haven’t figured out how to get this to work on Perlmutter. However, if you’re on a desktop or laptop with an x86_64 architecture, then you should be able to get this running on your machine using Docker. Read all the comments at the top of this file in the environment repo.)