Usage¶
Overview¶
snappl has a set of utilities for the Roman SNPIT, including all the classes and functions necessary for communicating with the internal snpit database.
If you’re here for the SNPIT November 2025 pipeline test, see November 2025 SNPIT Pipeline Test Primer.
- Things you need to understand:
November 2025 SNPIT Pipeline Test Primer¶
The database connection is under heavy development, and more things are showing up every day. Right now, the following is available:
Find image L2 images in the database
Find segmentation maps in the database
Saving newly discovered DiaObjects to the database
Finding DiaObjects
Saving updated positions for DiaObjects
Saving lightcurves to the database
Finding and reading lightcurves from the database
Saving 1d transient spectra to the database
Finding and reading 1d transient spectra from the database
This section describes what you need to do in order to connect to the database.
WARNING: because everything is under heavy development, it’s possible that interfaces will change. We will try to avoid doing this, because it’s a pain when everybody has to adapt, but we’re still building this, so it may be inevitable.
Recipes¶
Read everything to understand what’s going on, but these are intended as a quick start.
Prerequisites¶
These recipes assume you have a working environment (see Choose a working environment), which hopefully is as simple as pip install roman-snpit-snappl, but see that section for all the details and where you eventually need to be heading.
They also assume you have set up a config file. If you’re on NERSC, not running in a container, then save the config file for running natively on NERSC. (This is the file nov2025_nersc_native_config.yaml from the top level of the environment roman snpit github archive). Note that you will need to do two small edits in this file! If you are running in a podman container, then look at the in-container config file. (This is the file nov2025_nersc_container_config.yaml from the top level of the environment roman snpit github archive); you will also need to download interactive-podman-nov2025.sh. If you are elsewhere, you will need to edit the config file to have the right paths to find things on your system.
You need to set the environment variable SNPIT_CONFIG to point to where this configuration file lives.
Finally, once at the top of your code you need to do:
from snappl.dbclient import SNPITDBClient
dbclient = SNPITDBClient.get()
You can then make futher connections to the database using this dbclient variable. Many snappl functions that connect to the database take an optional dbclient parameter, to which you can pass this. By default, most of them will, in the background, just use the first one that was created with the first call to SNPITDBClient.get(). (See the docstring for SNPITDBClient.get if you are morbidly curious.)
Variables/Arguments for IDs and Provenances¶
Below, you will be told you need to know a number of object ids and/or provenance-related values. These will generally be provided by orchestration. You should make them things that can be passed on the command line. I recommend using the following command-line arguments — choose the ones that you need (they are all string values):
--diaobject-id
--diaobject-provenance-tag
--diaobject-process
--diaobject-position-provenance-tag
--diaobject-position-process
--image-id
--image-provenance-tag
--image-process
--segmap-provenance-tag
--segmap-process
--ltcv-provenance-tag
--ltcv-process
--spec1d-provenance-tag
--spec1d-process
If you just plan to call functions from python, then you can make the things you need keyword arguments. I recommend using the same names as above, only replacing the dashes with underscores (and removing the two dashes at the beginning). The examples below all assume that the variables have these names.
For a list of provenance tags we will be using during the November 2025 run, see Provenance Tags We Will Use In November 2025.
Finding a DiaObject¶
You need to know either the diaobjectd_id of the object (which you will generally be given), or you need to know the diaobject_provenance_tag and diaobject_process, and you must have enough search criteria to find the object. If you’re doing the latter, read the docstring on snappl.diaobject.DiaObject.find_objects. For the former:
from snappl.diaobject import DiaObject
diaobject = DiaObject.get_object( diaobject_id=diaobject_id )
The returned DiaObject object has, among other things, properties .id, .ra and .dec.
Getting an updated diaobject position¶
You need the diaobject_position_provenance_tag and diaobject_position_process.
Do:
diaobj_pos = diaobject.get_position( provenance_tag=diaobject_position_provenance_tag,
process=diaobject_position_process )
You get back a dictionary that has a number of keys including ra and dec.
Getting a Specific Image¶
Orchestration has given you a image_id that you are supposed to do something with. E.g.,, you are running sidecar, and you’re supposed to subtract and search this image.
from snappl.image import Image
image = Image.get_image( image_id )
You will get back an Image object (really, an object of a class that is a subclass of Image). It has a number of properties. Most important are .data, .noise, and .flags, which hold 2d numpy arrays. There is also a .get_fits_header() method that currently works, but be careful using this as this method will not work in the future when we’re using ASDF files. See the docstrings in snappl.image.Image for more details. Some of the stuff you might want is available directly as properties of and Image object.
Finding Images¶
You need to know the image_provenance_tag and image_process.
See the docstring on snappl.imagecollection.ImageCollection.find_images if you want to do more than what’s below.
Finding all images in a given band that include a ra and dec¶
Do:
from snappl.image import Image
images = Image.find_images( provenance_tag=image_provenacne_tag, process=image_process,
ra=ra, dec=dec, band=band )
where band=band is optional but often useful. (If you don’t specify it, you will get back images from all bands.) You will get back a list of Image objects, which have a number of properties (some of which may be None if they are unknown):
data: the data array, in ADU, a 2d numpy array
noise: the ADU uncertainty on the data, a 2d numpy array
flags: pixel flags, a 2d numpy array of ints [we are still working on definining these]
width: int, the width of the image in pixels; horizontal as viewed on ds9; corresponds to data.shape[1]
height: int, the height of the image in pixels; vertical as viewed on ds9; corresponds to data.shape[0]
image_shape: tuple of ints, (height, width)
pointing: int or str, the pointing; warning this property name will change when we know more
sca: int, the chip of the image
ra: float, the nominal center of the image in decimal degrees (usu. from the header)
dec: float, the nominal center of the image in decimal degrees (usu. from the header)
coord_center: tuple of (ra, dec) [I THINK], calcualted from the image WCS
ra_corner_00: float, decimal degrees, the RA of pixel (0, 0)
ra_corner_01: float, decimal degrees, the RA of pixel (0, height-1)
ra_corner_10: float, decimal degrees, the RA of pixel (width-1, 0)
ra_corner_11: float, decimal degrees, the RA of pixel (width-1, height-1)
dec_corner_00: float, decimal degrees, the Dec of pixel (0, 0)
dec_corner_01: float, decimal degrees, the Dec of pixel (0, height-1)
dec_corner_10: float, decimal degrees, the Dec of pixel (width-1, 0)
dec_corner_11: float, decimal degrees, the Dec of pixel (width-1, height-1)
band: str, the filter/band/whatever you call it for this image
mjd: float, mjd (days) when the image exposure started
position_angle: float, position angle in degrees north of east (CHECK THIS)
exptimefloat, exposure time in seconds
sky_levelfloat; an estimate of the sky level (in ADU) if known, None otherwise
zeropoint: float; convert ADU to AB magnitudes withm = -2.5*log10(adu) + zeropoint
Warning: because of how numpy arrays are indexed, if you want to get the flux value in pixel (ix, iy), you would look at .data[iy, ix].
There is also a .get_fits_header() method that currently works, but be careful using this as this method will not work in the future when we’re using ASDF files. See the docstrings in snappl.image.Image for more details.
Finding Segmentation Maps¶
You need to know the segmap_provenance_tag and the segmap_process.
See the docstring on snappl.segmap.SegmentationMap.find_segmaps for more information on searches you can do beyond what’s below.
Finding all segmaps that include a ra and dec¶
Do:
from snappl.segmap import SegmentationMap
segmaps = SegmentationMap.find_segmaps( provenance_tag=segmap_provenance_tag,
process=segmap_process,
ra=ra, dec=dec )
You get back a list of SegmentationMap objects. These have a number of properties, most import of which is image, which holds an Image object. You can get the image data for the segmentation map for the first element of the list with segmaps[0].image.data (a 2d numpy array).
Saving a new DiaObject¶
You are running sidecar and you’ve found a new diaobject you want to save. You need a process (we shall assume process='sidecar' here), the major and minor version of your code, and the params that define how the code runs. The latter is just a dictionary; you can build it yourself, but see Making Provenances below. Finally, assume that images is an list that has the snappl.image.Image objects of the images that you’ve used; replace images[0] below with wherever you have your Image object:
from snappl.provenance import Provenance
from snappl.diaobject import DiaObject
imageprov = Provenance.get_by_id( images[0].provenance_id )
prov = Provenance( process='sidecar', major=major, minor=minor, params=params,
upstreams=[ imageprov ] )
# You only have to do this next line once for a given provenance;
# once the provenance is in the database, you never need to save it again.
prov.save_to_db( tag=diaobject_provenance_tag ) # See note below
diaobj = DiaObject( provenance_id=prov.id, ra=ra, dec=dec, name=optional, mjd_discovery=mjd )
diaobj.save_object()
This will save the object to the database. You can then look at diaobj.id to see what UUID it was assigned. You do not need to give it a name, but you can if you want to. (The database uses the id as the unique identifier.) mjd_discovery should be the MJD of the science image that the object was found on.
Finding and reading lightcurves¶
You need to know the ltcv_provenance_tag and ltcv_process, and the diaobject_id of the object for which you want to get lightcurves:
from snappl.lightcurve import Lightcurve
ltcvs = Lightcurve.find_lightcurves( provenance_tag=ltcv_provenance_tag,
process=ltcv_process,
diaobject=diaobject_id,
band=band, # optional
)
You will get back a list of Lightcurve objects. You can find the actual lightcurve data of the first lightcurve from the list with ltcvs[0].lightcurve. This is an astropy QTable. You can read the metadata from ltcvs[0].lightcurve.meta.
Coming soon: a way to read a combined lightcurve that has all of the bands mixed together. (Not implemented yet.)
Saving lightcurves¶
You need to make sure you’ve created a dictionary with all the necessary metadata. Also make sure you’ve created a data table with the necessary columns; this can be an astropy Table, a pandas DataFrame, or a dict of lists. We shall call these two things meta and data.
Assume that you’ve made the lightcurve for object diaobject (a DiaObject object), and that you have a list of your images in images. Adjust below for the variables where you really have things. Finally, if you used an updated DiaObject position, make sure you have set the ra and dec in meta from that.
Finally, you will need to know the ltcv_provenance_tag we’re using.
Below, process is probably either campari or phrosty. major and minor are the major and minor parts of the version, which you should parse from campari.__version__ or phrosty.__version__. params are the parameters as described below in Making Provenances.
Do:
from snappl.provenance import Provenance
from snappl.lightcurve import Lightcurve
imgprov = Provenance.get_by_id( images[0].provenance_id )
objprov = Provenance.get_by_id( diaobject.provenance_id )
objposprov = Provenance.get_by_id( diaobj_pos['provenance_id'] )
ltcvprov = Provenance( process=process, major=major, minor=minor, params=params,
upstreams=[imgprov, objprov, objposprov] )
# The next line only needs to be run once. Once you've saved it to the database,
# you never need to do this again.
ltcvprov.save_to_db( tag=ltcv_provenance_tag )
meta['provenance_id'] = ltcvprov.id
meta['diaobject_id'] = diaobject.id
meta['diaobject_position_id'] = diaobj_pos['id']
for att in [ 'ra', 'dec', 'ra_err', 'dec_err', 'ra_dec_covar' ]:
meta[att] = diaobj_pos[att]
ltcv = Lightcurve( data=data, meta=meta )
ltcv.write()
ltcv.save_to_db()
You can look at ltcv.id to see the UUID of the lightcurve you saved, in case you are curious.
If you used the ra and dec that was in DiaObject, then meta['diaobject_position_id'] should be None. Skip everything else above that refers to diaobj_pos.
Finding and reading 1d Spectra¶
Right now we only have 1-d transient spectra, defined by the schema here.
If you already know the ID of the spectrum you want, you can just do:
from snappl.spectrum1d import Spectrum1d
spec = Spectrum1d.get_spectrum1d( spectrum1d_id )
- That will return a
Spectrum1dobject. See its docstring for more information. The most important property is probablycombined_data, which has dictionary with four elements; each value of the dictionary is a numpy array of the same length: lamb: the wavelength (in Å)flam: the flux (in what erg/s/cm²/Å)func: the uncertainty onflamcount: an integer, the number of individual image spectra that contributed to this data point.
The data_dict property has the full dictionary of data defined in the 1d spectrum schema.
You can find a spectra for a given object with:
from snappl.spectrum1d import Spectrum1d
specs = Spectrum1d.find_spectra( provenance_tag=spec1d_provenance_tag,
process=spec1d_process,
diaobject_id=diaobject_id,
mjd_start_min=mjd0, # optional
mjd_end_max=mjd1 # optional
)
The mjd... parmaeters are optional; include these if you want to provide a time window into which all the images that went into the spectra fit. See the find_spectra docstring for more optional parameters you can supply for the search. You will get back a list of Spectrum1d objects.
Saving 1d Spectra¶
You need to have the diaobject (a DiaObject object) for which you made the spectrum, potentially a diaobj_pos, an improved position for the object, and images, a list of Image object that held the dispersed images from which you are making the spectrum. You need to know the spec1d_provenance_tag.
You need to know the process (which is probably just the name of your code), and the major and minor versions of your code. Finally, you need to know the params that define how your code runs. The latter is just a dictionary; you can build it yourself, but see Making Provenances below.
You build a data structure that is described on the wiki; call that spec_struct. Note that if you pass all of provenance, diaobject, and diaobject_position_id to the Spectrum1d constructor, you do not necessarily need to have all those IDs in the spec_struct ahead of time; the constructor will fill out what is necessary. If you do both pass them and put them in the spec_struct, they must be consistent. You do not need to (and should not) set meta['id'] or meta['filepath'] yourself. id will be generated when you construct the Spectrum1d object, and filepath will be set automatically when you save the spectrum.
Do:
from snappl.provenance import Provenance
from snappl.spec1d import Spectrum1d
diaobj_prov = Provenance.get_by_id( diaobject.provenance_id )
imageprov = Provenance.get_by_id( images[0].provenance_id )
diaobj_pos_prov = Provenance.get_by_id( diaobj_pos['id'] )
spec1d_prov = Provenance( process=process, major=major, minor=minor, params=params,
upstreams=[ diaobj_prov, imageprov, diaobj_pos_prov ] )
# The next line only needs to be run once. Once
# you have saved a Provenance to the database you
# never need to save it again
spec1d_prov.save_to_db( tag=spec1d_provenance_tag )
# Make sure that all the mandatory metadata is there. In particular, it is necessary
# for each spec_struct['individual][n]['meta']['image_id'] to be set (where n is an
# integer). You could do this with:
for i in range( len(images) ):
spec_struct['individual'][i]['meta']['image_id'] = images[i].id
spec1d = Spectrum1d( spec_struct )
spec1d.save_to_db( write=True )
Note that when you create a Spectrum1d, it will keep a copy of the spec_struct object you passed in its spec_struct property. It will also modify this object, adding the id and filepath attributes, as well other things based on what you pass to the constructor. You can always get back at the internally stored structure with the data_dict property of the Spectrum1d object.
Choose a working environment¶
Whatever it is, you will need to pip install roman-snpit-snappl. This package is under heavy development, so you will want to update your install often. This provides the snappl modules that you are currently reading the documentation for.
We strongly recommend you think ahead towards developing your code to run in a container. The SNPIT will probably eventually need to run everything it does in containers. On your desktop or laptop, you can use Docker. On NERSC, you can use podman-hpc. On many other HPC clusters, you can use Singularity.
The SN PIT provides a containerized environment which includes the latest version of snappl at https://github.com/Roman-Supernova-PIT/environment . You can pull the docker image for this environment from one of:
registry.nersc.gov/m4385/rknop/roman-snpit-env:cpu
registry.nersc.gov/m4385/rknop/roman-snpit-env:cpu-dev
registry.nersc.gov/m4385/rknop/roman-snpit-env:cuda
registry.nersc.gov/m4385/rknop/roman-snpit-env:cuda-dev
rknop/roman-snpit-env:cpu
rknop/roman-snpit-env:cpu-dev
rknop/roman-snpit-env:cuda
rknop/roman-snpit-env:cuda-dev
We recommend you use the cpu version, unless you need CUDA, in which case try the cuda version, but you may need the cuda-dev version (which is terribly bloated).
WARNING: The snpit docker environment does not currently work on ARM architecture machines (because of issues with Galsim and fftw). This means that if you’re on a Mac, you’re SOL. If you’re on a Linux machine, do uname -a and look towards the end of the output to see if you’re on x86_64 or ARM. We hope to resolve this eventually. For now, as much as possible run on x86_64 machines. (However, reports are that pip instasll roman-snpit-snappl does work on ARM Macs, so the issue may just be we need to figure out how to get the docker images to build for ARM.)
You can, of course, create your own containerized environment for your code to run in, but you will need to support it, and eventually you will need to deliver it for the PIT to run in production. For that reason, we strongly recommend you start trying to use the standard SNPIT environment. Ideally, your code should be pip installable from PyPI, and eventually your code will be included in the environment just like snappl currently is.
Making Provenances¶
Before you save anything to the database, you need to make a Provenance for it. For example, consider the difference imaging lightcurve package phrosty. It will need to have a diaobject (let’s assume it’s in the variable obj), and it will need to have a list of images (let’s assume they’re in the variable images; we’ll leave aside details of template vs. science images for now). Let’s assume phrosty is using the Config system in snappl, and has put all of its configuration under photometry.phrosty. (There are details here you must be careful about; things like paths on your current system should not go under photometry.phrosty, but should go somewhere underneath system.. The current object and list of images you’re working on should not be in the configuration, but should just be passed via command-line parameters. The idea is that the configuration has all of, but only, the things that are the same for a large number of runs on a large number of input files which guarantee (as much as possible) the same output files.)
phrosty could then determine its own provenance with:
from snappl.config import Config
from snappl.provenance import Provenance
objprov = Provenance.get_by_id( obj.provenance_id )
improv = Provenance.get_by_id( images[0].provenance_id )
phrostyprov = Provenance( process='phrosty', major=MAJOR, minor=MINOR,
upstreams=[ objprov, improv ],
params=Config.get(), omitkeys=None, keepkeys=[ 'photometry.phrosty' ] )
See Provenance below for more details about what all of this means. Here, MAJOR and MINOR are the first two parts of the semantic version of phrosty.
We recommend that phrosty put in its output files, somewhere, in addition to what’s obvious:
The
provenance_idfor phrosty (obtained fromphrostyprov.id).The configuration parameters for phrosty (obtained from
phrostprov.params— a dictionary).
(If you’re very anal, you may want to save a gigantic dictionary structure including everything from phrostyprov and everything from all of the upstream provenances, and the upstreams of the upstreams, etc.)
NOTE: provenance can also store environment and environment version, but we don’t have that fully defined yet.
Before saving anything to the database, you will need to make sure that the provenance has been saved to the database. If you are sure that you’ve saved this same Provenance before, you can skip this step, but at some point you will need to:
phrostyprov.save_to_db( tag=PROVENANCE_TAG )
where PROVENANCE_TAG is a string; see Provenance Tags We Will Use In November 2025 below for a list of what we plan to use.
Connecting to the Database¶
To connect to the database, you need three things. First, you have to know the url of the web API front-end to the database. You must also have a username and a password for that web API. (NOTE: the config system is likely to change in the future, so exactly how this works may change.) If you’re using The Roman SNPIT Test Environment, then the test fixture dbclient configures a user with username test and password test_password, and in that environment the url of the web API is https://webserver:8080/.
You configure all of these things by setting the system.db.url, system.db.username, and either system.db.password or system.db.password_file in the configuration yaml files. (See Config below.) For example, see the default snpit_system_config.yaml in the Roman SNPIT environment. Do not save passwords to any git archive, and do not leave them sitting about in insecure places. Of course, having to type it all the time is a pain. A reasonable compromise is to have a secrets directory under your home directory that is not world-readable (chown 700 secrets). Then you can create files in there. Put your password in a file, and set the location of that file in the system.db.password_file config. (Make system.db.password to be null so the password file will be used.) If you’re using a docker container, of course you’ll need to bind-mount your secrets directory.
Once you’ve configured these things, you should be able to connect to the database. You can get a connection object with:
from snappl.dbclient import SNPITDBClient
dbclient = SNPITDBClient.get()
Thereafter, you can pass this dbclient as an optional argument to any snappl function that accesses the database. (Lots of the examples below do not explicitly include this argument, but you could add it to them.) Most of these functions will just use the first dbclient you created (which is cached in the background) if you don’t pass one, or create one themselves if one doesn’t already exists, so usually passing one isn’t necessary.
Config¶
snappl includes a config system whereby configuration files can be stored in yaml files. It has the ability to include other yaml files, and to override any of the config values on the command line, if properly used.
The Default Confg¶
You can find an example/default config for the Roman SNPIT in two files in the environment github repo:
Notice that the first one includes the second one. In the standard Roman SNPIT docker image, these two files are present in the root directory (/).
Ideally, all config for every SNPIT application will be in this default config file, so we can all use the same config and be sure we know what we’re doing. Of course, that’s far too cumbersome for development, so during development you will want to make your own config file with just the things you need in it.
By convention, everything underneath the system top level key are the things that you might have to change when moving from one cluster to another cluster, but that don’t change the behavior of the code. This includes paths for where to find things, configurations as to where the database is, login credentials, and the like. Everything that is _not_ under system should be things that define the behavior of your code. These are the things that are the same every you run on different inputs. It should _not_ include things like the specific images or diaobjects you’re currently working on. Ideally, everything that’s _not_ in system, if it stays the same, will give the same outputs on the same inputs when run anywhere.
Using Config¶
To use config, you first have to set the environment variable SNIPIT_CONFIG to the location of the top-level config file. If you’re using the default config and working in the roman snpit docker image, you can do this with:
export SNPIT_CONFIG=/default_snpit_config.yaml
Then, in your code, to get access to the config, you can just run:
from snappl.config import Config
...
cfg = Config.get()
tmpdir = Config.value( 'system.paths.temp_dir` )
Config.get() gets you a config object. Then, just call that object’s value method to get the actual config values. Separate different levels of dictionaries in the config with periods, as in the example. (Look at default_snpit_config.yaml to see how the config file corresponds to the value in the example above.)
There are more complicated uses of Config (including reading different, custom config files, modifying the config at runtime, understanding how the config files and all the possible modes of including other files are composed). Read the docstring on snappl.config.Config for more information.
Overriding Parameters on the Command Line¶
At runtime, if you set things up properly, you can override some of the parameters from the config file with command-line arguments. To accomplish this, you must be using python’s argparse package. When you’re ready to parse your arguments, write the following code:
configparser = argarse.ArgumentParser( add_help=False )
configparser.add_argument( '-c', '--config-file', default=None,
help=( "Location of the .yaml config file; defaults to the value of the "
"SNPIT_CONFIG environment variable." ) )
args, leftovers = configparser.parse_known_args()
try:
cfg = Config.get( args.config_file, setdefault=True )
except RuntimeError as e:
if str(e) == 'No default config defined yet; run Config.init(configfile)':
sys.stderr.write( "Error, no configuration file defined.\n"
"Either run <your application name> with -c <configfile>\n"
"or set the SNPIT_CONFIG environment variable.\n" )
sys.exit(1)
else:
raise
parser = argparse.ArgumentParser()
# Put in the config_file argument, even though it will never be found, so it shows up in help
parser.add_argument( '-c', '--config-file', help="Location of the .yaml config file" )
After that, put all of the parser.add_argument lines that you need for the command-line arguments to your code. Then, at the bottom, after you’re done with all of your parser.add_argument calls, put in the code:
cfg.augment_argparse( parser )
args = parser.parse_args( leftovers )
cfg.parse_args( args )
At this point in your code, you can get access to the command line arguments you specified with the args variable as usual. However, the running config (that you get with Config.get()) will _also_ have been updated with any changes made on the command line.
If you’ve set your code up like this, run it with --help. You will see the help on the arguments you defined, but you will also see optional arguments for everything that is in the config file.
TODO : make it so you can only include some of the top-level keys from the config file in what gets overridden on the command line, to avoid things getting far too cluttered with irrelevant options.
Provenance¶
Everything stored in the internal Roman SNPIT database has a Provenance associated with it. The purpose of Provenance is twofold:
It allows us to store multiple versions of the same thing in the database. (E.g., suppose you wanted to build a lightcurve for an object using two different configurations of your photometry software. If the database just stored “the lightcurve for this object”, it wouldn’t be possible to store both. However, in this case, the two lightcurves would have different provenances, so both can be stored.)
It keeps track of the code and the configuration used to create the thing stored in the database. Ideally, this includes all of the parameters (see below) for the code, in addition to the code and code version, as well as (optionally) information about the environment in which the code should be run, such that we could reproduce the output files by running the same code with the same configuration again.
A provenance is defined by:
The
process: this is usually the name of the code that produced the thing saved to the database.The
majorandminorversion of the process; Roman SNPIT code should use semantic versioning.
params, The parameters of the process (see below)Optionally: the
environment, andenv_majorandenv_minor, the major and minor versions of the environment. (By default, these three are all None.)
upstreams, the immediate upstream provenances (see below).
An id is generated from the provenance based on a hash of all the information in the provenance, available in the id property of a Provenance object. This id is a UUID (sort of), and will be something ugly like f76f39a2-edcf-4e31-ba6b-e3d4335cc972. Crucially, every time you create a provenance with all the same information, you will always get exactly the same id.
Getting Provenances from the Database¶
If, somehow, you got your hands on a provenance_id (the ugly 128-bit number), and you want to get the full Provenance object for it, you can accomplish that with:
from snappl.provenance import Provenance
prov = Provenance.get_by_id( provenance_id )
You will find provenance ids in the provenance_id field of things you pulled out of the database. For example, if you have a DiaObject object (call it obj) that you got with DiaObject.get_object or DiaObject.find_objects, then you can find the id of the provenance of that DiaObject in obj.provenance_id.
If, instead, you know (e.g. because the user passed this on the command line) that you want to work on the objects that we have chosen to tag with the provenance tag realtime, and the process rapid_alerts (for instance, these may be objects we learned about from the RAPID alert stream), then you could get the provenance with:
prov = Provenance.get_provs_for_tag( 'realtime', 'rapid_alerts' )
Parameters¶
The params field of a Provenance is a dictionary that should include everything necessary for the specified version of your code to produce the same output on the same input. It should not include things like input filenames. The idea is that the same Provenance will apply to everything that is part of a given run. Only when you are changing the configuration, or when you are getting input files from an earlier part of the pipeline, should the Provenance change.
If you are using the Config system, and you’ve put all of these parameters (but no system-specific, like base paths, and no input files) in the config yaml file, then you can get a suitable params with:
cfg = Config.get()
params = cfg.dump_to_dict_for_params( keepkeys=[ 'photometry.phrosty' ], omitkeys=None )
The list in keepkeys are the keys (including the full substructure below that key) from the config that you want to include in the dictionary. This allows you to select out the parts of the config that are relevant to your code. system and anything starting with system. should never be in keepkeys.
Upstreams¶
The upstream provenances are the ones that created the input files you use. For example, campari has three basic types of inputs: a diaobject, the supernova it’s running on; a diaobject_position, an updated position of the object; and images, the images it’s fitting its model to. Thus, it would have three upstream provenances, one for each of these things.
It can figure out these upstreams by just looking at the provenance_id field of the objects its using. Again, for example, campari will have (somehow) obtained a snappl.diaobject.DiaObject object; call that diaobj. It can get the diaobject provenance by just looking at diaobj.provenance_id. (To actually get the full Provenance object from the id, run snappl.provenance.Provenance.get_by_id( provenance_id ).)
Upstreams is part of the provenance because even if you run your code with all the same parameters, if you’re taking input files that were from a differently configured process earlier in the pipeline, you expect different outputs. Upstreams basically specify which sorts of input files are valid for this provenance.
Creating a Provenance¶
Just create a provenance with:
from snappl.provenance import Provenance
prov = Provenance( process, major, minor, params=<params>, upstreams=<upstreams> )
In this call, process is a string, major and minor are integers, params is a dictionary (see Parameters), and upstreams is a list of Provenance objects (see Upstreams).
If this is a totally new Provenance— you’ve never made it before— then save it to the database with:
prov.save_to_db( tag=<tag> )
Here, <tag> is the provenance tag that you want to tag this provenance with. If the provenance already exists in the database, or if another provenance from the same process is already tagged with this tag, you will get an error. If the provenance you’re trying to save already exists, that’s fine; it won’t resave it, it will just notice that it’s there. So, this is safe to call even if you aren’t sure if you’ve saved it before or not. If, for some reason, you really want this to be a new provenance, add exists=False to the call. In that case, if the provenance already exists, an exception will be raised.
The Roman SNPIT Test Environment¶
(This is currently a bit of a mess, and I haven’t figured out how to get this to work on Perlmutter. However, if you’re on a desktop or laptop with an x86_64 architecture, then you should be able to get this running on your machine using Docker. Read all the comments at the top of this file in the environment repo.)