Settings#

settings activitysim.core.configuration.Settings#

The overall settings for the ActivitySim model system.

The input for these settings is typically stored in one main YAML file, usually called settings.yaml.

Note that this implementation is presently used only for generating documentation, but future work may migrate the settings implementation to actually use this pydantic code to validate the settings before running the model.

Fields:
  • benchmarking (bool)

  • check_for_variability (bool)

  • checkpoint_format (Literal['hdf', 'parquet'])

  • checkpoints (Union[bool, list])

  • chunk_method (Literal['bytes', 'uss', 'hybrid_uss', 'rss', 'hybrid_rss'])

  • chunk_size (int)

  • chunk_training_mode (Literal['disabled', 'training', 'production', 'adaptive'])

  • cleanup_pipeline_after_run (bool)

  • cleanup_trace_files_on_resume (bool)

  • create_input_store (bool)

  • default_initial_rows_per_chunk (int)

  • disable_destination_sampling (bool)

  • disable_zarr (bool)

  • duplicate_step_execution (Literal['error', 'allow'])

  • fail_fast (bool)

  • hh_ids (pathlib.Path)

  • households_sample_size (int)

  • inherit_settings (Union[bool, pathlib.Path])

  • input_store (str)

  • input_table_list (list[activitysim.core.configuration.top.InputTable])

  • instrument (bool)

  • keep_chunk_logs (bool)

  • keep_mem_logs (bool)

  • log_alt_losers (bool)

  • log_settings (tuple[str])

  • memory_profile (bool)

  • min_available_chunk_ratio (float)

  • models (list[str])

  • multiprocess (bool)

  • multiprocess_steps (list[activitysim.core.configuration.top.MultiprocessStep])

  • num_processes (int)

  • offset_preprocessing (bool)

  • other_settings (dict[str, typing.Any])

  • output_tables (activitysim.core.configuration.top.OutputTables)

  • pipeline_complib (str)

  • recode_pipeline_columns (bool)

  • resume_after (str)

  • rng_base_seed (Optional[int])

  • rotate_logs (bool)

  • sharrow (Union[bool, str])

  • source_file_paths (list[pathlib.Path])

  • testing_fail_trip_destination (bool)

  • trace_hh_id (int)

  • trace_od (tuple[int, int])

  • treat_warnings_as_errors (bool)

  • use_shadow_pricing (bool)

  • want_dest_choice_presampling (bool)

  • want_dest_choice_sample_tables (bool)

  • write_raw_tables (bool)

field benchmarking: bool = False#

Flag this model run as a benchmarking run.

New in version 1.1.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

By flagging a model run as a benchmark, certain operations of the model are altered, to ensure valid benchmark readings. For example, in regular operation, data such as skims are loaded on-demand within the first model component that needs them. With benchmarking enabled, all data are always pre-loaded before any component is run, to ensure that recorded times are the runtime of the component itself, and not data I/O operations that are neither integral to that component nor necessarily stable over replication.

field check_for_variability: bool = False#

Debugging feature to find broken model specifications.

Enabling this check does not alter valid results but slows down model runs.

field checkpoint_format: Literal['hdf', 'parquet'] = 'parquet'#

Storage format to use when saving checkpoint files.

field checkpoints: Union[bool, list] = True#

When to write checkpoint (intermediate table states) to disk.

If True, checkpoints are written at each step. If False, no intermediate checkpoints will be written before the end of run. Or, provide an explicit list of models to checkpoint.

field chunk_method: Literal['bytes', 'uss', 'hybrid_uss', 'rss', 'hybrid_rss'] = 'hybrid_uss'#

Memory use measure to use for chunking.

The following methods are supported to calculate memory overhead when chunking is enabled:

  • “bytes”

    expected rowsize based on actual size (as reported by numpy and pandas) of explicitly allocated data this can underestimate overhead due to transient data requirements of operations (e.g. merge, sort, transpose).

  • “uss”

    expected rowsize based on change in (unique set size) (uss) both as a result of explicit data allocation, and readings by MemMonitor sniffer thread that measures transient uss during time-consuming numpy and pandas operations.

  • “hybrid_uss”

    hybrid_uss avoids problems with pure uss, especially with small chunk sizes (e.g. initial training chunks) as numpy may recycle cached blocks and show no increase in uss even though data was allocated and logged.

  • “rss”

    like uss, but for resident set size (rss), which is the portion of memory occupied by a process that is held in RAM.

  • “hybrid_rss”

    like hybrid_uss, but for rss

RSS is reported by psutil.Process.memory_info() and USS is reported by psutil.Process.memory_full_info(). USS is the memory which is private to a process and which would be freed if the process were terminated. This is the metric that most closely matches the rather vague notion of memory “in use” (the meaning of which is difficult to pin down in operating systems with virtual memory where memory can (but sometimes can’t) be swapped or mapped to disk. Previous testing found hybrid_uss performs best and is most reliable and is therefore the default.

For more, see Chunk.

field chunk_size: int = 0#

Approximate amount of RAM to allocate to ActivitySim for batch processing.

See Chunk for more details.

field chunk_training_mode: Literal['disabled', 'training', 'production', 'adaptive'] = 'disabled'#

The method to use for chunk training.

Valid values include {disabled, training, production, adaptive}. See Chunk for more details.

field cleanup_pipeline_after_run: bool = False#

Cleans up pipeline after successful run.

This will clean up pipeline only after successful runs, by creating a single-checkpoint pipeline file, and deleting any subprocess pipelines.

field cleanup_trace_files_on_resume: bool = False#

Clean all trace files when restarting a model from a checkpoint.

field create_input_store: bool = False#

Write the inputs as read in back to an HDF5 store.

If enabled, this writes the store to the outputs folder to use for subsequent model runs, as reading HDF5 can be faster than reading CSV files.

field default_initial_rows_per_chunk: int = 100#

Default number of rows to use in initial chunking.

field disable_destination_sampling: bool = False#
field disable_zarr: bool = False#

Disable the use of zarr format skims.

New in version 1.2.

By default, if sharrow is enabled (any setting other than false), ActivitySim currently loads data from zarr format skims if a zarr location is provided, and data is found there. If no data is found there, then original OMX skim data is loaded, any transformations or encodings are applied, and then this data is written out to a zarr file at that location. Setting this option to True will disable the use of zarr.

field duplicate_step_execution: Literal['error', 'allow'] = 'error'#

How activitysim should handle attempts to re-run a step with the same name.

New in version 1.3.

  • “error”

    Attempts to re-run a step that has already been run and checkpointed will raise a RuntimeError, halting model execution. This is the default if no value is given.

  • “allow”

    Attempts to re-run a step are allowed, potentially overwriting the results from the previous time that step was run.

field fail_fast: bool = False#
field hh_ids: Path = None#

Load only the household ids given in this file.

The file need only contain the desired households ids, nothing else. If given as a relative path (or just a file name), both the data and config directories are searched, in that order, for the matching file.

field households_sample_size: int = None#

Number of households to sample and simulate

If omitted or set to 0, ActivitySim will simulate all households.

field inherit_settings: Union[bool, Path] = None#

Instruction on if and how to find other files that can provide settings.

When this value is True, all config directories are searched in order for additional files with the same filename. If other files are found they are also loaded, but only settings values that are not already explicitly set are applied. Alternatively, set this to a different file name, in which case settings from that other file are loaded (again, backfilling unset values only). Once the settings files are loaded, this value does not have any other effect on the operation of the model(s).

field input_store: str = None#

HDF5 inputs file

field input_table_list: list[InputTable] = None#

list of table names, indices, and column re-maps for each table in input_store

field instrument: bool = False#

Use pyinstrument to profile component performance.

New in version 1.2.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

Use of this setting to enable statistical profiling of ActivitySim code, using the pyinstrument library (an optional dependency which must also be installed). A separate profiling session is triggered for each model component. See the pyinstrument documentation for a description of how this tool works.

When activated, a “profiling–*” directory is created in the output directory of the model, tagged with the date and time of the profiling run. Profile output is always tagged like this and never overwrites previous profiling outputs, facilitating serial comparisons of runtimes in response to code or configuration changes.

field keep_chunk_logs: bool = True#

Whether to keep chunk logs when deleting other files.

field keep_mem_logs: bool = False#
field log_alt_losers: bool = False#

Write out expressions when all alternatives are unavailable.

This can be useful for model development to catch errors in specifications. Enabling this check does not alter valid results but slows down model runs.

field log_settings: tuple[str] = ('households_sample_size', 'chunk_size', 'chunk_method', 'chunk_training_mode', 'multiprocess', 'num_processes', 'resume_after', 'trace_hh_id', 'memory_profile', 'instrument')#

Setting to log on startup.

field memory_profile: bool = False#

Generate a memory profile by sampling memory usage from a secondary process.

New in version 1.2.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

Using this feature will open a secondary process, whose only job is to poll memory usage for the main ActivitySim process. The usage is logged to a file with time stamps, so it can be cross-referenced against ActivitySim logs to identify what parts of the code are using RAM. The profiling is done from a separate process to avoid the profiler itself from significantly slowing the main model core, or (more importantly) generating memory usage on its own that pollutes the collected data.

field min_available_chunk_ratio: float = 0.05#

minimum fraction of total chunk_size to reserve for adaptive chunking

field models: list[str] = None#

list of model steps to run - auto ownership, tour frequency, etc.

See Pipeline for more details about each step.

field multiprocess: bool = False#

Enable multiprocessing for this model.

field multiprocess_steps: list[MultiprocessStep] = None#

A list of multiprocess steps.

field num_processes: int = None#

If running in multiprocessing mode, use this number of processes by default.

If not given or set to 0, the number of processes to use is set to half the number of available CPU cores, plus 1.

field offset_preprocessing: bool = False#

Flag to indicate whether offset preprocessing has already been done.

New in version 1.2.

This flag is generally set automatically within ActivitySim during a run, and not be a user ahead of time. The ability to do so is provided as a developer-only feature for testing and development.

field other_settings: dict[str, Any] = None#
field output_tables: OutputTables = None#

list of output tables to write to CSV or HDF5

field pipeline_complib: str = 'NOTSET'#

Compression library to use when storing pipeline tables in an HDF5 file.

New in version 1.3.

field recode_pipeline_columns: bool = False#

Apply recoding instructions on input and final output for pipeline tables.

New in version 1.2.

Recoding instructions can be provided in individual InputTable.recode_columns and OutputTable.decode_columns settings. This global setting permits disabling all recoding processes simultaneously.

Warning

Disabling recoding is fine in legacy mode but it is generally not compatible with using Settings.sharrow.

field resume_after: str = None#

to resume running the data pipeline after the last successful checkpoint

field rng_base_seed: Optional[int] = 0#

Base seed for pseudo-random number generator.

field rotate_logs: bool = False#
field sharrow: Union[bool, str] = False#

Set the sharrow operating mode.

New in version 1.2.

  • false - Do not use sharrow. This is the default if no value is given.

  • true - Use sharrow optimizations when possible, but fall back to legacy pandas.eval systems when any error is encountered. This is the preferred mode for running with sharrow if reliability is more important than performance.

  • require - Use sharrow optimizations, and raise an error if they fail unexpectedly. This is the preferred mode for running with sharrow if performance is a concern.

  • test - Run every relevant calculation using both sharrow and legacy systems, and compare them to ensure the results match. This is the slowest mode of operation, but useful for development and debugging.

field source_file_paths: list[Path] = None#

A list of source files from which these settings were loaded.

This value should not be set by the user within the YAML settings files, instead it is populated as those files are loaded. It is primarily provided for debugging purposes, and does not actually affect the operation of the model.

field testing_fail_trip_destination: bool = False#
field trace_hh_id: int = None#

Trace this household id

If omitted, no tracing is written out

field trace_od: tuple[int, int] = None#

Trace origin, destination pair in accessibility calculation

If omitted, no tracing is written out.

field treat_warnings_as_errors: bool = False#

Treat most warnings as errors.

Use of this setting is not recommended outside of rigorous testing regimes.

New in version 1.3.

field use_shadow_pricing: bool = False#

turn shadow_pricing on and off for work and school location

field want_dest_choice_presampling: bool = True#
field want_dest_choice_sample_tables: bool = False#

turn writing of sample_tables on and off for all models

field write_raw_tables: bool = False#

Dump input tables back to disk immediately after loading them.

This is generally a developer-only feature and not needed for regular usage of ActivitySim.

The data tables are written out to <output_dir>/raw_tables before any annotation steps, but after initial processing (renaming, filtering columns, recoding).