How the System Works
====================

This page describes how the software works, how multiprocessing works, and the primary example model data schema.  The code snippets below may not exactly match the latest version of the software, but they are close enough to illustrate how the system works.

.. _how_the_system_works:

Execution Flow
--------------

An example model run starts by running the steps in :ref:`example_run`. The following flow chart represents steps of ActivitySim, but specific implementations will have different individual model components in their execution.

.. image:: images/example_flowchart.png

Initialization
~~~~~~~~~~~~~~

The first significant step of the ``run`` command is:

::

  from activitysim import abm

which loads :mod:`activitysim.abm.__init__`, which calls:

::

   import misc
   import tables
   import models

which then loads the misc, tables, and models class definitions.  Loading :mod:`activitysim.abm.misc` calls:

::

   from activitysim.core import config
   from activitysim.core import inject

which loads the config and inject classes.  These define Inject injectables (functions) and
helper functions for running models.  For example, the Python decorator ``@inject.injectable`` overrides the function definition ``settings`` to
execute this function whenever the ``settings`` object is called by the system.  The :mod:`activitysim.core.inject` manages the data
pipeline.

::

   @inject.injectable(cache=True)
   def settings():
       settings_dict = read_settings_file('settings.yaml', mandatory=True)
       return settings_dict

Next, the tables module executes the following import statements in :mod:`activitysim.abm.tables.__init__` to
define the dynamic inject tables (households, persons, skims, etc.), but does not load them. It also defines the
core dynamic injectables (functions) defined in the classes. The Python decorator ``@inject.table`` override the function
definitions so the name of the function becomes the name of the table when dynamically called by the system.

::

  from . import households
  from . import persons
  #etc...

  #then in households.py
  @inject.table()
  def households(households_sample_size, override_hh_ids, trace_hh_id):

The models module then loads all the sub-models, which are registered as model steps with
the ``@inject.step()`` decorator.  These steps will eventually be run by the data pipeliner.

::

  from . import accessibility
  from . import atwork_subtour_destination
  from . import atwork_subtour_frequency
  #etc...

  #then in accessibility.py
  @inject.step()
  def compute_accessibility(accessibility, network_los, land_use, trace_od):

Back in the main ``run`` command, the next steps are to load the tracing, configuration, setting, and pipeline classes
to get the system management components up and running.

::

  from activitysim.core import tracing
  from activitysim.core import config
  from activitysim.core import pipeline


The next step in the example is to define the ``run`` method, call it if the script is being run as the program entry point, and handle the
arguments passed in via the command line.

::

  def run():
    #etc...

  if __name__ == '__main__':
    run()


.. note::
   For more information on run options, type ``activitysim run -h`` on the command line


The first key thing that happens in the ``run`` function is ``resume_after = setting('resume_after', None)``, which causes the system
to go looking for ``setting``.  Earlier we saw that ``setting`` was defined as an injectable and so the system gets this object if it
is already in memory, or if not, calls this function which loads the ``config/settings.yaml`` file.  This is called lazy loading or
on-demand loading. Next, the system loads the models list and starts the pipeline:

::

  pipeline.run(models=setting('models'), resume_after=resume_after)

The :func:`activitysim.core.pipeline.run` method loops through the list of models, calls ``inject.run([step_name])``,
and manages the data pipeline.  The first disaggregate data processing step (or model) run is ``initialize_households``, defined in
:mod:`activitysim.abm.models.initialize`.  The ``initialize_households`` step is responsible for requesting reading of the raw
households and persons into memory.

Initialize Households
~~~~~~~~~~~~~~~~~~~~~

The initialize households step/model is run via:

::

   @inject.step()
   def initialize_households():

      trace_label = 'initialize_households'
      model_settings = config.read_model_settings('initialize_households.yaml', mandatory=True)
      annotate_tables(model_settings, trace_label)

This step reads the ``initialize_households.yaml`` config file, which defines the :ref:`table_annotation` below.  Each table
annotation applies the expressions specified in the annotate spec to the relevant table.  For example, the ``persons`` table
is annotated with the results of the expressions in ``annotate_persons.csv``.  If the table is not already in memory, then
inject goes looking for it as explained below.

::

   #initialize_households.yaml
   annotate_tables:
     - tablename: persons
       annotate:
         SPEC: annotate_persons
         DF: persons
         TABLES:
           - households
     - tablename: households
       column_map:
         PERSONS: hhsize
         workers: num_workers
       annotate:
         SPEC: annotate_households
         DF: households
         TABLES:
           - persons
           - land_use
     - tablename: persons
       annotate:
         SPEC: annotate_persons_after_hh
         DF: persons
         TABLES:
           - households

   #initialize.py
   def annotate_tables(model_settings, trace_label):

    annotate_tables = model_settings.get('annotate_tables', [])

    for table_info in annotate_tables:

        tablename = table_info['tablename']
        df = inject.get_table(tablename).to_frame()

        # - annotate
        annotate = table_info.get('annotate', None)
        if annotate:
            logger.info("annotated %s SPEC %s" % (tablename, annotate['SPEC'],))
            expressions.assign_columns(
                df=df,
                model_settings=annotate,
                trace_label=trace_label)

        # - write table to pipeline
        pipeline.replace_table(tablename, df)


Remember that the ``persons`` table was previously registered as an injectable table when the persons table class was
imported.  Now that the ``persons`` table is needed, inject calls this function, which requires the ``households`` and
``trace_hh_id`` objects as well.  Since ``households`` has yet to be loaded, the system run the households inject table operation
as well.  The various calls also setup logging, tracing, stable random number management, etc.

::

  #persons in persons.py requires households, trace_hh_id
  @inject.table()
  def persons(households, trace_hh_id):

    df = read_raw_persons(households)

    logger.info("loaded persons %s" % (df.shape,))

    df.index.name = 'person_id'

    # replace table function with dataframe
    inject.add_table('persons', df)

    pipeline.get_rn_generator().add_channel('persons', df)

    if trace_hh_id:
        tracing.register_traceable_table('persons', df)
        whale.trace_df(df, "raw.persons", warn_if_empty=True)

    return df

  #households requires households_sample_size, override_hh_ids, trace_hh_id
  @inject.table()
  def households(households_sample_size, override_hh_ids, trace_hh_id):

    df_full = read_input_table("households")


The process continues until all the dependencies are resolved.  It is the ``read_input_table`` function that
actually reads the input tables from the input HDF5 or CSV file using the ``input_table_list`` found in ``settings.yaml``

::

  input_table_list:
    - tablename: households
      filename: households.csv
      index_col: household_id
      column_map:
        HHID: household_id


Running Model Components
~~~~~~~~~~~~~~~~~~~~~~~~

The next steps include running the model components specific to the individual implementation that you are running and as specified in the ``settings.yaml`` file.

Finishing Up
~~~~~~~~~~~~

The last models to be run by the data pipeline are:

* ``write_data_dictionary``, which writes the table_name, number of rows, number of columns, and number of bytes for each checkpointed table
* ``track_skim_usage``, which tracks skim data memory usage
* ``write_tables``, which writes pipeline tables as CSV files as specified by the output_tables setting

Back in the main ``run`` command, the final steps are to:

* close the data pipeline (and attached HDF5 file)

Components
----------

Individual models and components are defined and described in the Developers Guide. Please refer to the :ref:`Components<dev_components>` section.