Management commands

The scraped-data app includes the following commands for scraping campaign finance data from the CAL-ACCESS website.

As with any Django app management command, these can be invoked on the command line or called within your Python code.

Raw content downloaded from CAL-ACCESS is stored in .scraper_cache/, found in the directory specified by BASE_DIR in your Django project’s settings.


This command runs the following management commands, in order:

  1. scrapecalaccesspropositions
  2. scrapecalaccesscandidates
  3. scrapecalaccessincumbents

These commands are defined in more detail below.


The default behavior of the scraper commands is to avoid excessive downloads. As such, a CAL-ACCESS web page’s content will only be downloaded if:

  1. The page’s content isn’t cached; or
  2. The byte size of the cached content differs from the size of the content on the server (as specified in Content-Length header).

You can override this default behavior by invoking the force-download option:

$ python scrapecalaccess --force-download

Alternatively, you can avoid making any network requests by invoking the --cache-only option so as to parse and store data only from previously cached content:

$ python scrapecalaccess --cache-only

By default, data saved to your database from previous scrapes is preserved, or you can invoke the --flush option to start over with empty data tables:

$ python scrapecalaccess --flush


usage: scrapecalaccess [-h] [--version] [-v {0,1,2,3}]
                                 [--settings SETTINGS]
                                 [--pythonpath PYTHONPATH] [--traceback]
                                 [--no-color] [--flush] [--force-download]

Run all scraper commands

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output
  --settings SETTINGS   The Python path to a settings module, e.g.
                        "myproject.settings.main". If this isn't provided, the
                        DJANGO_SETTINGS_MODULE environment variable will be
  --pythonpath PYTHONPATH
                        A directory to add to the Python path, e.g.
  --traceback           Raise on CommandError exceptions
  --no-color            Don't colorize the command output.
  --flush               Flush database tables
  --force-download      Force the scraper to download URLs even if they are cached
  --cache-only          Skip the scraper's update checks. Use only cached


Scrape certified candidates for each election on the CAL-ACCESS site. A component of the scrapecalaccess command.

This command requests and parses content from the “certified” view of the Campaign/Candidates/list.aspx page (e.g., the 2016 General certified candidates). Data parsed from these pages are saved in the CandidateElection and Candidate models.


Here is how to run the command.

$ python scrapecalaccesscandidates


usage: scrapecalaccesscandidates [-h] [--version] [-v {0,1,2,3}]
                                           [--settings SETTINGS]
                                           [--pythonpath PYTHONPATH]
                                           [--traceback] [--no-color]
                                           [--flush] [--force-download]

Scrape certified candidates for each election on the CAL-ACCESS site.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output
  --settings SETTINGS   The Python path to a settings module, e.g.
                        "myproject.settings.main". If this isn't provided, the
                        DJANGO_SETTINGS_MODULE environment variable will be
  --pythonpath PYTHONPATH
                        A directory to add to the Python path, e.g.
  --traceback           Raise on CommandError exceptions
  --no-color            Don't colorize the command output.
  --flush               Flush database tables
  --force-download      Force the scraper to download URLs even if they are
  --cache-only          Skip the scraper's update checks. Use only cached


Scrape each candidate’s committees from the CAL-ACCESS site.

This command requests and parses content from the “general” view of the Campaign/Candidates/Detail.aspx page for candidate’s most recent “session” (e.g., Edward T. Gaines general information leading up to the 2016 General election). Data parsed from these pages are saved in the CandidateCommittee model.


The scrapecalaccesscandidatecommittees command is not currently included in scrapecalaccess because of the number of CAL-ACCESS web pages it scrapes. This may change in the future.


Here is how to run the command.

$ python scrapecalaccesscandidatecommittees


usage: scrapecalaccesscandidatecommittees [-h] [--version]
                                                    [-v {0,1,2,3}]
                                                    [--settings SETTINGS]
                                                    [--pythonpath PYTHONPATH]
                                                    [--traceback] [--no-color]

Scrape each candidate's committees from the CAL-ACCESS site.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output
  --settings SETTINGS   The Python path to a settings module, e.g.
                        "myproject.settings.main". If this isn't provided, the
                        DJANGO_SETTINGS_MODULE environment variable will be
  --pythonpath PYTHONPATH
                        A directory to add to the Python path, e.g.
  --traceback           Raise on CommandError exceptions
  --no-color            Don't colorize the command output.
  --flush               Flush database tables
  --force-download      Force the scraper to download URLs even if they are
  --cache-only          Skip the scraper's update checks. Use only cached


Scrape list of incumbent state officials for each election on CAL-ACCESS site. A component of the scrapecalaccess command.

This command requests and parses content from the “incumbent” view of the Campaign/Candidates/list.aspx page (e.g., the 2017-2018 General incumbents). Data parsed from these pages are saved in the IncumbentElection and Incumbent models.


Here is how to run the command.

$ python scrapecalaccessincumbents


usage: scrapecalaccessincumbents [-h] [--version] [-v {0,1,2,3}]
                                           [--settings SETTINGS]
                                           [--pythonpath PYTHONPATH]
                                           [--traceback] [--no-color]
                                           [--flush] [--force-download]

Scrape list of incumbent state officials for each election on CAL-ACCESS site.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output
  --settings SETTINGS   The Python path to a settings module, e.g.
                        "myproject.settings.main". If this isn't provided, the
                        DJANGO_SETTINGS_MODULE environment variable will be
  --pythonpath PYTHONPATH
                        A directory to add to the Python path, e.g.
  --traceback           Raise on CommandError exceptions
  --no-color            Don't colorize the command output.
  --flush               Flush database tables
  --force-download      Force the scraper to download URLs even if they are
  --cache-only          Skip the scraper's update checks. Use only cached


Scrape links between filers and propositions from the official CAL-ACCESS site. A component of the scrapecalaccess command.

This command requests and parses content from the Campaign/Measures/list.aspx page (e.g., the 2015-2016 propositions and ballot measures) and “general” view of each propositions Campaign/Measures/Detail.aspx page (e.g., Prop 60’s general information). Data parsed from these pages are saved in the PropositionElection, Proposition and PropositionCommittee models.


$ python scrapecalaccesspropositions


usage: scrapecalaccesspropositions [-h] [--version] [-v {0,1,2,3}]
                                             [--settings SETTINGS]
                                             [--pythonpath PYTHONPATH]
                                             [--traceback] [--no-color]
                                             [--flush] [--force-download]

Scrape links between filers and propositions from the official CAL-ACCESS

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v {0,1,2,3}, --verbosity {0,1,2,3}
                        Verbosity level; 0=minimal output, 1=normal output,
                        2=verbose output, 3=very verbose output
  --settings SETTINGS   The Python path to a settings module, e.g.
                        "myproject.settings.main". If this isn't provided, the
                        DJANGO_SETTINGS_MODULE environment variable will be
  --pythonpath PYTHONPATH
                        A directory to add to the Python path, e.g.
  --traceback           Raise on CommandError exceptions
  --no-color            Don't colorize the command output.
  --flush               Flush database tables
  --force-download      Force the scraper to download URLs even if they are
  --cache-only          Skip the scraper's update checks. Use only cached