Installation Guide¶
This guide will walk you through the process of installing the latest official release of django-calaccess-processed-data so that you can incorporate CAL-ACCESS data into your own Django project.
If, instead, you want to install the raw source code or contribute as a developer please refer to the “How to contribute” tutorial.
Warning
This library is intended to be plugged into a project created with the Django web framework. Before you can begin, you’ll need to have one up and running. If you don’t know how, check out the official Django documentation.
Installing the Django apps¶
The latest version of the application can be installed from the Python Package Index using pip
.
$ pip install django-calaccess-processed-data
Like most Django applications, the app then needs to be added to the
INSTALLED_APPS
in your settings.py
configuration file. You also need to include other Django apps it depends on:
INSTALLED_APPS = (
# ... other apps up here ...
'calaccess_raw',
'calaccess_scraped',
'calaccess_processed',
'calaccess_processed_filings',
'calaccess_processed_elections',
'calaccess_processed_flatfiles',
'calaccess_processed_campaignfinance',
'opencivicdata.core.apps.BaseConfig',
'opencivicdata.elections.apps.BaseConfig',
)
A little more about these dependencies:
calaccess_raw
- This app downloads and extracts the raw data files exported each night from the CAL-ACCESS database. The app then loads these files into your Django project’s database with minimal transformations. For more details, see the django-calaccess-raw-data section.
calaccess_scraped
- This app scrapes the CAL-ACCESS website and loads additional data not included in the nightly exports. For more details, see the django-calaccess-scraped-data section.
opencivicdata.core
- This app includes Django models and admin panels for the core data types of the Open Civic Data specification, including
Person
,Organization
,Post
andMembership
. opencivicdata.elections
- This app includes Django models and admins panels for election-related data types that have been provisionally included in the Open Civic Data specification.
Connecting to a local database¶
Also in the settings.py
file, you will need to configure Django so it can connect to your database.
Note
Unlike a typical Django project, this application only supports PostgreSQL database backends. This is because we enlist specialized tools to load the immense amount of source data more quickly than Django typically allows. We haven’t developed those routines for SQLite and the other Django backends yet, but we might someday.
Before you begin, make sure you have a PostgreSQL server installed. If you don’t, now is the time to hit Google and figure out how. The official PostgreSQL documentation is another good place to start.
Once that’s handled, add a database connection string like this to your settings.py
.
DATABASES = {
'default': {
'NAME': 'calaccess_processed',
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'USER': 'your-username-here',
'PASSWORD': 'your-password-here',
'HOST': 'localhost',
'PORT': '5432'
}
}
Return to the command line. This will create a PostgreSQL database to store the data.
$ createdb calaccess_processed
Note
If you’d prefer to load the CAL-ACCESS outside your default database, check out our guide to working with Django’s system for multiple databases.
Loading the data¶
Now you’re ready to create the database tables with Django using its manage.py
utility belt.
$ python manage.py migrate
Once everything is set up, the updatecalaccessrawdata command will download the latest bulk data release from the Secretary of State’s website and load it into your location database.
$ python manage.py updatecalaccessrawdata
Warning
This will take an hour or more. Go grab some coffee.
Because the nightly raw export is incomplete, we have to scrape additional data from the CAL-ACCESS website. Use the scrapecalaccess command to kick off this process, either after updatecalaccessrawdata
finishes or in a separate terminal window:
$ python manage.py scrapecalaccess
Once the raw CAL-ACCESS data is loaded and the scrape has finished, you can transform all this messy data and load into a more simplified structure with the processcalaccessdata command:
$ python manage.py processcalaccessdata