Contribute to django-calaccess-downloads-website

This walkthrough is for developers who want to contribute to django-calaccess-downloads-website, a open-source archive of campaign-finance and lobbying-disclosure data from the California Secretary of State’s CAL-ACCESS database.

It will show you how to install the source code of this application to fix bugs, develop new features and deploy an archive to the Internet using Amazon Web Services.


Preparing a development environment

In order to contribute you first need to set up a local development environment by installing the source code and configuring a few settings.

While not required, we recommend that development be done within a contained virtual environment.

One way to accomplish that is with a two related Python packages: virtualenv and virtualenvwrapper. If you have both of these installed, a new project can be easily created like so:

$ mkproject django-calaccess-downloads-website

That will jump into a new folder in your code directory, where you can fork our code repository from GitHub. Don’t know what that means? Read this.

Once you’ve created a fork, you should clone it to your computer.

$ git clone https://github.com/<YOUR-USERNAME>/django-calaccess-downloads-website.git .

Next, install the other Python libraries our code depends on, like the Django web framework.

$ pip install -r requirements.txt

Configuring settings

Many of the settings in this project can vary depending on where the code is being run. For instance, your local installation of the code will likely connect to a different database than the public website.

To keep these different environments straight and avoid including sensitive passwords in public repositories we have developed a system for storing many of the configuration options in a file named .env at the project’s root.

The file is excluded from Git’s version control system and needs to be created fresh each time the code is installed.

How .env works

The .env file is expected to contain a separate section for each environment, using the structure favored by Python’s ConfigParser module. Here’s a simple example:

[DEV]
database_name=calaccess
mysecretpassword=password

[PROD]
database_name=calaccess
mysecretpassword=hotpockets

By default, the source code will draw settings from a section name DEV. To configure it to use a different set of variables (like the``PROD`` section above), you must set the CALACCESS_WEBSITE_ENV environment variable.

$ export CALACCESS_WEBSITE_ENV=PROD

If you are using virtualenv and virtualenvwrapper, you could add the above line of code to $VIRTUAL_ENV/bin/postactivate so that whenever you start the project’s virtual environment, the variable will be exported automatically.

Note

You could also add the following line to your $VIRTUAL_ENV/bin/postdeactivate script to clear the variable whenever you deactivate the virtual environment:

$ unset CALACCESS_WEBSITE_ENV

Connecting to a local database

Unlike a typical Django project, this application only supports PostgreSQL version 9.6 and above as a database backend. This is because we enlist specialized tools to load the immense amount of source data more quickly than Django typically allows.

Create the database the PostgreSQL way.

$ createdb calaccess_website -U postgres

Creating an archive on Amazon S3

Even a development project that will run only on your computer needs an account with Amazon Web Services to store archived files in its S3 file service.

If you don’t already have an AWS account, make one now and request a key pair that lets you access its services via Python.

Then create a new S3 “bucket” to store files archived by this project.


Filling in .env for the first time

The development environment can be created in the .env file by running a Fabric task that will ask you to provide a value for all of this project’s settings.

$ fab createconfig

You will prompted to provide the project’s full list of settings, though some of them are only necessary when deploying the code and site with Amazon Web Services.

Setting Required in development Definition
db_name Yes Name of your database.
db_user Yes Database user.
db_password Yes Database password.
db_host Yes Database host location.
aws_access_key_id Yes Shorter secret key for accessing Amazon Web Services.
aws_secret_access_key Yes The longer secret key for accessing Amazon Web Services.
aws_region_name Yes Amazon Web Services region where you resources are located.
s3_archived_data_bucket Yes Amazon S3 bucket where archived CAL-ACCESS data will be stored.
s3_baked_content_bucket No Amazon S3 bucket where the public-facing website will be stored.
key_name No Name of the SSH .pem file associated with Amazon Web Services. Should be found in ~/.ec2.
ec2_host No Public address of website’s Amazon EC2 instance.
email_user No Gmail account for sending error emails.
email_password No Gmail password for sending error emails.

If necessary, you can overwrite a specific setting or append a new one:

$ fab setconfig:key=<new-variable-name>,value=<some-value>

You can also print your current app environment’s configuration:

$ fab printconfig

Or everything in the Fabric environment:

$ fab printenv

Bootstrapping the project

Now that everything is configured, create the database tables.

$ python manage.py migrate

Once everything is set up, the updatedownloadswebsite command will download the latest bulk data release from the Secretary of State’s website load it into your local database and archive the files on Amazon S3.

$ python manage.py updatedownloadswebsite

Warning

This will take a while. Go grab some coffee.


Exploring the site

Finally, start the development server and visit localhost:8000/admin/ in your browser to inspect the site.

$ python manage.py runserver

Preparing a production server

This section will walk you through deploy the downloads website on the Internet via Amazon Web Services. You will need to have completed the steps above.

Change your environment

As described above, the source code will draw settings from a section of the .env file named DEV.

To switch to configuring your project for a production environment, you should set the CALACCESS_WEBSITE_ENV environment variable to PROD.

$ export CALACCESS_WEBSITE_ENV=PROD

If you are using virtualenv and virtualenvwrapper, you could add the above line of code to $VIRTUAL_ENV/bin/postactivate so that whenever you start the project’s virtual environment, this variable will be exported automatically whenever you use workon to begin work.


Creating an RDS database

You will need to create a hosted database to store the data and keep tabs on the archive over time. Our recommended method for doing this is using Amazon’s Relational Database Service.

You can spin up a PostgreSQL server there using our prepackaged Fabric commands. You’re only required to provide a name like download-website:

$ fab createrds:download-website

Then, wait several minutes while the server is provisioned.

By default, the new database server will have 100 GB of disk space allocated on a t2.large RDS class instance. If need be, you can override these settings:

$ fab createrds:download-website,block_gb_size=80,instance_type=db.m4.large

The address for the RDS host will automatically be added to the configuration for your current environment, which is stored in .env. If you already had an RDS host set for your current env, its address will be overwritten.


Create an EC2 Instance

Next you should create a new Ubuntu 14.04 server on Amazon’s Elastic Compute Cloud to host the Django project.

$ fab createec2

By default, the server will have 100 GB of disk space allocated on a c3.large class instance. If need be, you can override these settings:

$ fab createec2:block_gb_size=80,instance_type=c3.xlarge

You can also override our default Amazon Machine Image (AMI):

$ fab createec2:ami=<some-other-ami-id>

As with creating an RDS instance, the address for your new EC2 instance will automatically be added to the configuration for your current environment, which is stored in .env. If you already had an EC2 host set, its address will be overwritten.


Filling in .env for the second time

Now you’ll want to run our configuration command again, this time filling in the new details from your AWS account, database and server. You may want to create a new set of S3 buckets separate from your development buckets.

$ fab createconfig

Bootstrap the Django project

Finally, you’re ready to bootstrap the Django project on the Ubuntu server.

$ fab bootstrap

After connecting to your current EC2 instance, a framework called Chef and its dependencies, including Ruby, will be installed on the server. Chef is used to configure the server and install the downloads website’s code.

The bootstrap task also sets up a crontab job to execute run as command every six hours that will automate the collection, extraction and processing of the daily CAL-ACCESS database exports.


Wrapping up

And that’s it! You know have a live CAL-ACCESS archive running in the cloud.