Contribute to django-calaccess-scraped-data¶

This walkthrough is for developers who want to contribute to django-calaccess-scraped-data, a Django app to scrape from the CAL-ACCESS website supplementary data not included in the California Secretary of State’s nightly data dumps.

It will show you how to install the source code of this application to fix bugs and develop new features.

Preparing a development environment¶

It is not required, but it is recommended that development of the library be done from within a contained virtual environment.

One way to accomplish that is with Python’s virtualenv tool and its helpful companion virtualenvwrapper. If you have that installed, a new project can be started with the following:

$ mkproject django-calaccess-scraped-data

That will jump into a new folder in your code directory, where you can clone our code repository from GitHub after you make a fork of your own. Don’t know what that means? Read this.

$ git clone https://github.com/<YOUR-USERNAME>/django-calaccess-scraped-data.git .

Next install the other Python libraries our code depends on.

$ pip install -r requirements.txt

Connecting to a local database¶

The calaccess_scraped app doesn’t have any specific database requirements. However, we recommend PostgreSQL 9.4 (or greater), which is a hard requirement of other apps in our tool chain.

Create the database the PostgreSQL way.

$ createdb calaccess_scraped -U postgres

Create a file at example/project/settings_local.py to save your custom database credentials. That might look something like this.

DATABASES = {
    'default': {
        'NAME': 'calaccess_scraped',
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'USER': 'username', # <-- Change this
        'PASSWORD': 'password', # <-- And this
        'HOST': 'localhost',
        'PORT': '5432'
    }
}

Note

If you’d prefer to load the CAL-ACCESS outside your default database, check out our guide to working with Django’s system for multiple databases.

Once the database is configured¶

Now create the tables and get to work.

$ python example/manage.py migrate

Now you’re ready to scrape. The scrapecalaccess command will download, cache and parse content from the CAL-ACCESS website:

$ python example/manage.py scrapecalaccess

Welcome aboard!¶

Now that your development environment is set up, check out the GitHub issue tracker where plenty of work awaits.

As you submit your work, please pay attention to the results of our integration tests (more details here).