Contribute to django-calaccess-scraped-data¶
This walkthrough is for developers who want to contribute to django-calaccess-scraped-data, a Django app to scrape from the CAL-ACCESS website supplementary data not included in the California Secretary of State’s nightly data dumps.
It will show you how to install the source code of this application to fix bugs and develop new features.
Preparing a development environment¶
It is not required, but it is recommended that development of the library be done from within a contained virtual environment.
One way to accomplish that is with Python’s virtualenv
tool and its helpful companion virtualenvwrapper
. If you have that installed, a new project can be started with the following:
$ mkproject django-calaccess-scraped-data
That will jump into a new folder in your code directory, where you can clone our code repository from GitHub after you make a fork of your own. Don’t know what that means? Read this.
$ git clone https://github.com/<YOUR-USERNAME>/django-calaccess-scraped-data.git .
Next install the other Python libraries our code depends on.
$ pip install -r requirements.txt
Connecting to a local database¶
The calaccess_scraped
app doesn’t have any specific database requirements. However, we recommend PostgreSQL 9.4 (or greater), which is a hard requirement of other apps in our tool chain.
Create the database the PostgreSQL way.
$ createdb calaccess_scraped -U postgres
Create a file at example/project/settings_local.py
to save your custom database credentials. That might look something like this.
DATABASES = {
'default': {
'NAME': 'calaccess_scraped',
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'USER': 'username', # <-- Change this
'PASSWORD': 'password', # <-- And this
'HOST': 'localhost',
'PORT': '5432'
}
}
Note
If you’d prefer to load the CAL-ACCESS outside your default database, check out our guide to working with Django’s system for multiple databases.
Once the database is configured¶
Now create the tables and get to work.
$ python example/manage.py migrate
Now you’re ready to scrape. The scrapecalaccess command will download, cache and parse content from the CAL-ACCESS website:
$ python example/manage.py scrapecalaccess
Welcome aboard!¶
Now that your development environment is set up, check out the GitHub issue tracker where plenty of work awaits.
As you submit your work, please pay attention to the results of our integration tests (more details here).