Changelog¶
2.0.0 (August 2018)¶
- Dropped MySQL support
- Refactored clean and loading commands for readability
- Added Django 2.1 support
- Added Python 3.7 support
1.6.2 (October 2017)¶
- Renamed raw data zipfile as raw.zip to match the language on our revamped downloads page
1.6.1 (September 2017)¶
- Upgrade to v2.0.0 of django-postgres-copy and refactor export commands accordingly
- Require Django 1.10 or above
- Correct documentation for a number of fields
1.6.0 (July 2017)¶
- Deprecate
--test, --use-test-data
options fromupdatecalaccessrawdata
command. - Renamed custom Django project setting
CALACCESS_DOWNLOAD_DIR
toCALACCESS_DATA_DIR
. - Removed
CALACCESS_TEST_DOWNLOAD_DIR
project setting. - Extract and track any .TSV file regardless of location in download .ZIP directory tree.
1.5.2 (April 2017)¶
- Fix duplicate updates. Only create a new
RawDataVersion
if:Content-length
fromHEAD
differs fromexpected_size
on previous version, or,Last-modified
is at least five minutes more recent thanrelease_datetime
on previous version.
1.5.1 (April 2017)¶
- Proceed with download of ZIP file as long as last-modified datetimes in
HEAD
andGET
requests are within five minutes of each other.
1.5.0 (April 2017)¶
- Django 1.11 compatibility.
- Fix check for existing clean zipped file when resuming.
- Fix message on response status code log.
- Skip dropping/re-adding of database table constraints and indexes when loading into MySQL (transactional DDL statements are not supported).
1.4.9 (March 2017)¶
- Reset auto-increment fields after truncating database tables in postgres.
- Add prefixes on tracking model admins.
- When making requests to sos.ca.gov, log HTTP status code and reason and raise HTTP error if bad status.
1.4.8 (January 2017)¶
- Upgrade to
csvkit
version 1.0.
1.4.7 (December 2016)¶
- Fixed search field on admins for models with
ForeignKey
fields.
1.4.6 (November 2016)¶
- Upgraded to latest version of django-postgres-copy
- Small improvements to CAL-ACCESS field documentation
- Small expansion of unittests
- Clean up of migrations
1.4.5 (September 2016)¶
- Copyediting of CAL-ACCESS form documentation
1.4.2 (late-August 2016)¶
- Docstring edits
1.4.1 (late-August 2016)¶
- Increase max character length on
ReceivedFilingsCd
fields. - Prevent unnecessary download of zip when resuming
updatecalaccessrawdata
. - Include release datetimes in log when
downloadcalaccessrawdata
andupdatecalaccessrawdata
versions are incompatible.
1.4.0 (mid-August 2016)¶
Added zipping up and archiving of cleaned CSVs and error logs.
- Added
RawDataVersion.clean_zip_archive
FileField. - Renamed
RawDataVersion.zip_file_archive
toRawDataVersion.download_zip_archive
.
- Added
Smaller clean data files (removed unnecessary quote characters).
Improvements to tracking models
- Replaced
RawDataCommand
model with datetime fields and related properties- Added to
RawDataVersion
instances.update_start_datetime
and.update_finish_datetime
to store version’s most recent update start and finish datetimes..update_completed
returnsTrue
if most recent update to version started and finished..update_stalled
returnsTrue
if most recent update to version started but did not finish..download_start_datetime
and.download_finish_datetime
to store version’s most recent download start and finish datetimes..download_completed
returnsTrue
if most recent download of version started and finished..download_stalled
returnsTrue
if most recent download version started but did not finish..completed()
QuerySet method toRawDataVersion
to get all versions where the update completed.
- Added to
RawDataFile
instances.clean_start_datetime
and.clean_finish_datetime
to store raw file’s most recent clean start and finish datetimes..load_start_datetime
and.load_finish_datetime
to store raw file’s most recent load start and finish datetimes.
- Added to
- Expanded file size tracking
- Renamed
.size
to.expected_size
onRawDataVersion
instances. - Added
.download_zip_size
toRawDataVersion
instances. - Added
.clean_zip_size
toRawDataVersion
instances. - Added methods to get a pretty version (e.g.,
723M
) of each file size field- Added to
RawDataVersion
instances.pretty_expected_size()
.pretty_download_size()
.pretty_clean_size()
- Added to
RawDataFile
instances.pretty_download_file_size()
.pretty_clean_file_size()
- Added to
- Raise
CommandError
if completed download file size is not the same as expected size. - Added
RawDataVersion
properties to calculate file and record counts:.download_file_count
.download_record_count
.clean_file_count
.clean_record_count
.error_file_count
.error_count
- Renamed
- Replaced
Added
extractcalaccessrawfiles
management command for unzipping and extracting raw data files from downloaded CAL-ACCESS database export.- Start and finish times stored in
.start_extract_datetime
and.finish_extract_datetime
onRawDataVersion
instances.
- Start and finish times stored in
Bug fixes.
- In``downloadcalaccessrawdata``, skip download if the size of the local zip file is equal to or bigger than the expected zip file size.
- Because the server hosting the ZIP doesn’t always provide the most up-to-date resource (as we have documented), a
CommandError
will be raised under any of the following conditions:- If
downloadcalaccessrawdata
is not called from the command-line (presumably, then, it was called byupdatecalaccessrawdata
), and theRawDataVersion
instance of the download command doesn’t match the most recently started update. - If the
ETag
in the initial HEAD request made bydownloadcalaccessrawdata
does not match theETag
in the subsequent GET request. - If the actual size of the ZIP does not match the value of the
Content-Length
in the HEAD response.
- If
- If
downloadcalaccessrawdata
raises any of the above errors,updatecalaccessrawdata
will wait five minutes and try again. - When archiving zips and files, open in binary (
'rb'
) mode. - In
cleancalaccessrawfile
, fixed skipping of empty lines for Python 3.5.
Support for Django 1.10.
1.3.0 (July 2016)¶
- Now distributing on wheels.
- Added error_count to output
reportcalaccessrawdata
and excluded any unspecified fields. - Added model property to RawDataFile that returns the CalAccess model object.
1.2.0 (July 2016)¶
- Enhancements to tracking models
- Zero pad datetime parts of archive dir (for better sorting)
- Calculate and store load_columns_count and load_records_count in
loadcalaccessrawfile
- Added error_count and error_log_archive fields to RawDataFile in order to track bad line parses during
cleancalaccessrawfile
. - Added download_file_size and clean_file_size to RawDataFile.
- Enhancements to CalAccess models
- Added inactive models group for CAL-ACCESS tables that are empty or apparently no longer in use.
- Added a CalAccessMetaClass to automatically configure meta attributes common to all models.
- Added a custom admin for every model.
- Model verbose names are pre-fixed with model groups
- Edits to model doc strings.
- Enhancements to management commands
- Added standard logging.
- Added a logger.info to the end of the
updatecalaccessrawdata
command to allow sending of emails when finished - Edits to command doc strings.
- More tests
- Test to confirm that any field included in a model’s
UNIQUE_KEY
attribute actually exists on the model. - Test to confirm that every model has a custom admin.
- Test to confirm that any field included in a model’s
- Bug fixes
- Fixed numbers in clean_records_count for RawDataFile.
- Fixed line numbers logged in errors.csv files.
- Write output of
reportcalaccessrawdata
to data directory instead ofREPO_DIR
, which may not be in settings.
1.1.0 (late June 2016)¶
- When –noinput is invoked for
updatecalaccessrawdata
, exit if previously updated to the currently available version. - Enforce lowercase UNIQUE_KEY settings on models.
- Removed unnecessary pretty amount model methods as part of driving common.py models file test coverage up to 100%.
1.0.2 (early June 2016)¶
- Include migrations in official package.
- Fix
verbose_name
forRawDataFile.clean_file_archive
.
1.0.0 (May 2016)¶
- Enhanced resume behavior
- Allow previously interrupted updates to resume at any stage of the process: downloading, cleaning or loading.
- Users will be prompted to resume (if possible). User may decline and re-start the entire update.
- Removed
--resume-download
option fromupdatecalaccessrawdata
anddownloadcalaccessrawdata
in favor of prompting the user to resume. - Removed
--database
option from all commands. Multi-database users are encouraged to use Django’s database routers.
- Raw data file archiving
- Added
CALACCESS_STORE_ARCHIVE
setting. When enabled, management commands will save each version of the downloaded .zip file, the extracted .tsv files and cleaned .csv files to the Django project’sMEDIA_ROOT
. - Added FileFields to RawDataVersion and RawDataFile in order to link the database records with the archived files they reference.
- Added
- Completed documentation of all 80 raw data models and 1,467 fields
- Defined hundreds of choices for 182 look-up fields.
- Published expanded Django project documentation. Added re-directs from old app-specific documentation.
- Integrated references to official documents and filing forms into data models. PDFs on DocumentCloud.
- Expanded unit testing of data model documentation
- Wider scope of choice field testing.
- Verify that each model has a
UNIQUE_KEY
attribute set. - Verify that each model has a document reference.
- Verify that each choice field has a document reference.
- Verify that each model with a form_type or form_id field (with a few exceptions) is linked to filing forms.
- Introduced
reportcalaccessrawdata
command, which generates a report outlining the number / proportion of files / records cleaned and loaded.
- Model Re-modeling:
- Moved
BallotMeasuresCd
fromother.py
tocampaign.py
. Same with admin. - Moved remaining models in
other.py
tocommon.py
. Removedother.py
. Same with admins. - Re-ordered models into related groups.
- Moved
- Bug fixes
- Truncate time portions of raw datetime values (see #1457).
- Strip newlines when loading into MySQL.
0.2.0 (January 2016)¶
- Support for Python 3.5
- Support for Django 1.9
- Simplified downloadcalaccessrawdata. Now only downloads, unzips and preps
- Introduced updatecalaccessrawdata, which downloads, cleans and loads data
- Added –resume-download option in case download is interrupted
- Added –csv option to loadcalaccessrawfile so that users can load from a file other than the one specified for the given calaccess_raw model
- Added –keep-files option. Unless the option is invoked downloadcalaccessrawdata, cleancalaccessrawfile, loadcalaccessrawfile and updatecalaccessrawdata now clear out original and intermediate files
- Support for multiple databases configured in Django DATABASE settings. Users can now load into a specified database using –database option
- Fixed verifycalaccessrawfile
- Updated management command options to most recent Django style, using argparse instead of optparse
- Hundreds of unique keys, field defs and choices patched by Code Rushers
- Automatically generated table documentation page
- Expanded documentation
0.1.2 (February 2015)¶
- Substituted clint for progressbar
- Improved choices for form type fields
0.1.1 (January 2015)¶
- Datetime support for MySQL fields
- Fixed bug that didn’t allow null values in PostgreSQL datetime fields
0.1.0 (November 2014)¶
- Support for PostgreSQL database backends
- Upgraded to Django 1.7
- Prettified management command output and logging
- Improved docs, admins and configuration for some campaign-finance models
- Numerous small bug fixes and documentation corrections
0.0.7 (August 2014)¶
- Complete set of models that cover 100% of source CSV files
- Management commands that prep and load the data for MySQL backends
- Administration panels for previewing the data