Models for tracking updates¶
The raw-data app also keeps track of each snapshot of the CAL-ACCESS database released by the California Secretary of State, including its release date and byte size, as well as the activity of the management commands that process this data.
This tracking information is stored in the data tables outlined below.
Note
By default, the raw-data app does not archive previous versions of the CAL-ACCESS database. Rather, with each call to the management commands, the data files they process are overwritten.
You can configure the raw-data app to keep each copy of the zip file downloaded from the California Secretary of State as well as the indivdual raw .csv files and cleaned .tsv files by flipping the CALACCESS_STORE_ARCHIVE
to True
in settings.py
:
# in settings.py
CALACCESS_STORE_ARCHIVE = True
By default, the older copies of these files will be saved to the path specified by your Django project’s MEDIA_ROOT
setting (more on that here). However, if you’ve implemented a custom storage system or installed a third-party app (such as django-storages), that should work too.
RawDataVersion¶
Versions of CAL-ACCESS raw source data, typically released every day.
Fields¶
Name | Type | Unique key | Definition |
---|---|---|---|
id | Integer | Yes | Auto-incrementing unique identifer of versions |
release_datetime | DateTime | No | (Unique) date and time the version of the CAL-ACCESS database was released (value of Last-Modified field in HTTP response header) |
expected_size | Integer | No | The expected size of the downloaded CAL-ACCESS zip, as specified in the content-length field in HTTP response header |
update_start_datetime | DateTime | No | Date and time when the update to the CAL-ACCESS version started |
update_finish_datetime | DateTime | No | Date and time when the update to the CAL-ACCESS version finished |
download_start_datetime | DateTime | No | Date and time when the download of the CAL-ACCESS database export started |
download_finish_datetime | DateTime | No | Date and time when the download of the CAL-ACCESS database export finished |
extract_start_datetime | DateTime | No | Date and time when extraction of the CAL-ACCESS data files started |
extract_finish_datetime | DateTime | No | Date and time when extraction of the CAL-ACCESS data files finished |
download_zip_archive | FileField | No | An archive of the original zipped file downloaded from CAL-ACCESS |
clean_zip_archive | FileField | No | An archive zip of cleaned (and error log) files |
clean_zip_size | Integer | No | The actual size of the downloaded CAL-ACCESS zip after the downloaded completed |
download_zip_size | Integer | No | The size of the zip containing all cleaned raw data files and error logs |
Instance methods and properties¶
.download_completed |
Check if the download of the version's zip file completed. Return True or False . |
.download_stalled |
Check if the download of the version's zip file started but did not complete. Return True or False . |
.download_file_count |
Returns the count of files included in the version's downloaded zip. |
.download_record_count |
Returns the count of records in the version's downloaded files. |
.clean_file_count |
Returns the count of files cleaned in the version. |
.clean_record_count |
Returns the count of records in the version's cleaned files. |
.error_file_count |
Returns the count of cleaned files with errors in the version. |
.error_count |
Returns the count of cleaning errors in the version. |
.extract_completed |
Check if the extract of files from the downloaded zip completed. Return True or False . |
.extract_stalled |
Check if the extract of files from the downloaded zip started but did not complete. Return True or False . |
.update_completed |
Check if the database update to the version completed. Return True or False . |
.update_stalled |
Check if the database update to the version started but did not complete. Return True or False . |
.pretty_clean_size() |
Returns a prettified version (e.g., "725M") of the zip of clean data files and error logs. |
.pretty_download_size() |
Returns a prettified version (e.g., "725M") of the actual size of the downloaded zip. |
.pretty_expected_size() |
Returns a prettified version (e.g., "725M") of the expected size of the downloaded zip. |
Query set methods¶
.complete()
Filters down QuerySet to return only version that have a complete update.
$ python manage.py shell
>>> from calaccess_raw.models.tracking import RawDataVersion
>>> RawDataVersion.objects.completed()
<QuerySet [<RawDataVersion: 2016-08-15 11:20:29+00:00>, <RawDataVersion: 2016-08-11 11:20:24+00:00>, <RawDataVersion: 2016-08-09 11:20:49+00:00>, <RawDataVersion: 2016-08-05 11:20:27+00:00>, <RawDataVersion: 2016-08-04 11:20:28+00:00>, <RawDataVersion: 2016-07-31 11:20:29+00:00>, <RawDataVersion: 2016-07-30 11:20:42+00:00>, <RawDataVersion: 2016-07-29 11:20:30+00:00>, <RawDataVersion: 2016-07-28 11:20:30+00:00>, <RawDataVersion: 2016-07-26 11:20:28+00:00>, <RawDataVersion: 2016-07-22 11:20:30+00:00>, <RawDataVersion: 2016-07-05 11:20:30+00:00>, <RawDataVersion: 2016-07-04 11:20:30+00:00>, <RawDataVersion: 2016-06-28 11:20:28+00:00>, <RawDataVersion: 2016-06-14 11:20:49+00:00>, <RawDataVersion: 2016-06-10 11:20:26+00:00>, <RawDataVersion: 2016-06-08 11:20:29+00:00>, <RawDataVersion: 2016-05-27 11:20:28+00:00>, <RawDataVersion: 2016-05-21 15:35:11+00:00>, <RawDataVersion: 2016-05-20 13:59:57+00:00>, '...(remaining elements truncated)...']>
RawDataFile¶
Data files included in the given version of the CAL-ACCESS raw source data.
Fields¶
Name | Type | Unique key | Definition |
---|---|---|---|
id | Integer | Yes | Auto-incrementing unique identifer of the file |
file_name | String (up to 100) | No | Name of the raw source data file without extension |
download_records_count | Integer | No | Count of records in the original file downloaded from CAL-ACCESS |
clean_records_count | Integer | No | Count of records in the cleaned file generated by calaccess_raw |
load_records_count | Integer | No | Count of records in the loaded from cleaned file into calaccess_raw's data model |
download_columns_count | Integer | No | Count of columns in the original file downloaded from CAL-ACCESS |
clean_columns_count | Integer | No | Count of columns in the cleaned file generated by calaccess_raw |
load_columns_count | Integer | No | Count of columns on the loaded calaccess_raw data model |
download_file_archive | FileField | No | An archive of the original raw data file downloaded from CAL-ACCESS. |
clean_file_archive | FileField | No | An archive of the raw data file after being cleaned. |
clean_file_size | Integer | No | Size of the .CSV file |
download_file_size | Integer | No | Size of the .TSV file |
error_log_archive | FileField | No | An archive of the error log containing lines from the original download file that could not be parsed and are excluded from the cleaned file. |
error_count | Integer | No | Count of records in the original download that could not be parsed and are excluded from the cleaned file. |
version_id | Integer | No | Foreign key referencing the version of the raw source data in which the file was included. |
clean_start_datetime | DateTime | No | Date and time when the cleaning of the file started |
clean_finish_datetime | DateTime | No | Date and time when the cleaning of the file finished |
load_start_datetime | DateTime | No | Date and time when the loading of the file started |
load_finish_datetime | DateTime | No | Date and time when the loading of the file finished |
Instance methods and properties¶
.model() |
Returns the RawDataFile's corresponding CalAccess database model object. |
.pretty_clean_file_size |
Returns a prettified version (e.g., "725M") of the cleaned file's size. |
.pretty_download_file_size |
Returns a prettified version (e.g., "725M") of the downloaded file's size. |