LTARE Data Documentation

Documentation is the process of recording all aspects of project design; sampling; lab analyses; data cleaning; data analyses; data quality control and assurance procedures; and development of decision-support tools. Seem familiar? These are the steps of the data life cycle. Documentation helps to:

  • standardize procedures
  • enable reproducibility
  • establish credibility
  • ensure others (including our future selves) use and interpret data correctly
  • provide searchability

All documentation (including this document) should be updated (and versioned) as procedures change and lessons are learned.

Samples collected by WaSHI LTARE sites must follow the procedures and standards described in the below documentation.

External data must have, at minimum, the documentation outlined in  to be integrated into the WaSHI LTARE dataset.

Project Level

Project-level documentation includes all descriptive information about the SOS dataset, as well as planning decisions and process documentation. Documentation includes quality assurance project plansstandard operating procedures, and other high-level documents (e.g., request for proposals, applications, meeting agendas/notes).

Quality assurance project plan (QAPP)

The QAPP is the highest level of project documentation and covers everything from the project description; personnel roles and responsibilities; project timelines; data and measurement quality objectives; study design; and overviews of field, laboratory, and quality control.

Ours can be found in Y:/NRAS/soil-health-initiative/state-of-the-soils/qapp.

Standard operating procedures (SOP)

SOPs provide detailed instructions for field, lab, or data processing procedures and decision-making processes.

Ours can be found in Y:/NRAS/soil-health-initiative/state-of-the-soils/sop.

SOS Sampling

The purpose of this SOP is to detail the procedures for a typical site visit in which soil samples are collected for physical, chemical, and biological soil health indicator analyses. Procedures include equipment preparation prior to sampling; best practices for filling out field forms; the selection of sampling locations; sampling protocols; sample handling and storage; and submitting samples to the lab. Following this SOP ensures data quality by creating audit trails and reminders to check that data are present, complete, and accurate. Additionally, this SOP will be used to maintain consistent sample collection procedures throughout the state for WSDA employees and partners.

Quality control / quality assurance (QA/QC)

This SOP outlines the process for screening sample metadata and lab results for completeness, consistency, and quality. Procedures involve subject matter expertise, investigation, communication with sampling teams and labs, algorithmic quality control, and tagging sample results with quality codes (listed in the below table). Data are then integrated into the statewide database.

Dataset Level

Dataset-level documentation applies to lab results, sample locations, grower information, and management data. Readmes and changelogs document what each dataset contains, how they are related, potential issues to be aware of, and any alterations made to the data. See below for examples of what to include.

Readme

readme files are plain text documents that contain information about the files in a folder, explanation of versioning, and instructions/metadata for data packages. These files are saved as .txt, instead of MS Word documents that take longer to open and can only be opened on computers with Microsoft installed.

Describe contents of folder

The readme.txt in the _complete-dataset folder describes each files’ structure, contents, and other pertinent information, such as data sources.

Explain versions

he readme.txt in the 2023_sampling > lab-data > raw folder explains why there are two different versions of the lab results and where to find additional information.

Provide instructions

Another readme.txt instructs how to use the files in the ArcGIS soil sample points box.com folder. When this folder is shared with partners, the readme helps orient them to the contents of the folder and modify the files as needed for their own project.

Changelog

Changelogs are also simple and concise plain text documents saved in a folder alongside data files that document changes to the dataset. For more information, see keepachangelog.com/.

At the bare minimum, the changelog.txt contains:

  • date of modification
  • initials of who made the changes
  • description of the changes

See the example changelog.txt in the _complete-dataset folder.

Variable Level

Variable-level documentation includes data dictionaries, which are tabular collections of names, definitions, and attributes about the variables in a dataset. Data dictionaries are ideally created in the planning phase of the project before data are collected.

Data Dictionary

Each row is a different variable, and each column is a different attribute of that variable. With a data dictionary, a user should be able to properly interpret each variable in the data.

Our data-dictionary.xlsx in the _complete-dataset folder contains three tabs (lab-resultssample-locations, and qc-codes) that describe the attributes of each variable.

External Data

External data refers to any data not directly collected by WSDA or trained partners (e.g., WSU or conservation districts) that follow our SOPs. These can include other studies pre-dating WaSHI, special soil health surveys, and publicly available datasets.

On a case-by-case basis, the Senior Soil Scientist and Data Scientist consider the following questions when deciding whether to integrate an external dataset:

  • How does the study design fit into SOS goals?
  • What field procedures were used and how were they documented?
  • Who analyzed the soil samples? With which methods and QA/QC procedures?
  • Are the following required metadata and management data available along with the lab results?
    • Farm, producer, and field info
    • Sampling date
    • Sampling depth
    • Latitude and longitude
    • Production system (current crop, crop rotation, etc.)
    • Information concerning tillage, livestock grazing, irrigation, soil fertility and amendments, land use history, and/or conservation practices
  • Is there a data dictionary or codebook describing the measurements, units, missing values, etc.?

Generally, external data should 1) be well documented, 2) be collected and analyzed by well-trained scientists and labs; and 3) have adequate accompanying metadata and management data to facilitate interpretation of the results.

Some publicly available datasets to consider are in Y:/NRAS/soil-health-initiative/state-of-the-soils/external-data.

Intake form

External data may be provided in the External Data Intake spreadsheet, alongside related documents such as SOPs, management surveys, raw data files, etc.


  1. Enough farm, producer and field info to distinguish unique farmers and fields for assigning unique IDs. They don’t need to include personally identifiable information.