LTARE Data Formats & Standards
Data Formats
Data generated from or integrated into WaSHI can be non-digital or digital.
Non-digital Data
Non-digital data, such as field forms, management surveys, and chain of custody forms, are manually recorded on paper forms. Paper forms must be transcribed or converted to digital file formats and then stored in the WaSHI site.
Digital Data
Digital data include tabular, spatial, and binary data, such as lab results, sample locations, and field photos. Non-conventional data also include code, algorithms, tools, and workflows.
- Tabular data include comma separated values (csv), tab separated values (tsv), Microsoft Excel open XML spreadsheet (xlsx), and portable document format (pdf).
- Spatial data include file geodatabases (gdb), vector shapefiles (zipped folder containing multiple file extensions), keyhole markup language (kml or kmz). Tabular data may also contain spatial data such as longitude and latitude.
- Binary data include photos (jpeg, png, gif, tiff), videos (mp4), code (R, py, js), and object-oriented data files (RDS, Rdata, parquet, arrow).
- Proprietary data formats include Microsoft Excel, Word, and Powerpoint files (xlsx, docx, pptx). RDS and RData files are examples of application-specific data formats that can only be opened using the R programming language or RStudio IDE. These types of files should be saved in conjunction with a copy of the data in a non-proprietary and open-standard format, such as csv, to maintain accessibility for those who do not have Microsoft Office or do not use R.
- Written documents and presentations are in formats including Microsoft Word and PowerPoint (docx and pptx), hypertext markup language (HTML), and pdf.
- Notebooks combine text with executable code to generate written documents and presentations in docx, pptx, html, or pdf formats. These notebooks are stored in formats depending on the programming language: a few examples include R markdown (rmd), Quarto (qmd), and Jupyter notebook (ipynb).
Data Standards
- Date will be expressed as YYYY-MM-DD according to ISO 8601 standard.
- Date with time will be expressed as YYYY-MM-DDTHH:MM:SSZ.
- T separates date from time.
- Z designates the time zone (Z or -HH:MM).
- Z if using Universal Time Coordinated (UTC) with no offset.
- Pacific Standard Time (PST) offset is -8:00.
YYYY-MM-DDTHH:MM:SS-8:00 - Pacific Daylight Time (PDT) offset is -7:00.
YYYY-MM-DDTHH:MM:SS-7:00
- Geospatial data will be accompanied by metadata that abides by the ISO 19115 standard and follows Esri’s documentation when using ArcGIS Pro. Metadata contains information about the identification, extent, quality, spatial and temporal schema, spatial reference, and distribution of digital geographic data.
- Code will follow the style guide indicated in the Code Style Guide.