Quality indicators processing_level and QC_indicator#

Data quality descriptors that are a common source of confusion. Below is an attempt to suggest a standard NPIOcean usage.

Caution

This contains some draft recommendations quickly put together by ØF. They are suggestions for a future NPIOcean recommendation - not agreed-upon NPIOcean guidelines (to be discussed further)..


processing_level#

Description#

ADCC Recommended attribute described as:

“A textual description of the processing (or quality control) level of the data.”

Can be assigned as either:

  • A global variable (if one description fits the entire dataset), or

  • A variable attribute for each data variable (if processing/QC differs between variables)

OceanSITES#

OceanSITES prescribes set values for processing_level (page 24, OceanSITES manual), like Raw instrument data, Post-recovery calibrations have been applied, Data interpolated..

Full OceanSITES option list for processing_level

  • Raw instrument data

  • Instrument data that has been converted to geophysical values

  • Post-recovery calibrations have been applied

  • Data has been scaled using contextual information

  • Known bad data has been replaced with null values

  • Known bad data has been replaced with values based on surroundingdata

  • Ranges applied, bad data flagged

  • Data interpolated

  • Data manually reviewed

  • Data verified against model or other contextual information

  • Other QC process applied

In practice, it can be hard to pick one of these that is a good fit for the processing/quality level of the data. In particular, the descriptions are a bit too broad to be very useful, and in most cases more than one of the catagories apply.

Since there is actually no requirements of the use of set values in the conventions (in fact, “a textual description” rather suggests this should be written case-by-case), we may want to go away from using the OceanSITES values.

NPIOcean recommendation (tentative)#

Suggested recommendation for use of processing_level

Write, in your own words, a brief summary (about 1-3 sentences) of the processing level.

Examples:

  • Salinity data have been corrected against in-situ water samples. A small number of outliers removed after manual review.

  • Converted to physical values only. No subsequent quality control applied.

  • A scale factor 1.03 has been applied to oxygen based on climatological mean for the area. Missing values have been filled using linear interpolation.

Take the OceanSITES list as a guide if helpful, but do not feel obliged to use the exact categories.

Think of the user who quickly wants to understand what the data processing level is.


QC_indicator#

Description#

An OceanSITES-specific attribute (i.e. not required by conventions). A quick description (a few words) describing the data quality (good data, probably good data, unknown..).

QC_indicator is not required by convention, and not from a controlled dictionary, so there is technically no need to adhere to a string dictionary. It is a useful attribute to include, however!

NPIOcean recommendation (tentative)#

Suggested recommendation for use of QC_indicator

  • Use the QC_indicator when you can.

  • Think of it as a quick description of the data quality (e.g. good data).

  • Assign on global or variable level depending on what was done for processing_level:

    • I. e., assign as global variable if processing_level is a global attribute.

  • Preferably use values from the list below.

    • (This list is based on OceanSITES QC-indicator codes, we have added the category uncalibrated data).

Suggested option list for QC_indicator

Meaning

Comment

unknown

No QC was performed.

good data

All QC tests passed.

probably good data

potentially correctable bad data

Not to be used without scientific correction or re-calibration.

bad data

Data have failed one or more tests.

nominal value

Data were not observed but reported (e.g., instrument target depth).

interpolated value

Missing data may be interpolated from neighboring data in space or time.

missing value

Fill value - not actual data (e.g. a placeholder for data that will arrive later ).

uncalibrated data

Uncalibrated data (e.g. raw Chl-A data not calibrated against water samples).


Think of the user who quickly wants to understand what the data quality is.