Resources and guidelines for NetCDF formatting#

NetCDF conventions and the NPI approach#

It can be confusing to try to get an overview of what the various conventions and best practices for NetCDF formatting actually are.

Here, we collect various resources for where to find information about the conventions.

The NPI approach most important sources of information are:

  1. The CF conventions (current version 1.10)

    • Describe conventions for the use metadata; i.e. information about how to interpret the data.

    • This covers standardised descriptions of the physical meaning and units of data variables, as well as standard names for geographical regions etc.

  2. The ACDD conventions (current version 1.3)

    • Describe conventions for discovery metadata, i.e. information which makes it easy to find the datasets.

    • This covers global attributes of the dataset describing the dataset and its history, such as attributes title, creator_name, time_coverage_start.

The two are compatible; ADCC defines which attributes to include, and the CF conventions provides standardised names for specific important variable attributes.

In addition, the following are useful resources:

CF conventions#

The CF (Climate and Forecasting) conventions conventions define the community standards for NetCDF formatting. It is an extension of the older COARDS conventions. It was origainally targeted for large gridded datasets in the climate and weather forecating world, but has expanded to become the standard for general earth science data, including observational data such as the ocean data we collect at NPI.

  • The standard_name attribute associated with a variable is a key part of the CF conventions. It serves to describe precisely the physical quantities being represented. For example The standard_name is stricly controlled by the CF-conventions. For example, in-situ ocean temperature should have the standard name sea_water_temperature.

    The standard_name is a variable attribute, different from the variable name. The CF standard name table defines the standard_name and the canonical units.

    Multiple variables can have the same standard_name, e.g. if we have different measurements of the same variable or dual sensors. They should be differentiated not by standard_name but by other varioable attributes and variable name.

  • The units attribute associated with a variable describes the physical unit of the variable data, such as Pa, J m-2, or degree_north.

    • Dimensionless quantities that represent fractions, or parts of a whole, should have unit 1. This applies, for example, to practical salinity and sea ice fraction.

    • It is fine to use units other than the canonical units, such as degree_Celsius instead of K. The requirement is that the string describing the unit is supported by the UDUNITS2 package. An overview of units and symbols in UDUNITS2 can be found here.

  • The long_name attribute associated with each variable is optional but recommended, and not standardised by the conventions. It is described as a long descriptive name which may, foexample, be used for labeling plots. In instances where we do not use a standard_name, like in the case of uncalibrated/voltage data, it is very important to include a long_name describing the physical meaning of the variable.

For example, if sea temperature on a mooring is measured by a series of 5 Microcats and by a profiler that produces values at 10 levels, it may be reported in a single file with OceanSITES data management User’s manual temperature variables and 2 depth variables. TEMP(TIME, DEPTH) could hold the Microcat data, if DEPTH is declared as a 5-element coordinate; and TEMP_prof(TIME, DEPTH_prof) could hold the profiler data if DEPTH_prof is declared as a 10-element coordinate. Both variables would have a standard_name of “sea_water_temperature”. The following lists a subset of the OceanSITES recommended variable names.

Variable names

The variable name itself is not standardised by CF. The CF-documentation explicitly states that “Nothing depends on the names of variables”.

There is one notable exception: Names of standard coordinate variables (TIME, DEPTH, PRES, LONGITUDE, LATITUDE) should follow UNIDATA conventions when possible

Recommendations for variable names (but not strict standardizations!) are given by the SeaDataNet Parameter Discovery Vocabulary P02 (SDN P02). These are also used in the OceanSITES conventions.

At NPI, we will try to adhere to SND P02 names, e.g., PRES, TEMP, PSAL, CNDC. Useful references for this are the parameter names in the OceanSITES manual and ARGO user’s manual (p75).

OceanSITES (3.6, p25) recommends that variable names start with SND P02 names, which can be followed by a suffix (e.g. TEMP_prof). Suggested recommendation (ØL) is to use this to indicate preliminary/uncalibrated data (for example, CHLA_uncal).

ADCC conventions#

The ADCC (Attribute Convention for Data Discovery) conventions define the metadata attributes that should be included in the netCDF file. ADCC operates with Highly recommended, Recommended, and Suggested attributes. We should aim to always use the first two and include Suggested variables whenever possible.

The ADCC website provides a nicely ordered list of all required attributes with an explanation of what each field should contain. Yannick’s document also provides a nice template.

ARGO guidelines#

The ARGO data management users manual details the practices at ARGO.

Other useful attributes not strictly required (“NPI conventions”)#

  • It is good practice to include an original_name containing the variable name before conversion of the data. E.g., for CTD data, this could be the SBE variable names such as t090C, lECO-AFL etc.

Unidata conventions#

  • Conventions for units are defined by the UDUNITS package maintained at UCAR.

Other useful attributes#