General Workflow

The general order of tasks performed during the processing of a dataset can be described briefly as follows. The workflow can be divided into a number of stages, that are mostly a reference to formerly independent programs but are kept to help the user to keep track of the progress and to serve as starting points for a restart of processing in the event of an unforeseen interruption like a power interruption or for a reprocess with changed settings that only affect parts of the process.

Before the actual processing, the user creates a configuration, either by writing a configuration file or by creating a project and creating or importing a configuration (in the graphical user interface). The actual processing is then started - by command or by pressing the “START” button - and runs without mandatory user interaction. Both the command-line and graphical interfaces will try to inform you about the progress.

Processing Stages

EC-PeT Processing Workflow

Collect Data – start

First all data files will be scanned to determine the time period covered and to check if the files are present and accessible. Subsequently they will be read and stored inside the project file, which is a plain SQLite3 database [Hipp, 2018].

Note

This step eliminates all redundancies, i.e. input files that cover overlapping time periods are merged automatically.

Preprocessor – pre

First, the slow data (also called reference data) are processed, i.e. averaged over the output intervals.

Raw fast values are from the database are converted using gain and offset values provided in the calibration information for each sensor. Known constant delays between the different values are compensated by accordingly shifting the values in time. Then, the quality tests for high-frequency raw data are applied, and spikes are removed according to the selected despiking method. Last, the values are calibrated using the known calibration coefficients mean values for each output interval are calculated.

Note

Optionally, despiked but not calibrated data in raw-data format (TOA5) may be written after this step. This is for users that want to use legacy unmodified EC-PACK.

Planar Fit – planar

Optionally, the planar-fit rotation angles are calculated for each planar-fit interval (a multiple of output intervals usually several days).

Note

Optionally, planar fit results in EC-PACK format may be written after this step. This is for users that want to use legacy unmodified EC-PACK.

Flux Calculation – flux

All device-dependent and physical corrections are applied to the fast data and all mean covariances and their uncertainties are calculated for each output interval. The corrections are optionally iterated to account for any interdependencies.

Postprocessor – post

The quality tests for the average values and fluxes are applied. From the results of all predecessor and postprocessor quality tests are combined to quality flags for each flux.

Write Output – out

The averaged values, fluxes, quality flags and quality measures are written to one or multiple files of the desired output format(s).

Data Flow

The data flow through EC-PeT follows this pattern:

  1. Raw Data Input

    • Input files as put out by the datalogger, e.g. TOA5 from Campbell Scientific dataloggers

    • Either in one single file per time covered

    • Or split into
      • Fast measurements (e.g. 20 Hz: wind components, sonic temperature, gas concentrations)

      • Slow measurements (e.g. every 10 min: reference temperature, pressure, humidity)

  2. Data Storage

    • All data stored in the project file (whic is a SQLite3 database file that may be viewed or edited by any compatible database browser).

    • Automatic merging of overlapping time periods and deuplication

    • Metadata preservation and data integrity checks

  3. Preprocessor

    • Preprocessor tests on raw high-frequency data

    • Spike detection and removal

    • Instrument diagnostic flag evaluation

4a. Planar fit (optional)

  • Calculation of uncorrected mean wind

  • Determination of instrument tilt by fit over a user specified time interval

  1. Flux Calculation

    • Coordinate rotation (planar fit or double rotation)

    • Correction of wind dependent time delay between signals (optional)

    • Covariance computation with uncertainty estimates

    • Physical corrections (WPL, frequency response, etc.)

  2. Postprocessor

    • Quality Assessment based on computed fluxes

    • Footprint analysis (optional)

    • Stationarity checks

    • Final quality flag assignment

  3. Output Generation

    • Multiple output formats available

    • Quality-controlled and unredacted flux files

    • Diagnostic test result and quality measure files

    • Processing logs and metadata

Stage Dependencies

This dependency structure allows for:

  • Restart capability: Processing can be resumed from any completed stage

  • Selective reprocessing: Only affected stages need to be re-run when parameters change

  • Development and debugging: Individual stages can be tested independently

Performance Considerations

Processing time and resource usage depend on:

  • Dataset size: Larger datasets require more time and memory

  • Quality control settings: More stringent QC requires additional computation time, in particular iterative mthods.

  • Hardware resources: Multi-core systems can parallelize some operations

Typical processing rate:

The processing of:

  • 20 Hz data

  • Full preprocessor test suite

  • Full postprocesor test suite

  • No footprint model

  • 8 cores (Core-I Gen12)

takes approx 8 hours per mont data.

Disk memory usage scales approximately linearly with dataset size, with the project database reaching similar size to the raw data files.

Error Handling and Recovery

EC-PeT includes robust error handling:

  • Automatic backup: Project state saved at each stage completion

  • Resume capability: Interrupted processing can be resumed

  • Error logging: Detailed logs help diagnose issues

  • Data validation: Input data checked for consistency and completeness

  • Graceful degradation: Processing continues even if some data is problematic