General Workflow¶
The general order of tasks performed during the processing of a dataset can be described briefly as follows. The workflow can be divided into a number of stages, that are mostly a reference to formerly independent programs but are kept to help the user to keep track of the progress and to serve as starting points for a restart of processing in the event of an unforeseen interruption like a power interruption or for a reprocess with changed settings that only affect parts of the process.
Before the actual processing, the user creates a configuration, either by writing a configuration file or by creating a project and creating or importing a configuration (in the graphical user interface). The actual processing is then started - by command or by pressing the “START” button - and runs without mandatory user interaction. Both the command-line and graphical interfaces will try to inform you about the progress.
Processing Stages¶
Collect Data – start¶
First all data files will be scanned to determine the time period covered and to check if the files are present and accessible. Subsequently they will be read and stored inside the project file, which is a plain SQLite3 database [Hipp, 2018].
Note
This step eliminates all redundancies, i.e. input files that cover overlapping time periods are merged automatically.
Preprocessor – pre¶
First, the slow data (also called reference data) are processed, i.e. averaged over the output intervals.
Raw fast values are from the database are converted using gain and offset values provided in the calibration information for each sensor. Known constant delays between the different values are compensated by accordingly shifting the values in time. Then, the quality tests for high-frequency raw data are applied, and spikes are removed according to the selected despiking method. Last, the values are calibrated using the known calibration coefficients mean values for each output interval are calculated.
Note
Optionally, despiked but not calibrated data in raw-data format (TOA5) may be written after this step. This is for users that want to use legacy unmodified EC-PACK.
Planar Fit – planar¶
Optionally, the planar-fit rotation angles are calculated for each planar-fit interval (a multiple of output intervals usually several days).
Note
Optionally, planar fit results in EC-PACK format may be written after this step. This is for users that want to use legacy unmodified EC-PACK.
Flux Calculation – flux¶
All device-dependent and physical corrections are applied to the fast data and all mean covariances and their uncertainties are calculated for each output interval. The corrections are optionally iterated to account for any interdependencies.
Postprocessor – post¶
The quality tests for the average values and fluxes are applied. From the results of all predecessor and postprocessor quality tests are combined to quality flags for each flux.
Write Output – out¶
The averaged values, fluxes, quality flags and quality measures are written to one or multiple files of the desired output format(s).
Data Flow¶
The data flow through EC-PeT follows this pattern:
Raw Data Input
Input files as put out by the datalogger, e.g. TOA5 from Campbell Scientific dataloggers
Either in one single file per time covered
- Or split into
Fast measurements (e.g. 20 Hz: wind components, sonic temperature, gas concentrations)
Slow measurements (e.g. every 10 min: reference temperature, pressure, humidity)
Data Storage
All data stored in the project file (whic is a SQLite3 database file that may be viewed or edited by any compatible database browser).
Automatic merging of overlapping time periods and deuplication
Metadata preservation and data integrity checks
Preprocessor
Preprocessor tests on raw high-frequency data
Spike detection and removal
Instrument diagnostic flag evaluation
4a. Planar fit (optional)
Calculation of uncorrected mean wind
Determination of instrument tilt by fit over a user specified time interval
Flux Calculation
Coordinate rotation (planar fit or double rotation)
Correction of wind dependent time delay between signals (optional)
Covariance computation with uncertainty estimates
Physical corrections (WPL, frequency response, etc.)
Postprocessor
Quality Assessment based on computed fluxes
Footprint analysis (optional)
Stationarity checks
Final quality flag assignment
Output Generation
Multiple output formats available
Quality-controlled and unredacted flux files
Diagnostic test result and quality measure files
Processing logs and metadata
Stage Dependencies¶
This dependency structure allows for:
Restart capability: Processing can be resumed from any completed stage
Selective reprocessing: Only affected stages need to be re-run when parameters change
Development and debugging: Individual stages can be tested independently
Performance Considerations¶
Processing time and resource usage depend on:
Dataset size: Larger datasets require more time and memory
Quality control settings: More stringent QC requires additional computation time, in particular iterative mthods.
Hardware resources: Multi-core systems can parallelize some operations
Typical processing rate:
The processing of:
20 Hz data
Full preprocessor test suite
Full postprocesor test suite
No footprint model
8 cores (Core-I Gen12)
takes approx 8 hours per mont data.
Disk memory usage scales approximately linearly with dataset size, with the project database reaching similar size to the raw data files.
Error Handling and Recovery¶
EC-PeT includes robust error handling:
Automatic backup: Project state saved at each stage completion
Resume capability: Interrupted processing can be resumed
Error logging: Detailed logs help diagnose issues
Data validation: Input data checked for consistency and completeness
Graceful degradation: Processing continues even if some data is problematic