Skip to content

Microcontroller friendly, efficient Savitzky-Golay Implementation with Gram Polynomials and Memoization

License

MIT, BSD-2-Clause licenses found

Licenses found

MIT
LICENSE
BSD-2-Clause
LICENSE.txt
Notifications You must be signed in to change notification settings

Tugbars/Savitzky-Golay-Filter

Repository files navigation

Savitzky-Golay Filter Implementation in C

Author: Tugbars Heptaskin
Date: 2025-10-24
Last Updated: 2025-11-05

Overview

This is a highly-optimized implementation of the Savitzky-Golay filter for data smoothing and differentiation. The implementation achieves 4-6x speedup over naive approaches through a comprehensive optimization strategy while maintaining bit-exact compatibility with MATLAB's sgolayfilt function (differences on the order of 10⁻⁶).

Key Features

  • Production-ready performance: Multiple layers of optimization for real-world applications
  • Embedded-safe: No heap allocation, no VLAs, static buffers only
  • MATLAB-validated: Output matches MATLAB's Savitzky-Golay filter to floating-point precision
  • Comprehensive documentation: Extensive inline documentation of mathematical foundations and optimization techniques
  • Configurable: Supports both non-causal (centered) and causal (past-only) filtering modes

Mathematical Foundation

The Savitzky-Golay filter performs polynomial least-squares fitting on a sliding window of data points. For each window position, it fits a polynomial of degree m through 2n+1 points (where n is the half-window size), then evaluates the fitted polynomial (or its derivative) at a target point.

Core Components

  1. Gram Polynomials F(k,d)

    • Orthogonal polynomials over discrete points [-n, ..., +n]
    • Computed using a three-term recurrence relation
    • The derivative order d allows computing smoothed derivatives
  2. Generalized Factorial GenFact(a,b)

    • Product: a × (a-1) × (a-2) × ... × (a-b+1)
    • Used for normalization factors in Gram polynomials
  3. Filter Weights

    • Each data point gets a weight determined by Gram polynomial projections
    • Weights are precomputed and reused via convolution
  4. Convolution

    • Filtered value: y[j] = Σ(i=0 to 2n) w[i] × x[j-n+i]

Optimization Strategy

This implementation employs five major optimization techniques for a cumulative 4-6x total speedup:

1. GenFact Lookup Table (1.5-2x speedup)

  • Problem: Repeated expensive factorial-like computations
  • Solution: Precompute all GenFact values into a 2D table at startup
  • Trade-off: ~4KB memory for O(1) lookups instead of O(n) computation
  • Impact: Eliminates repeated multiplications in weight calculation

2. Static Allocation (embedded-safe)

  • Problem: VLAs and heap allocation problematic for embedded systems
  • Solution: Fixed-size stack buffers with compile-time known maximum sizes
  • Benefits:
    • No stack probing overhead
    • Better cache locality
    • Safe for resource-constrained environments

3. Gram Polynomial Optimization (2-3x speedup)

Multiple sub-optimizations:

  • Rolling buffer approach: O(d) space instead of O(k×d)
  • Strength reduction: Hoist divisions (1/n) out of loops
  • Branch elimination: Separate d=0 case to avoid conditionals in hot paths
  • Optional memoization: Cache computed values when enabled

4. Edge Weight Caching (1.5x amortized speedup)

  • Problem: Edge weights recomputed for every filter application
  • Solution: Cache computed edge weights with parameter validation
  • Key insight: Leading and trailing edges use same weights (different data order)

5. ILP-Optimized Convolution (1.2-1.3x speedup)

  • Problem: Serial dependency chains prevent parallel execution
  • Solution: Four independent accumulation chains exploit CPU superscalar execution
  • Benefits:
    • Intel/AMD CPUs: 2 FMA units → 2 parallel operations
    • Apple M-series: 4 FMA units → 4 parallel operations
    • Pairwise reduction for numerical stability

Performance Validation

The implementation has been validated against MATLAB's sgolayfilt function:

  • Accuracy: Differences on the order of 10⁻⁶ (floating-point precision limits)
  • Test case: 350-point dataset, halfWindow=12, polynomialOrder=4
  • See: Included validation plots showing bit-level agreement

Usage

Basic Example

#include "savgolFilter.h"

int main() {
    // Your input data
    double dataset[] = { /* your data */ };
    size_t dataSize = sizeof(dataset) / sizeof(dataset[0]);
    
    // Allocate input and output arrays
    MqsRawDataPoint_t rawData[dataSize];
    MqsRawDataPoint_t filteredData[dataSize];
    
    // Initialize input data
    for (size_t i = 0; i < dataSize; ++i) {
        rawData[i].phaseAngle = dataset[i];
        filteredData[i].phaseAngle = 0.0f;
    }

    // Configure filter parameters
    uint8_t halfWindowSize = 12;      // Window size = 2*12+1 = 25 points
    uint8_t polynomialOrder = 4;      // 4th-order polynomial fit
    uint8_t targetPoint = 0;          // Center point (non-causal)
    uint8_t derivativeOrder = 0;      // 0 = smoothing, 1 = 1st derivative, etc.
  
    // Apply filter
    clock_t tic = clock();
    int result = mes_savgolFilter(rawData, dataSize, halfWindowSize, 
                                   filteredData, polynomialOrder, 
                                   targetPoint, derivativeOrder);
    clock_t toc = clock();

    if (result == 0) {
        printf("Elapsed: %f seconds\n", (double)(toc - tic) / CLOCKS_PER_SEC);
        // Use filteredData...
    } else {
        printf("Filter failed with error code: %d\n", result);
    }
    
    return 0;
}

Configuration Options

Non-Causal (Centered) Filtering

uint8_t targetPoint = 0;  // Uses both past and future data
  • Filter centered on current point
  • Uses symmetric window: [t-n, ..., t, ..., t+n]
  • Best for offline processing where all data is available
  • Produces smoothest results

Causal (Real-Time) Filtering

uint8_t targetPoint = halfWindowSize;  // Uses only past data
  • Filter uses only present and past data
  • Window extends backward: [t-2n, ..., t]
  • Suitable for real-time applications
  • Introduces phase delay but maintains causality

Derivative Computation

uint8_t derivativeOrder = 0;  // Smoothing
uint8_t derivativeOrder = 1;  // First derivative (velocity)
uint8_t derivativeOrder = 2;  // Second derivative (acceleration)

Compile-Time Configuration

The implementation supports optional features via preprocessor defines:

// Enable memoization (trades ~13KB memory for speed when reusing same parameters)
#define ENABLE_MEMOIZATION

// Alternative: Runtime GenFact precomputation (lower memory, specific to one filter config)
#define OPTIMIZE_GENFACT

Recommendation: Use default settings (lookup table + memoization) for best performance.

Parameter Constraints

The filter validates all parameters at runtime:

  • Window size: (2n+1) ≤ data size
  • Polynomial order: m < window size (typically m ≤ 4 for most applications)
  • Target point: 0 ≤ t ≤ 2n
  • Half-window size: n > 0
  • Maximum window: Configurable via MAX_WINDOW (default: 65 points)

Implementation Details

Edge Handling

The filter applies different strategies for three regions:

  1. Leading edge (first n points):

    • Uses asymmetric windows with shifted target points
    • Weights applied in reverse order to available data
    • Ensures smooth transition to central region
  2. Central region (points n to N-n-1):

    • Full symmetric windows
    • Optimal smoothing/differentiation
    • Most computationally efficient
  3. Trailing edge (last n points):

    • Reuses leading edge weights (mirrored)
    • Weights applied in forward order
    • Maintains consistency with leading edge

Numerical Stability

The implementation preserves numerical accuracy through:

  • Careful operation ordering in floating-point arithmetic
  • Pairwise reduction in accumulation chains
  • No premature rounding or truncation
  • Validated against MATLAB reference implementation

Memory Requirements

  • Lookup table: ~4KB (GenFact table, 65×65 floats)
  • Memoization cache: ~13KB (when enabled)
  • Stack per call: <2KB (fixed-size buffers)
  • Edge cache: <8KB (weights for edge cases)

Total: ~27KB worst-case (all features enabled)

Typical Applications

  • Signal smoothing: Remove noise while preserving signal features
  • Numerical differentiation: Compute derivatives with noise suppression
  • Feature extraction: Detect peaks, valleys, inflection points
  • Trend analysis: Extract underlying trends from noisy data
  • Real-time filtering: Causal mode for online data processing

Error Codes

 0  : Success
-1  : NULL pointer passed
-2  : Invalid parameters (constraint violation)

Branch Information

main branch (current)

  • Scalar implementation with comprehensive optimizations
  • Production-ready, extensively documented
  • Best for general-purpose use and embedded systems

Future Work

Potential extensions for specialized use cases:

  • SIMD vectorization (AVX/SSE) for massive datasets
  • GPU acceleration for real-time multi-channel processing
  • Multi-threaded batch processing

References

  • Original paper: Savitzky, A.; Golay, M.J.E. (1964). "Smoothing and Differentiation of Data by Simplified Least Squares Procedures". Analytical Chemistry 36 (8): 1627–1639.
  • MATLAB documentation: sgolayfilt

License

MIT License

Citation

If you use this implementation in your research, please cite:

@software{savgol_optimized_2025,
  author = {Heptaskin, Tugbars},
  title = {High-Performance Savitzky-Golay Filter Implementation in C},
  year = {2025},
  url = {[Your repository URL]}
}

For detailed implementation notes, see the extensive inline documentation in src/savgolFilter.c

About

Microcontroller friendly, efficient Savitzky-Golay Implementation with Gram Polynomials and Memoization

Topics

Resources

License

MIT, BSD-2-Clause licenses found

Licenses found

MIT
LICENSE
BSD-2-Clause
LICENSE.txt

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published