Common approach on computing derivative of a noisy function numerically is similar to classic central differences we considered before. We approximate “difficult” by a polynomial near point of interest and assume that is equal (with some deviance) to the derivative of that polynomial.

Only one important change is required – method of approximation. In case of noisy data there is no sense to use interpolation as we do for central differences (values are already corrupted by noise, no need to preserve them precisely). Instead smoothing least-squares approximation is used in order to remove noise from the data. Filters derived by this procedure are commonly referenced as Savitzky-Golay digital differentiators. We will consider several particular filters of this class named after Cornelius Lanczos. Detailed procedure is described below (skip to the final formulas).

Analogously to central differences, we fix some step and sample at (odd) points around :

.

Then we construct approximating polynomial for by minimizing cost function

with respect to unknown coefficients . To do this we need to solve system of linear equations:

Once polynomial is constructed we can estimate following to our assumption:

Power of the polynomial should be strictly less than otherwise least-squares approximation will be equivalent to interpolation (number of unknowns will be equal to number of freedom degrees) and we will arrive to central differences.

Methods derived by this procedure are called low-noise Lanczos differentiators for and super Lanczos low-noise differentiators for . In general

These differentiators have very simple structure and can be written in general form for any :

Interesting fact is that denominators constitute special sequence of integer numbers appeared in many fields of mathematics and physics. For example, these numbers are coincide with maximum accumulated number of electrons at energy level . Check this website for more details.

To differentiate a digital signal we need to use h=1/SamplingRate and replace by in the expressions above. Magnitude responses of low-noise differentiators for are drawn below:

As it can be clearly seen this method is much more robust to noise than plain central differences. However due to least-squares nature Lanczos differentiators cannot guarantee complete noise suppression in high frequencies. Its magnitude response doesn’t go smoothly to zero near . Instead they resemble “wavy” always-non-zero response on high frequencies.

Filters with guaranteed suppression of noisy high frequencies are presented in the next article:

### Disclaimer

You can use materials from this site in any legal activity (research, teaching, engineering, etc.) freely on the basis of respecting my authorship by giving clear statement and reference to my site or particular page on it in your work. Besides brief description of your project is very welcome in the comments below. One point though, if you use my materials in commercial project please consider supporting my site and me by donation.

## 11 Comments

#Why low-noise Lanczos differentiators do not work with noisy data?

#I suppose of wrong coefficients!

Change signs of 22 (N=7), 86 (N=9) and 300 (N=11)

#I’ve checked the coefficients and their signs – all are correct.

#Pavel,

thank you for tutorial. I am right now working on Savitzky-Golay differentiation as applied to differential spectrometry and it seems that you may know how this stuff works. I can manage the algorithm, but there is a question of estimates of error. Do you know any results in this direction? I mean, I derived these estimates but I don’t know if I invented another bicycle.

#Hi Sergey,

What do you mean under error estimation? Asymptotic using sampling step – h?

#No. Well, in my problem sampling step is given – device’s resolution is such and such and you can’t increase it. My boss says we need 4th derivative and noises are nasty enough to use Savitzky-Golay filters. Problem is, when you use too little points the noises are not filtered out and if you use too much points you can smooth the sample all the way to zero.

What I talk about are estimates like that:

let , where is the noise.

I mean, if you compute something, you must know the precision of your calculations, i.e. what is the difference between value of derivative you computed and actual value of this derivative? I made some quick calculations on my lap, because I didn’t find any results in this direction. Either my googling skills suck (and it may actually be the case) or nobody came up with such results. I wonder what it is 🙂

#Yep, this is what I was talking about.

Usually residual term depends on sampling step and higher derivatives. You can easily derive such kind of error using Taylor series instead of .

However this kind of analysis has major flaw – it is based on assumption that magnitude of the higher derivatives are going to zero. Which is not the case for practical data especially in the presence of noise – derivatives behaves wildly. Hence range of the possible difference from true derivative becomes huge and meaningless.

Besides you have to catch your own tail since you have to calculate higher derivatives in order to get accuracy of lower-order estimation.

All you can do is to study your signal (range of frequencies), can it be presented with polynomial, what degree, noise properties?

Knowing these characteristics you can build appropriate filter.

Take a look on paper about Savitzkiy-Golay properties:

http://www.cs.wright.edu/~phe/Research/DigitalSignalProcessing-05.pdf

#Thanks for paper. At very least it’s a good place to start.

Problem is, my data are not signal strength at given times. My data are transparencies of sample at given wavelengths and noises come from limited sensivity of spectrometer, so as far as I know, these derivatives have no physical meaning. Frankly speaking, for now I just hope that derivatives exist and noise is somewhat decent. Anyway mathematics cannot answer these questions without some serious chemistry or physics.

As for what my data really is, most probably it’s sum of a lot of Gaussian curves with occasional Lorenz curve. If I only knew that they are uniformly wide…

As I told before, step h is given, and I can’t change it. The only parameters in my control are approximation degree and sample width. And for given function and step there is an optimal sample width that depends on a lot of things including higher derivatives. Problem is, this optimal sample width is a tricky beast – you miss it by ten percent and your errors go through the ceiling, so taking a wild guess and hoping that it will work out isn’t going to work.

Actually, being a PDE guy I do not see a lot of problem in dependence on higher derivatives. All you have to do is to get a decent equation that your function satisfies, derive some apriori estimates and then improve them with aposteriori estimates. Sounds scary, but it’s still better than to hope blindly that everything goes well. I tried, and it doesn’t. By the way, asymptotics of Fourier image of function are in fact equivalent to estimates on high derivatives, so you must have them one way or another and in either case you can’t do it by mathematics alone. Sigh.

Anyway, thanks for helping a newbie.

#Dear Pavel,

I do Monte Carlo simulations of Self Avoiding Walk on the lattice as model of Polymers.

I use low-noise differentiator for calculating the derivative of the function Entropy(Energy) obtained from the simulations.

How I cite the low-noise differentiator method in a scientific paper?

I think it is not possible to cite a web site.

Is there an original paper where the method is presented?

Thank you,

Carlo

#If you use filters from this page – then cite Savitzky-Golay paper where they derived.

If you use filters from this page: Smooth noise-robust differentiators – then cite my website. BiBTeX record is available on that page.

#Hello, Pavel,

I am interested in your discussion of the low-noise Lanczos differentiators on http://www.holoborodko.com/pavel/numerical-methods/numerical-derivative/lanczos-low-noise-differentiators/. The idea is very natural to me. I noticed that on your page, you describe results for M=2 (regular) and M=4 (super). I can’t help but wondering why we only consider even M. Lowering M decreases the number of degrees of freedom in the approximating polynomial, so my intuition tells me that we will have a less noisy result because we’re farther from overfitting. Am I missing something?