DOA estimator

What I describe in this article is a project I developed during the course Sound Analysis, Synthesis And Processing at Politecnico di Milano.

The entire class of students was placed in the following context: a Uniform Linear Array (ULA) with 16 MEMS microphones distributed over 45 cm records a source at 8 kHz, moving in front of the array.

The goal was sound source localization (DOA, Direction of Arrival) by implementing a delay-and-sum beamformer suitable for wide-band sources. Since spatial filtering is intrinsically narrow-band, the adopted processing separates the spectrum into frequency bands (via STFT) and proceeds frame by frame, in order to track the sound content over time and reconstruct its angle of arrival consistently.

The implementation process

Development is bottom-up: the fundamental blocks are validated first, and only then integrated into the system.

The class and function usage diagram that summarizes the operational flow is the following:

Main diagram

Below I will describe all modules one by one, treating them in order of implementation and testing.

AudioData: this is the data entry point. It loads multichannel audio from file, stores it, and normalizes it.
CustomFFT: a recursive FFT for power-of-two lengths, based on Cooley–Tukey. The output is compared against MATLAB’s fft, to ensure numerical correctness.

FFT test

STFTProcessor: computes the Short-Time Fourier Transform on a single channel. The function takes as input the signal, fs, window lengths, overlap, and nfft. The output is the STFT matrix ‘S’ with frequency axis f and time axis t. Validation is performed against MATLAB’s spectrogram.

STFT test

AllChannelSTFT: to move to multichannel processing, AllChannelSTFT applies the same STFT scheme to all microphones in the array, treating channels independently but with consistent parameters. Tests check correctness and output alignment for each channel.

AllChannelSTFT test

GetCovMatrix: starting from the multichannel STFTs, GetCovMatrix builds the spatial covariance matrix band by band. Verification is based on checking the Hermitian property of the matrix.
GetSteeringVector: computes the steering vector for a given angle and frequency, taking into account spacing d, speed of sound c, and the number of microphones. In practice, it assigns to each array element the expected phase with respect to the wavefront.
Beamform: implements the core of the system: delay-and-sum. For each band and for each frame, it combines the covariance with steering vectors over the angular interval of interest, generating a power map versus angle and time (p_theta_time). The quality of this map directly affects the final estimates.
DOAEstimator: reads p_theta_time and returns the direction-of-arrival estimates (doa_estimates) by identifying, frame by frame, the angles with maximum energy.

DOAEstimator test

Everything described above is orchestrated by the Main class, which loads and normalizes data (AudioData), computes STFTs on all channels (AllChannelSTFT), builds covariance matrices (GetCovMatrix), applies beamforming over the angle range (Beamform), and extracts DOA estimates (DOAEstimator).

Visualization

To interpret the results I implemented the classes visualizePseudospectrum, which shows the pseudospectrum (power as a function of angle, over time), and visualizeDOAEstimates, which plots the trend of the angular estimates. In addition, the classes getSingleFrame, framesGenerator, and videoGenerator create a sequence of frames and compose it into a video, useful to view the temporal evolution.

The visual result is shown in the figures below:

Pseudospectrum: highlights the power distribution with respect to the angle of arrival (and time). Peaks indicate the predominant directions, therefore the possible source directions.

Pseudospectrum

DOA over time: DOA estimates reveal the frame-by-frame trajectory of the moving source.

DOA over time

Arrow sequence

The analysis and tests confirm the effectiveness of the implemented beamforming. You can find the code on GItHub.

Angelo Antona, 28 May 2024