### 1 Introduction

Benefiting by the installation of seismic stations, we can, to some extend, understand that how our planet Earth sings. These sounds released by our planet generate the vibration signals that can be recorded by the seismic receivers. Seismic ambient noise, which is generated by ocean infragravity waves, commonly referred to as “Hum” ($30 - 500 \ s$), microseisms ( the primary and secondary, $1 - 30 \ s$) and human noise (cultural noise, $0.1 - 1 \ s$) (Nakata et al., 2019). How to estimate the level of seismic ambient noise? The power spectral density (PSD) is usually used to describe the seismic noise spectra. Here, we briefly introduce the steps of estimating the seismic noise, some Python codes included. By the way, the PSD model of the seismic ambient noise has been introduced by Peterson (1993).

### 2 Basic principles

Supposing that we have a stochastic time series (randomly generated noise) $u(t)$, and the discrete form is represented as $u(n)$. The assumption that $u(t)$ is periodic is given here, we can directly estimate the its PSD as $$p(\omega) = \frac{1}{T}|U(\omega)|^2 \tag{1},$$ where $U(\omega)$ is the Fourier spectrum of $u(t)$, namely $U(\omega)=\int_{-T/2}^{T/2}u(t)e^{-i\omega t}dt$.

The discrete form of $p(\omega)$ is given by $$p(k)=\frac{1}{N}|U(k)|^2 \tag{2},$$ where, $N$ is the total data point number of $u(n)$.

However, the seismic noise is not stochastic stable and the direct estimation method is thus not proper. One may think that we can divide the noise time series into several segments to repeat the estimations of the PSD and calculate the average PSD to obtain the final result. Indeed, this data processing will improve the stability of the PSD of the seismic noise. Here we cut the data into $M$ segments and the length of every segment is $L$, the average PSD is obtained by $$\bar{p}(k)=\frac{1}{M}\sum_{m=1}^M[\frac{1}{L}|U_m(k)|^2] \tag{3}.$$ Similar to the idea of short time Fourier transform, window function can be also taken into consideration. Dividing data into several segments is the situation of rectangle window function. Sometimes we may use non-rectangle window function to conduct the data processing.

Finally, considering the data overlap can improve stability of the PSD result. The choice of the length of overlap, however, may need to tested for the perfect situation.

### 3 A Realistic Example

Here we retrieve $LHZ$ component seismic noise data recoded by station $MSEY$ of network $II$ from $2018-07-29T00:00:00$ to $2018-08-02T00:00:00$ in Seychelles, a island in the Indian Ocean.

The sampling rate is $1 \ Hz$, so the maximum frequency is $0.5 \ Hz$ according to the $Nyquist$ sampling theorem.

We will compare the PSD results from different estimation strategies in the following parts. Some pre-processing steps should be finished, e.g., removing mean, trend, doing tapering and removing the instrument response. The velocity waveform after removing instrument response is show as the below figure.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  import numpy as np import matplotlib.pyplot as plt from scipy.signal import savgol_filter as sgolay from obspy import read from scipy.signal import welch import matplotlib matplotlib.rcParams['text.usetex'] = True tr = read('II.MSEY.00.LHZ.SAC')[0] trc = tr.copy() d = trc.data * 1e6 dt = trc.stats.delta t = np.arange(len(d)) * dt / 1e3 plt.figure(figsize=(15, 4)) plt.plot(t, d, lw=1, color='#4682B4', alpha=0.8) plt.xticks(fontsize=16); plt.yticks(fontsize=16) plt.xlabel(r'Time $\times 10^3$ (s)', fontsize=23) plt.ylabel(r'Velocity ($\mu$m/s)', fontsize=23) plt.grid(lw=1, ls=(2, (10, 5)), color='c', alpha=0.25) plt.tight_layout() plt.show() 

We implement a Python method to estimate the average PSD of discrete time series with dividing the data into several segments:

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26  def overlap_psd(tr, seg, overlap): if seg < overlap: print('Ensure segment > overlap!') return dt = tr.stats.delta; fs = 1 / dt over_t = (seg-overlap) * dt seg_t = seg * dt ti = 0 p1 = np.zeros(seg) index = 0 t0 = tr.stats.starttime t1 = t0 while t1 < tr.stats.endtime-seg_t: t2 = t1 + seg_t trc = tr.copy() trc.trim(t1, t2) trc.data -= np.mean(trc.data); trc.detrend(); trc.taper(0.01) dd = trc.data p = np.fft.fft(dd, seg) p = abs(p) ** 2 / seg p1 += p t1 += over_t index += 1 f1 = np.arange(seg) * fs / (seg-1) p1 /= index; p1 = 10 * np.log10(p1) return f1, p1 

The PSD results from several estimation strategies are computed.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46  # The whole window. d0 = trc.data n0 = len(d0) f0 = np.arange(n0) * fs / (n0-1) p0 = 10 * np.log10( abs(np.fft.fft(d0))**2/n0 ) # Overlapped windows. fs = 1 / trc.stats.delta d = trc.data seg = 2 ** 16 overlap = seg // 2 f1, p1 = overlap_psd(trc, seg, overlap) # Welch. f2, p2= welch(trc.data, fs, nperseg=seg, noverlap=overlap) p2 = 10 * np.log10(p2) # No overlaped windows. fs = 1 / trc.stats.delta d = trc.data seg = 2 ** 16 overlap = 0 f3, p3 = overlap_psd(trc, seg, overlap) # Smoothing the PSD results with the Savitzky-Golay filter. sl = 201 p0 = sgolay(p0, 10*sl+1, 2) p1 = sgolay(p1, sl, 2) p2 = sgolay(p2, sl, 2) p3 = sgolay(p3, sl, 2) # Plot the PSD results. plt.figure(figsize=(15, 8)) plt.semilogx(f0, p0, lw=1, color='g', alpha=0.8, label=r'Full') plt.semilogx(f1, p1, lw=1, color='r', alpha=0.8, label=r'50 \% overlap') plt.semilogx(f2, p2, lw=1, color='b', alpha=0.8, label=r'Welch') plt.semilogx(f3, p3, lw=1, color='k', alpha=0.8, label=r'0 \% overlap') plt.legend(fontsize=20) plt.xlim(4e-3, fs*0.48) plt.ylim(-180, -100) plt.xticks(fontsize=20); plt.yticks(fontsize=20) plt.xlabel('frequency (Hz)', fontsize=25) plt.ylabel('PSD (dB)', fontsize=25) plt.grid(lw=1, ls=(2, (10, 5)), color='c', alpha=0.25) plt.tight_layout() plt.show() 

### 4 The PSD Probability Density Function Estimation

The PSD probability density function (PSDPDF) is used to visualize the PSD of seismic noise and one can consider long-duration seismic noise data. By dividing the noise time series into many segments (overlap involved if necessary), we map every PSD curve into the PSDPDF figure. In detail, we accumulate the number of all PSD curve at every frequency value, and the probability at some power value and frequency could be obtained by dividing the cumulative number is with the sum of all numbers at this frequency.

Here we collect one-year seismic noise data recorded by a seismic station ($CASY$ in network $IU$) in Antarctica in 2020. We divide the noise time series into daily segments to compute the PSDPDF of noise at this seismic station.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72  import numpy as np import matplotlib.pyplot as plt from scipy.signal import savgol_filter as sgolay from obspy import read from glob import glob from scipy.signal import convolve2d files = sorted(glob(r'IU.CASY*.SAC')) nfft = 2 ** 17 nr = 251 hn = nfft // 2 psd = np.zeros((nr, hn)) p = np.zeros(hn) for i, sac in enumerate(files): tr = read(sac)[0] fd = 10 * np.log10(abs(np.fft.fft(tr.data, nfft)) ** 2 / nfft) p += fd[:hn] for j in range(hn): index = int(fd[j]+250) if index < 0: index = 0 if index > 250: index = 250 psd[index, j] += 1 f1 = 5e-4; f2 = 0.45 fs = 1 / tr.stats.delta f = np.arange(nfft) * fs / (nfft-1) fn1 = int(f1*nfft/fs); fn2 = int(f2*nfft/fs) nrf = 10 f = f[fn1: fn2+1]; f = f[::nrf] P = psd[:, fn1: fn2] pp = sgolay(p, 11, 2) pp = pp[fn1: fn2+1] / len(files); pp = pp[::nrf] P = P[::1, ::nrf] sl = 2 P = convolve2d(P, np.ones((sl, sl))/sl**2, 'same') db = np.arange(nr) - 250 P = P[50: 171]; db = db[50: 171] for i in range(len(P[0])): P[:, i] /= np.sum(P[:, i]) plt.figure(figsize=(15, 8)) plt.pcolormesh(f, db, P*100, cmap='CMRmap_r') cbar = plt.colorbar(shrink=0.75, aspect=30, pad=0.05, extend='both') cbar.set_label(r'Probability (%)', fontsize=20) cbar.ax.tick_params(labelsize=16) plt.semilogx(f, pp, lw=2, color='#888888') plt.text(2.5e-3, -125, 'Hum', size=25, bbox=dict(boxstyle='round', ec='#333333', fc='#87CEFA', )) plt.text(6e-2, -130, 'SF', size=20, bbox=dict(boxstyle='round', ec='#333333', fc='#87CEFA', )) plt.text(0.11, -115, 'DF', size=20, bbox=dict(boxstyle='round', ec='#333333', fc='#87CEFA', )) plt.xlabel('Freuqency (Hz)', fontsize=25) plt.ylabel('Velocity PSD (dB)', fontsize=25) plt.xticks(fontsize=20) plt.yticks(fontsize=20) plt.tight_layout() plt.show() 

The hum, SF (single-frequency, the primary) and DF (double-frequency, the secondary) microseisms are marked on the figure. The gray curve denotes the average PSD from $366$-day ambient seismic noise data segments.

### Bibliography

Nakata, Nori; Gualtieri, Lucia; Fichtner, Andreas (2019). *Seismic Ambient Noise || Noise-Based Monitoring. , 10.1017/9781108264808(9).

Peterson, J. 1993. Observations and modeling of seismic background noise. U.S. Geol. Surv. Tech. Rept., 93-322, 1–95