### Building of the coaching database

We practice PEGSNet on a database of artificial knowledge augmented with empirical noise. For every instance within the database, three-component waveforms at every station location are obtained as follows.

#### Supply parameters

The occasion supply location is randomly picked from 1,400 potential areas extracted from the Slab2.0 mannequin of subduction geometry^{38} at two totally different depths (20 and 30 km). Given latitude, longitude and depth, strike and dip angles are decided by the subduction geometry and rake is randomly extracted from a traditional distribution with imply = 90° and customary deviation of 10°. The occasion remaining *M*_{w} is drawn from a uniform distribution with min = 5.5 and max = 10.0. We intentionally select to not use a Gutenberg–Richter distribution for *M*_{w} to keep away from a sampling bias throughout coaching, by which the mannequin may higher estimate sure magnitude values just because they’re extra represented within the coaching database. Lastly, from *M*_{w} we compute the scalar second ({M}_{0}={10}^{1.5{M}_{{rm{w}}}+9.1}).

#### Supply time perform

Given *M*_{0}, a pseudo-empirical STF is computed utilizing the STF mannequin described in a earlier work^{5}, which features a multiplicative error time period and is legitimate for earthquakes with *M*_{w} > 7.0. In abstract,

$${rm{STF}}(t)={M}_{0}frac{f(t)}{int f(t){rm{d}}t},$$

(1)

with:

$$fleft(tright)=t{rm{exp }}left{-0.5{left(lambda tright)}^{2}proper}left[1+Nleft(tright)right],$$

(2a)

$$lambda =1{0}^{left(7.24-0.41{rm{log }}left({M}_{0}proper)+varepsilon proper)},$$

(2b)

$$Nleft(tright)=0.38frac{nleft(tright)}{sigma },$$

(2c)

the place *ε* is drawn from a Gaussian distribution with zero imply and customary deviation of 0.15, *n*(*t*) is the time integral of a Gaussian noise time sequence with zero imply and *σ* is the usual deviation of *n*(*t*). The time period *ε* accounts for variability within the STF length for a given *M*_{0}, whereas *N*(*t*) fashions the traits of noise noticed in actual STFs^{5}. Examples of ultimate STFs for various magnitude values are proven in Prolonged Information Fig. 1.

#### Computing artificial waveforms

With the chosen supply parameters and STFs, we use the normal-mode method described in a earlier work^{14} to compute three-component artificial waveforms in a spatial area of about 20° across the supply epicentre. The ensuing seismometer responses are convolved with the STF of the corresponding artificial occasion and multiplied by the scalar second to acquire artificial traces of acceleration sampled at 1 Hz at every station location. Lastly, traces are bandpass-filtered between 2.0 mHz (Butterworth, two poles, causal) and 30.0 mHz (Butterworth, six poles, causal). The ultimate seismograms are 700 s lengthy centred on the occasion origin time.

#### Noise database

The noise database consists of 259 days of three-component waveforms for 2 non-continuous time intervals: between January 2011 and October 2011 (excluding March 2011) and between January 2014 and April 2014. These intervals have been chosen to pattern variable (seasonal) noise circumstances. We notice that the temporal vary spanned by the noise database doesn’t overlap with any of the earthquakes used for actual knowledge circumstances (Prolonged Information Fig. 10a). We first divide the every day recordings into 1-h-long traces after which apply restricted preprocessing, eradicating the instrument response, the imply and the linear development, changing to acceleration and decimating the unique traces from 20 to 1 Hz. Lastly, every hint is filtered utilizing the identical bandpass filter utilized to the artificial seismograms (see earlier step) and saved. Observe that no a priori assumptions on ranges and traits of the chosen noise are made. Quite the opposite, we embody all actual noise circumstances present in steady seismic recordings within the specified interval vary. It is because, in precept, we wish the mannequin to have the ability to generalize nicely beneath a broad vary of noise circumstances.

#### Including empirical noise to synthetics

From the noise database described within the earlier step, a realization of noise (700 s lengthy) is extracted by randomly deciding on a beginning time level. On this course of, we be sure that to make use of totally different noise knowledge in coaching, validation and take a look at units. To protect spatial coherence of noise throughout the seismic community, the identical time interval is used for all stations for a given occasion. The chosen noise traces are then added to the corresponding acceleration seismograms to provide the ultimate enter knowledge for PEGSNet. If noise knowledge usually are not out there for a number of stations within the chosen time window for a given occasion, we discard these stations by setting the corresponding remaining hint amplitudes (noise and PEGS) to zero within the enter knowledge.

#### Preprocessing of enter knowledge

Earlier than being fed to PEGSNet, we first type the enter waveform for every instance primarily based on station longitude. We discovered this method to be efficient, however we notice that the issue of concatenating station waveforms in a significant manner in deep studying is an lively space of analysis^{35}. Then, on the idea of the theoretical P-wave arrival time (*T*_{P}) at every station for a given occasion, we set the amplitude of the seismograms to zero for *t* ≥ *T*_{P}. Observe that PEGSNet doesn’t carry out P-wave triggering itself. As an alternative, it depends on theoretical P-wave arrivals. In a real-time state of affairs, any present P-wave triggering algorithm (whether or not primarily based on machine studying or not) can be utilized to set the swap on the corresponding stations whose knowledge can then be handed to PEGSNet.

To restrict the affect of very noisy traces and to suppress excessive amplitudes (presumably associated to the background regional seismicity), we additional clipped the ensuing hint utilizing a threshold of ±10 nm s^{−2}. This threshold is chosen based on the utmost PEGS amplitude for an *M*_{w} = 10 earthquake at 315 s as discovered within the calculated database of noise-free synthetics. Amplitudes are lastly scaled by 10 nm s^{−2} to facilitate convergence of the optimizer, and on the similar time, to protect details about the relative amplitudes of the PEGS radiation sample throughout the seismic community. Lastly, to simulate lacking knowledge and/or problematic sensors, we randomly mute 5% of the stations for every occasion by setting to zero the amplitudes of the corresponding traces.

### Deep studying and PEGSNet

#### Earlier work

Convolutional neural networks (CNNs) originated in neocognitron^{42} and have become sensible as soon as it was discovered that the backpropagation process^{43} can be utilized to compute the gradient of an goal perform with respect to the weights of the community. CNNs are a regularized type of neural networks, that’s, the perform house they symbolize is less complicated and they’re extra sample-efficient than totally related neural networks^{44}. Deep CNNs have led to a revolution in laptop imaginative and prescient, and have had a job in nearly each state-of-the-art method for duties associated to recognition and detection in photos^{45,46}. In geoscience, machine studying has proven robust potential for data-driven discovery of beforehand unknown alerts and bodily processes hidden in massive volumes of noisy knowledge^{47,48}.

We notice, nevertheless, that our alternative of a deep studying mannequin over classical machine studying fashions presents an interesting framework to immediately cope with uncooked seismograms. As a consequence, this alternative allows us to discover a bigger perform house that’s not restricted by constructing an a priori set of options, which is a requirement for making use of classical machine studying fashions on seismogram knowledge.

Profitable purposes of deep studying in seismology have supplied new instruments for pushing the detection restrict of small seismic alerts^{31,32} and for the characterization of earthquake supply parameters (magnitude and placement)^{33,34,35} with EEWS purposes^{29,36,37}. We current a deep studying mannequin, PEGSNet, educated to estimate earthquake location and observe the time-dependent magnitude, *M*_{w}(*t*), from PEGS knowledge, earlier than P-wave arrivals.

#### Description of PEGSNet structure

PEGSNet is a deep CNN that mixes convolutional layers and totally related layers in sequence (Prolonged Information Fig. 2a). The enter of the community is a multi-channel picture of measurement (*M*, *N*, *c*) the place *M* is 315 (akin to 315-s-long traces sampled at 1 Hz), *N* is the variety of stations (74) and *c* is the variety of seismogram elements used (three: east, north and vertical). The outputs of the community are three values akin to second magnitude (*M*_{w}), latitude (*φ*) and longitude (*λ*), the place *M*_{w} is time dependent. The coaching technique used to be taught *M*_{w}(*t*) from the information is described beneath.

The primary a part of the mannequin (the CNN) consists of eight convolutional blocks. Every block is made of 1 convolutional layer with a rectified linear unit (ReLU) activation perform adopted by a dropout layer. The variety of filters in every convolutional layer will increase from 32 (blocks 1–5) to 64 (blocks 6–7) to 128 (block 8) to progressively extract extra detailed options of the enter knowledge. A hard and fast kernel measurement of three × 3 is utilized in every convolutional layer. We use spatial dropout with a hard and fast price of 4% to cut back overfitting of the coaching set. Most pooling layers are added ranging from block 4 to cut back the general dimension of the enter options by an element of 4. The output of the CNN is then flattened and fed to a sequence of two dense layers of measurement 512, and 256 with a ReLU activation perform and customary dropout with a 4% price. Totally related layers carry out the high-level reasoning and map the realized options to the specified outputs. The output layer consists of three neurons that carry out regression via a hyperbolic tangent activation perform (tanh). The labelling technique for *M*_{w}(*t*), *φ* and *λ* is mentioned intimately beneath. The full variety of parameters within the community is 1,479,427.

#### Studying technique

The aim of the mannequin is to trace the second launched by a given earthquake because it evolves from the origin time. A particular studying technique has been developed to deal with this job (Prolonged Information Fig. 2).

*Labelling*. Labels are *φ*, *λ* and a time-dependent *M*_{w}. *φ*, *λ* merely correspond to the true values for every occasion. *M*_{w}(*t*) is the time integration of the STF for every occasion. As detailed within the subsequent part, the mannequin is educated by randomly perturbing the ending time of the enter seismograms, in order that for a given ending time the enter knowledge are related to the worth of *M*_{w}(*t*) at the moment. To implement the function of the tanh activation perform within the output layer, we additional scale all of the labels to fall within the [−1, 1] interval via min/max normalization.

*Studying the time-dependent second launch*. To ensure that PEGSNet to be taught *M*_{w}(*t*), we randomly perturb the beginning time of the enter knowledge throughout coaching (Prolonged Information Fig. 2c). Each time that an instance is extracted from the dataset, a price (*T*_{1}) is drawn at random from a uniform distribution between −315 and 0 (s). *T*_{1} is the time relative to the earthquake origin time (*T*_{0}) akin to the beginning time of the chosen seismograms for that instance. In apply, from the 700-s-long seismograms (centred on *T*_{0}) within the database, we extract traces from *T*_{1} to *T*_{2} = *T*_{1} + 315 s: for *T*_{1} = −315 s the extracted traces finish at *T*_{0}; for *T*_{1} = 0 s the traces begin at *T*_{0} and finish 315 s after. As soon as a price for *T*_{1} is chosen, the worth of *M*_{w}(*T*_{2}) is assigned because the corresponding label for this instance (Prolonged Information Fig. second). This permits the mannequin to be taught patterns within the knowledge because the STF evolves with time.

*Coaching*. The complete database (500,000 examples of artificial earthquakes) is cut up into coaching (350,000) validation (100,000) and take a look at (50,000) units, following a 70/20/10 technique. The community is educated for 200 epochs (utilizing batches of measurement 512) on the coaching set by minimizing the Huber loss between the true values and the anticipated earthquake supply parameters utilizing the Adam algorithm^{49}, with its default parameters (*β*_{1} = 0.9 and *β*_{2} = 0.999) and a studying price of 0.001. On the finish of every epoch, the mannequin is examined on the validation set to evaluate the training efficiency and keep away from overfitting (Prolonged Information Fig. 2b). After studying, the mannequin that achieved the perfect efficiency (lowest loss worth) on the validation set is chosen as the ultimate mannequin. The ultimate mannequin is then examined in opposition to the take a look at set (subsequently with knowledge that has by no means been seen by PEGSNet throughout coaching) to evaluate its remaining efficiency.

#### Testing technique

As soon as PEGSNet is educated, it may be used to estimate *M*_{w}(*t*) in a real-time state of affairs. We assess the latency efficiency of PEGSNet on the take a look at set with the next process (Prolonged Information Fig. 4). For every instance within the take a look at set, we slide a 315-s-long window [*T*_{1}, *T*_{2} = *T*_{1} + 315 s] via the information with a time step of 1 s. The beginning window ends on the earthquake origin time *T*_{0} (*T*_{2} = *T*_{0} and *T*_{1} = *T*_{0} − 315 s) and the ultimate window begins on the earthquake origin time (*T*_{2} = *T*_{0} + 315 s and *T*_{1} = *T*_{0}). We let PEGSNet predict *M*_{w}(*T*_{2}) at every time step, thus progressively reconstructing the STF. Every *M*_{w}(*t*) estimate made by PEGSNet solely makes use of data prior to now 315 s. The identical process can also be utilized to actual knowledge (Fig. 3 and Prolonged Information Fig. 10), to simulate a playback of the information as in the event that they have been fed to PEGSNet in real-time.

#### Check on noise-free synthetics

We examine the efficiency of PEGSNet by utilizing the identical database described above however with out together with noise within the enter knowledge. Coaching and testing on noise-free artificial knowledge gives an higher restrict of PEGSNet efficiency. Though this experiment represents a nearly inconceivable state of affairs for real-world purposes, the outcomes can reveal inherent limitations of our mannequin or within the enter knowledge. Prolonged Information Fig. 6a reveals the accuracy map for the take a look at set. As anticipated, the mannequin is ready to decide the ultimate *M*_{w} of the occasions with excessive accuracy and related efficiency whatever the precise *M*_{w} of the occasion, besides at early instances. To have a look at the latency efficiency extra intimately, Prolonged Information Fig. 6b reveals the density plot of the residuals as a perform of time for the entire noise-free take a look at set. Errors are principally confined inside ±0.1 however are comparatively increased within the first 10–15 s after origin. We relate this statement to a mix of two elements: first, within the first 15 s after origin, very small PEGS amplitudes are anticipated at only a few stations within the seismic community, partially owing to the cancelling impact between the direct and induced phrases. This will result in a scenario wherein little info is current within the enter photos and the mannequin finally ends up predicting the imply worth of the labels at these early instances. Second, the seismic community geometry is probably not optimum for recording PEGS amplitudes on this time window. Lastly, we notice that related behaviour is noticed for the outcomes obtained on the noisy database (Fig. 2a, b) however with the next latency (30–40 s). This highlights the function of noise in degrading the optimum efficiency of PEGSNet.

### Preprocessing of actual knowledge

The preprocessing steps for actual knowledge carefully comply with the process detailed in earlier work^{7}. For every station and every element:

1. Choose 1-h-long uncooked seismograms ending on the theoretical *T*_{P} calculated utilizing the supply location from the US Geological Survey (USGS) catalogue (Prolonged Information Fig. 10a);

2. Take away the imply;

3. Take away the instrument response and acquire acceleration alerts;

4. Lowpass 30.0 mHz (Butterworth, six poles, causal);

5. Decimate to 1 Hz;

6. Highpass 2.0 mHz (Butterworth, two poles, causal);

7. Clip to ±10 nm s^{−2} and scale by the identical worth;

8. Pad with zeros for *t* ≥ *T*_{P} and choose 700-s-long hint centred on *T*_{0}.

This process is similar as that used to generate the artificial database, besides that right here, traces must be reduce on the P-wave arrival first to keep away from contamination of PEGS by the P-wave throughout instrument response removing.

To hurry up our testing process (see Strategies subsection ‘Testing technique’), the information are preprocessed as soon as after which sliced into enter for PEGSNet at every time step. In an internet model of our mannequin, that is unfeasible as all of the preprocessing steps must be utilized every time that new packets of knowledge are streamed in. We simulate the circumstances of a real-time workflow on the Tohoku-Oki knowledge to evaluate potential discrepancies with the simplified workflow within the outcomes: at every time step, we apply the preprocessing steps described above, utilizing 1-h-long hint ending on the present time step. We discover that the ensuing PEGSNet predictions obtained utilizing the 2 workflows are basically indistinguishable from one another (Prolonged Information Fig. 11).

### Predictions on further actual knowledge

We examined PEGSNet on all of the subduction earthquakes (dip-slip mechanism inside 40 km from the megathrust) with *M*_{w} ≥ 7 which have occurred since January 2003, with out contemplating aftershocks (Prolonged Information Fig. 10). Amongst them, the 2003 *M*_{w} = 8.2 Hokkaido earthquake is on the fringe of PEGSNet’s decrease sensitivity restrict of 8.3. For this occasion, PEGSNet estimates the ultimate *M*_{w} after about two minutes (Prolonged Information Fig. 10b), in settlement with what was beforehand noticed on the take a look at set for occasions with related magnitude (Fig. 2a). Nonetheless, given the anticipated decrease accuracy and better errors for this occasion, we take into account these predictions much less dependable. For lower-magnitude occasions, PEGSNet predictions converge towards the noise baseline of 6.5 or by no means exceed its decrease sensitivity restrict, confirming that PEGS from *M*_{w} < 8.0 earthquakes are basically indistinguishable from noise (Prolonged Information Fig. 10c–f). Deep studying denoising methods for coherent noise removing^{50} may show profitable in enhancing PEGSNet efficiency and would be the topic of future work.