✍️Acoustics events to be labelled


In this section, you will find in more detail the definition of each classes.

The following classes will be labelled in the audio files:

Title
Title
Source
How to label?
Whistles
WHI
Burst Pulse Sounds
BPS
Gulps
GUL
Grunts
GRU
Creaks
CRE
Squawks
SAW
Squeaks
SEA
Natural Unknown Event
NUE
Low-Frequency vessel
LFV
Medium-Frequency Vessel
MFV
High-Frequency Vessel
HFV
Lloyd's Mirror Effect
LME
Ping
PIN
Anthropogenic Unknown Event
AUE
Parasitic Acoustic Noise
PAN
Bad Quality Unknown
BQU


The following sub-sections present spectrograms of each class.
The dashed window indicates how the acoustic events should be labelled.


Whistle

Whistles are narrow-band and modulated signals, used by cetaceans and typically linked to social interactions.


In the spectrogram below, a single whistle is observed:


In the spectrogram below, several whistles are observed.

Speaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00


Gulps

Gulps are low-frequency, short-pulse, identical to a sip or sob. An example is shown in the following spectrogram:


Speaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00

Grunts

Grunts are trains of intense burst-pulses, as presented in the spectrogram:


Creaks

Creaks are long burst-pulse sounds (>0.2 sec), sounding like a creaking door. Creaks and squeaks are similar, but they can be distinguished because creaks are when we see the lines while the background is blurry.




Squawks

Squawks are long burst-pulse (>0.2 sec), with higher repetition rate than “Creaks”, sounding like a crying baby. These sounds are not so frequent.


Burst-pulse sound

These sounds are mainly for echolocation. Burst-pulse sounds (also known as click trains ) are sounds of very short duration and frequency broadband, presented in the spectrograms below. They often appear near the gulps. Natural sounds.



Speaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00


-00:00



Squeaks


Squeaks are short burst-pulse sounds with a harmonic structure, and sounding like a scream, and is visualized as in the spectrogram below. Creaks and squeaks are similar, but they can be distinguished because squeaks are when we see just the lines and the background is not blurry.

Speaker High VolumeTo hear: Speaker High Volume Speaker High Volume


-00:00


Low-frequency vessel

Include vessels with fundamental frequency below 112 Hz (e.g. cargo ships, tankers).
It can be visualized in the spectrogram as the following:

If you zoom in the frequency, you could visualize as the following:

Speaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00



Medium- Frequency Vessel

Medium/small vessel class, corresponds to vessels with fundamental frequency range between 112 Hz and 2200/2500 Hz, and can be observed as the following:


Speaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00



High-frequency vessel

This class includes vessels that can emit sounds with higher frequency than 2200/2500 Hz (e.g. speed boats, jet skis), and it can be visualized as the following:
It is important to label exactly how it is shown in the picture below.
Speaker High VolumeSpeaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00

-00:00


Note:

All the more pinkish/orange parts should be labeled, as they are caused by the boat's noise, meaning they are associated with the boat.

Lloyd's Mirror Effect

The ocean acoustic Lloyd's Mirror effect (LME) is produced by interference between the direct-path and the sea surface phase-reversed reflection of a sound as observed at a receiver. Here, it corresponds to the moment of a vessel approaching the recorder. It can be visualized as the following:

Always label the vessel whenever an LME is present.

Ping

A ping sound typically refers to a short, high-pitched sound that is emitted by a device or system, often as a signal or notification. It is characterized by its brief duration and sharp, distinct tone, and can be visualized as the following:






Speaker High VolumeSpeaker High VolumeTo hear: Speaker High Volume Speaker High Volume

(The Ping is quite loud)

-00:00

Anthropogenic Unknown Event

Every unclassified sound produced by humans.

Other sounds: Parasitic noise and Bad quality

Parasitic noise:
Parasitic noise refers to disturbances that affect the performance systems. It originates from unintended sources and can degrade signal quality, cause malfunctions, or lead to inaccurate data transmission. Parasitic noise often arises from components or environmental factors not designed to introduce noise but do so as a byproduct.


Speaker High VolumeSpeaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00


-00:00

Bad quality- BQU

Speaker High VolumeSpeaker High VolumeTo hear: Speaker High Volume Speaker High Volume

-00:00

If you find PAN and BQU, please inform Carolina Ramos.


Size of the labelling window: Examples


This section covers some possible cases of challenging scenarios. For example, when you have the same event overlapped or continuous in time. In this section some situations are presented.


Case 1 – Two whistles



Solution: In this case, we can observe two whistles. They can be labelled together because they are separated by less than 1 second.


Case 2 – Several whistles



Solution: In this case we can observe several overlapped whistles. They can be labelled together because they are separated by less than 1 second.



Case 3 – Ping



Solution: In this case, the event should be labelled individually since they are separated by more than 1 second.



Case 4 – Events difficult to visualize



Solution: Sometimes the events are not easily visible in the spectrogram, play with Zoom and speed to visualize / hear the event.


Case 5- Different events overlapping



Solution: Label all the different events, even though they are overlapped. You can label them as it shows in the spectrogram.

Sometimes, there are files that do not contain any relevant sound to label. However, their corresponding txt file must still be saved, even if empty (create the label as usual in Audacity, then delete the label box). This is important because background noise is also needed to train our AI model.