Animal-borne acoustic data alone can provide high accuracy classification of activity budgets

Background. Studies on animal behaviour often involve the quantification of the occurrence and 16 duration of various activities. When direct observations are challenging (e.g. at night, in a burrow, at 17 sea), animal-borne devices can be used to remotely record the distribution and behaviour of an 18 animal (e.g. changing body posture and movement, geographical position) and/or its immediate 19 surrounding environment (e.g. wet or dry, pressure, temperature, light). Changes in these recorded 20 variables are related to different activities undertaken by the animal. Here we explored the use of 21 animal-borne acoustic recorders to automatically infer activities in seabirds. 22 Results. We deployed acoustic recorders on Cape gannets and analysed sound data from 10 23 foraging trips. The different activities (flying, floating on water and diving) were associated with 24 clearly distinguishable acoustic features. We developed a method to automatically identify the 25 activities of equipped individuals, exclusively from animal-borne acoustic data. A random subset of 26 4 foraging trips was manually labelled and used to train a classification algorithm (k-nearest 27 neighbour model). The algorithm correctly classified activities with a global accuracy of 98.46%. 28 The model was then used to automatically assess the activity-budgets on the remaining non-labelled 29 data, as an illustrative example. In addition, we conducted a systematic review of studies that have 30 previously used data from animal-borne devices to automatically classify animal behaviour (n=61 31 classifications from 54 articles, including our study). The majority of studies (82%) used 32 accelerometers for studying behavioural states or changes in behaviour, and to a lesser extent… . , 33 all potentially providing a good accuracy of classification (>90%). Conclusion. This article demonstrates that acoustic data alone can be used to reconstruct activity budgets with very good accuracy. In addition to the animal’s activity, acoustic devices record the environment of equipped animals (biophony, geophony, anthropophony) that can be essential to contextualise the behaviour of animals. They hence provide a valuable alternative to the set of tools available to assess animals’ behaviours and activities in the wild.


Background
Studies on animal behaviour often involve the quantification of behavioural patterns [1]. , from an 43 ethogram to an activity-budget [2]. Knowledge on how individuals allocate their time according to 44 different activities is important in terms of understanding their flexibility towards changes in the 45 environment, such as variations in temperature [3,4], habitat [5,6], social systems [7] or prey 46 availability [8,9]. 47 Figure 1. Illustration of (A) the sound spectrogram along with (B) the manual identification and labelling of activities and (C,D) the predictions before and after revision. Three main activities were defined and included in the budget (flying, diving and floating on the water) and two additional transition activities (entering water and taking off) were used exclusively for the revision algorithm. These transition activities were used to confirm dive and flying events, and then merged into their corresponding main activity. Isolated segments were removed and relabelled and predictions were smoothed using a moving median over 6 segments.  Automatic identification of activities from sound data 136 Among the five types of supervised learning algorithms that were tested (see methods), the k-137 nearest neighbour model was finally chosen because its ratio between true and false positives for 138 the diving class (of highest interest in our study case) was higher than that in other algorithms, still 139 with a similar global accuracy. 140 The classification procedure was able to correctly classify the activities of Cape gannets (the 141 "labelled set") with a global accuracy of 98.46%. The performances, as measured by the global 142 confusion matrix and the ROC's Area Under the Curve (AUC) for each class, varied per activity 143 (Supplementary Figure 1). The sensitivity was lowest for the class 'diving" compared to the other 144 classes (, Figure 3), meaning that over all "diving" segments, 62.3% (908/1457) were correctly 145 detected (others were wrongly classified as floating or flying) whereas for "flying" and "floating" 146 segments, >98% of segments were correctly detected ( Figure 3). Nonetheless, when diving was 147 predicted, it was reliable given the high precision value (95.5 %, Figure 3). The classes "floating on 148 water" and "flying" were predicted with high accuracy, given the high values of both indicators in 149 all instances (>97 %, Figure 3  When studied in terms of activity budget (meaning that 1.4s segments are grouped into "events" of 155 the same activity), it appeared that the number of predicted events were over-estimated, although 156 they were predicted with shorter duration ( Figure 3B). Nonetheless, when studied in terms of time-157 activity budget, the predicted time spent in each activity was very close to the observed time 158 (between 0.3% and 1.1% of difference depending on the activity, Table 1 Figure 5, Table S1). The number of dives estimated per individual also varied greatly between 169 individuals, from 23 to 174 dives per trip ( Figure 4). supervised learning algorithms and based on data from animal-borne devices (Table 2). 176 Table 2. Information extracted from 61 reviewed classifications (54 articles, including our study). Other devices deployed concomitantly to accelerometers: GPS (n=4), gyroscope (n=3), magnetometer (n=1), gyroscope + magnetometer (n=5), pressure (n=1), gyroscope + pressure (n=1), magnetometer + acoustic (n=1). Species categorisation: terr.=terrestrial, aqua.=aquatic, fly.=flying. Algorithms: RF = Random Forest, DT = Decision Tree, SVM = Support Vector Machine, DA = Discriminant Analysis, NN = Neural Network, KNN = K-Nearest Neighbour, MM = Markov Model, mix = combination of several algorithms. References: [26,32,[52][53][54].
Terrestrial species were by far the most studied species (n=40, Table 2, Figure 5), followed with 178 aquatic species (n=13) and flying species (n=8). The most commonly used devices were 179 accelerometers (82% of reviewed studies, Table 2), either alone (n=34 studies) or in association 180 with other devices (n=16). Acoustic recorders have rarely been used in this context as we found 181 only three studies that met our criteria for the systematic review. The weight of devices was 182 reported in only 48 % of the studies and ranged widely for all device categories ( Table 2). The 183 different types of devices varied in terms of sampling frequency, with the GPS devices being the 184 most limited (up to 1Hz at the highest) while acoustic recorders provided the highest sampling 185 frequency (>10kHz). In comparison, accelerometers were used over a large range of sampling rates, 186 from 0.02Hz to 100Hz (Table 2). Although the sampling frequency did not seem to be directly 187 related to the global accuracy, a higher sampling frequency seemed to allow for a higher number of 188 activities studied in the activity budget (Supplementary Figure 3). 189 190 Figure 5. Performance of automatic classifications of activity budgets as measured by the global accuracy, as a function of the type of devices used in the 61 reviewed classifications (from 54 articles, including our study). Colours indicate a categorisation of species: n=40 terrestrial species (green), n=13 aquatic species (blue), n=8 flying species (orange). GPS=global position systems. Accel=Accelerometers. Other devices deployed concomitantly to accelerometers included GPS, gyroscope, magnetometer, pressure sensors, and acoustic recorders. References: [26,32,[52][53][54]. 191 The number of activities studied in a budget varied greatly among studies, from two to 19 (Table 2), 192 with a mode at three activities ( Figure 6). The highest number of activities (19, Table 2) was 193 extracted from acoustic recorders, followed with a study based on accelerometers (12 activities). 194 The global accuracy of classification reported in the reviewed studies varied between 65% and 195 100% (Table 2) and this did not seem to be related on the size of the different datasets studied 196 (Supplementary Figure 4). The highest accuracies were obtained from accelerometer data ( Figure 5, 197 6), even though a good accuracy (>90%) could be achieved using data collected from all types of 198 devices ( Figure 5). Among all articles reviewed, the performance of our classification (98.46%) 199 based exclusively on acoustic data appeared very high and demonstrated that the activity budget of 200 wild animals can be recorded and reconstructed exclusively from acoustic data. 201 Figure 6. Performance of automatic classifications of activity budgets as measured by the global accuracy, as a function of the number of activities in the budget, extracted from 61 reviewed classifications (54 articles, including our study). Symbols indicates the type of animal-borne devices used to remotely record the behaviour of study animals and the full red circle indicates the values obtained in our study. Number of activities are all integers, but a random horizontal offset was added for the figure display to limit overlap of points. GPS=global position systems. Other devices deployed concomitantly to accelerometers included GPS, gyroscopes, magnetometer, pressure sensors and acoustic recorders. References: [26,32,[52][53][54]. 202 Ultimately, the potentially most important difference among the different types of devices in terms 203 of data yield might be the nature of other types of information provided, in addition to the animal's 204 activities themselves (Table 2). Accelerometers have been used to reconstruct the energy budget 205 associated with different activities; GPS devices provide information on the geographical position 206 and distribution of the animals; pressure sensors provide information on the diving profiles of 207 aquatic species. In comparison, acoustic recorders provide information on all the sounds 208 surrounding an animal: the biophony (including vocalisations from the equipped animal, its 209 conspecifics, but also heterospecifics), the geophony (all natural but non-biological sounds related 210 to the habitat), and the anthropophony (human-generated sounds).

212
The different activities undertaken by our study animals were associated with distinguishable sets of 213 acoustic features. They could then be automatically identified from sound data exclusively, with 214 very good accuracy (98.5 % global accuracy). Although the performances varied per class (i.e. the 215 three main activities, floating on water, flying, and diving), the precision was consistently very high 216 (95.5-99.4 %, n=3 activities) showing that the activities could be predicted with high confidence, 217 especially if studied as percentage of time spent in each of the activities. Our results compared 218 favourably to those of other studies using acoustic data to infer behaviour [52][53][54] and compared 219 very well to all previously published studies that automatically classified activities based on animal-220 borne devices ( Figure 6). Interestingly, our results based on acoustic data showed a higher 221 classification performance compared to a previous study classifying the same activities on the same 222 study species based on speed and turning angles derived from geographical location data (92.3 % 223 global accuracy, 91.8-94.8 % precision [32]). In addition to high predictive performances, acoustic 224 devices provide additional information on the surrounding biophony, geophony and anthropophony, 225 that can be used to contextualize the observed behaviours. They thus appear a valuable alternative 226 to other devices for the monitoring of animal's behaviours. 227 By inferring the behaviour of birds from acoustic data, we were able to estimate the time-activity Various devices are available to remotely record an animals behaviours and activities. Our 238 systematic review showed that accelerometers are the devices most commonly used for this 239 purpose, even though a good accuracy of classification can be obtained from a range of devices. 240 The weight of devices did not appear to be the most limiting factor, since all types of devices can be 241 found at a small size (<20g, the smallest device being an accelerometer at 2g). Otherwise, the 242 sampling frequency of the different types of devices might also be an important factor, since our 243 results suggest that a higher sampling frequency may provide access to a higher number of recorded 244 activities, and thus a more detailed description of the animal's behaviours. In this respect, the most limiting device would be the GPS, and the device with the highest potential would be the acoustic 246 recorder. Ultimately, if technical aspects can be overcome (e.g. deployment techniques and weight 247 of devices, data analyses and classification algorithm using recent machine learning techniques), 248 our systematic review suggested that the most important factor to be considered when choosing a 249 device for recording an animal's activities should be access to additional information. Indeed, if all 250 types of devices can provide a good accuracy of classification on the animal's activities, they all 251 record different variables. As a consequence, they each provide additional information on different 252 aspects related to the animal's behaviours. Accelerometers record the fine-scale movements of 253 animals in three dimensions, and thus provide details on movement related activities [48,129,133]. 254 In addition to behavioural activities, accelerometers can be used to measure the energy expenditures 255 of animals during different activities and thus allow for reconstructing dynamic energy budget 256 models [47]. Time-depth recorders are best adapted for aquatic animals by providing detailed 257 information on their diving behaviour [40,134,135]. In comparison, acoustic recorders do not 258 measure the displacement or body movement of animals directly, yet our study proved that they can 259 be used alone to reconstruct the activities of animals with very high accuracy that are comparable to 260 what is obtained using other devices such as accelerometers. In addition, acoustic recorders 261 simultaneously record the biophony, geophony and anthropophony in the environment of equipped 262 animals, and thus provide a large diversity of other information that can be essential to interpret the 263 animal behaviours in a meaningful way. The physiology (heart rate) and the breeding behaviour 264 (hatchling sounds in a burrow) of some species can be recorded remotely using acoustic devices 265 [136]. The surrounding environment of equipped animals is also recorded and could help 266 contextualize specific behaviours [52]. The vocalizations of equipped animals allow the study of 267 variations in social interactions and grouping behaviours in different contexts [137,138]. 268 Furthermore, multi-species associations can be recorded. For example, in our dataset, we recorded 269 dolphin whistles underwater during some of the dives performed by equipped Cape gannets (data 270 not shown). We could imagine that interactions between seabirds and fisheries or human marine 271 activities could be recorded as well. Similar information on the surrounding context of animals can 272 also be obtained using animal-borne video-cameras [139][140][141], but in comparison, acoustic 273 recorders are much smaller in size and weight (which can be crucial for deployments on wild 274 animals), they can record continuously for a much longer duration, and they record sounds from all 275 directions whereas cameras are limited by their field of view. Ultimately, combining different 276 recorders may help reconstruct a more comprehensive understanding of animal behaviour in their 277 natural environment [42,53,54], as long as this is done within compromising the welfare and 278 behaviour of the study animals [142]. 279

280
This article demonstrates the use of animal-borne acoustic data alone to automatically infer the 281 activities of wild elusive animals with high accuracy. In addition to animal activities, acoustic 282 recorders provide information on the surrounding environment of equipped animals (biophony, 283 geophony, anthropophony) that can be essential to contextualize and interpret the behaviour of 284 study animals. They therefore show promise to become a valuable and more regularly used 285 alternative to the set of devices used to record animal activities remotely. 286

287
Study species 288 Our study species is the Cape gannet, a seabird endemic to southern Africa [55]. This species has 289 been recently classified as endangered by the IUCN red list because of a drastic loss of more than 290 50% of the population over three generations [143]. This has mostly been related to a massive 291 decrease of their natural feeding resources due to fisheries [132,144,145]. Cape gannets feed mainly 292 on small pelagic fish, sardines Sardinops sagax and anchovies Engraulis encrasicolus [146]. Their 293 foraging effort, in terms of trip duration and time spent in different activities, directly reflects the 294 abundance of their natural prey in the local marine environment [56][57][58][59]. The foraging trips of Cape 295 gannets can thus be used as a proxy for local prey abundance and fish stocks [147], like in many 296 other seabird species [148,149]. Furthermore, their foraging effort directly influences their breeding 297 investment and success [60,61]. As a consequence, the monitoring of their foraging activities at sea 298 is of particular interest in relation to both the local marine ecosystem and the conservation 299 management of this threatened species. 300

301
Fieldwork took place on Bird Island (Algoa Bay, South Africa) during December 2015. We 302 deployed twenty devices (details below) on chick-rearing Cape gannets to record their behaviour 303 while foraging at sea. Four individuals were randomly selected for manual identification of activity 304 and model training. The trained model was then applied to automatically predict time-activity-305 budgets on the data where the entire foraging trip was recorded, which comprised of another six 306 individuals (trips not recorded in full resulted from progressive water damage). 307 Deployment procedure. Birds on departure to sea were captured near their nest using a pole with a 308 hook on the end. Only one parent was captured per nest and devices were attached for one foraging 309 trip only (usually one to two days), while the partner was on the nest guarding the chick. Nests were 310 then monitored every hour from sunrise to sunset, and the deployed birds were captured again soon 311 after their return to the colony and the devices were retrieved. Birds were handled for eight and six 312 minutes on average for the first and second capture respectively. The handling procedure consisted 313 of attaching devices (using adhesive tape, Tesa, Germany) and measuring the bird's body mass for 314 the first capture (average 2580 g, n=10 birds, measured with Pesola, Baar, Switzerland, precision 50 315 g), and retrieving devices and taking standard measurements (not used in this study) for the second 316 capture. Acoustic recorders were deployed in combination with a GPS (global positioning system) 317 device on eight birds (total mass 60 g, 2.3 % of bird body mass), a GPS and a video camera on one 318 bird (90 g, 3.4 % of the bird body mass), or a time-depth recorders and a video camera on eleven 319 birds (80 g, 3.1 % of bird body mass). The devices had no significant effect on the duration of 320 foraging trips, when compared between equipped and non-equipped birds (for details see [138]), so 321 normal behaviour was assumed. Only the data from the acoustic recorders were used in this study. 322 Acoustic recorders. Audio recorders (Edic-mini Tiny+ B80, frequency response 100 Hz -10 kHz ± 323 3 dB, 65 dB dynamic range, TS-Market Ltd., Russia, fitted with a CR2450 battery, 16.2 g, 324 autonomy estimated for ~50h at 22kHz in our study, and provided for 190h at 8kHz by the 325 manufacturer) were set up to record sound in mono at a sampling frequency of 22.05 kHz. They 326 recorded continuously, hence collecting data during the whole foraging trip of the birds. The main 327 challenge for collecting such acoustic data was to ensure high quality recordings on board a flying 328 and diving bird. To limit disturbance from the wind, we placed the audio recorder on the lower back 329 of the bird, under feathers and facing backwards. In addition, a thin layer of foam was added after 330 the first deployment to reduce flow and background noise. We sealed the microphones in nitrile 331 glove materials (amplitude attenuation of 6 dBSPL both in the air and in the water, no modification 332 of the frequency response, as measured in the laboratory) to keep the devices sufficiently dry when 333 immersed in the sea water but still ensure good quality sound recordings (avoiding thick waterproof 334 casing). 335

336
The activities of Cape gannets when foraging at sea were manually identified on a subset of our 337 dataset (henceforth referred to as "labelled dataset"). The data retrieved from four deployed Cape 338 gannets were randomly selected, comprising of ~33 h of recordings. Based on previous work with 339 observations from bird-borne video cameras [62] we identified three different main activities: 340 floating on the water, flying, and diving. Those three activities are associated with different sounds 341 that can clearly be identified by a trained human ear (Figure 1). When the bird is flying, the wind is 342 usually loud and the wing flapping can sometimes be heard. When the bird is on water the ambient 343 noise is usually less, sometimes with water splashing sounds. The take-off is distinguishable with 344 loud flapping at a high rate. Gannets dive in the water at high speed, up to 24m.s -1 [150], so they 345 enter the water with a loud impact noise, often saturating the amplitude of recording. Coming out of the water is also usually loud with sounds of rising bubbles. To manually label these data, the 347 spectrograms of the selected sound data were visually observed and the sound was played 348 concomitantly using the software Avisoft-SASLab Pro (version 5.2.09, Avisoft Bioacoustics, 349 Germany). A total of 318 events "floating on the water", 391 events "flying" and 243 events 350 "diving" were identified and labelled. Those labelled data were then used to characterize the 351 acoustic properties of each activity and to train the classification algorithm (using a cross-validation 352 procedure, details below). 353

354
In order to characterize the bird's activity from the sound recordings, an automatic feature 355 extraction was applied. For each sound recording, the algorithm followed four steps. First, the 356 sound data were downsampled at 12kHz. Second, in order to remove low frequency acoustic noise, 357 the sound recordings were high-passed filtered (above 10Hz) using a second-order Butterworth 358 filter. Third, the recordings were divided into small sound-segments of ~1.4 s (corresponding to 2 14 359 samples). This segment length was chosen to reflect the dynamic of movement of our study species. 360 In particular, the dives last on average 20s (minimum 6s) and always start with an 'entering the 361 water' that displays very specific sound features (Fig. 2) and lasts 1-2s. A segment length of 2 14 362 (corresponding to 1.4s) thus appeared most appropriate. The algorithm was also tested using 363 segment lengths of 2 13 (0.68s) and 2 15 (2.76s) and they led to similar results (not shown). Fourth, a 364 set of temporal (n= 21) and spectral (n=14) features were extracted from each sound-segment to 365 acoustically describe the activities. Temporal features included envelope features such as root mean 366 square (RMS), peak to peak and peak to RMS values (means and standard deviations), and also 367 signal skewness, kurtosis, entropy, quantiles and zero crossing rate. Spectral features were 368 computed from the power spectrum (Fast Fourier transform) and included dominant frequency 369 features (dominant frequency value, magnitude, ratio to the total energy, bandwidth at -10dB, 370 spectral centroid and spectral flatness (the two latter computed as per [151])) in addition to quartiles 371 of energy and the ratio of energy above three fixed thresholds (300, 1500, 5000 Hz). All acoustic 372 features were computed using Matlab R2019b custom scripts. 373 The three main activities were re-defined into five categories: floating on the water, taking-off 374 (three first segments of flying when preceded with floating on water), flying, entering water (first 375 segment of diving when preceded with flying), and diving. The two transition classes were used for 376 the 'revision algorithm' as described in the following section ("Classification procedure"). 377 Classification procedure 378 The labelled dataset was used to train and test a classification algorithm following a 5-fold cross-379 validation procedure. Briefly, this procedure consisted of splitting the dataset into a training set 380 containing 4/5 of the data to train the algorithm, and testing it on the remaining 1/5. This 381 partitioning of the data into training and test set was done five times, and performances of the 382 algorithm on the test sets were averaged over those five replications. 383 Five types of supervised learning algorithms were tested (Decision trees, Discriminant Analysis, 384 Support Vector Machines, Nearest neighbour classifiers and ensemble classifiers), with some 385 providing high classification results (above 90%). Among them, the k-nearest neighbour model was 386 finally chosen because its ratio between true and false positives for the diving class (of highest 387 interest in our study case) was higher than that in other algorithms, still with a similar global 388 accuracy. The k-nearest neighbour algorithm was implemented with five neighbours, Euclidian 389 distance as distance metric and equal distance weight. 390 In all tested models, each sound-segment was considered as independent from each other. As a 391 strong dependence exists (for instance, Cape gannets do not fly just after diving without 392 transitioning on the water), a 'revision algorithm' was applied subsequently to the results of the 393 classification procedure. First, 'entering water' segments were used to confirm a dive event or 394 deleted if no dive segment was following the entering water segment. A similar procedure was used 395 with the take-off and flying segments. Then, transition segments were merged into their 396 corresponding class (entering water was relabelled and merged with its associated diving event, 397 similarly for take-off merged with flying). Isolated segments (defined as segments of one type 398 occurring within a 6-segments long window of similar label segments) were removed and relabelled 399 so that a coherent 6-segments long window of unique event was kept ( Figure 1C, 1D). Finally, 400 predictions were smoothed using a moving median 6 segments-long window (corresponding to 401 ~8.42 s) to further reduce the rapid changes in the class of segments predicted over short duration 402 and thus improve the prediction of events. The classification algorithm was applied to unlabelled acoustic data to predict the activities of Cape 417 gannets when foraging. Only the data with full foraging trips were kept at this stage. These included 418 six new individuals, plus one individual for which part of the data was labelled and used in the 419 trained model. The activity of birds was then predicted on a total of ~93 h of acoustic recordings. 420 The time-activity budgets (based on the number and duration of events) of unlabelled trips were 421 computed by grouping successive segments (1.4sec) of similar activity into 'events' (see for 422 example Figure 1D). For instance, a 7-second period of diving, corresponding to 5 continuous time-423 segments labelled as diving, was considered as one diving 'event'. 424

425
To place our study into perspective and discuss the use of acoustic recorders among the different 426 devices available for remotely recording and inferring behaviour, we conducted a systematic review 427 on articles that automatically classified activities from animal-borne devices. We searched for 428 articles in a systematic, repeatable way, using the ISI Web of Science Core Collection database. Our 429 search included articles in English from 2000 to 2021, and was based on the following keywords: 430 (((((((((TS= (("time budget*" OR "time-budget*" OR "activity budget*" OR "activity-budget*" OR 431 "time-activity budget*" OR "state budget*" OR "behavio*ral state*" OR "behavio*r-time budget*" 432 OR "behavio*r* classif*" OR "behavio*r discrimination" OR "behavio*r* categor*" OR "scene-433 classif*") AND (recorder* OR device* OR tag* OR biologging OR bio-logging OR logger* OR 434 datalogger* OR biologger* OR bio-logger* OR collar* OR sensor* OR "animal-borne" OR 435 "animal borne") AND (behavio*r*) AND (classif* OR accuracy OR "machine-learning" OR 436 "machine learning" OR "supervised learning" OR "feature learning" OR "infer* 437 behavio*r*") )))))))))) 438 On the 5 th of April 2021 this query resulted in a list of 202 articles. These articles were first checked 439 for relevance to our scope: use of animal-borne devices on non-human animals to record and infer