Wearable reproductive trackers: quantifying a key life history event remotely

Advancements in biologging technology allow terabytes of data to be collected that record the location of individuals but also their direction, speed and acceleration. These multi-stream data sets allow researchers to infer movement and behavioural patterns at high spatiotemporal resolutions and in turn quantify fine-scale changes in state along with likely ecological causes and consequences. The scope offered by such data sets is increasing and there is potential to gain unique insights into a suite of ecological and life history phenomena. We use multi-stream data from global positioning system (GPS) and accelerometer (ACC) devices to quantify breeding events remotely in an Arctic breeding goose. From a training set of known breeders we determine the movement and overall dynamic body acceleration patterns indicative of incubation and use these to classify breeding events in individuals with unknown reproductive status. Given that researchers are often constrained by the amount of biologging data they can collect due to device weights, we carry out a sensitivity analysis. Here we explore the relative merits of GPS vs ACC data and how varying the temporal resolution of the data affects the accuracy of classifying incubation for birds. Classifier accuracy deteriorates as the temporal resolution of GPS and ACC are reduced but the reduction in precision (false positive rate) is larger in comparison to recall (false negative rate). Precision fell to 94.5%, whereas recall didn’t fall below 98% over all sampling schedules tested. Our data set could have been reduced by c.95% while maintaining precision and recall > 98%. The GPS-only classifier generally outperformed the ACC-only classifier across all accuracy metrics but both performed worse than the combined GPS and ACC classifier. GPS and ACC data can be used to reconstruct breeding events remotely, allowing unbiased, 24-h monitoring of individuals. Our resampling-based sensitivity analysis of classifier accuracy has important implications with regards to both device design and sampling schedules for study systems, where device size is constrained. It will allow researchers with similar aims to optimize device battery, memory usage and lifespan to maximise the ability to correctly quantify life history events.


Background
Biologging technology is revolutionising our understanding of animal movements. Recent developments have enabled the recording of high resolution spatiotemporal data and additional sensors provide numerous ancillary data streams that characterise individual movement (acceleration, speed and heading) and physiological state (body temperature, blood pressure and heart rate). There are numerous applications of these data streams in isolation and combination, and we are only just beginning to explore their potential. For example, GPS data has been used to distinguish search from forging behaviours and understand foraging site fidelity in Northern Gannets Morus bassanus [1,2], GPS and ACC data to reconstruct continuous-time movement paths via dead reckoning in numerous mid-sized mammals [3] and ACC data to infer different modes of flight in White storks Ciconia ciconia [4].
Aside from identifying movement or behavioural states, biologging can be used to explore more general questions about life history and behaviour, for instance GPS data can be used to pinpoint exactly where and when mortality occurs within the annual cycle especially for migratory species, e.g., Black-tailed Godwits Limosa limosa and Black Kites Milvus migrans [5,6]. Of course, these broad ecological life history questions can also be explored via direct observations of individuals in some circumstances but using biologging has important advantages. First, it can allow full annual cycle sampling at the individual-level, allowing researchers to examine lifehistory consequences and trade-offs such as the effect of migration distance on reproductive success [7]. Second, it means we can examine an individual's movements and behaviours when they are not directly observable using conventional methods, thereby reducing sampling biases. Third, the rich data sets created reveal rare, but nevertheless important, movement or behavioural states that may have otherwise been missed, e.g., calving events in ungulates [8]. Current biologging data sets still have some drawbacks such as a smaller number of individuals followed and shorter study durations.
For birds, a key life history event that biologging can quantify is breeding, represented by distinctive movement and activity patterns because of constraints associated with egg-laying and incubating. Approaches that can identify incubation initiation and failure/hatching will allow researchers to derive regular estimates of daily nest survival and productivity, and identify potential causes of nest failure. Previous approaches to identify breeding events in birds using GPS data failed to distinguish breeding attempts lasting fewer than seven days and could not accurately quantify the total length of breeding attempts [9][10][11]. Identifying these short-lived nesting attempts is required if we are to differentiate between breeding deferral, no nest site preparation or egg laying, and early nest failure, laying of eggs but failure during the first few days of incubation. This is an important distinction when trying to diagnose the cause of population decline as different environmental variables and lifehistory factors may drive breeding propensity [12,13] in comparison to early nest failure [14,15]. A more recent approach combining GPS and ACC data [16] was able to identify nesting attempts of greater than three days and quantify the duration of nesting attempts. This was achieved by identifying days, where movement or ODBA fell below an arbitrary threshold and then assigning the median latitude and longitude for those days as the nest site. Birds were then classified as 'nesting' as long as 75% of GPS fixes were within 50 m of the proposed nest site.
These improvements on previous approaches were due to ACC data allowing brief breeding attempts to be distinguished from daily roosting, foraging routines or moult periods that can appear similar to nesting behaviour using GPS data alone.
A challenge that faces all avian biologists using biologging devices is to optimise the trade-off between data resolution, maintaining battery power and device weight. The size, body weight, morphology, migration distance and flight mode of many birds place limitations on device size and attachment methods due to the risk of deleterious effects [17]. Device design and weight affect the battery power and memory capacity, which have important implications for the number of data streams recorded and their temporal resolution. Since sampling schedules can affect our ability to accurately quantify movement patterns and space use [18,19], depending on the methodological approach [20], it will also affect the ability to accurately discern life history events based on unique movement signatures. Decisions on device sampling schedules often have to be made before deployment, or at least before preliminary analysis, with currently little information available to guide choices, but see Mitchel et al. [21] and Noonan et al. [22], which explore how GPS sampling regimes influence home range estimates. Likewise, devices that are reprogrammable after deployment may not be contactable if birds are in areas without mobile phone network connectivity. This makes it challenging to determine whether a given sampling schedule will allow the biological question of interest to be answered while not oversampling which risks depleting the device battery.
In the current paper, we present a method that uses GPS and ACC data to determine the duration of incubation in an Arctic nesting goose species. Guidance is also provided on the optimisation of device design and sampling schedules to accurately quantify breeding events for a range of device sampling regimes in birds. Classification of incubation periods is achieved using a set of known breeders that can train the classification scheme and ground-truth its accuracy. This methodology is stress tested by calculating its accuracy, while the sampling intervals of the data are varied and GPS and ACC data are used in combination and isolation. Compared to the approach in Schreven et al. [16] our method has a number of key differences: (1) it does not require complex statistical methodologies, (2) rule-based thresholds for classification are set automatically as opposed to requiring arbitrary choices, and (3) classification can be carried out using a single data stream, i.e., GPS or ACC data. Importantly this allows the sampling interval of biologging data to be changed with no alterations to the coding of our classifier. Ozsanlav-Harris et al. Animal Biotelemetry (2022) 10:24

Study system
The Greenland White-Fronted Goose Anser ablifrons flavirostris (GWfG) is an artic migrant that breeds in Western Greenland, spends the non-breeding period in the Britain and Ireland and uses Iceland as a staging area in spring and autumn. Birds depart Iceland at the start of May with incubation initiated in Western Greenland between mid-May and mid-June, meaning incubation extends into early July in some cases [23]. Incubation is followed by a 3-4 week moult period that mainly occurs in July but can be earlier for deferred and failed breeders, and therefore, the breeding and moult periods may overlap within the population [24]. Only females incubate the clutch and will take one or two recesses daily, totalling up to 80 min [23]. From the point of incubation initiation, a clutch takes at least 24 days to hatch [25]. Due to these aspects of breeding biology, it was expected that female geese would show clear differences in movement and behavioural patterns during periods of incubation compared to other relatively static behaviours during the breeding season (May-September), e.g., roosting or moult. For a given day of incubation we do not expect the movement patterns and energy expenditure to differ between birds that fail prior to 24 days of consecutive incubations compared to those that ultimately go on to hatch a clutch, this is based upon direct observation of breeding birds [23]. If a brood survives, the young associate with their parents for at least their first non-breeding season [26], enabling visual identification of individually marked, successfully breeding adults on the non-breeding grounds in Britain and Ireland.

Device deployment and sampling schedules
Biologging collars were deployed on geese caught at baited sites using cannon nets in Scotland (31 devices), Northern Ireland (5 devices), Ireland (19 devices) and Iceland (9 devices) during September 2017-March 2020. Geese were sexed by cloacal examination in the field during post-capture handling. In total we tracked 64 individual breeding seasons from adult females with some individuals being tracked for up to 3 years. Ornitela GSM solar neck collar devices (model N38, Vilnius, Lithuania; 38 g) were used allowing on-board storage of data that can be remotely downloaded via the 3G network. Devices were programmed to collect GPS fixes at an interval of 15 min and ACC in 3-dimensions at a frequency of 10 Hz over a 3 s burst every 6 min. We also tracked four individual breeding seasons from adult males and one individual breeding season from an immature bird, which are known to not incubate or breed, respectively. They served as an additional form of methodological validation, full details are provided in additional file 3, but the following analysis uses adult female data only.
Incubation classification scheme (Fig. 1) A set of known successful breeders was identified through direct observations of birds with devices on the non-breeding grounds. Individual tagged birds could be recognised via unique painted codes on the collars, and if an individual was associated with a consistent number of juveniles over three separate resightings then it was deemed to have bred successfully. This is possible as juveniles spend at least their first winter in close association with their parents and has been verified with molecular genetics in the Light-bellied Brent Goose Branta bernicla hrota [27].
Due to the breeding biology of the geese there is an expectation that during incubation females will remain largely stationary and have very low energy expenditure. Direct observation of breeding GWfG has found that during incubation birds are mostly still or preening on the nest with only short recesses, cumulatively amounting to no more than 80 min [23]. An example of the movement and energy expenditure patterns of a known successful breeder can be seen in Additional File 1 (and accessed at Ref. [28]). GPS data were used to quantify daily movement patterns and ACC data to calculate overall dynamic body acceleration (ODBA), a proxy for energy expenditure [29,30], for each female goose when in Greenland. We used two metrics to describe daily movement patterns, (1) daily median net displacement (ND) and (2) the distance between successive median daily locations (DDIST). Net displacement is the straight line distance between the first fix of the day and all subsequent fixes on the same day [31]. For each day we calculated a median daily location expressed as median latitude and longitude values. These metrics were chosen as they cover movement both within and between days and are relatively insensitive to different sampling rates in comparison to those that can arise from summing step lengths [18]. The metrics were calculated using the amt [32] and geosphere [33] packages in R v3.6.3 [34] and all distances measured in kilometres were the shortest path between two points on an ellipsoid.
From the ACC data we calculated the ODBA for each burst, j, across the three axes (x, y and z) using the formula: x i represents the ith component and x̅ the mean of all n samples within burst j for the x-axis, and likewise for y and z.
To quantify the variation of ND, DDIST and ODBA values during periods of incubation and non-incubation the data from known breeders (n = 8 individual breeding seasons) were used as a training set for out classification. The incubation period for each known breeder was defined as the 24 consecutive days with the lowest total of average daily ODBA, while the individual was in Greenland as this is the likely minimum incubation period required to hatch a clutch [23]. A three day buffer on either side of the 24 day window was applied to represent uncertainty in the exact length of incubation due to individual variation in the duration required for a clutch to hatch [35]. All remaining days, while an individual was in Greenland, were designated as non-incubating (Fig. 2). For the incubating and non-incubating days we pooled daily values across all known breeders and calculated the 2.5th and 97.5th quantiles for ND, DDIST and ODBA during incubation and non-incubation. These quantile values were used to classify all remaining individuals for which reproductive outcome was undetermined. The classification Fig. 1 Flowchart depicting the joint classifier methodology used to classify breeding events from biologging devices collecting GPS and ACC data. Training values are calculated from a set of known successful breeders and these are used to create thresholds to identify periods of incubation in unknown breeders. The selection of top candidate incubations was only used for re-sampled data sets, not for the validation of the joint classifier system ( Fig. 1) used the following framework to create the joint classifier (a classifier jointly using GPS and ACC data): 1. Days during the breeding season were scored with a 2, 1 or 0 for each of ND, DDIST and ODBA. A 2 was assigned when the value for the day of interest was below the 97.5th quantile of incubation days from the training set. A 1 was assigned if the value for the day of interest fell in between the 97.5th quantile for incubation days and the 2.5th quantile for nonincubation days regardless of which quantile value was lower. A 0 was assigned if the value for the day of interest was greater than the 2.5th quantile for nonincubation days unless the 97.5th quantile for incubation days was higher, in which case it was assigned a 0 if it was above the 97.5th quantile for incubation days. A graphical representation of the scoring system is shown in Additional file 2: Fig. S2. 2. These scores for ND, DDIST and ODBA were added together to give a total score anywhere between 0 and 6. Each day was then classified as incubation if it scored a 6, indicating that it was below the 97.5th quantile for incubation days from the training set across all three metrics. 3. Days that scored 2-5 had some support for being incubation days but there was still some uncertainty. Therefore, if other days nearby had a score of 6 then the bird was classified as having started incubation and this would increase our confidence that days with scores of 2-5 were in fact incubation. Days with such a score could be classified as incubation if there was a day up to three days ahead or preceding that was already originally classified as incubation. This essentially allowed gap filling in candidate incubation periods if there was some support that those gap days had low levels of movement and/or energy expenditure. A three day window was chosen as it is often used for temporal interpolation in remote sensing time series [36,37]. Longer windows have been found to reduce accuracy of the interpolated time series [36] and shorter windows, i.e., 1 or 2 days, Fig. 2 Average daily ODBA during the breeding season for all GWfG in the training set that were known to have breed successfully. The 24 day consecutive period with the lowest total ODBA is assigned as the incubation period, a three day buffer is placed either side and all remaining days are assigned as not incubating. At the top of each plot the device code and the year of the breeding season is given appeared to occasionally classify incubations shorter than 24 days in the birds that were observed to have breed successfully on the wintering grounds.
This sometimes resulted in more than one candidate period of incubation being identified per individual. This only occurred once the data were resampled, when verifying the classifier using the full data set only one period of incubation was identified for each individual and these all commenced during the known incubation initiation period. The following set of criteria were used to select the most likely candidate incubation period for resampled data sets: 1. Any candidate incubation periods that did not contain any days with the top score (6 in this instance) were removed; 2. For any remaining candidate incubation periods, the one with the highest average daily score was chosen. This was calculated by summing the scores for all days in the candidate incubation and dividing by the number of days. The candidate incubation with the highest average daily score had the largest number of days with movement and acceleration patterns that matched the incubations in the training set.
As part of our test to examine the effect of data streams on classifier accuracy, classifiers that used only GPS or ACC data were tested. The classifier outlined above requires GPS and ACC data so the following alterations to the classification rules were made for the single data stream classifiers:

GPS-only classifier
1. To identify the incubation period for individuals in the training set the ND and DDIST values were both rescaled using the rescale function from the arm package [38]. The scaled ND and DDIST values were summed together and the 24 day consecutive period with the lowest total value was chosen as the incubation period. Again a three day buffer was allocated either side of this 24 day window and all remaining days were non-incubation; 2. The same scoring system is implemented as above but since there are now only two metrics each day can be assigned a score from 0 to 4. The 0-2 for ND and 0-2 for DDIST being summed; 3. If a day has a score of 4 then it is assigned as incubation as it was below the 97.5th quantile for incubation days from the training set for ND and DDIST; 4. If a day has a score of 2 or 3 then it can be labelled as incubation if there is a day up to three days prior or after that is already labelled as incubation from the previous step; 5. If more than one candidate incubation period was identified then the same set of rules as those in the joint classifier above were used to pick the period most indicative of incubation.

ACC-only classifier
1. The incubation period for known breeders is identified in the same manner as the combined GPS and ACC approach; 2. The same scoring system is then implemented as above but since there is now only one metric, each day can be assigned a score from 0 to 2; 3. If a day has a score of 2 then it is assigned as incubation. 4. If a day has a score of 1 then it can only be labelled as incubation if there is a day up to three days prior or after that is already labelled as incubation from the previous step; 5. If more than one candidate incubation period was identified then the same set of rules as those in the joint classifier above were used to pick the period most indicative of incubation.

Methodological validation
We validated our classification using the full tracking data set. Initially we dropped one individual breeding season in turn from the training data set, trained the classifier on the remaining individuals and tested it on the one excluded individual breeding season. In each instance we would expect that our classifier would identify an incubation equal to or longer than 24 days as this is the minimum duration required to hatch a clutch. This served to validate our approach but also determine if changing the individuals in the test set influenced our classification. We performed additional validation (see Additional file 3) on five individual breeding seasons, four males and one immature, which are both known to not incubate a clutch. Therefore, we expect our trained classifier to not identify any days as incubating for these four individual breeding seasons.

Sensitivity analysis: the effect of sampling interval
To assess the effect of data resolution we varied the sampling intervals of our biologging data streams and then re-ran our three classifiers to determine what effects this would have on the classification performance. Three components of our data set could be altered: (1) the interval between GPS fixes; (2) the interval between ACC bursts; and (3) the duration of the 10 Hz ACC bursts. We varied the GPS interval between 15 and 90 min, the ACC burst interval between 6 and 144 min and the duration of the burst was 1, 2, or 3 s. Sampling intervals could only be increased by increments that were multiples of the original sampling intervals (15 min for GPS and 6 min for ACC).
To assess the performance of classification for each sampling regime the results were compared to those obtained from running the joint classifier with the full data set for all individuals not in the training set (reference incubations). For individuals with unknown breeding status, every day during the breeding period could be classified as a true positive (classified as incubation in the full data set and resampled data set), true negative (classified as not-incubation in the full data set and resampled data set), false negative (classified as incubation in the full data set but non-incubation in resampled data set) or false positive (classified as non-incubation in the full data set but as incubation in the resampled data set). We then calculated a single precision and recall value for each resampled data set (formulas below), which are commonly used in machine learning approaches [39,40] to assess the performance of the classifier for each resampled data set. In addition we performed two other tests to assess the performance of classification: (1) the number of incubations identified using the resampled data set, where there was no corresponding incubation for that individual breeding season in the reference incubation set and; (2) the average number of misclassified days (false positive plus false negatives) per individual breeding season for each resampled data set with individuals breeding seasons being grouped according to the length of the incubation in the reference incubation set. We acknowledge that our reference incubations are not 100% accurate and they may deviate from the true incubation periods slightly. Small reductions to recall and precisions could be due to this deviation but we argue that large reductions are suggestive of longer sampling intervals no longer being able to discern incubation from other behaviours.

Methodological validation
The joint GPS and ACC classifier using the full data set identified 14 individual breeding seasons with no Precision = truepositives truepostives + falsepostives Recall = truepositives truepositives + falsenegatives incubation, 33 individual breeding seasons that failed during incubation and 9 individual breeding seasons that likely hatched a clutch (as they incubated for at least 24 days). No individual breeding seasons were classified with more than one candidate incubation period. When testing the validity of the classifier on the full data set by dropping each individual breeding season sequentially from the training set we found that 28 day incubations were classified for three individual breeding seasons, 26 day incubations for four individual breeding seasons and a 24 day incubation for one individual breeding season (for examples see Fig. 3). These incubations all fell in the known duration for successful breeders and were all initiated within the known date range for this species [23]. In addition we found that four tagged males and one immature did not have a single day classified as incubating, which is expected as male GWfG do not incubate the clutch [23] and immatures do not breed (Additional file 3).

Sensitivity analysis: the effect of sampling interval
As the sampling interval of GPS fixes and ACC bursts increased the precision and recall of the joint classifier declined (Fig. 4, Additional file 2: Tables S1-S3). This decline was more pronounced when measured in terms of precision (Fig. 4A) as opposed to recall (Fig. 4B), suggesting our method is more prone to classifying a day as incubation when it is actually non-incubation as opposed to vice versa. However, even at the maximum interval lengths tested for both GPS and ACC data, the combined classifier did not fall below a precision of 0.94. The decline in precision commenced when ACC burst interval exceeds 24 min, before this break point changing the sampling interval of GPS fixes and ACC burst results in almost no change to precision. Precision appears to vary more with ACC sampling interval as opposed to ACC burst length, although the range of values over which we could vary burst length was more limited. In comparison to the joint classifier, the GPSonly and ACC-only classifier performed worse in terms of precision and recall (Fig. 5, Additional file 2: Tables S6-7), although the GPS-only classifier performed similarly to the joint classifier in precision when the ACC burst interval exceeded 96 min. The GPS-only classifier clearly outperformed the ACC-only classifier in terms of precision and recall. The ACC-only classifier had a sharp drop off in recall when the burst interval exceeded 72 min. The DDIST metric used in the GPSonly classifier was able to discriminate clearly between incubation and non-incubation days across all GPS sampling rates (Additional file 2: Fig. S2).
When assessing the performance of our joint classifier in terms of whether it failed to identify reference incubations or added extra incubations (compared to the reference incubations) at longer sampling intervals, we found that only one reference incubation (two days in duration) was missed at ACC sampling rates > 72 min. Three individuals assigned to breeding deferral (no incubation days) in the reference incubations, were assigned incubation days under longer sampling intervals (Fig. 6, Additional file 2: Table S4). In two individuals this occurred when ACC sampling interval > 72 min or GPS sampling interval > 75 min. In the third individual, a two day incubation period was identified across a variety of sampling rates. Both single data-stream classifiers assigned incubations that were not identified by the joint classifier with the highest resolution data set (Fig. 6). The GPS-only classifier classified an incubation event in 6-8% of individual breeding seasons when no corresponding incubation was identified in the full data set. The ACC-only classifier assigned non-corresponding incubations in 6-8% of individual breeding seasons when the sampling interval was below 24 min, rising to 19-24% of individual breeding seasons when the sampling interval exceeded 72 min. For both single data-stream classifiers the occurrence of multiple candidate incubations per individual increased as the sampling interval increased. For the GPS-only classifier, additional candidate incubations were often identified during the main moult period (after 1st July) but were removed by the classifier when selecting the top candidate incubation period. If the sampling interval exceeded 48 min then the ACC-only classifier had to consider multiple candidate incubation for almost all individuals. Often two candidate periods of incubation were separated by a one to three day gap of non-incubation. In the reference incubations both of the incubation periods and the gap were identified as a single incubation period so a non-incubation gap in the middle had been wrongly detected by the ACC only classifier.
How the accuracy of the classifier varied over different incubation lengths (from the reference incubations) was also examined (Fig. 7, Additional File 2: Table S5). We assigned each individual breeding season to one of the following categories according to the classification in the reference incubations; deferral (0 days), 1-5 days, 6-10 days, 11-24 days and hatched (> 24 days). The rates of misclassification were similar across all of the categories (Fig. 7) but were perhaps highest for the deferral category.
The quantile values for ND, DDIST and ODBA that were used in the classifier at each sampling rate can be found in Additional file 2: Fig. S2). The mean values of these metrics varied little with sampling rate, but we do observe increases in the inter-quantile range for incubating and non-incubating days as sampling intervals increased and at large intervals the inter-quantile ranges began to overlap.

Discussion
Our approach utilises GPS and ACC data to reconstruct avian breeding events with daily resolution and our rule-based classification model was verified using a validation set of known breeders [41]. This improves on the resolution of many previous approaches to classify breeding patterns in birds [9][10][11] and is at least comparable to the resolution achieved by Schreven et al. [16]. We then tested how the accuracy of breeding event classification is affected by collecting GPS and ACC data in combination and isolation while varying the sampling intervals. There were declines in the precision and recall of classification as the sampling interval increased and GPS data in isolation seemed to outperform ACC data in isolation. Collecting ACC bursts at intervals shorter than 24 min and GPS fixes shorter than 60 min combined, produced only minor improvements to precision.
Our approach and that of Schreven et al. [16] offer improvement over previous efforts [9,10] in being able to identify short breeding attempts (less than three days) which allows a distinction to be made between breeding deferral and early nest failure. We build on these recent developments by providing guidance in terms of data Fig. 4 Precison and recall of a classifier using GPS and ACC data to determine incubation lengths. The interval between each GPS fix and ACC burst was varied along with the duration of each ACC burst. The precision (a) and recall (b) of each resampled data set was compared to the output from the classifer when using the full data set (GPS fix every 15 min and a 3 s ACC burst every 6 min). For the precison plot a break point (black dashed line) is marked, where the ODBA sampling interval is 24 min requirements, classifier automation and a simpler implementation. GPS and ACC data can be used in isolation with only a few minor changes to our classification. The classifier itself does not require any statistical models making it easier for conservation practitioners to implement and interpret. Classification is accomplished via a series of thresholds that are calculated automatically from the training set. This automation allowed resampling of our biologging data and application of the classifier on numerous sampling interval combinations.
Decreasing the temporal resolution of the data resulted in non-linear decreases in the precision and recall of the classifier when identifying breeding events. Declines in precision were much larger in comparison to recall suggesting there is a tendency to classify non-incubating days as incubating as opposed to vice versa. This is likely due to the static nature of females during incubation being confused with similar lack of movement that might be expected during roosting or low mobility moult periods. The high resolution at which we originally collected our data is not required to achieve precision and recall > 98%. For instance, a GPS fix interval of 60 min and a 1 s ACC burst every 48 min give precision and recall above 98.5% but a data set less than 5% of the original size. This would result in reduced risk of draining battery power as less data would have to be created, written to memory and then possibly sent via remote download.
GPS-only and ACC-only classifiers have utility in identifying incubation events but care needs to be taken if there are other extended periods of low movement in the annual cycle when choosing sampling regimes. Accuracy of the GPS classifier remained high across all sampling intervals but periods of flightless moult began to be assigned as candidate incubations at longer sampling intervals. However, candidate incubations during the moult period generally had low average daily scores so were not selected as the 'top incubation' and, therefore, precision and recall remained comparable to the joint classifier. For the ACC-only classifiers, a burst interval greater than 48 min caused low recall and precision, and large numbers of candidate incubation periods. When this sampling interval was exceeded, activities such as Precison and recall of classifiers using a single biologging data stream, to identify incubation periods. These single data-streams are compared to a classifier that uses both GPS and ACC across a variety of different sampling intervals recesses and preening likely became disproportionately represented in the daily average ODBA values causing false negatives and lowering recall. Periods of low movement during moult were also incorrectly classified as incubation causing false positives and lowering precision. For many bird species moult does not entail an extended flightless periods, e.g., most birds of prey [42], and in these instances daily movement would be less impacted and we would expect easier separation between incubation and feather moult periods. Overall fewer GPS fixes than ACC bursts were required to achieve comparable accuracy in our single data stream classifiers but GPS data are much more battery intensive to collect and ACC generally creates more data which is then more battery intensive to send remotely. When also factoring in the weight of GPS and ACC sensors the choice between the two data streams needs careful consideration.
Using biologging data to identify breeding events in birds and subsequently calculate daily nest survival has benefits even when direct nest observation is possible.
The devices allow for 24-h monitoring of individuals, providing a number of advantages over observational approaches: (1) individuals can be monitored in remote regions, where directly finding nests is not possible, which may be particularly useful for migrants that are readily caught in non-breeding ranges and can be tracked to remote breeding grounds (e.g., geese, waders and ducks breeding in remote Arctic regions but readily caught on non-breeding grounds at lower latitudes in America, Europe and east Asia); (2) biologging provides the opportunity to identify breeding deferral, because individuals instead of nests are monitored which, can be important to measure to understand drivers breeding success in a number of groups which regularly defer breeding, e.g., ducks [43], gulls [44] and terns [45]; (3) individuals are tracked year-round, giving detail on movements prior to the breeding season and allowing an assessment of how carry-over effects such as pre-breeding season site choice and migratory phenology affect breeding outcomes and (4) a sample Fig. 6 Number of additional erroneous incubations that were identified at various sampling regimes but were not identified in the data set with the highest resolution. The classifier which used GPS and ACC data is compared to classifier that only used GPS or ACC data of nests are monitored with biologging that are unbiased in relation to habitat and location which can influence nest detectability [46] and ultimately lead to biased estimates of nest survival or density [47]. Direct observational approaches often miss nests that fail during the first few days of incubation and the exact failure date is often unknown and has to be approximated to the period between the final two observations. There are statistical approaches that can address these shortcomings [48][49][50] but ultimately still inflate variances in daily nest survival estimates [51].
Continuing technological advancements to reduce device size [52] will almost certainly allow our approach to be applied to a wider range of bird species in the near future. Although the need for joint GPS and ACC data at reasonably high temporal resolutions likely prevents our approach being applied to small and mid-sized passerines. As of 2022 there are 9 g devices available with GPS and ACC capabilities which, could be attached on birds > 450 g, e.g., Eurasian Oystercatcher Haematopus ostralegus (if using a 2% of body mass rule [17]). There are number of life-history traits that our approach lends itself too. First, it requires species that breed in relatively open habitats, e.g., grassland, savannah and scrub, although GPS signal can still be acquired in forests [53]. Burrow or cavity nesting species are unlikely to achieve GPS signal when on the nest and other devices that monitor light, e.g., geolocators, are likely more appropriate to monitor incubation [54]. Second, it requires species with single parent incubations strategies or regular switching between parents. Species where the parents take multi day foraging trips, e.g., Procellariiformes seabirds, will provide poor estimates of nest survival due to the large window nest failure could have occurred within. Third, similar recess and movement patterns during incubation to GWfG studied here are required to make full use of our sensitivity analysis. Longer and more frequent recesses will decrease the distinctiveness of incubation compared to other behaviours and likely require higher resolution data to distinguish, fewer shorter recesses will likely have the opposite effect. For bi-parentally incubating species, where only one parent is tagged, the point of nest failure will be uncertain as nest failure may occur, while the other parent is incubating. In these instances the nest failure point can be modelled as a window rather than a fixed point within a 'time-to-event' type nest survival model. Finally, it requires species, where informed decision on the movement and ODBA thresholds can be made. This requires species, where successful breeding can be determined post-breeding, like in our case, or where a small group of individuals with devices can be monitored during breeding to create a training data set. If this is not possible then studies on waterfowl could use similar threshold values to the ones we use here.

Conclusions
We classified incubation events with near daily resolution and resampled the tracking data showing that a classifier using GPS and ACC data had the highest accuracy followed by GPS-only and ACC-only classifiers. Our study will guide other researchers in optimising methodologies and the duty cycles of devices when determining breeding status in birds [10,16]. Nearly all birds actively incubate a clutch of eggs, but there are differences in nest recess patterns and parental incubation strategies between species. We argue our approach is sufficiently generalizable and requires data sets much smaller than the one we collected (< 95%) to be applicable across a range of birds. If a group of individuals can be tracked then it will provide unbiased estimates of avian nest survival and serve as a practical alternative to direct observational approaches. The derived estimates of nest survival are a vital component in demographic modelling and assessing avian conservation strategies.
Additional file 1: Animation depicting movement and energy expenditure of a successful breeder. Figure S1. Average daily energy expenditure, measured in ODBA, and a movement animation during the breeding season of an adult female Greenland white-fronted goose that was known to have breed successfully. In the left hand plot a purple line represent the incubation period and an orange line represent none incubation periods. Both the graph and animation run over the same period of time in summer 2018. The video can also be accessed at: https:// youtu. be/ ZIc38 0VppDM.

Additional file 2:
Additional plots and tables to support results section. Figure S2. Quantile values from training data. Quantile values extracted from the training set of known breeders for incubation and non-incubation days during the breeding season. The daily values were pooled across all individuals and the 97.5th and 2.5th quantiles calculated for incubation and non-incubation days. The classification and scores in brackets relate to how individuals with unknown breeding status were scored in the classifier. Table S1. Precision and recall values: joint classifier 3s burst. The precision and recall values when classifying avian incubation events using a joint classifier across a variety of sampling schedules for accelerometer data (ODBA interval) and GPS data (GPS interval). The results are shown for a 3 s accelerometer burst. Darker shading indicates larger values and, therefore, a more accurate classification. Table S2. Precision and recall values: joint classifier 2s burst. The precision and recall values when classifying avian incubation events using a joint classifier across a variety of sampling schedules for accelerometer data (ODBA interval) and GPS data (GPS interval). The results are shown for a 2 s accelerometer burst. Darker shading indicates larger values and, therefore, a more accurate classification. Table S3. Precision and recall values: joint classifier 1s burst. The precision and recall values when classifying avian incubation events using a joint classifier across a variety of sampling schedules for accelerometer data (ODBA interval) and GPS data (GPS interval). The results are shown for a 1 s accelerometer burst. Darker shading indicates larger values and, therefore, a more accurate classification. Table S4. Additional incubation form joint classifier. The number of additional incubation classified, in comparison to the reference incubations, using a joint classifier across a variety of sampling schedules for accelerometer data (ODBA interval) and GPS data (GPS interval). The results are shown for a 3, 2 and 1 s accelerometer bursts. Darker shading indicates larger values and, therefore, a more additional incubations. Table S5. Average days misclassified. The average number of days misclassified per individual breeding season, in comparison to the reference incubations, using a joint classifier across a variety of sampling schedules for accelerometer data (ODBA interval) and GPS data (GPS interval). The results are grouped by the outcome of the individual breeding season in the reference sample, i.e., '1-5 days' is a 1-5 day incubation in the reference sample. Darker shading indicates larger values and, therefore, a greater misclassification rate. Table S6. Precision and recall GPS only classifier. The precision and recall values when classifying avian incubation events using a GPS only classifier across a variety of sampling schedules for the GPS data (GPS interval). Darker shading indicates larger values and, therefore, a more accurate classification. Table S7. Precision and recall ACC only classifier. The precision and recall values when classifying avian incubation events using an Accelerometer only classifier across a variety of sampling schedules for the ODBA data (ODBA interval) and length of the accelerometer burst (Burst length). Darker shading indicates larger values and, therefore, a more accurate classification.
Additional file 3: Validating the methodology using males and immature birds. Figure S5. Breeding classifications of male and immature GWfG during the breeding season plotted against the average daily ODBA values. The dashed line represents the 95th quantile of average daily ODBA values for incubating days, determined using a set of known breeders. Below this value days are given a score of 2 (see main text). (PIFO47, UCOL35, UCOL37 and UCOL40 are males. 17812 was an immature). Figure S6. Breeding classifications of male and immature GWfG during the breeding season plotted against the distance between successive median daily locations (DDIST). The dashed line represents the 95th quantile of DDIST values for incubating days, determined using a set of known breeders. Below this value days are given a score of 2 (see main text). (PIFO47, UCOL35, UCOL37 and UCOL40 are males. 17812 was an immature). Figure S7. Breeding classifications of male and immature GWfG during the breeding season plotted against daily median net squared displacement (ND). The dashed line represents the 5th quantile of ND values for not incubating days, determined using a set of known breeders. Below this value days are given a score of 2 (see main text). (PIFO47, UCOL35, UCOL37 and UCOL40 are males. 17812 was an immature) Additional file 4: BTO permit. Permit for catching, handling and attaching the biologging devices used in this study onto Greenland White-fronted Geese