Maximising the value of transmitted data from PSATs tracking marine fish: a case study on Atlantic bluefin tuna

Background The use of biologging tags to answer questions in animal movement ecology has increased in recent decades. Pop-up satellite archival tags (PSATs) are often used for migratory studies on large fish taxa. For PSATs, movements are normally reconstructed from variable amounts of transmitted data (unless tags are recovered, and full data archives accessed) by coupling geolocation methods with a state-space modelling (SSM) approach. Between 2018 and 2019, we deployed Wildlife Computers PSATs (MiniPATs) from which data recovery varied considerably. This led us to examine the effect

One of the fundamental challenges in early animal tracking studies was the need to either physically recover tags, or actively follow tagged animals to obtain data and reconstruct movements [27].Since 1978, however, the Argos data collection and location system has provided a means to relay data from transmitting tags via satellites [28], removing some of the need to recover them.Pop-up satellite archival tags (PSAT) are geolocating tags designed for use in the marine environment [29].PSATs are attached externally to study animals for programmable deployment lengths, based on their capacity to store and transmit data [30].They record a continuous archive of data on a range of environmental parameters, typically including light, pressure (depth) and temperature, before detaching ("popping-up") from the study animal and transmitting a summary of their logged data to Argos satellites.The continuous archive is stored on the tag's hard drive and can be downloaded if physically recovered, always representing the most complete data set from a PSAT deployment.For tags that are not physically recovered, the limited Argos bandwidth requires that the data archive is compressed on-board and divided into smaller, transmissible data messages.The specifics of these processes vary between manufacturers and tag models.For Wildlife Computers MiniPATs, each unique geolocation message contains tag-sensed "observations" of light and depth, and temperatures of the "sea surface" (i.e., the temperature at the shallowest depth) for a specific day, although depending on their programming, tags can also send messages containing information on behaviour (e.g., time spent in predetermined depth and temperature bins).These messages are transmitted to overpassing satellites in the Argos constellation [31].For PSATs, movements are typically reconstructed from data at the end of the deployment using light-based geolocation [32,33], whereby light curves (identified on board the tag) are used to estimate locations generated either by threshold, [34] or curve-fitting methods [35,36].For this geolocation method, location estimates can be improved by incorporating (i) tag-sensed sea surface temperature (SST, [37][38][39]) or profiles of depth and temperature [40] with modelled oceanography, and, (ii) tag-sensed depth with known bathymetry.
Despite being an invaluable tool to study many species, some PSAT features impact the quantity and quality of data recovered from them.First, the need to keep devices small [41] requires the use of small batteries that limit the mass of the tag, but also limit operational lifetime.As the energy required for data transmissions is considerable, this can constrain the volume of data that can be transmitted [23,42].Second, PSATs are not infallible, and tags can fail to transmit completely [23,43,44].Third, limitations associated with transmission of data, linked to the latitude of transmission (greater coverage at higher latitudes), biofouling, antenna breakages or even bad weather mean that the proportion of data that is successfully decoded after transmission can vary greatly [23,45].Finally, due to their sophistication, PSATs are expensive ($000 s per tag), which imposes a limitation on the number that can be purchased and deployed.These issues can be accounted for, in part, by careful study design (see [23]) but the reality in tracking studies using PSATs is that partial data sets are a common occurrence.
To deal with partial or fragmented data, as well as the complex error structures associated with geolocation [46] modern methods utilize state space models (SSMs) to reconstruct movements from tag data [47].SSMs combine several observation models (e.g., derived from light, temperature, and depth data) to form a data likelihood.This data likelihood model is coupled with a movement model to provide the most probable locations (and their uncertainty) at which the original observations were recorded [47].In the case of some movement SSMs, the user inputs a prior assumption on how the tracked animal moves to inform the process model-e.g., movement speed.For Wildlife Computers tags, the "Global Position Estimator 3" (GPE3, [48]) enables setting user-defined movement speeds: greater speeds allow for larger steplengths between successive locations and, conversely, lower speeds limit step-lengths and may constrain movements.In a movement SSM context, the output is a discrete time series of locations between a known start (i.e., deployment location obtained with a GPS device) and endpoint location (i.e., pop-up location derived through Argos using the Doppler Effect), with computed spatial probabilities.The GPE3 is one example of such a lightbased geolocation SSM and other models exist (e.g.[2,38,40,49,50]).Light-based geolocation can have spatial errors of 100 s of kilometres [36,38,51,52], and it is widely appreciated that SSMs generate more accurate estimates when supplied with greater volumes of highquality data [47].Understanding how data volume and SSM accuracy co-vary in real-tracking scenarios could aid researchers in maximising the inclusion of PSAT data sets of varying data volumes, limiting the risk of spurious results and thus increasing the ecological inference of the resulting outputs.
Atlantic bluefin tuna (Thunnus thynnus, ABT) have a broad vertical niche and dive extensively [53][54][55] yet routinely spend time in epipelagic waters, where twilights can be reliably detected [55,56].They have also been tracked using PSATs for over two decades [31] and deployments of a year or more provide a broad temporal range of data [56,57].As part of a larger tracking project (Thunnus UK, www.thunn usuk.org) we deployed PSATs (Wildlife Computers MiniPATs) on ABT between 2018 and 2019.PSATs transmitted variable amounts of data, ranging from no or low volumes of data to nearly all expected transmissions.Low data volumes can afflict tagging studies for a number of technical reasons, and our experience of this issue prompted us to investigate the effects of data quantity on movement reconstructions using ABT as a case study.A proportion of these tags were physically recovered, enabling the recovery of the full archive contained in the tag's memory.The goals of this analysis were, therefore, twofold: (i) to investigate how track reconstruction with the GPE3 varied when provided differing amounts of geolocation data, and (ii) to inform whether tags transmitting limited data should be excluded from spatial analyses.Here, we use Wildlife Computers software and hardware, specifically, but where suitable make inferences that may be applicable to other PSATs and SSMs.

Electronic tagging
PSATs were deployed on ABT caught with rod and reel using either dead baits or trolled surface lures in territorial waters of the United Kingdom in 2018 (n = 10) and 2019 (n = 26).Fish were brought to the research vessel by professional anglers, and measuring, application of tags and tissue sampling was conducted on board by licensed scientists.Whilst on board, ABT were rested on a salt water rinsed padded vinyl mat, eyes were covered with a cloth soaked in a fish slime replacement, and freeflowing saltwater was used to irrigate the gills via a hose.PSATs were attached externally using two intramuscular titanium darts following the methods of Wilson et al. [57].Post-tagging procedure, individuals were revived in-water by towing at 2-3 knots with the head facing the direction of travel.Upon showing visible signs of recovery (i.e., a strong tailbeat), the tagged fish were released.All research was conducted under license from the UK Home Office.

PSATs
PSATs ("tags" hereafter) were Wildlife Computers Mini-PATs (model 348-F), which archived sensor data on ambient light, depth and temperature every 5 or 15 s.For transmission, the archive is summarised by the tag's microprocessor and summarised into several data message types that can be transmitted, with geolocation messages being one example.If geolocation data are enabled during programming, one geolocation data message is generated per tracking day containing data on SSTs (i.e., temperatures associated with the shallowest readings), the depth of the animal during the sampling of the light curves, and ambient light levels during twilights [58].After popping off the fish, tags transmit a data message every 60 s in a chronological cycle, starting at the deployment date.When all messages have been transmitted once, the chronological loop restarts at the first message, and this continues until battery exhaustion.No data message type (e.g., geolocation or behavioural messages if they are enabled) has greater transmission priority than another.

PSAT programming
Tags were programmed for deployments of between 314 and 366 days.All tags were programmed to auto-detach if they remained at a constant depth (± 2.5 m) for 3 days or if sensors indicated that the tag was at or deeper than a depth threshold (either 1400 or 1700 m).Tags were programmed to generate a geolocation data message and between 2 and 4 other messages (e.g., profiles of temperature and depth) per tracking day (geolocation data proportions ranged between 0.17 and 0.29 of total data messages).The total number of data messages generated over a complete deployment to be subsequently uploaded via the Argos system was capped at either 1400 (n = 19 tags) or 1800 messages (n = 17 tags) in line with manufacturer guidance (based on expected data recovery).Some tags (n = 12) were physically recovered after popping off, allowing download of a complete archive data set (see inset box for terminology).Recovered tags were left outside with clear view of the sky until battery exhaustion.

Terminology
Transmission period: The length of time a tag transmitted after having detached from the study animal.

Data received:
The amount of data received and successfully decoded (ignores corrupt messages) by a researcher from a transmitting PSAT.This can be either total data (i.e.geolocation data and all other data) or just geolocation data.Where appropriate these differences are stated.
i. Transmitted dataset: A summary dataset received from a transmitting PSAT via Argos (i.e.pieced back together from individual data messages).ii.Archive dataset: A dataset only available by physically downloading from a recovered tag (1-15 s resolution).

Data volume:
The proportion of a tag's summarised data archive recovered (either from transmission or recovery).This can be in a total (i.e.all behavioural and geolocation messages) or geolocation-only context (i.e.only geolocation messages).
i. Partial dataset: A part of a summarised PSAT dataset of a standardised data volume, created by post-processing.A fragmented record of all data for a given deployment.ii.Complete dataset: A dataset containing 100% of summarised data generated either from a recovered tag being processed on the Wildlife Computers portal (as in this study) or by successfully recovering all unique data messages from a transmitting tag (no tag in this study).A contiguous record of all data for a given deployment.

GPE3
The GPE3 is proprietary software of Wildlife Computers [48] and is based on the SSM as described in Pedersen et al. [59,60].It has been widely applied to fish tracking studies [56,[61][62][63][64][65].The authors are end-users of this software and had no additional controls over GPE3 or its outputs.For the purposes of this research additional information on the GPE3 was requested from Wildlife Computers, which is also included as a supplement.First, tag data are processed for geolocation modelling.The GPE3 requires specific file formats as inputs, so both transmitted and archive data sets are pre-processed into the same format using the Wildlife Computers online portal (https:// my.wildl ifeco mpute rs.com), which is also where researchers access the GPE3 and its resultant outputs.Briefly, the GPE3 SSM is as follows: three observation models comprise the data likelihood, calculated on a time-discretised grid with a cell size of 0.25° × 0.25°.These are, (i) Known locations, in our case from a GPS at the time of release (latitude, longitude and time in UTC) and an Argos-derived location after the tag had detached (latitude, longitude and time in UTC).(ii) Daily twilights annotated onboard the PSAT by selecting the daily periods when delta light is greatest.The time of the twilight (in UTC) and rate of change (i.e., shape of the light curve) is used to provide an algorithmically derived position for longitude and latitude, respectively, the "template fit" method, [66].(iii) SST likelihoods, tag SST observations are derived from the thermistor and depth sensor readings and are taken from the shallowest depths sampled closest in time to the start of the dawn or dusk and compared with the NOAA OI SST V2 High Resolution Data set (https:// psl.noaa.gov/ data/ gridd ed/ data.noaa.oisst.v2.highr es.html) provided depth is no greater than 10 m.Likelihood surfaces are refined by applying a bathymetric mask using the ETOPO1 1 Arc-Minute Global Relief Model [67] to exclude cells where bathymetry was shallower than the maximum recorded depth value.The movement model is a random walk diffusion model with step lengths limited by a user-defined speed parameter.In our case, ABT movements were reconstructed using a user-specified movement speed of 2.5 m s −1 .Altering this parameter can dramatically change outputs and conducting prior analyses is an important precursor to final parameter selection.Whilst not a key goal of this study, varying movement speed was investigated, and 2.5 m s −1 defined analytically as outlined in "Assessment of geolocation outputs".Finally, most likely locations are calculated by the HMM using a forward-backward algorithm, coupling the observation model with the movement model.The outputs from the GPE3 used in this study, were (i) most likely latitudes and longitudes.(ii) A NetCDF file containing Geo2D arrays of 0.25 × 0.25-degree gridded probabilities (summing to 1) for each 12-hourly location.Other components are available in the NetCDF and in a Google Earth file but were not used in this analysis.

PSAT data processing
For intra-tag data volume comparison, only PSATs that had transmitted until battery exhaustion and were physically recovered were used (n = 12 tags).For these tags, we calculated data percentages by dividing the total number of unique messages received (either for just geolocation or all data) by the number of messages generated and multiplying by 100.Due to variations in data transmission (e.g., one tag transmitting 49% data, and another only transmitting 41%), transmitted data sets were serially decomposed to create aliases with smaller volumes of geolocation data at uniform intervals ("partial data sets"; e.g., a tag that transmitted 46% of geolocation data was decomposed into 5%, 10%, 20%, 30% and 40% geolocation volume aliases).In a real tracking scenario, the tag transmission schedule and the orbit of Argos satellites are non-random and define which messages are transmitted and subsequently received and decoded (see [23] for an example of this).To mimic a real scenario as closely as possible (i.e., tag batteries exhausting after a certain time elapsed), we chose to create data aliases by capping the transmitted data set for a given tag by the time taken to transmit n% of data (e.g., for 17P0786, 5% = 0.6 days and 10% = 1.1 days etc.Additional file 1: Fig. S2), rather than selecting data at random from a complete data set (given the non-randomness of the process).For this reason, only transmitted data sets were used in the intra-tag data comparison.Not all tags transmitted the same volume of data, and therefore, more tags were in the 5% compared to the 40% data groups, for example (sample sizes are provided in results).The resulting data sets from this decomposition are termed "partial data sets", which were then used for comparison with the "complete data sets" (Box 1).The most probable locations derived from partial data sets were then compared with the most probable locations derived from the complete data set for each pair of partial and complete data sets for a given tag (e.g., 5% and 100%).All data sets referred to were processed as outlined in the "GPE3" section.

Assessment of geolocation outputs
To account for spatial domains varying with distance travelled (i.e., the further animals move, the larger the possible errors for SSMs), fish were analysed in two groups based on complete data sets: fish that dispersed over distances less than 1000 km from respective tagging locations (straight, geodesic, line between deployment and furthest point; 'short distance fish') or fish that dispersed over distances over 1000 km ('long-distance fish'; Additional file 1: Fig. S6).GPE3 outputs were assessed by comparing movements reconstructed using complete data sets with the corresponding movements from each partial data set (i.e., 5%, 10%, 20%, 30% and 40%) in five ways: (i) spatial similarity (termed "τ"; between 0 and 1).Briefly, τ = the square root of the proportion of overlap between the total 12-hourly 99% likelihood surface (i.e., all likelihood polygons merged and dissolved for a data set) with 1 being a perfect match (see Additional file 1: Fig. S3 for full description); (ii) spatial uncertainty, comparisons between the area of 99% likelihood surfaces (i.e., "uncertainty", see Additional file 1: Fig. S4 for full description); (iii) distance, we calculated the great circle straight line distance between time-matched locations and also expressed differences by calculating root squared mean distance (i.e., see Additional file 1: Fig. S4 for full description); (iv) the difference in tag sensed SST and SST at the reconstructed location (from the NOAA Optimum Interpolation SST analysis, https:// www.ncdc.noaa.gov/ oisst, ", Additional file 1: Fig. S5); and, (v) entry to the Mediterranean Sea. Adult ABT from the eastern stock are known to visit the Mediterranean Sea between May and July [17,56].If complete data set reconstructions placed ABT within the Mediterranean Sea (i.e., a location was present in the Mediterranean Sea basin east of 5.5°W), the data volume at which this habitat was first occupied is indicated.This approach (i.e., "v") was used to investigate the effect of altering the user-defined speed parameter on GPE3 track reconstructions and also to define the correct movement speed for all intra-tag comparisons.In this analysis complete data sets were re-processed by GPE3 at movement speeds ranging from 0.5 to 5 m s −1 , at 0.5 m s −1 increments and resulting movements compared.We decided to choose the lowest speed at which ABT were all placed in the Mediterranean Sea to avoid specifying biologically unrealistic movement speeds for GPE3 runs.If ABT did not enter the Mediterranean Sea at any data volume or movement speed, this metric was not included.

Data analyses
All data analyses were conducted in R [68] and mapped in QGIS [69].Errors, where stated, are reported as ± 1 Standard Deviation.A student's paired T test [70] was used to investigate differences in proportions of total data and geolocation data received between tags.A generalised linear model (GLM; Poisson family, [70]) was used to investigate the influence of tag transmission time, deployment duration and total messages generated on the proportion of data finally recovered via the Argos System.The GLM with the best fit to data was selected by removing individual parameters (predictors) using the "Drop1" function in R, [71] and comparing the resulting model with the null model using a Chi-squared likelihood-ratio test.
To investigate the relationships between data volume and movement group (short-or long-distance fish) on (i) spatial uncertainty, (ii) distance, and (iii) difference in SST, generalised linear mixed effects models (GLMM; Poisson family) were fit to data using the package "lme4" in R [72] with tag ID as a random variable.The significance of fixed effects in models were estimated by first calculating the t-statistic (coefficient/standard error of coefficient) and then comparing it to critical values (corresponding to the desired significance levels) representing the thresholds beyond which the null hypothesis is rejected in favour of the alternative hypothesis.
Temporal autocorrelation was tested for by conducting a Durbin-Watson test on uniformly scaled residuals using the function testTemporalAutocorrelation function in the DHARMa package in R [73].If a significant test result was returned (as for both spatial uncertainty and distance), the data set in question was proportionally reduced (e.g., by 10%, 20% and 30%) by random subsampling and retested until a non-significant test result (at the P ≤ 0.01 level) was obtained (Additional file 1: Fig. S12).Due to the lower number of data points and nonnormal data distribution, the relationship between spatial similarity and movement group and spatial similarity and geolocation data volume were investigated using a nonparametric Kruskal-Wallis test and Pearson's product moment [74], respectively.To investigate the relationship between number of messages to transmit and time taken to transmit a data threshold a GLM was fitted to data as previously described for Argos data.For full data volume analytic pathways see Additional file 1: Figs.S1-S6.

Data proportion
Patterson and Hartmann [23] demonstrate how sampling regime (summary period in hours) and deployment length influence the proportion of data received from a transmitting PSAT.Here, we develop this idea and investigate a variation on this principle-the time taken to transmit proportion x of geolocation data (t; i.e., a specific data type).This is a function of the amount of geolocation data to recover (d x ) and the rate of data recovery (r).The total amount of geolocation data is obtained by multiplying the rate of data creation (β, messages per day for specific data type) by the deployment length in days (l).Thus proportion d x is calculated as follows: The rate of data recovery (r) is a function of the rate of message recovery via Argos, which is latitude dependent (m lat -messages received per hour and successfully decoded), the overall proportion of novel messages (σ, i.e., excluding duplicates) and the proportion of the total messages generated required (α, i.e., geolocation messages/total messages): (1) Therefore, the time in hours taken to transmit data proportion d x is calculated as follows: Using Argos satellite pass and message data for the tags in this study, we calculate mean latitude of transmissions (calculated using radio-frequency Doppler shift methods), rate of data recovery via Argos satellites (satellite passes per hour and messages per hour) and the proportion of duplicate messages.For the purposes of this investigation, we use grand mean values where suitable to investigate how data recovery varies with both deployment length and programming regime (geolocation messages/total messages).

Tag deployments
Tags were deployed on ABT between the 4th of October 2018 and the 18th of November 2019 (n = 36, Table 1) and 29 detached and reported data (81%; 29 of 36).To obtain the full archive of data, 12 tags were physically recovered after having detached from their study animals.Seven of the 29 tags did not report (19%).Mean deployment length for reporting tags was 280 ± 98 days (range 106-366 days) with 18 tags (62%) remaining attached until their programmed pop-up date, whilst 11 tags (38%) detached prematurely.Long-distance fish were typically tracked for longer than short distance fish (mean track length = 342 ± 24 days versus 182 ± 116 days, respectively).
Transmission of data to Argos (n = 29 tags) occurred for as little as 0.3 days and as long as 10.8 days (mean 4.2 ± 3 days), yielding between 2 and 91% (mean 29 ± 25%) of the total data set, and between 2 and 82% (mean 27% ± 22%) of total geolocation data.No significant difference was observed between total data volume and geolocation data volume for tags (paired-samples T test on log-transformed data; t = − 0.32, df = 28, P = 0.8).The volume of geolocation data received was positively related to the amount of time a tag transmitted for (χ 2 = 183.7,P ≤ 0.001; Fig. 1), negatively related to the length of the tag deployment (χ 2 = 18.4,P ≤ 0.001) and negatively related to the number of messages generated (χ 2 = 5.3, P = 0.02).
(3) t = d x /r mean maximum displacement = 2283 ± 560 km) and three were short-distance fish (n = 545 days, mean maximum displacement = 713 ± 34 km).Long distance fish migrated from the United Kingdom as far west as the Central Atlantic (40°W; Fig. 2) and south to the Canary Islands.Seven of these fish entered the Mediterranean Sea (crossed west-east through the Gibraltar Straight at 5.5°W), where they travelled as far east as Turkey (5°E) before returning to the tagging site.Conversely, short distance fish spent the entirety of the tracking period between the Western English Channel and the Bay of Biscay.For both long and short distance fish, most time was spent in the Bay of Biscay.When movements from complete geolocation data sets were compared with partial data sets, reconstructions for short distance migrants were more similar, less uncertain, closer and exhibited less modelled SST difference when compared to longdistance fish (Table 2).For long distance fish only, spatial similarity (Spearman's Rank; ρ = 0.89, P < 0.001; Fig. 2), mean distance (GLMM, z = 2684, P < 0.001; Fig. 3) spatial uncertainty (t = − 94.12, P < 0.001; Fig. 3) and SST differences (t = − 34.2, P < 0.001; Fig. 4) were significantly correlated with proportion of geolocation data transmitted and successfully decoded, reaching the best fit (i.e., most similar with least uncertainty and SST difference) to complete data sets at 40% data volume (the maximum in this study).

Table 1
Deployment statistics for PSATs analysed in this study that that transmitted data.Displacement categories were only calculated for tags that were recovered and subsequently included in data volume comparisons Italic cells denote PSATs that were physically recovered and used for data volume comparisons.For "Deployment length", full deployments met their programmed pop-up dates, whereas premature deployments did not.Premature is abbreviated to "Prem"

What geolocation data volume constitutes a viable track?
Due to the minimal effect of geolocation data volume on spatial metrics for short distance fish, the following analyses refer only to long distance fish.For both spatial uncertainty and distance from corresponding locations, the smallest differences between complete data sets and partial data sets occurred in the 30% and 40% geolocation data groups (Table 2; Fig. 3).Reconstructions with only 5% of data (n = 9 fish) erroneously placed eight fish (89%) at latitudes between 60° and 70° (mean = 64 ± 7°), when complete data sets placed them at latitudes of less than 56° (mean = 45 ± 4°).These considerable distances were less frequent with 10% of data (only 1 of 7 fish) and absent at 20% of data (n = 2 fish).Eight complete data sets showed ABT entering the Mediterranean Sea.For these fish, reconstructions from partial data sets only placed fish in this region after 30% of data was used for reconstructions.For the data set analysed here (n = 29), 30% of geolocation data was received for 20% (n = 4) and 67% (n = 6) of tags for full and premature deployments, respectively.When transmitting data, these tags took between 1.2 and 5.1 days (mean = 3 ± 1 d) to transmit 30% of geolocation data, and this time was positively correlated with the number of messages a tag had to transmit (GLM; t = 2.6, P = 0.03; Fig. 5).Based on values derived from transmitting tags (Table 3) this time would have been reduced to an average of 0.7 ± 0.3 d (range 0.3-1 day) if only geolocation data had been programmed (Fig. 5b).Similarly, if only geolocation data had been programmed for all tags in this study, 93% (n = 27) of tags could have transmitted 30% of their geolocation data as opposed to 20% (n = 10).

Model speed and track reconstructions
Movements reconstructed from complete data sets suggested that eight fish entered the Mediterranean Sea (presumably to spawn; Fig. 6).This only occurred when using a movement speed of either 1.5 m s −1 (only 1 of 8 fish had locations occurring within the Mediterranean Sea at some point), 2 m s −1 (7 of 8 fish had locations occurring within the Mediterranean Sea at some point), or 2.5 m s −1 (8 of 8 fish had locations occurring within the Mediterranean Sea at some point during the track).

Discussion
The extensive use of PSATs in aquatic biologging studies warrants careful consideration of movements reconstructed from PSAT tag data.Here, using ABT as an example study species and Wildlife Computers PSATs and software, we show that variation in the data volumes recovered had a significant effect on the estimated movements of ABT.Importantly, for long-distance fish we highlight that geolocation data volumes of 30% or more of the total result in plausible movement reconstructions, but at data volumes of less than 30%, reconstructions can differ by thousands of kilometres from where the fish are most likely to be.We also highlight that the spatial habits of tracked animals can affect modelled estimates, with greater potential differences for wider ranging animals.Whilst our analysis focussed on one PSAT and one geolocation model, variation in data volume used in analytical geolocation is a generic problem for satellitetransmitting tags [23,43].From an end-user's perspective, here we discuss our findings and offer some steps to decrease error in movement reconstructions and reduce the risk of researchers accepting unlikely SSM outputs.

The challenge of incorporating spatial variability into research-the example of Atlantic bluefin tuna
Here we document the results of a study on Atlantic bluefin tuna, a long-distance migrant, to illustrate the effect of data quantity on movement reconstructions.The species is ideally suited to studies with PSATs as it migrates long distances [75] and spends most of its time in the photic zone throughout its range [17,55,56].ABT are a commercially exploited species of conservation concern and have been described as "the archetype of overfishing and general mismanagement" [76].
The management of the Atlantic-wide fishery is problematic in-part due to the complex spatial population structure [77][78][79].Efforts to address shortcomings in Trend line shows linear model fitted to data with P = 0.05 confidence interval.Points coloured red denote tags that were physically recovered and used in the SSM data volume comparison Fig. 2 Examples of differences in GPE3 track reconstructions between partial data sets (from transmitted data, yellow) and complete data sets (physically downloaded from tags, blue) for two tags (one per row), one that completed a full deployment and one that released prematurely.Maps show the 99% likelihood surfaces and reconstructed locations for the same tag.Columns show the resultant track if only 5%, 10%, 20%, 30% or 40% of the data are received in comparison with the complete data set.Scatterplots show spatial similarity (τ) between partial and complete data sets for all sampled fish for short range (top row), and long-range (bottom row) groups.Corresponding spatial similarity values are also stated on each map.For long-range tags that entered the Mediterranean Sea, the orange vertical line represents the minimum data volume at which ABT (all individuals pooled) were first placed in the region."EC"-"the Channel" and "BoB"-Bay of Biscay the stock assessment process are incorporating movement data from PSATs into the more traditional sizeand age-structured assessment approach [80].Here, we demonstrate that low data volume and/or movement speed below 2.5 m/s can result in reconstructed movements erroneously locating fish outside of known spawning grounds and in different stock management zones.If data volume is not checked or controlled for when tag data are used in assessments, these data could bias management and have knock-on effects for conservation.

Assessing geolocation accuracy for PSATs
To assess the accuracy of geolocation models, researchers must obtain a true account of an animal's movements for comparison (see [52] for a review of methods).For some fish species that frequent the surface, such as mako sharks (Isurus oxyrinchus [38]), basking sharks (Cetorhinus maximus [81,82]) or reef manta rays (Manta alfredi [51]), this could be from double tagging with real-time tracking tags (e.g., Smart Position Only Tags, SPOTs; [50]), which have errors as small as 100 s of meters [24].However, real-time tracking tags are not suited to species that only spend very brief periods at the surface, such as ABT.Where this method is unsuitable, data derived geolocation is best, but it is important to recognise that a track of a tagged fish reconstructed from data is the best approximation, and not an absolute truth.The accuracy or usefulness of that approximation will vary depending on the geolocation model, as well as the quality of the data that are provided to it.
Here, we use the most complete data set (i.e., data for every single day of a track, without gaps) as the 'optimal geolocated track' (i.e., the best approximation), and recognise that the only known locations (subject to errors of up to a few kms; [24]) are the release and pop-up locations.Because ABT routinely occupy the photic zone at twilights, PSATs can collect high quality light data at dawns and dusks making them a good candidate species for this approach (i.e., data quality is controlled for).In contrast, species that spend more time below the photic zone during twilights, such as porbeagle sharks (Lamna nasus [83]) and broadbill swordfish (Xiphias gladius; [84]) may be less suited to this approach as issues with data volume could be compounded by issues driven by quality of light data and the decreased ability to detect twilights as animals reside in the dark.For these species, geolocation models, such as "HMMoce" [40] that leverage water-column oceanography, as well as light data [81,84,85] can improve confidence in geolocation accuracy when light data is poor.This approach is currently unavailable for the GPE3, however.For the GPE3, accuracy in modelled geolocation estimates for animals, where real-time tracking tags are unsuitable could be improved by double-tagging with alternative tag types to provide known locations during a PSAT deployment.For example, acoustic tags [86] or a secondary PSAT such as a mark-report PSAT (mrPAT; [52]) could provide additional (to deployment and pop-up locations) known locations to "tie in" with the analytical geolocation.The opportunity to increase tag burden by double tagging would be dependent on the study species and any other welfare considerations.Irrespective of methodological

Data volume and model performance
Remotely obtaining data from marine study animals is an engineering challenge [42], and it is inevitable that technical issues occur that reduce data transmission [45].In our study, a battery issue reduced data transmission, but in other cases different issues can limit the number of messages that the end-user receives (Table 4).The challenge of reduced data volume is, therefore, ubiquitous in tracking studies using PSATs, irrespective of the underlying cause.SSMs used for analytical geolocation will perform better when provided with larger volumes of high-quality data [47].When data volumes fell below 30% (most drastic at 5%) model estimated locations showed long-distance ABT could be displaced thousands of kilometres from the most likely path based on the full data set.At decreased data volumes, the interval between days with geolocation data is increased.With every time increment without geolocation data, the extent of the area within GPE3 in which the fish could be placed grows, and with it so should the potential error and uncertainty.This issue is common to other SSMs with a Brownian (or similar) movement model.Our results show this very clearly, even at relatively high data volumes, with the uncertainty for a given location growing with proximity to the nearest data observation, and the greatest distances to complete locations occurring at the smallest data volumes investigated.Indeed, for animals moving long distances, at data volumes of less than 30% the spatial extent of the model becomes too large (due to large data gaps) to constrain modelled locations to a plausible area (e.g., ABT were placed north of 60°N when, in fact, high data reconstructions suggested they were at 40°N, on average).This may have been amplified in our case study, because the movement speed of ABT is high based on their size and documented swimming performance.A further challenge in data-limited circumstances, is the effect of poor light readings due to turbidity or diving during twilights leading to spurious latitude estimates, which we do not investigate here.Nonetheless, when more data are available, spurious measurements Fig. 4 Relationship between SST recorded on the tag, and the corresponding oceanographically modelled SST at locations estimated from differing volumes of data for long-distance fish.Black line in all plots denotes a perfect match between tag-recorded and modelled SST, whereas red and blue lines denote that the modelled SST was 5° warmer and 5° cooler than tag derived SST, respectively would have much less of an impact, which is another reason for questioning model outputs with limited data.We would recommend extra care is taken when interpreting movement reconstructions based on less than 30% of total geolocation data and suggest that study-and species-specific sensitivity tests (for both data quantity and quality) are undertaken to seek to understand how the chosen SSM performs at low data volumes.
For periods where data are absent, locations are estimated by the movement model.In these cases, new locations are (i) influenced most heavily by the two nearest (in time) observations, and (ii) generated to minimise animal movement between the most recent and nearest distant locations [60].This results in the most recent and nearest distant observations being connected by new locations in either a straight or slightly curved line.If, during this time, the animal exhibits straight, non-tortuous movements (e.g., high-speed directional migration), differences are likely to be minimal.However, the further the animal deviates from the most probabilistic path when conducting tortuous movements either in three-dimensional space (e.g., foraging and/or diving; [87] or time the larger the potential difference.This phenomenon is almost impossible to account for in fragmented data sets, highlighting the need to treat large temporal data gaps with caution.In reality for most migratory studies, light-based geolocation is perhaps better suited towards ABT that move longer distances (i.e., daily displacements larger than mean error and uncertainty), but accounting for these intra-population differences in spatial habits in study design remains challenging.

The reality of PSATs transmitting at least 30% of their data
In general, the longer a tag is deployed for, the longer it will take the tag to transmit 30% (or another fixed proportion) of its geolocation data.The generation of PSATs used in this study were identified post-deployment as having a fault in the battery component, which led to shorter transmission durations.Due to this, only four of the 20 PSATs attached to long-distance migrating fish were able to transmit 30% data of their data (the threshold identified in this case).In 2011, Musyl et al. [43] Fig. 6 Effect of user-defined movement speeds on GPE3 track reconstructions.a-e Maps showing movements for a single, long-distance tag reconstructed using user-defined movement speeds between 1 and 3 m s −1 .f Abacus plot of Mediterranean Sea occupation and movement speed for eight physically recovered tags that were placed within the Mediterranean Sea at the base run of 2.5 m s −1 .Vertical dotted line denotes the speed at which all tags were placed in the Mediterranean Sea.For 0.5 m s −1 , five model runs failed and these are shown as crosses reviewed the performance of 731 PSATs deployed on 19 species and indicate that the quantity of geolocation data varied with species, habitat class and manufacturer, and was positively correlated with deployment duration.Specifically, for Wildlife Computers MiniPATs deployed on ABT, Musyl et al. [43] report a mean data quantity of 22.7% per tag (mean deployment length of 14 PSATs was 66 ± 22 days, with tags transmitting a mean of 15 ± 4 days of geolocation data), lower than the threshold we suggest likely to lead to robust estimations of location for ABT in this study.However, the reliability of the technology has increased considerably since Musyl et al. [43] made to take proportionally longer for geolocation data to be transmitted (Fig. 5) Longer deployments are less likely to transmit all geolocation data and are associated with larger uncertainty and error (e.g., Fig. 3) Higher ratios of geolocation data to "other" data (TAD, PDT, MixLayer etc.) should result in a higher likelihood of tag geolocation messages being recovered in a shorter timeframe.This is unless depth data are also used for geolocation (e.g., HMMoce)

Transmission Pop-up location
There is greater satellite coverage at the poles, so transmitting tags will be "seen" more frequently if they pop-up in more poleward locations A higher proportion of transmissions will be received in more poleward locations

Tag transmission
The time a tag transmits for can vary but, in general, longer transmissions = more data transmitted (Fig. 1).For a tag that has collected 2000 messages, 100% data recovery may take upwards of 30 transmission days, which is unlikely

Positive relationship with data recovery
Tag damage If a tag is damaged, this will likely negatively affect data transmission [43,44] Non-damaged tags transmit more data

Tag biofouling
Biofouling is a well-known and studied issue for slower moving species and species that inhabit warmer waters [43].Application of antifoul could help ensure biofouling does not negatively impact tag flotation and orientation during transmission phase (e.g., excessive listing due to globules of antifoul or biofouling on the antenna) Non-biofouled tags transmit more data their assessment, and more recent studies indicate that data transmission quantities for newer MiniPATs are well above the 30% threshold (e.g., 65% ± 29% for 15 reporting PSATs in [56]).Indeed, Wildlife Computers indicate that MiniPATs should transmit for between 12 and 16 days, which would give a good chance of the 30% threshold being met, assuming no other issues affect transmissions and researchers are realistic in programming regimes for planned deployment lengths (i.e., ensure that the message load for longer deployments remains within reasonable limits as per the goal of the study, Table 4).An emerging issue for geolocation studies as PSATs become increasingly sophisticated and capable of providing additional data products (e.g., Activity Time Series, [88]) is the risk of pushing the capabilities of the tags too far, through inclusion of non-geolocation data products or by extending the deployment duration excessively, thereby increasing the time taken to transmit a specific amount of geolocation data.Due to the similarities in programming regime of the tags used here, this is something that was beyond the scope of this study, but something that we demonstrate hypothetically using parameters derived from transmitting tags.Researchers should be aware of this potential pitfall when programming PSATs.In addition to tag hardware performance, transmissions are more regular towards the poles, and both sea state and biofouling affect the number of transmissions tags send.Future research could seek to further investigate how the rate of data recovery varies spatio-temporally from real data.A repository of this information available to researchers could be of use to all PSAT models using the Argos network.

Maximising the use of PSATs transmitting variable data
From lessons learnt there are several steps that could be implemented to aid non-expert interpretation, maximise data sets generated from PSATs, and reduce the risk of generating unlikely movement reconstructions (Table 5).The extent to which users are able to scrutinise tag outputs, or to troubleshoot derived products, will vary.Through the course of our investigations into how data volume varies and influences data interpretation, we have identified the following data products or practices that are or would be helpful, and which would improve the level of data scrutiny and validation.Some of these may be relatively simple for manufacturers or model developers to include within software, or it may be more appropriate for the end-user to generate them.(i) Including uncertainty estimates with model outputs: ensuring semi-minor and major axes of the uncertainty ellipses to the basic model output if not already (for the GPE3 this is available in an auxiliary file), (ii) Data volume index: include a notification on data volume to indicate the foundation on which a spatial reconstruction is built.The percentage volume that will impact spatial reconstructions in other studies will vary, but increasing awareness of the need to consider this aspect will improve the reporting of results.(iii) Gap warning: whereby large data gaps (leading to inflated uncertainty, which is unavoidable) are identified.The size of the gap of concern would vary with the spatial habits of the study animal, but this would aid researchers in understanding this.The latter point is already implemented as part of the SSM detailed by [2,50], where these periods are identified and used to constrain the temporal bounds of the model, thus reducing uncertainty.(iv) Movement speed.This is widely understood as a potentially limiting factor for movement geolocation models [52,89,90], further highlighted for the GPE3 and ABT in our results.If the user-defined movement speed is too low, the model bounding box can become restrictive (less of an issue for fish moving short distances), resulting in derived locations erroneously placed away from far-away regions that may have higher likelihoods (and where the fish may actually have been).However, increasing the permitted movement speed in track reconstructions is a trade-off between increasing the available area for the SSM to place the animal, and the concurrent increase in uncertainty [89].We show that it can influence the ability of SSMs to geolocate animals in enclosed bodies of water, such as the Mediterranean Sea basin.[89] and [90] adopt a similar approach to derive the optimum movement speed for dolphinfish and silky sharks.However, other methods are available, including arbitrary selection (e.g.[91]) or based on published rates of movement of the study animal (e.g., 1.2 m s −1 for reef manta rays, [92]) or using algorithms as part of the model [93].Our case study shows (in addition to [89,90]) that selecting movement speed through an iterative approach, considering a range of speeds and comparing the resultant estimated movements, can help to inform and improve interpretation.In the case of the GPE3, an improvement could be to have the model estimate diffusion through such an iterative approach to select the most suitable movement speed with the best fit to the data, although we recognise this comes at an increased processing cost.Ultimately the selection of this parameter needs to be given due consideration in studies with the method and results reported along with the overall results.(v) Temperature matching threshold.Location certainty can be increased using tag derived measurements of near surface, or surface, temperature [38,39], vertical temperature profiles [40], and bathymetry with high-resolution spatio-temporal ocean models, such as the HYbrid Coordinate Ocean Model (HYCOM, [94]) and the ETOPO1-Bedrock bathymetric model [67].Whilst researchers should assume SSMs Table 5 Recommendations for potential refinements to the programming, modelling and outputs phases to reduce risk of overinterpretation for researchers working on surfaceorientated pelagic fish using MiniPATs and the GPE3 For completeness, we have included steps here that are not the subject of this study, but that are broadly relevant and useful

Benefit to researcher
Programming Message transmission summary Once programmed, tag software could provide the researcher with estimated scenarios for data recovery based on latitude and longitude of estimated pop-up location.This is covered along with a method in Patterson and Hartmann [23].Work to understand the effect of temperature on transmissions could also benefit this It would allow researchers to make realistic assumptions on data recovery Transmission schedule An option for PSATs to assign priority to geolocation data messages could be included at the programming stage.For instance, two geolocation data messages could be transmitted for every auxiliary message, enhancing the likelihood of enough geolocation data being recovered to generate plausible tracks More geolocation data whilst also having some data on other variables (e.g., temperature and depth) Modelling Speed parameter Recommendation for researchers to conduct a sensitivity analysis on the impact of the speed parameter on movement reconstructions Setting speeds too low can result in erroneous track reconstructions and too high can cause overfitting Temperature matching threshold For observation-driven locations do not allow progression of SSM if tagderived SST at a location more than a specified threshold is ± remotely sensed SST For occasions, where SST is likely to be more important for geolocating (e.g., around the equinoxes), this would prevent extreme outliers

Output
Including uncertainty estimates with model outputs Semi-major and semi-minor axis of the 99th likelihood should be included as an output of the model alongside most likely locations Researchers would be able to assess error quickly and without the need for opening separate files, which can be computationally costly.The files would still be there if researchers decided this warranted a closer look Data volume index A notification or statement outlining the volume of geolocation data used to generate the track.Ultimately different species/PSATs/SSMs will require differing volumes of data but this would increase awareness of the importance of data volume Reduce the risk of including analysis of erroneous and uncertain data, and ultimately making Type I or Type II errors when hypothesis testing

Gap warning
If there are gaps longer than 5% of the data set length (i.e., 15 days for a 300 day data set), warn that these gaps could inflate error Reduce the risk of including analysis of erroneous and uncertain data, and ultimately making Type I or Type II errors when hypothesis testing are not perfect [95], as the resolution and accuracy of oceanographic models increases, SSMs for fish could give a higher importance to matching tag sensed temperatures and depths to modelled temperatures and depth for at a specific location at a specific time.In all cases this has been shown to improve model fits [37,38,40], yet we demonstrate here that observed and modelled outputs differed by more than 9°C in some cases.If these cases resulted in errors when thresholds are reached (e.g., if tag-derived SST was different to remotely sensed SST by more than 5°C), this could reduce location uncertainty.A caveat to this would obviously be the accuracy of the modelled values themselves, which varies between offshore and near-shore environments.Finally, in addition to the classic methods of light-based geolocation for wide ranging fish species, there have been recent efforts to include both vertical water profiles [40] and geomagnetic fields in geolocation [96], which will further refine results with the addition of novel data sources.

Conclusions and future works
SSMs for reconstructing animal movements from PSAT data, whilst complex, have become a mainstay for scientists tracking fish and are now readily applied without prior knowledge of underlying processes and principles.
Here, the common challenge of PSATs transmitting variable amounts of data (in this case due to a battery due issue) prompted us to question how data volume affects SSM performance, with the objective of identifying issues with geolocation as a result of data recovery.Our case study on ABT highlights that movements reconstructed using the GPE3 from ≥ 30% of geolocation data result in locations like movements reconstructed with 100% of data and that reconstructions at 5% or 10% of data result in large potential errors.However, this result is specific to animals that move long distances, with much smaller differences observed for short-ranging conspecifics.The challenge of using partial data sets is ubiquitous when using data from transmitting tags for geolocation and requires consideration from researchers irrespective of tag types and geolocation models used.Our results highlight the importance both of understanding sensitivities of chosen SSMs to inevitable variation in input data volume and in choosing inputs that result in the best fit of the models to data.The recommendations for modellers and researchers we make based on our findings have parallels with other PSATs and SSMs and provide valuable insights into steps researchers can take to refine PSAT experiments.Future works should aim to build on this principal by studying other species, including other PSAT models and using additional SSMs and analytical methods (e.g., in randomisation experiments), with a view of increasing overall data return and the reliability of results from these invaluable tools.

Fig. 1
Fig. 1 Relationship between geolocation data recovered via ARGOS and time transmitting for all transmitting MiniPATs in this study.Trend line shows linear model fitted to data with P = 0.05 confidence interval.Points coloured red denote tags that were physically recovered and used in the SSM data volume comparison

Fig. 3
Fig. 3 Distance from complete data location and uncertainty associated with differing volumes of geolocation data received for tags detailing long distance movements.a Solid lines show weekly mean distances plotted with a 7-day running average over days since deployment.Raw values are plotted as points.b, c boxplots showing the relationship between data volume groups and distance from corresponding complete data location (total data set n = 4681) and 12-hourly uncertainty (total data set n = 6468).Widths of boxplots are proportional to the number of observations in each group.The approximate spawning season for ABT is highlighted to indicate differences occurring during Mediterranean Sea spawning period.Colour scheme in "a" corresponds to data categories shown in "b" and "c"

Fig. 5
Fig. 5 Time taken for PATs to transmit the 30% of geolocation data required for plausible track reconstructions and its relationship with deployment length and programming.a Line represents a fitted generalized linear model using PATs that transmitted 30% of geolocation data (n = 10) including a 95% confidence interval (dotted lines).Trend line is extended to axes to represent the range of common programming regimes.b Contour plot demonstrating the trade-offs between programming regime and (geolocation data proportion) and deployment length in context of time taken to transmit 30% of geolocation data.Points from a) are overplotted

Table 2
Comparative statistics for comparisons between partial and complete GPE3 data sets for differing data volumes and movement characteristicsValues provided are grand means calculated first at the data set level and second at the data volume level.Data in the 5% group are highlighted as this is the group, where the largest errors occurred.Test statistics are provided for Kruskal Wallis (Spatial Similarity) and generalised linear mixed models (other variables).For distance to complete data location, values are summarised both as mean great circle distance and root mean squared distance (RMSE) in parentheses Bold values identify grand means for individual metrics

Table 4
Considerations for researchers using Wildlife Computers MiniPAT deployments to maximise data recoveryFor completeness, we have included steps here that are not the subject of this study, but that will influence data recovery from PSATs.Relevant citations are included where necessary