Vitality models found useful in modeling tag-failure times in acoustic-tag survival studies

Acoustic telemetry studies often rely on the assumption that premature tag failure does not affect the validity of inferences. However, in some cases this assumption is possibly or likely invalid and it is necessary to apply a correction to estimation procedures. The question of which approaches and specific models are best suited to modeling acoustic tag failures has received little research attention. In this short communication, we present a meta-analysis of 42 acoustic tag-life studies, originally used to correct survival studies involving outmigrating juvenile salmonids in the Columbia/Snake river basin. We compare the performance of nine alternative parametric models including common failure–time/survival models and the vitality models of Li and Anderson Theor Popul Biol 76:118–131, (2009) and Demogr Res 28:341–372, (2013). The tag-life studies used acoustic tags from three different tag manufacturers, had expected lifetimes between 12 and 61 days, and had dry weights ranging from 0.22 to 1.65 g. In 57% of the cases, the vitality models of Li and Anderson Theor Popul Biol 76:118–131, (2009) and Demogr Res 28:341–372, (2013) fit the tag-failure times best. The vitality models were also the second-best choices in 17% of the cases. Together, the vitality models, log-logistic, (19%), and gamma models (14%) accounted for 90% of the models selected. Unlike more traditional failure–time models (e.g., Weibull, Gompertz, gamma, and log-logistic), the vitality models are capable of characterizing both the early onset of tag failure due to manufacturing errors and the anticipated battery life. We provide further guidance on appropriate sample sizes (50–100 tags) and procedures to be considered when applying precise tag-life corrections in release–recapture survival studies.


Background
Acoustic telemetry is a powerful tool for studying fish movement and survival [9,16]. While many studies reasonably assume that tags do not fail during the study period [24,26], there are other studies with design limitations related to the size of the organism, duration of the study, and detection capability that make a degree of tag failure within a study unavoidable [2,33]. Under these circumstances, it becomes necessary to correct for tag life expectancies in order to make reliable inferences [8,35].
Correcting for premature acoustic tag failures is particularly critical in estimating the survival of outmigrating juvenile salmonids at dams in major rivers [12,30,31]. Often in these studies, investigators apply single [7] or paired release-recapture [5] models to estimate perceived survival, the joint probability of the fish and tag being alive from one detection point to another over time. These perceived estimates of survival are negatively biased in the presence of post-release tag failure [1], unless information on tag life or failure times is available for correction.
The degree of severity of the bias from post-release tag failure is dictated by the amount of temporal overlap between the distribution of detection times at the interrogation sites and the tag-life distribution. However, even minor tag failure may be consequential when survival estimates are required to meet specific standards. For example, over the last decade, federal hydroproject operators had to comply with survival threshold and minimum precision criteria (e.g., survival ≥ 0.96 average dam passage survival for juvenile spring Chinook salmon and steelhead with standard error ≤ 0.015) [32]. Even a small degree of bias can be consequential in regulatory studies. It is advisable to conduct concurrent tag-life studies, in which a sample of tags is activated alongside active tags used in the survival study because tag-failure rates are known to vary with manufacturing lot and ambient water temperature [3]. The tags selected for the tag-life studies need to be representative of the tags used in survival studies. If distinct tag lots are to be used, it may be prudent to have tag-lot specific tag-life studies. These sampled tags are monitored by a hydrophone to measure the time until failure, and a model is then fit that characterizes the failure time curve, which in turn is used to correct survival estimates [35].
Some studies have modeled tag failure using nonparametric approaches [8,14], while Townsend et al. [35] recommended a parametric approach to modeling the failure-time data, because if a parametric model is found that fits the empirical data, the precision of the tag-life corrected survival estimates is improved. There is a suite of traditional failure-time distributions to select from when performing tag-life corrections including gamma, Gompertz [11], log-logistic, log-normal [10], and Weibull [36]. Alternative models vary in flexibility and how well they fit failure-time data based on the number of parameters and the assumption of how risk of failure changes through time.
A seemingly unlikely source for further model consideration comes from the study of population demographics and animal survival. Li and Anderson [21,22] modeled death times as a survival process that depends on two components, a vitality-dependent process intrinsic to the individual and a vitality-independent process associated with accidental death. These two processes are analogous to the propensity of battery failure and accidental failure in modeling tag life. Some of these accidental failures have been traced to water intrusion, electric leakage, and manufacturing errors. Because tag lots often have a mixture of these two sources of failure, the 4-parameter versions of these models ("Vitality (2009)" and "Vitality (2013)" hereafter) have the potential to better model tag-failure times where simpler models cannot capture the complexity of the survival process.
Here, in this short communication, we evaluated the fit of nine failure-time models to 42 different acoustictag life studies all conducted using the same protocol between 2002 and 2018. Our purpose was to thoroughly examine the relative performance of these models so as to provide guidance to investigators on the best candidate models and strategies for incorporating tag-life corrections into release-recapture survival studies of fish.

Methods
We first describe the nine models, then our procedure for evaluating goodness-of-fit (GOF) and ranking the performance of models for each study. We have limited our model descriptions to their general characteristics and relationships. Additional details on the conventional failure-time/survival models that we evaluated may be found in Lee and Wang [20]. The structure and motivation of the two 4-parameter Vitality models are described in Li and Anderson [21,22].

Tag-failure models
The survival function begins with a value of 1 (i.e., 100%) at time t = 0 and declines as a function of time. A survival function S(t) can be formed from any positive continuous probability distribution via its cumulative distribution function where F(t) is the cumulative distribution function, where and where f(t) is the density function. For reference, the hazard function is defined as and is also known as the instantaneous failure rate and characterizes the risk of failure over time [20]. The shape of the hazard function is often useful in selecting a failure-time model to a specific failure-time process (Table 1).
Perhaps the simplest parametric failure-time model is the exponential model, with survival function where the hazard rate is constant and defined by λ. Acoustic tag-failure rate is not uniform over time; thus, the exponential model is a generally a poor choice for this application. We excluded the exponential model from our analysis for this reason. Nonetheless, the exponential model is an appropriate starting point as it forms the basis of more complex failure-time models. The exponential distribution is a special case of the 2-parameter Weibull distribution, with survival function which in turn is a special case of the 3-parameter Weibull distribution with survival function [10,36]: with shape (β), scale (λ), and shift (γ) parameters. The shift parameter describes the endpoint of an initial "failure free" portion of the curve.
Other common survival models that we considered were the 2-parameter gamma with γ shape and scale parameters and the 3-parameter generalized gamma [18,34] which includes α , an intercept parameter. The hazard function of the 2-parameter gamma decreases or increases to 1, whereas the generalized gamma approaches the value of α . The exponential, gamma, and Weibull distributions are special cases of the generalized gamma distribution.
The fifth distribution that we evaluated was the 2-parameter Gompertz distribution [11], which is an extension of the exponential model that assumes the hazard rate increases exponentially with time or age. The survival function for the Gompertz model is where parameters and γ describe the intercept and slope of a log-linear regression equation for the hazard rate, respectively. We considered the 2-parameter log-normal survival model that has a dome-shaped hazard function with σ shape and μ scale parameters. The 2-parameter log-logistic has a similar shaped hazard function to the log-normal, but allows for steeper declines from the apex with γ shape and scale parameters.
The final two survival functions we examined were the 4-parameter vitality models. The Vitality (2009) model assumes a normal distribution of initial vitality across a batch of tags and a stochastic decline toward zero vitality. The survival function of the Vitality (2009) is defined as where Φ = cumulative normal distribution, r = wear rate, s = standard deviation in wear rate, k = rate of accidental failure, u = standard deviation in accidental failure.
The Vitality (2013) model has a slightly different parameterization that assumes the same stochastic decline in tag vitality but combined with a Poisson where r = rate of vitality loss (intrinsic), s = spread of initial and evolving vitalities (intrinsic), = frequency of challenges during life (extrinsic), β = magnitude of challenges (extrinsic). We hypothesized that one or both vitality models would tend to fit acoustic-tag failure times well, as they allow for early onset of random tag failure due to accidental failure as well as systemic battery failure later on. The accidental failure component, in addition to battery discharge also a stochastic process, gives the vitality models additional flexibility to fit data not found in other models.

Tag-life studies
The 42 different tag-life studies were all performed with the same study procedures. Tags were systematically sampled from the tag lots used in salmon smolt survival studies conducted within the Columbia River Basin, 2002-2018. Within each test, tags were activated and monitored with hydrophones continuously until complete failure of all tags. The tags were submerged in ambient water the same temperature as the tagged fish encountered during the survival studies. Failure times were recorded to the minute. The failure time analyses used the time-to-failure measured in days and fractions of days.
The various acoustic tags analyzed were manufactured by Advanced Telemetry Systems, Hydroacoustic Technology Incorporated, and Lotek, with 16, 25, and 5 separate tag-life evaluations, respectively. Mean tag lives ranged from 12 to 61 days and sample sizes ranged from 38 to 125 tags per study. Tag sizes ranged from 0.22 to 1.65 g dry weight. Tags were set to emit acoustic pulses between 20 and 60 times per minute, depending on the specific needs of the study.

Model fitting and comparison
The failure time data from the different tag-life studies conducted between 2002 and 2018 were fit to the nine alternative failure-time models within the R programming language and free software environment (https :// www.r-proje ct.org). For the more conventional survival analysis models (1-7), we used model-fitting routines in the "FAdist" and "flexsurv" R packages [4,15]. We fitted the two vitality models using routines available in the "Vitality" R package [25].
Because of the diversity of models that we examined and the fact that many of the distributions involved were non-nested, we had to devise new metrics for assessing GOF and ranking model performance. The 2-and 3-parameter Weibull models and gamma and generalized gamma models are nested and as such can be compared using likelihood ratio tests [19]. However, the Gompertz, log-normal, log-logistic, and vitality models are not nested among themselves or the others. Also, in this situation, Akaike information criterion [6] cannot be used because the approach requires the alternative models share the same distribution.
Instead, we compared the various model fits to the empirical survival function using the nonparametric Kaplan and Meier [17] product-limit method. The Kaplan-Meier (K-M) method estimates the survival function as where n = sample size, i = number of failures before time t.
Relative GOF of the alternative parametric models was measured by the average squared deviation between the empirical K-M and the fitted model values (Fig. 1)  The number of parameters (p) serves as a penalty function for the number of estimated model parameters. The GOF was modeled after the mean square error for regression. The tag-failure model with the smallest GOF value was selected as the most appropriated.
We also performed lack-of-fit tests based on the K-M nonparametric curve (10). The test statistic for the Kolmogorov-Smirnov (K-S) test is the absolute value of the largest discrepancy between Ŝ (t i ) and S(t i ) anywhere along the fitted curve, i.e., Whereas the traditional K-S test assumes the theoretical distribution being tested and its parameters are a priori specified, in our case, they were estimated from the data. Therefore, we used the approach of Lilliefors [23], where the test distribution under the null hypothesis was simulated from the fitted model via parametric bootstrap. For each replicate test performed, a random sample size n was drawn from the fitted parameter survival function and the value D calculated. This simulation process was replicated 50,000 times to create a distribution (D sim ) under the null hypothesis to which the actual observed statistic (D 0 ) was compared. This number of simulations was selected to guarantee a precision of ± 0.004 in the estimated P-values ( z 2 0.975 · 0.5 · 0.5/50000 ). Estimated P-values for the Lilliefors tests are reported in Additional file 1, based on a α = 0.05 rejection criterion. Whereas GOF provided a measure of relative goodness-of-fit to compare alternative models, the K-S test assessed whether there was a significant lack-of-fit of the selected model (i.e., H o : model fits vs. H a : model does not fit). By construct, the GOF and the D 0 of the K-S test are positively correlated.

Results
Two types of tag failure were observed in our meta-analyses. The first is premature tag failure occurring within hours or just a few days after tag initiation. This tag failure is presumably the result of manufacturing error or mechanical failure of the tag per se. Of the 42 data sets we examined, at least 26 had obvious signs of premature mechanical failure. The second failure type was the anticipated battery failure at the end of the tag life. This battery failure produces the cascade of failure times seen in the right tail of the failure-time curves (Fig. 2). Although our set of 42 tag-life studies was ill constructed for the direct purposes of determining factors affecting tag-life, a few patterns were apparent. Manufacturing quality improved over time as indicated by fewer and less-frequent occurrences of premature tag failure, tag size (i.e., weight) decreased, and the tag-life to tag-weight ratio increased.
In 24 of the 42 cases (57%), a vitality model (2009 or 2013) was selected as the best fit among the nine alternative parametric failure-time models evaluated. The log-logistic model was the second most common (19%) choice, followed by the gamma or generalized gamma (17%), Gompertz (5%), 3-parameter Weibull (2%), and log-normal (2%). In numerous cases there were little differences in GOF between first, second, or even third choices of survival models. The two versions of the vitality  (11) model (2009 and 2013) were found to be top-ranking with equal frequency (12 data sets each), suggesting that no one version was clearly superior from the standpoint of model fit. The two vitality models were ranked second best in an additional 17% of the cases. Both versions of the gamma model also performed equally well (3 data sets each). We encourage readers to examine the supplemental data, model fits, and the impact of premature tag failure on the tag-life curves.
The vitality models often outperformed other candidates because they could account for both early failures defining the shoulder of the function and the later precipitous decline due to battery failure. The log-logistic model fit these initial failures better than the remaining candidate models, although their survival functions were almost always positioned above those of the vitality models in the shoulder of the curve.
In all cases, the top-ranking survival model according to GOF was not rejected by the K-S test of lack-of-fit (P < 0.05). However, we found the K-S test to be insensitive to lack-of-fit. The K-S test rejected a fitted model only 60 times out of 378 (15.9%; 42 data sets × 9 candidate models) despite visually obvious cases of lack-of-fit. Therefore, non-rejection of a K-S test should not be the sole criterion for model selection. Nevertheless, we found a strong inverse relationship between the natural log of GOF value and the P value of the K-S tests (r = − 0.79, P < 0.001). With P-values ranging from 0 to 1, P-values near 1 indicated smaller discrepancy between observed and fitted values of the failure-time data. Using the K-S maximum P-value as a criterion for model selection, the vitality models were again selected in 57% of the cases studied, followed again by the log-logistic model at 19%.
In addition to being most frequently top-ranking, the vitality models also demonstrated considerable flexibility in the shape of the survival curves. We found that many of the tag-life datasets could be categorized as having a particular shape to which one of the conventional failure model was best suited. For example, gamma models tended to be top-ranking for data sets with survival curves resembling a half-normal distribution. Although vitality models were not always top-ranked for these cases, they consistently provided a fit that was competitive with the other top-ranked models because they could emulate the shape of their survival functions (Fig. 2).

Discussion
Tagging studies with the objectives of describing fish movement and life history often do not include taglife studies as part of the investigation. Such studies are designed based on the anticipated life expectancy of the tags and the temporal requirements of the investigation [2]. On the other hand, fish survival studies based on regulatory requirements with mandated survival thresholds will generally need to include formal tag-life studies [32]. Without the ancillary tag-life information, perceived survival estimates calculated by classic release-recapture models will be negatively biased by the presence of tag failure [13,35]. The size of the potential bias increases as the expected tag-life decreases and the expected travel time to detection sites increases. At the point where the travel times begin exceeding maximum tag life, bias correction becomes incomplete and the negative bias of the survival estimates increases. When actual fish survival is close to the regulatory thresholds, even small bias corrections can be consequential. For example, the compliance threshold for yearling Chinook salmon (Oncorhynchus tshawytscha) and steelhead smolt survival through a hydroelectric project (i.e., reservoir plus dam) in the mid-Columbia River is typically ≥ 93%, with an estimated standard error of ≤ 0.025 [28,29,32]. At federally operated hydroprojects in the lower Snake River and mainstem Columbia River, dam passage survival has a threshold of 0.96 for yearling Chinook salmon and steelhead smolts or 0.93 for subyearling Chinook salmon with a precision requirement of SE ≤ 0.015. Here even small tag-life corrections of less than a percentage point can be important.
Rarely if ever do acoustic-tag manufacturers provide the results of a tag-life study as part of a tag-lot purchase. At best, manufacturers may provide a life expectancy for their products. But the meaning of say a 30-day tag is at best unclear. The average tag investigator may wrongly interpret a 30-day tag as guaranteeing all tags will have a minimum tag-life of 30 days. Instead, a 30-day tag-life expectancy actually guarantees some tags will indeed fail before 30 days. For example, the gamma-fitted tag-failure time data of Fig. 2 had a tag-life expectancy of 15.4 days, with minimum and maximum failure times of 8.5 and 18.0, respectively. In that data set, 44% of the tags failed before the expected tag life of 15.4 days. For the loglogistic fitted tag-failure time data of Fig. 2, 51% of the tags failed before the expected tag-life of 15.3 days. Consequently, for investigators designing their studies based on tag-life expectancy, corrections for tag failure may be essential. To avoid possible effects of tag failure and the need to provide tag-life corrections to survival studies, investigators would need to use tag lots with life expectations several times longer than expected maximum travel times. Among our 42 data sets, 62% had tag-failure times greater than 3 standard deviations to the left of the mean, 93% had tag failures 2 standard deviations to the left of the mean. We recommend at a minimum all tagged fish arrival times occur within the upper shoulder of the failure-time curve in order for tag-life corrections to be small and tractable.
We found clear evidence to support the use of the vitality models for tag-life correction on the basis that these models were top-ranking in terms of GOF for the majority of data sets and exhibited a variety of survival function shapes that matched empirical tag-life data sets (Fig. 2). We do not recommend that model selection be based solely on the non-rejection of the K-S lack-of-fit test, as the test is rather conservative in the range of sample sizes (38 to 125) we evaluated. We instead recommend that investigators evaluate the GOF of their tag-life data to a suite of alternative survivorship models using both ocular and numerical evaluations of model fit. Among these, the alternative models should include vitality, log-logistic, and the gamma family of models.
We found the Vitality (2009) model to be preferable to the Vitality (2013) model. The GOF measure did not suggest clear dominance of one version of the vitality model over another. However, we found the tag-failure process to be more analogous to the Vitality (2009) model, which assumes early failures as a result of a variability in initial vitalities in the population followed by a stochastic decline. While the Vitality (2013) has some similar properties, it further assumes that individuals encounter challenges of varying magnitude over a lifetime, which is not particularly representative of the process that acoustic tags undergo. Our second reason for favoring the Vitality (2009) model was that the survival curve for this model was less frequently above the K-M estimates in the shoulder of the curve than its counterpart.
In our experience, the shoulder of the survivorship curve is where most of the tag-life correction occurs and therefore should be estimated with greatest accuracy. A common reason for the poor fit of many models was that the curve descended too early, "cutting off " the shoulder present in the empirical data. Proceeding with a model misspecified in this manner would result in an overcorrection of survival estimates. Poor fit in the shoulder of some the tag-life data was partly what motivated our experimentation with the vitality class of models. This shortcoming was common for all models that we compared with the exception of the vitality models and to a lesser extent the log-logistic model. In fact, Weibull, log-normal, and gamma models only properly fit tag-life datasets without any early outlying failures. The Gompertz model was somewhat of an exception in that it was competitive with the vitality models for 6 out of 42 cases (14%) and where the initial decline in tag-life was relatively steep.
While providing tag-life-corrected survival estimates are within the reach of all investigators, it remains the responsibility of individual studies to determine the appropriateness of collecting this expensive auxiliary information. It must be acknowledged that tag-life studies are costly and there are important tradeoffs involved in conducting tag-life studies. Cost considerations occur at two levels. First, there is a question of a whether an independent tag-life study is warranted for a particular survival study. Second, there is a question of the number of tags that should be used. There are situations in which a single tag-life study may be applied to multiple release groups. However, it may be necessary to adjust the tag-life corrections for dissimilar release schedules. With respect to the second consideration, acoustic tags cost approximately $200-$250 each, resulting in a taglife study costing $10,000-$25,000, if 50-100 tags are used. In our experience with juvenile salmonid acoustictag studies, sample sizes for tag-life studies should range between 50 and 100 tags. With 50 tags, the standard errors of the survival estimates are typically increased at the second and third decimal place. With 100 tags, the standard errors are changed at the third or fourth decimal place with the incorporation of the variability in tag-life data. Admittedly, not all studies warrant the same degree of precision as the survival estimates in our case studies. However, it is worth noting that the lower the sample size, the greater the chance that none of tags sampled for the tag-life study will possess defects that are actually present in tag population, in which case the early-failure process will not be incorporated into the correction.
Another important consideration when applying taglife corrections is whether it is appropriate to perform a censored analysis of the tag-life data. There are at least two scenarios where a right-censored tag-life analysis may be useful and appropriate. The first scenario occurs when the tag-life study is stopped/truncated before the last tag failure. In this case, a right-truncated failuretime analysis is essential. Let T be the time of truncation, then the maximum likelihood estimates of the truncated model are based on the likelihood where r is the number of tags that failed on or before the truncation time T. A second truncation scenario can occur when the observed fish travel times are relatively short compared to the observed tag-failure times and it is more accurate and easier to model tag-failure times to some truncation point beyond the longest travel time. This truncation strategy is useful when failure-time distributions have difficulty fitting both the shoulder and tail of the failure-time curve. When inferences near the tail of the failure-time distributions are unnecessary, a truncated right-tailed analysis may do a better job fitting the shoulder of the survivorship curve where travel times are likely more relevant.
Ideally, the duration of the survival studies should be timed to be completed while still in the left-hand shoulder of the tag-life curves. Should the duration of the survival study coincide with the right-hand cascade of tag failures, tag-life corrections will be greater and consequences to precision more profound. In the case where the duration of the survival study exceeds the tag-life curve, tag-life corrections will be underestimated, and the survival estimates will remain negatively biased to an unknown extent. Consequently, despite the mathematical ability to account for tag failure, it remains important to coordinate the duration of the field study with tag selection and function. Harnish et al. [12] discussed an issue of tag-life correction, unforeseen by Townsend et al. [35]. In their case, tag failures occurred so severely that it also caused an apparent negative bias in the distribution of arrival times.
The arrival times of acoustic-tagged fish are also a reflection of the tag-failure process. Properly, it is a mixture of distributions from both the travel time and tag-failure process. As a result, the tag-life corrections described in Townsend et al. [35] and Cowen and Schwartz [8] are more correctly termed bias adjustments than bias corrections. Harnish et al. [12] identified this second source of bias by having independent travel time data from acoustic-tagged fish that were dual-tagged with PIT-tags [27] not subject to tag failure. For investigators without the luxury of using dual-tagged fish, the prospect of residual bias after tag-life correction may exist. The prospect of this residual bias increases with steepness of the failure-time curve and the discrepancy between actual travel times and observed range of failure times in the tag-life study.
This paper describes a meta-analysis of the performance of various models in fitting tag-life data sets and draws on extensive experience related to the application of tag-life correction to juvenile salmonid survival studies. We direct investigators to the freeware Program ATLAS (Active Tag Life Adjusted Survival), which can be used to interactively examine alterative tag-life models (i.e., vitality, Weibull), perform truncated tag-life analyses, and obtain tag-life corrected fish survival estimates (http://www.cbr.washi ngton .edu/analy sis/apps/ atlas ). Other software available to analyze a range of