Processing and visualising association data from animal-borne proximity loggers

With increasing interest in animal social networks, field biologists have started exploring the use of advanced tracking technologies for mapping social encounters in free-ranging subjects. Proximity logging, which involves the use of animal-borne tags with the capacity for two-way communication, has attracted particular attention in recent years. While the basic rationale of proximity logging is straightforward, systems generate very large datasets which pose considerable challenges in terms of processing and visualisation. Technical aspects of data handling are crucial for the success of proximity-logging studies, yet are only rarely reported in full detail. Here, we describe the procedures we employed for mining the data generated by a recent deployment of a novel proximity-logging system, “Encounternet”, to study social-network dynamics in tool-using New Caledonian crows. Our field deployment of an Encounternet system produced some 240,000 encounter logs for 33 crows over a 19-day study period. Using this dataset, we illustrate a range of procedures, including: examination of tag reciprocity (i.e. whether both tags participating in an encounter detected the encounter and, if so, whether their records differed); filtering of data according to a predetermined signal-strength criterion (to enable analyses that focus on encounters within a particular distance range); amalgamation of temporally clustered encounter logs (to remove data artefacts and to enable robust analysis of biological patterns); and visualisation of dynamic network data as timeline plots (which can be used, among other things, to visualise the simulated diffusion of information). Researchers wishing to study animal social networks with proximity-logging systems should be aware of the complexities involved. Successful data analysis requires not only a sound understanding of hardware and software operation, but also bioinformatics expertise. Our paper aims to facilitate future projects by explaining in detail some of the subtleties that are easily overlooked in first-pass analyses, but are key for reaching valid biological conclusions. We hope that this work will prove useful to other researchers, especially when read in conjunction with three recently published companion papers that report aspects of system calibration and key results.


Background
Animal social networks (ASN) are usually constructed from data on the spatiotemporal co-occurrence of identifiable subjects (reviews: [1][2][3]). Whenever two animals come within a pre-defined distance of each other, an 'association' (sometimes also called an 'encounter' or 'contact') is recorded for the dyad, which can be represented graphically as an 'edge' in a social network. Directly observing wild animals is often challenging, and in most study systems produces datasets that are biased (some subjects are easier to observe than others) and may be too sparse for robust statistical analyses (focal subjects are usually observed in the order of once per month, week, or day). With increasing interest in the dynamics and drivers of ASN topology [4][5][6][7], research areas that require particularly large amounts of high-quality data, field biologists have started exploring opportunities for automated data collection (review: [8]).
Two types of encounter mapping technology can be distinguished (see schematic in Fig. 1; [8]). With 'indirect encounter mapping' , the spatiotemporal movements of tagged animals are tracked individually, and co-occurrence patterns are inferred post hoc at the data analysis stage. This includes, for example, the use of VHF (very high frequency) radio-telemetry [9] or GPS (global positioning system) logging [10] to locate animals (yielding time-stamped X and Y coordinates), or more recently, of PIT/RFID (passive integrated transponder/radiofrequency identification) tags [11] that are detected by a grid of stationary reading stations (yielding time-stamped visitation data). In contrast, 'direct encounter mapping' involves the use of animal-mounted tags-so-called proximity loggers (or 'business card' tags; [12])-that communicate with each other, to produce reciprocated records of social contacts (in the form of time-stamped encounter logs; Fig. 2). Direct encounter mapping can thus occur when animals associate away from fixed reading stations and in habitats where movement tracking would be challenging (e.g. because forest cover limits the use of GPS). Proximity loggers are 'transceiver' tags that both transmit and receive radio signals (acoustic versions for aquatic habitats are available; [12,13]) and exploit the fact that radio signals attenuate predictably with distance. The technology can therefore be used to make inferences about the 'proximity' of associating individuals (see below, and for a detailed discussion, [14]), but data on the physical locations of encounters are usually lacking (but see [15,16]). Georeferencing the data collected by proximity loggers remains a major challenge [16], but promises unprecedented insights into the spatiotemporal dynamics of a wide range of biological processes.
We have recently conducted the first full-scale deployment of a novel proximity-logging system, "Encounternet" (Encounternet LLC, Washington, Seattle, USA), to investigate the social networks of tool-using New Caledonian crows Corvus moneduloides. As explained in detail below, Encounternet is a fully digital proximitylogging technology, which unlike other commercially available terrestrial systems [17][18][19][20][21][22] enables tag-to-tag communication over distances well in excess of 10 m (other systems usually transmit over a few metres) and records raw signal-strength data for encounters [other systems record detections as binary (yes/no) data]. In earlier papers, we have described how we calibrated our system for field deployment [14] and reported the analysis of both time-aggregated [23] and dynamic network data [15]. Here, we explain the basic procedures for processing and visualising proximity-logger data, focussing  ); usually, data are recorded in binary form (encounter yes/no), but some systems, like "Encounternet", store raw signal-strength data that can later be converted into estimates of tag-to-tag (and hence animal-to-animal) distance; for details, see main text and [14] specifically on Encounternet-unique features (for an earlier study on tags developed by Sirtrack Ltd., see [24]) and on some subtleties that may be easily overlooked by firsttime users. Taken together, our four papers [14, 15, 23, this study] provide a comprehensive description of how to use Encounternet and similar wireless sensor network (WSN) technology [25,26], to study the social dynamics of free-ranging animals.  Table 1). c At the analysis stage, the encounter between A and B can be reconstructed from their respective log files. The upper plot shows the encounter according to what tag A received, and the lower plot according to what tag B received. The disparity in start and end times for the encounter, as recorded by A and B, arises from the difference in times at which tags A and B transmit radio pulses

Proximity-logging technology
The Encounternet system consists of animal-mounted loggers (henceforth 'tags' for simplicity) and a grid of fixed receiver stations ('basestations'), which are used for downloading data remotely from tags (for photos of hardware, see [14]). Each tag emits uniquely ID-coded radio pulses at regular, user-defined time intervals (here 20 s; see below) and continually 'listens' for the signals of other tags. When two tags come within reception range of each other, each tag opens a log file which records data about the encounter-the received ID code, the start and end times of the encounter and a measure of signal strength (for sample data, see Table 1). These data comprise a 'reciprocated encounter' . An example of the timings of pulses transmitted and received by two tags during an encounter is shown schematically in Fig. 2a, b illustrates how data would be logged by each tag. Without independent knowledge of the timings of pulses, the encounter would be reconstructed from the log files as shown in Fig. 2c. Figure 2 demonstrates that a phase offset between the transmission times of the two tags can cause a disparity in the start and end times of the encounter recorded by each tag (but this should be less than the programmed pulse interval).
During an encounter, signal strength is recorded as a 'received signal-strength indicator' (RSSI) value, which is a measure of the power ratio (in dB) of the received signal and an arbitrary reference (for details, see [14]); the RSSI value is converted to an integer for recording and will henceforth be unitless. For each encounter log, which consists of (up to) a pre-programmed number of consecutively received radio pulses, the minimum, maximum and mean RSSI (RSSI min , RSSI max and RSSI mean ) values of the pulse sequence are recorded ( Table 1). The proximity of the tags can later be estimated from RSSI values using an appropriate calibration curve [14,27].
In the present study, we programmed tags to emit pulses every 20 s, which is significantly less than the timescales over which crows' fission-fusion dynamics are expected to occur (minutes to tens of minutes; see [23]). Tags are unable to receive signals during the brief periods (several milliseconds) when they are transmitting, so although slight differences in on-board clock times (generated by tag-specific drift rates) ensured that phase synchrony was unlikely, the exact transmission times were jittered by multiples of 1/3 s up to ± 4/3 s to minimise this possibility.

Field deployment
In October 2011, we deployed Encounternet tags on 41 wild New Caledonian crows in one of our long-term study populations (for biological rationale of the study, see [23], and for background on the study species, see [28]); four tags failed after 4-11 days of transmission and a further four yielded no data, leaving 33 birds for analysis. Tags were attached to crows using weak-link harnesses which were designed to degrade over time, to release devices after the study. The data were collected via 45 basestations deployed in the study area. We have provided a full description of our field procedures elsewhere [15,23].

Preliminary data processing and analysis
Data were recorded for 19 days, amassing ca. 240,000 encounter logs, with all 33 crows participating in at least one association. The encounters analysed (both here and in [15]) were restricted to those recorded between sunrise and sunset only, which constituted a sample of ca. 177,000 logs. Recorded RSSI values ranged from −61 to +60, corresponding to distances of over 50 m to within 1 m (for calibration results, see [14]). The distribution of RSSI mean values for all encounter logs is shown in Fig. 3a; the sharp peak at RSSI mean = 0 was caused by a bug in the tags' firmware [23] and is not due to the behaviour of the tagged animals, as suggested by another study [29].

Table 1 Sample encounter logs recorded by crow-mounted "Encounternet" proximity loggers
'this.ID' and 'enc.ID' are the identities of the receiving and transmitting tags, respectively; 'first.time' and 'last.time' are the start and end time of an encounter (recorded in tag-clock units of 1/64 s); the following three 'RSSI' columns give signal-strength statistics for the pulse sequence making up the encounter; and 'type' codes distinguish, among other things, tag-to-tag logs (type = 1) from error messages and master node commands (not applicable in this example) The distribution of encounter log durations is shown in Fig. 3b. The peaks at multiples of 20 s are a result of the tags' programmed pulse rate (see above and Fig. 2). Tags created a single log for each encounter up to a maximum of 15 received pulses, giving a peak in recorded log durations at 300 s. Because pulses could occasionally be missed (for example, because of a temporary obstruction between the birds), tags did not 'close' encounter logs until no pulse had been received from the other tag for six consecutive pulse intervals (6 × 20 s = 120 s); when this occurred, the end time was recorded as the time of the last received pulse. There is thus a second peak at 320 s (one missed pulse during the encounter), a smaller one at 340 s (two missed pulses) and so on. If more than 15 pulses were received during an encounter, successive log files were created. Grouping encounter log durations by 10-point RSSI mean bins reveals that long-distance encounters are much shorter than close-range ones (Fig. 3c). Figure 4 contains a simple visualisation of a day's worth of encounter logs for two different pairs of crows. It can be seen that there is considerable variation in signal strength from one encounter log to the next, and that reciprocated encounter logs do not match exactly either in timing or in signal strength. The majority of encounter logs appear to have roughly the same duration (ca. 300 s, as per our pre-programmed 15-pulse limit), and successive encounter logs are separated by a small gap of around 20 s (more easily visible in Fig. 5), which is another consequence of tags emitting a pulse every 20 s.

Filtering and amalgamation of reciprocated encounter logs
Spatial proximity is a symmetric proxy for association; if crow A is 10 m from crow B, then crow B is also 10 m from crow A. The logs recorded by the tags, however, are not perfectly symmetrical; for example, there will be variation in the transmitting and receiving strength of tags. Details of the factors influencing signal strength can be found in [14]. Here, we concentrate on the steps taken to clean the data, whatever the cause of the discrepancies.
In Fig. 6, we illustrate the recorded RSSI mean values of reciprocated encounter logs between five different pairs of crows on chosen days. Each plot shows the signals received by each tag of a pair plotted in red or blue. The five examples illustrate a range of ways in which reciprocated signals can differ. The first type of discrepancy is that one tag in a pair can consistently record a higher signal strength than the other (Fig. 6a, e). All five examples show that the start and end times of encounter logs can differ. In some instances, it was actually impossible to match up pairs of encounter logs between the tags. Differences in encounter log duration can be seen most easily in Fig. 6e, between 9:00 and 10:00 hours where tag #74 (blue) records encounter logs with much shorter duration than tag#81 (red). Lastly, Fig. 6b, c shows encounter logs for two crow dyads, both involving crow #72 (blue in both plots), which did not contribute any data during the latter half of the morning.
To construct a symmetric set of encounters from the data, reciprocated signals must be amalgamated to produce a single timeline of encounters between each pair of crows. Since there were no calibration experiments a b c Fig. 3 Properties of encounter logs recorded for a population of wild New Caledonian crows. a Distribution of RSSI mean values for all encounter logs (the peak at RSSI mean = 0 is due to a software error; see [23]). b Distribution of durations for all encounter logs over all 19 study days. c Durations of encounter logs within different RSSI mean ranges. Boxes show 25th and 75th percentiles, whiskers show 2.5th and 97.5th percentiles and medians are indicated by red lines. The distribution of durations is very similar for RSSI mean values between −10 and +50, while encounter logs at RSSI mean <−10 tend to be much shorter. Data are from [15] performed to assess variation in tag performance (including output power and reception sensitivity; see [30,31]), there is no way of reliably determining the 'correct' signal strength for encounters. The lack of tag-specific calibration also makes it impossible to know which tags are more accurately recording start and end times of encounters. In addition to these issues, nothing is known about the height of tags above the ground, the relative orientation of the two tags (and their antennae), or the habitat where the encounter took place, all of which affect RSSI (for details, see [14,23]). We have therefore used a simple method of reconciling reciprocated encounter logs, which does not require any independent information on these factors.
The first step in amalgamating reciprocated encounter logs is to apply a filter criterion (FC), so that only logs that are likely to result from encounters of biological interest are retained for further analyses. In our study of social dynamics in New Caledonian crows, we were primarily interested in close-range encounters of birds [23], Fig. 4 Examples of encounter logs for two crow dyads during daylight hours. The two examples illustrate patterns for pairs of crows that associated (a) frequently [encounter data on day 15, between crows #74 and #81, as recorded by tag 74# (blue) and tag #81 (red)]; and (b) only sporadically [encounter data on day 2, between crows #84 and #85, as recorded by tag #84 (blue) and tag #85 (red)]. Each encounter log is shown as a shaded bar, extending horizontally from the start to end time of the log, and vertically from the minimum to the maximum RSSI values recorded during the encounter; between RSSI min and RSSI mean , the bars are shaded in light blue or red, and from RSSI mean to the RSSI max the bars are shaded in a darker blue or red. Data are from [15] and after system calibration, settled on an FC of RSSI mean ≥15; for single radio pulses, we estimated through simulation that 50 % of pulses of an RSSI ≥15 will result from an inter-tag distance of 4.74 m or less, while 95 % of pulses will originate from within 11.29 m (for details, see [14]). Over distances of a few metres, we would expect crows to be able to observe, and socially learn from, each other, which is key for the biological process we hoped to elucidate-the possible diffusion of foraging inventions across crow networks.
The steps taken to amalgamate reciprocated encounter logs are shown in Fig. 5 for real Encounternet data from tags #74 and #81, collected between 5:15 and 7:15 hours on day 14. In this example, we have amalgamated the RSSI mean values of signals transmitted by tag #74 and received by tag #81 (shown in blue) with signals transmitted by tag #81 and received by tag #74 (shown in red) (Fig. 5a). After discarding all encounter logs which do not meet the chosen FC, this leaves eight periods of association, six received by tag #81 and two by tag #74 (Fig. 5b). The first two shortly after 5:30 hours are an example of two bouts separated by a brief gap (Fig. 5b). As mentioned in the previous section, this is a result of the programmed limit of log files, to close after a maximum of 15 consecutively received 20-s pulses (=300 s). To be able to analyse the total length of time in which crows remain Schematic illustrating the filtering and amalgamation of proximity-logging data. a RSSI mean values for all encounter logs between crows #81 and #74 between 5:15 and 7:15 hours on day 14, as recorded by tag #81 (blue) and tag #74 (red). Amalgamation is performed at a given filter criterion (FC) (here, RSSI mean ≥15), as indicated by the dashed horizontal line. The first step is to discard all encounter logs which do not fulfil the FC, which produces the data shown in b. Using these data, the two crows are defined to be engaged in an encounter at any time when either tag is receiving a signal from the other tag (cf. Fig. 2). c A timeline plot indicating with green shading the times at which there is an encounter between the two crows. Consecutive encounter logs separated by a gap of less than 23 s have been concatenated to form a single encounter (see main text) within range, we have concatenated consecutive encounter logs which are separated by a gap of less than 23 s (to account for the 20-s gap between pulses and give an extra 3-s 'leeway' to ensure that consecutive logs will be concatenated). Data processing resulted in four encounters (meeting the FC) between crows #74 and #81, as illustrated in the 'timeline' plot in Fig. 5c. In such plots, the timeline of a crow is represented by a black horizontal line, and green shading between two timelines indicates a period in which the two crows are engaged in an encounter (cf. Fig. 7). We note that, by defining an 'encounter' as a period when at least one tag in a dyad logs a signal strength above our FC, we retain some encounters in which one of the tags logs below the FC. This is justified, since there are many ways that environmental conditions can cause a radio signal to weaken [14,26], but few ways a signal can be boosted; false positives are therefore highly unlikely, while false negatives will occur frequently. Figure 8 shows the effect of amalgamating encounter logs on the distribution of durations for our Encounternet deployment. Whilst the majority of encounters are between 5 and 6 min long, amalgamation at an FC of RSSI mean ≥15 revealed that crows spent up to ca. 11 min in close proximity of each other. The median 5-min encounter duration corresponds to Fig. 6 Examples of reciprocated encounter logs for five crow dyads during daylight hours. Each plot shows the RSSI mean values of all encounter logs between a pair of crows during a single recording day. a Encounters logged between crow #72 and crow #75 on day 19. In general, the signal strength (RSSI) recorded by #72 was greater than that recorded by #75, suggesting variation in tag performance. b, c Two sets of reciprocated encounter data on day 5, both involving crow #72; data are missing for this tag during the latter half of the morning, which may be due to temporary tag failure or problems with uploading data to basestations. d, e Examples on days 18 and 19, respectively. Again, most of the time one of the tags consistently recorded a higher signal strength than the other. e Note the disparity in start and end times of reciprocated encounter logs, particularly between 9:00 and 10:00 hours. Data are from [15] the programmed 15-received pulse limit of log files. In many of these encounters, crows will have been close to each other for more than 5 min, but the logs recorded before and after this log will have failed to meet the FC, because the mean RSSI was 'dragged down' by pulses received when the birds were first approaching each other, and then, having closely associated, separating from each other.

Temporal network visualisation
The complete temporal dataset of amalgamated encounters can be displayed on timeline plots for all crows (cf. [32]). Figure 7 shows such a plot for 1 day's worth of encounters. Ordering crows according to ascending tag ID is not visually appealing, as many encounters (green shading) overlap with each other (Fig. 7a). One way to improve data visualisation is to place the timelines of frequently associating crows close together. An optimal ordering of the crows can be found by minimising the total area of green shading on each plot, as we have illustrated here for the first 7 days of our deployment ( Fig. 7b; during which the population was not subjected to experimental manipulations; see [15]). It is easy to see that this layout makes the structure of the data much more apparent; for example, there are several pairs or triplets of crows (e.g. adults #81 and #68, and immature #74) which engage in close-range encounters with each other throughout the course of the day, suggesting that these crows have strong social bonds.

Discussion
Research projects using proximity-logging systems proceed through three major stages: system preparation The timeline of each crow is represented by a horizontal line, with green shading between two timelines indicating the period during which the two individuals were engaged in an encounter (cf. Fig. 5c). Each timeline is labelled with tag ID, age (J juvenile; I immature; A adult) and sex (F female; M male), and the labels are coloured according to community membership (for details of community assignment, see [15]). In a crows are ordered according to ascending tag ID, while in b, the ordering has been calculated to minimise the total area of green shading (for the first 7 days of data collection). Data are from [15] and calibration; field deployment and data collection; and data processing and analysis. Prospective users of this technology need to be aware that each of these steps will remain a major undertaking, until hardware, field procedures and analysis techniques have become more established. In this paper, we have offered some guidance on aspects of data processing and visualisation. Once deployed, proximity-logging systems can quickly generate vast amounts of data, which may take some users by surprise (especially, those that have no prior experience with biologging technologies). It is essential that research teams possess sufficient bioinformatics expertise as well as adequate infrastructure for data storage and handling.
While aspects of data cleaning and processing have been described previously (e.g. [18,24,30,31]), these studies were concerned with proximity-logging systems that record encounters as binary detection data (such as the proximity tags by Sirtrack Ltd., New Zealand). In contrast, we provide the first description of techniques for a system that records raw signal-strength (i.e. RSSI) values and, therefore, enables post hoc data filtering by signal strength-and hence animal-to-animal distance-at the analysis stage. To allow further refinements of filtering procedures, we recommend that future studies quantify each tag's transmission power before deployment [30], as such variation could cause animals to appear more or less sociable than they really are [31]. Alternatively, field-recorded data could be used to assess the difference in RSSI values recorded by pairs of tags; comparison of RSSI frequency distributions may reveal differences in tag performance that could be taken into account in subsequent analyses. Our study also illustrated how certain data properties, such as encounter durations, are influenced by tag settings (such as pulse intervals; Fig. 3) and processing procedures (such as amalgamation and concatenation criteria; Fig. 8). When embarking on a proximity-logging project, it is important to recognise how this can potentially affect the biological conclusions that are being drawn from the data. Where possible, we encourage: pilot testing of parameter settings before field deployment, to ensure that they are suitable for mapping the biological processes of interest (e.g. [23]), and detailed sensitivity analyses at the data-mining stage, to confirm that key results are robust (e.g. [15]).
In many study contexts, well-established, indirect encounter mapping technologies (see "Background"; Fig. 1) will remain the method of choice; for example, for species living in open habitats, conventional GPS tracking systems can provide high-resolution datasets that are straightforward to analyse. Where proximity logging is the best option, however, its strengths should be recognised and fully exploited. First, being WSNs, data can be harvested remotely from roaming 'nodes' (animalmounted tags) using fixed nodes (basestations) [25,26], which create opportunities for near real-time analyses. In our study on New Caledonian crows, we used this feature to assess network parameters on a daily basis, to ascertain that a stable equilibrium state had been reached [23], before conducting experimental manipulations that were designed to disturb the network topology [15]. Achieving this level of experimental control would be impossible with most other data collection techniques, but requires careful preparation of data-handling protocols and computer hard-and software resources, to enable ad hoc analyses under field conditions. Another strength of proximity-logging systems is the high temporal data resolution they can achieve. With encounter 'checks' several times per minute for all tagged subjects, sampling rates exceed those possible with unaided field observation by several orders of magnitude. This increase in data quality creates exciting opportunities for investigating socialnetwork dynamics [4,[6][7][8]15], but brings with it new challenges in terms of data visualisation. We have provided examples of a timeline procedure (cf. [4,32]), which we have found useful in our own work, as it enabled us to examine our full dataset in an intuitive way and plan more elaborate diffusion simulations ( [15]; James et al. unpubl. manuscript). a b Fig. 8 The effect of amalgamation on encounter durations recorded for a population of wild New Caledonian crows. a Distribution of durations for all encounter logs which satisfy the filter criterion of RSSI mean ≥15. b Distribution of encounter durations after amalgamating logs, following procedures illustrated in Fig. 5. Data are from [15]