Title: | Canonical Data Structure for Behavioural Data |
---|---|
Description: | Implements an S3 class based on 'data.table' to store and process efficiently ethomics (high-throughput behavioural) data. |
Authors: | Quentin Geissmann [aut, cre] |
Maintainer: | Quentin Geissmann <[email protected]> |
License: | GPL-3 |
Version: | 0.3.2 |
Built: | 2025-02-09 04:29:51 UTC |
Source: | https://github.com/rethomics/behavr |
In modern behavioural biology,
it is common to record long time series of several variables (such as position, angle,
fluorescence and many others) on multiple individuals.
In addition to large multivariate time series, each individual is associated with a set of
metavariables (i.e. sex, genotype, treatment and lifespan ), which, together, form the metadata.
Metavariables are crucial in so far as they generally "contain" the biological question.
During analysis, it is therefore important to be able to access, alter and compute interactions
between both variables and metavariables.
behavr
is a class that facilitates manipulation and storage of metadata and data in the same object.
It is designed to be both memory-efficient and user-friendly.
For instance, it abstracts joins between data and metavariables.
behavr(x, metadata) setbehavr(x, metadata) is.behavr(x)
behavr(x, metadata) setbehavr(x, metadata) is.behavr(x)
x |
data.table containing all measurements |
metadata |
data.table containing the metadata |
A behavr
table is a data.table.
Therefore, it can be used by any function that would work on a data.frame or a data.table.
Most of the operation such as variable creation, subsetting and joins are inherited from the data.table
[]
operator, following the convention DT[i,j,by]
(see data table package for detail).
These operations are applied on the data.
Metadata can be accessed using meta=TRUE
: DT[i,j,by, meta=TRUE]
,
which allows extraction of subsets, creation of metavariables, etc.
Both x
and metadata
should have a column set as key with the same name (typically named id
).
behavr()
copies x
, whilst setbehavr()
uses reference. metadata
is always copied.
The relevant rethomic tutorial section – about metavariables and variables in this context
data.table – on which behavr
is based
xmv – to join metavariables
rejoin – to join all metadata
bind_behavr_list – to merge several behavr
tables
# We generate some metadata and data set.seed(1) met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[ , list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] # we store them together in a behavr object d # d is a copy of the data d <- behavr(data, met) print(d) summary(d) # we can also convert data to a behavr table without copy: setbehavr(data, met) print(data) summary(data) ### Operations are just like in data.table # row subsetting: d[t < 10] # column subsetting: d[, .(id, t, x)] # making new columns inline: d[, x2 := 1 - x] ### Using `meta = TRUE` applies the operation on the metadata # making new metavariables: d[, treatment := interaction(condition,sex), meta = TRUE] d[meta = TRUE]
# We generate some metadata and data set.seed(1) met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[ , list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] # we store them together in a behavr object d # d is a copy of the data d <- behavr(data, met) print(d) summary(d) # we can also convert data to a behavr table without copy: setbehavr(data, met) print(data) summary(data) ### Operations are just like in data.table # row subsetting: d[t < 10] # column subsetting: d[, .(id, t, x)] # making new columns inline: d[, x2 := 1 - x] ### Using `meta = TRUE` applies the operation on the metadata # making new metavariables: d[, treatment := interaction(condition,sex), meta = TRUE] d[meta = TRUE]
This function is typically used to summarise (i.e. computing an aggregate of) a variable (y
)
for bins of a another variable x
(typically time).
bin_apply(data, y, x = "t", x_bin_length = mins(30), wrap_x_by = NULL, FUN = mean, ...) bin_apply_all(data, ...)
bin_apply(data, y, x = "t", x_bin_length = mins(30), wrap_x_by = NULL, FUN = mean, ...) bin_apply_all(data, ...)
data |
data.table or behavr table (see details) |
y |
variable or expression to be aggregated |
x |
variable or expression to be binned |
x_bin_length |
length of the bins (same unit as |
wrap_x_by |
numeric value defining wrapping period. |
FUN |
function used to aggregate (e.g. mean, median, sum and so on) |
... |
additional arguments to be passed to |
bin_apply
expects data from a single individual, whilst
bin_apply_all
works on multiple individuals identified by a unique key.
wrapping
is typically used to compute averages across several periods.
For instance, wrap_x_by = days(1)
, means bins will aggregate values across several days.
In this case, the resulting x
can be interpreted as "time relative to the onset of the day" (i.e. Zeitgeber Time).
behavr – the documentation of the behavr
object
metadata <- data.frame(id = paste0("toy_experiment|",1:5)) dt <- toy_activity_data(metadata, duration = days(2)) # average by 30min time bins, default dt_binned <- bin_apply_all(dt, moving) # equivalent to dt_binned <- dt[, bin_apply(.SD, moving), by = "id"] # if we want the opposite of moving: dt_binned <- bin_apply_all(dt, !moving) # More advanced usage dt <- toy_dam_data(metadata, duration = days(2)) # sum activity per 60 minutes dt_binned <- bin_apply_all(dt, activity, x = t, x_bin_length = mins(60), FUN = sum) # average activity. Time in ZT dt_binned <- bin_apply_all(dt, activity, x = t, wrap_x_by = days(1) )
metadata <- data.frame(id = paste0("toy_experiment|",1:5)) dt <- toy_activity_data(metadata, duration = days(2)) # average by 30min time bins, default dt_binned <- bin_apply_all(dt, moving) # equivalent to dt_binned <- dt[, bin_apply(.SD, moving), by = "id"] # if we want the opposite of moving: dt_binned <- bin_apply_all(dt, !moving) # More advanced usage dt <- toy_dam_data(metadata, duration = days(2)) # sum activity per 60 minutes dt_binned <- bin_apply_all(dt, activity, x = t, x_bin_length = mins(60), FUN = sum) # average activity. Time in ZT dt_binned <- bin_apply_all(dt, activity, x = t, wrap_x_by = days(1) )
Bind all rows of both data and metadata from a list of behavr tables into a single one. It checks keys, number and names of columns are the same across all data. In addition, it forbids to bind metadata that would result in duplicates (same id in two different metadata).
bind_behavr_list(l)
bind_behavr_list(l)
l |
list of behavr |
a single behavr object
behavr – the documentation of the behavr
object
met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[,list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5), by = "id"] d1 <- behavr(data, met) met[,id := id + 5] data[,id := id + 5] data.table::setkeyv(met, "id") data.table::setkeyv(data, "id") d2 <- behavr(data, met) d_all <- bind_behavr_list(list(d1, d2)) print(d_all)
met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[,list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5), by = "id"] d1 <- behavr(data, met) met[,id := id + 5] data[,id := id + 5] data.table::setkeyv(met, "id") data.table::setkeyv(data, "id") d2 <- behavr(data, met) d_all <- bind_behavr_list(list(d1, d2)) print(d_all)
This function returns the metadata from a behavr table.
meta(x) setmeta(x, new)
meta(x) setmeta(x, new)
x |
behavr object |
new |
a new metadata table |
a data.table representing the metadata in x
set.seed(1) met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[, list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] d <- behavr(data, met) ## show metadata meta(d) # same as: d[meta = TRUE] ## set metadata m <- d[meta = TRUE] # only id > 2 is kept setmeta(d, m[id < 3]) meta(d)
set.seed(1) met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[, list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] d <- behavr(data, met) ## show metadata meta(d) # same as: d[meta = TRUE] ## set metadata m <- d[meta = TRUE] # only id > 2 is kept setmeta(d, m[id < 3]) meta(d)
Print and summarise a behavr table
## S3 method for class 'behavr' print(x, ...) ## S3 method for class 'behavr' summary(object, detailed = F, ...)
## S3 method for class 'behavr' print(x, ...) ## S3 method for class 'behavr' summary(object, detailed = F, ...)
x , object
|
behavr table |
... |
arguments passed on to further method |
detailed |
whether summary should be exhaustive |
behavr – to generate x
This function joins the data of a behavr table to its own metadata. When dealing with large data sets, it is preferable to keep metadata and data separate until a summary of data is computed. Indeed, joining many metavariables to very long time series may result in unnecessary – and prohibitively – large memory footprint.
rejoin(x)
rejoin(x)
x |
behavr object |
behavr – to formally create a behavr object
set.seed(1) met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[, list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] d <- behavr(data, met) summary_d <- d[, .(test = mean(x)), by = id] rejoin(summary_d)
set.seed(1) met <- data.table::data.table(id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") data <- met[, list(t = 1L:100L, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] d <- behavr(data, met) summary_d <- d[, .(test = mean(x)), by = id] rejoin(summary_d)
This function can merge rows of data from the same individual that was recorded over multiple experiments.
A usual scenario in which stitch_on
can be used is when an experiment is interrupted and a new recording is started
on the same biological subjects.
Stitching assumes the users has defined a unique id in the metadata that refers to a specific individual.
Then, if any data that comes from the same unique id, it is merged.
stitch_on(x, on, time_ref = "datetime", use_time = F, time_variable = "t")
stitch_on(x, on, time_ref = "datetime", use_time = F, time_variable = "t")
x |
behavr object |
on |
name of a metavariable serving as a unique id (per individual) |
time_ref |
name of a metavariable used to align time (e.g. |
use_time |
whether to use time as well as date |
time_variable |
name of the variable describing time |
When several rows of the metadata match a unique id (several experiments),
the first (in time) experiment is used as the reference id.
The data from the following one(s) will be added with a time lag equals to the difference between
the values of time_ref
.
When data is not aligned to circadian time, it makes sense to set use_time = TRUE
.
Otherwise, the assumption is that the time is already aligned to a circadian reference,
so only the date is used.
a behavr table
behavr – to formally create a behavr object
set.seed(1) met1 <- data.table::data.table(uid = 1:5,id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") met2 <- data.table::data.table(uid = 1:4, id = 6:9, condition = letters[1:4], sex=c("M", "M", "M", "F"), key = "id") met1[, datetime := as.POSIXct("2015-01-02")] met2[, datetime := as.POSIXct("2015-01-03")] met <- rbind(met1, met2) data.table::setkeyv(met, "id") t <- 1L:100L data <- met[,list(t = t, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] d <- behavr(data, met) summary(d) d2 <- stitch_on(d, on = "uid") summary(d2)
set.seed(1) met1 <- data.table::data.table(uid = 1:5,id = 1:5, condition = letters[1:5], sex = c("M", "M", "M", "F", "F"), key = "id") met2 <- data.table::data.table(uid = 1:4, id = 6:9, condition = letters[1:4], sex=c("M", "M", "M", "F"), key = "id") met1[, datetime := as.POSIXct("2015-01-02")] met2[, datetime := as.POSIXct("2015-01-03")] met <- rbind(met1, met2) data.table::setkeyv(met, "id") t <- 1L:100L data <- met[,list(t = t, x = rnorm(100), y = rnorm(100), eating = runif(100) > .5 ), by = "id"] d <- behavr(data, met) summary(d) d2 <- stitch_on(d, on = "uid") summary(d2)
Trivial functions to convert time to seconds – since behavr
uses second as a conventional unit of time.
days(x) hours(x) mins(x)
days(x) hours(x) mins(x)
x |
numeric vector to be converted in second |
Most functions in the rethomics
framework will use seconds as a unit of time.
It is always preferable to call a function like my_function(days(1.5))
rather than my_function(60 * 60 * 24 * 1.5)
.
number of seconds corresponding to x
(1d = 86400s, 1h = 3600s and 1min = 60s)
This function generates random data that emulates some of the features of fruit fly activity and sleep. This is designed exclusively to provide material for examples and tests as it generates "realistic" datasets of arbitrary length.
toy_activity_data(metadata = NULL, seed = 1, rate_range = 1/c(60, 10), duration = days(5), sampling_period = 10, ...) toy_ethoscope_data(...) toy_dam_data(...)
toy_activity_data(metadata = NULL, seed = 1, rate_range = 1/c(60, 10), duration = days(5), sampling_period = 10, ...) toy_ethoscope_data(...) toy_dam_data(...)
metadata |
data.frame where every row defines an individual.
Typically |
seed |
random seed used (see set.seed) |
rate_range |
parameter defining the boundaries of the rate at which animals wake up. It will be uniformly distributed between animals, but fixed within each animal. |
duration |
length (in seconds) of the data to generate |
sampling_period |
sampling period (in seconds) of the resulting data |
... |
additional arguments to be passed to |
a behavr table with the metadata columns as metavariables.
In addition to id
and t
columns different methods will output different variables:
toy_activity_data
will have asleep
and moving
(1/10s)
toy_dam_data
will have activity
(1/60s)
toy_ethoscope_data
will have xy_dist_log10x1000
, has_interacted
and x
(2/1s)
The relevant rethomic tutorial section – explainig how to work with toy data.
behavr – to formally create a behavr object
# just one animal, no metadata needed dt <- toy_ethoscope_data(duration = days(1)) # advanced, using a metadata metadata <- data.frame(id = paste0("toy_experiment|",1:9), condition = c("A", "B", "C")) metadata # Data that could come from the scopr package: dt <- toy_ethoscope_data(metadata, duration = days(1)) print(dt) # Some DAM-like data dt <- toy_dam_data(metadata, seed = 2, duration = days(1)) print(dt) # data where behaviour is annotated e.g. by a classifier dt <- toy_activity_data(metadata, 1.5) print(dt)
# just one animal, no metadata needed dt <- toy_ethoscope_data(duration = days(1)) # advanced, using a metadata metadata <- data.frame(id = paste0("toy_experiment|",1:9), condition = c("A", "B", "C")) metadata # Data that could come from the scopr package: dt <- toy_ethoscope_data(metadata, duration = days(1)) print(dt) # Some DAM-like data dt <- toy_dam_data(metadata, seed = 2, duration = days(1)) print(dt) # data where behaviour is annotated e.g. by a classifier dt <- toy_activity_data(metadata, 1.5) print(dt)
This function eXpands a MetaVariable from a parent behavr object. That is, it matches this variable (from metadata) to the data by id.
xmv(var)
xmv(var)
var |
the name of the variable to be extracted |
This function can only be called within between the []
of a parent behavr object.
It is intended to facilitate operations between data and metadata.
For instance, when one wants to modify a variable according to a metavariable.
a vector of the same type as var
, but of the same length as the number of row in the parent data.
Each row of data is matched against metadata for this specific variable.
#### First, we create some data library(data.table) set.seed(1) data <- data.table( id = rep(c("A", "B"), times = c(10, 26)), t = c(1:10, 5:30), x = rnorm(36), key = "id" ) metadata = data.table(id = c("A", "B"), treatment = c("w", "z"), lifespan = c(19, 32), ref_x = c(1, 0), key = "id") dt <- behavr(data, metadata) summary(dt) #### Subsetting using metadata dt[xmv(treatment) == "w"] dt[xmv(treatment) == "w"] dt[xmv(lifespan) < 30] #### Allocating new columns using metavariable # Just joining lifespan (not necessary) dt[, lif := xmv(lifespan)] print(dt) # Anonymously (more useful) dt[, x2 := x - xmv(ref_x)] print(dt)
#### First, we create some data library(data.table) set.seed(1) data <- data.table( id = rep(c("A", "B"), times = c(10, 26)), t = c(1:10, 5:30), x = rnorm(36), key = "id" ) metadata = data.table(id = c("A", "B"), treatment = c("w", "z"), lifespan = c(19, 32), ref_x = c(1, 0), key = "id") dt <- behavr(data, metadata) summary(dt) #### Subsetting using metadata dt[xmv(treatment) == "w"] dt[xmv(treatment) == "w"] dt[xmv(lifespan) < 30] #### Allocating new columns using metavariable # Just joining lifespan (not necessary) dt[, lif := xmv(lifespan)] print(dt) # Anonymously (more useful) dt[, x2 := x - xmv(ref_x)] print(dt)