From an old lab:
Write a function that takes two vectors of the same length and returns the total number of instances where the value is
NA
for both vectors. For example, given the following two vectors
c(1, NA, NA, 3, 3, 9, NA)c(NA, 3, NA, 4, NA, NA, NA)
The function should return a value of
2
, because the vectors are bothNA
at the third and seventh locations. Provide at least one additional test that the function works as expected.
Start with writing a function
Solve it on a test case, then generalize!
a <- c(1, NA, NA, 3, 3, 9, NA)b <- c(NA, 3, NA, 4, NA, NA, NA)
You try first. See if you can use these vectors to find how many elements are NA
in both (should be 2).
03:00
data.frame(nums = 1:4, lets = c("a", "b"))
## nums lets## 1 1 a## 2 2 b## 3 3 a## 4 4 b
data.frame(nums = 1:3, lets = c("a", "b"))
## Error in data.frame(nums = 1:3, lets = c("a", "b")): arguments imply differing number of rows: 3, 2
State the lengths of each
both_na <- function(x, y) { if(length(x) != length(y)) { v_lngths <- paste0("x = ", length(x), ", y = ", length(y)) stop("Vectors are of different lengths:", v_lngths) } sum(is.na(x) & is.na(y))}both_na(a, c(b, b))
## Error in both_na(a, c(b, b)): Vectors are of different lengths:x = 7, y = 14
For quick checks, with usually less than optimal messages, use stopifnot
Often useful if the function is just for you
z_score <- function(x) { stopifnot(is.numeric(x)) x <- x[!is.na(x)] (x - mean(x)) / sd(x)}z_score(c("a", "b", "c"))
## Error in z_score(c("a", "b", "c")): is.numeric(x) is not TRUE
z_score(c(100, 115, 112))
## [1] -1.1338934 0.7559289 0.3779645
06:00
Modify your prior code to so it runs, but returns a warning, if the vectors are recyclable, and returns a meaningful error message if they're different lengths and not recylable.
Hint 1: You'll need two conditions
Hint 2: Check if a number is fractional with %%
, which returns the remainder
in a division problem. So 8 %% 2
and 8 %% 4
both return zero (because
there is no remainder), while and 7 %% 2
returns 1 and 7 %% 4
returns 3.
both_na <- function(x, y) { if(length(x) != length(y)) { lx <- length(x) ly <- length(y) v_lngths <- paste0("x = ", lx, ", y = ", ly) if(lx %% ly == 0 | ly %% lx == 0) { warning("Vectors were recycled (", v_lngths, ")") } else { stop("Vectors are of different lengths and are not recyclable:", v_lngths) } } sum(is.na(x) & is.na(y))}
Which of these is most intuitive?
f <- function(x) { x <- sort(x) data.frame(value = x, p = ecdf(x)(x))}ptile <- function(x) { x <- sort(x) data.frame(value = x, ptile = ecdf(x)(x))}percentile_df <- function(x) { x <- sort(x) data.frame(value = x, percentile = ecdf(x)(x))}
random_vector <- rnorm(100)tail(percentile_df(random_vector))
## random_vector percentile## 95 1.826218 0.95## 96 1.828779 0.96## 97 1.909633 0.97## 98 1.924716 0.98## 99 2.127457 0.99## 100 2.737141 1.00
head(percentile_df(rnorm(50)))
## rnorm_50 percentile## 1 -2.080872 0.02## 2 -1.792119 0.04## 3 -1.748559 0.06## 4 -1.314279 0.08## 5 -1.246780 0.10## 6 -1.243942 0.12
What's the purpose of the function?
Just your use? Never needed again? Don't worry about it at all.
Mass scale? Worry a fair bit, but make informed decisions.
What's the likelihood of needing to reproduce the results in the future?
Consider using name spacing (::
)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |