class: center, middle, inverse, title-slide # Functions: Part 2 ### Daniel Anderson ### Week 6, Class 2 --- layout: true <script> feather.replace() </script> <div class="slides-footer"> <span> <a class = "footer-icon-link" href = "https://github.com/uo-datasci-specialization/c3-fp-2021/raw/main/static/slides/w6p2.pdf"> <i class = "footer-icon" data-feather="download"></i> </a> <a class = "footer-icon-link" href = "https://fp-2021.netlify.app/slides/w6p2.html"> <i class = "footer-icon" data-feather="link"></i> </a> <a class = "footer-icon-link" href = "https://fp-2021.netlify.app/"> <i class = "footer-icon" data-feather="globe"></i> </a> <a class = "footer-icon-link" href = "https://github.com/uo-datasci-specialization/c3-fp-2021"> <i class = "footer-icon" data-feather="github"></i> </a> </span> </div> --- # Agenda * Review take-home midterm (quickly) * Purity (quickly) * Function conditionals + `if (condition) {}` + embedding warnings, messages, and errors * Return values --- # Learning objectives * Understand the concept of purity, and why it is often desirable + And be able to define a side effect * Be able to change the behavior of a function based on the input * Be able to embed warnings/messages/errors --- class: inverse-red middle # Take-home midterm review --- # Purity A function is pure if 1. Its output depends *only* on its inputs 2. It makes no changes to the state of the world -- Any behavior that changes the state of the world is referred to as a *side-effect* -- Note - state of the world is not a technical term, just the way I think of it --- # Common side effect functions * We've talked about a few... what are they? -- ### A couple examples * `print` * `plot` * `write.csv` * `read.csv` * `Sys.time` * `options` * `library` * `install.packages` --- class: inverse-blue middle # Conditionals --- # Example From an old lab: > Write a function that takes two vectors of the same length and returns the total number of instances where the value is `NA` for both vectors. For example, given the following two vectors ```r c(1, NA, NA, 3, 3, 9, NA) c(NA, 3, NA, 4, NA, NA, NA) ``` > The function should return a value of `2`, because the vectors are both `NA` at the third and seventh locations. Provide at least one additional test that the function works as expected. --- # How do you *start* to solve this problem? -- <span style = "text-decoration: line-through"> Start with writing a function </span> -- Solve it on a test case, then generalize! -- ### Use the vectors to solve! ```r a <- c(1, NA, NA, 3, 3, 9, NA) b <- c(NA, 3, NA, 4, NA, NA, NA) ``` You try first. See if you can use these vectors to find how many elements are `NA` in both (should be 2).
03
:
00
--- # One approach ```r is.na(a) ``` ``` ## [1] FALSE TRUE TRUE FALSE FALSE FALSE TRUE ``` ```r is.na(b) ``` ``` ## [1] TRUE FALSE TRUE FALSE TRUE TRUE TRUE ``` ```r is.na(a) & is.na(b) ``` ``` ## [1] FALSE FALSE TRUE FALSE FALSE FALSE TRUE ``` ```r sum(is.na(a) & is.na(b)) ``` ``` ## [1] 2 ``` --- # Generalize to function ```r both_na <- function(x, y) { sum(is.na(x) & is.na(y)) } ``` -- ### What happens if not same length? --- # Test it ```r both_na(a, b) ``` ``` ## [1] 2 ``` ```r both_na(c(a, a), c(b, b)) ``` ``` ## [1] 4 ``` ```r both_na(a, c(b, b)) # ??? ``` ``` ## [1] 4 ``` -- ### What's going on here? --- # Recycling * R will *recycle* vectors if they are divisible ```r data.frame(nums = 1:4, lets = c("a", "b")) ``` ``` ## nums lets ## 1 1 a ## 2 2 b ## 3 3 a ## 4 4 b ``` -- * This will not work if they are not divisible ```r data.frame(nums = 1:3, lets = c("a", "b")) ``` ``` ## Error in data.frame(nums = 1:3, lets = c("a", "b")): arguments imply differing number of rows: 3, 2 ``` --- # Unexpected results * In the `both_na` function, recycling can lead to unexpected results, as we saw * What should we do? -- * Check that they are the same length, return an error if not --- # Check lengths * Stop the evaluation of a function and return an error message with `stop`, but only if a condition has been met. -- ### Basic structure ```r both_na <- function(x, y) { if(condition) { stop("message") } sum(is.na(x) & is.na(y)) } ``` --- # Challenge
02
:
00
Modify the code below to check that the vectors are of the same length. Return a *meaningful* error message if not. Test it out to make sure it works! ```r both_na <- function(x, y) { if(condition) { stop("message") } sum(is.na(x) & is.na(y)) } ``` --- # Attempt 1 * Did yours look something like this? ```r both_na <- function(x, y) { if(length(x) != length(y)) { stop("Vectors are of different lengths") } sum(is.na(x) & is.na(y)) } both_na(a, b) ``` ``` ## [1] 2 ``` ```r both_na(a, c(b, b)) ``` ``` ## Error in both_na(a, c(b, b)): Vectors are of different lengths ``` --- # More meaningful error message? ### What would make it more meaningful? -- State the lengths of each -- ```r both_na <- function(x, y) { if(length(x) != length(y)) { v_lngths <- paste0("x = ", length(x), ", y = ", length(y)) stop("Vectors are of different lengths:", v_lngths) } sum(is.na(x) & is.na(y)) } both_na(a, c(b, b)) ``` ``` ## Error in both_na(a, c(b, b)): Vectors are of different lengths:x = 7, y = 14 ``` --- # Quick error messages * For quick checks, with usually less than optimal messages, use `stopifnot` * Often useful if the function is just for you -- ```r z_score <- function(x) { stopifnot(is.numeric(x)) x <- x[!is.na(x)] (x - mean(x)) / sd(x) } z_score(c("a", "b", "c")) ``` ``` ## Error in z_score(c("a", "b", "c")): is.numeric(x) is not TRUE ``` ```r z_score(c(100, 115, 112)) ``` ``` ## [1] -1.1338934 0.7559289 0.3779645 ``` --- # warnings If you want to embed a warning, just swap out `stop` for `warning` --- # Challenge ### This is a tricky one
06
:
00
Modify your prior code to so it runs, but returns a warning, if the vectors are recyclable, and returns a meaningful error message if they're different lengths and *not* recylable. Hint 1: You'll need two conditions Hint 2: Check if a number is fractional with `%%`, which returns the remainder in a division problem. So `8 %% 2` and `8 %% 4` both return zero (because there is no remainder), while and `7 %% 2` returns 1 and `7 %% 4` returns 3. --- # One approach ```r both_na <- function(x, y) { if(length(x) != length(y)) { lx <- length(x) ly <- length(y) v_lngths <- paste0("x = ", lx, ", y = ", ly) if(lx %% ly == 0 | ly %% lx == 0) { warning("Vectors were recycled (", v_lngths, ")") } else { stop("Vectors are of different lengths and are not recyclable:", v_lngths) } } sum(is.na(x) & is.na(y)) } ``` --- # Test it ```r both_na(a, c(b, b)) ``` ``` ## Warning in both_na(a, c(b, b)): Vectors were recycled (x = 7, y = 14) ``` ``` ## [1] 4 ``` ```r both_na(a, c(b, b)[-1]) ``` ``` ## Error in both_na(a, c(b, b)[-1]): Vectors are of different lengths and are not recyclable:x = 7, y = 13 ``` --- class: inverse-red bottom background-image: url(https://i.gifer.com/Bbo5.gif) background-size: contain # Step back from the ledge --- # How important is this? * For most of the work you do? Not very * Develop a package? Very! * Develop functions that others use, even if not through a package? Sort of. --- class: inverse-blue middle # Return values --- # Thinking more about return values * By default the function will return the last thing that is evaluated * Override this behavior with `return` * This allows the return of your function to be conditional * Generally the last thing evaluated should be the "default", or most common return value --- # Pop quiz * What will the following return? ```r add_two <- function(x) { result <- x + 2 } ``` -- ### Answer: Nothing! Why? ```r add_two(7) add_two(5) ``` --- # Specify the return value The below are all equivalent, and all result in the same function behavior .pull-left[ ```r add_two.1 <- function(x) { result <- x + 2 result } add_two.2 <- function(x) { x + 2 } ``` ] .pull-right[ ```r add_two.3 <- function(x) { result <- x + 2 return(result) } ``` ] --- # When to use `return`? Generally reserve `return` for you're returning a value prior to the full evaluation of the function. Otherwise, use `.1` or `.2` methods from prior slide. --- # Thinking about function names Which of these is most intuitive? ```r f <- function(x) { x <- sort(x) data.frame(value = x, p = ecdf(x)(x)) } ptile <- function(x) { x <- sort(x) data.frame(value = x, ptile = ecdf(x)(x)) } percentile_df <- function(x) { x <- sort(x) data.frame(value = x, percentile = ecdf(x)(x)) } ``` --- # Output * The descriptive nature of the output can also help * Maybe a little too tricky but... ```r percentile_df <- function(x) { arg <- as.list(match.call()) x <- sort(x) d <- data.frame(value = x, percentile = ecdf(x)(x)) names(d)[1] <- paste0(as.character(arg$x), collapse = "_") d } ``` --- ```r random_vector <- rnorm(100) tail(percentile_df(random_vector)) ``` ``` ## random_vector percentile ## 95 1.826218 0.95 ## 96 1.828779 0.96 ## 97 1.909633 0.97 ## 98 1.924716 0.98 ## 99 2.127457 0.99 ## 100 2.737141 1.00 ``` ```r head(percentile_df(rnorm(50))) ``` ``` ## rnorm_50 percentile ## 1 -2.080872 0.02 ## 2 -1.792119 0.04 ## 3 -1.748559 0.06 ## 4 -1.314279 0.08 ## 5 -1.246780 0.10 ## 6 -1.243942 0.12 ``` --- # How do we do this? * I often debug functions and/or figure out how to do something within the function by changing the return value & re-running the function multiple times .b[[demo]] --- # Thinking about dependencies * What's the purpose of the function? + Just your use? Never needed again? Don't worry about it at all. + Mass scale? Worry a fair bit, but make informed decisions. * What's the likelihood of needing to reproduce the results in the future? + If high, worry more. * Consider using name spacing (`::`) --- class: inverse-green middle # Next time ### Lab 3