+ - 0:00:00
Notes for current slide
Notes for next slide

Developing Your First R Package

Daniel Anderson

Week 10

1 / 55

Agenda

  • Basics of package development

  • An example from my first CRAN package

  • Creating a package (we'll actually do it!)

2 / 55

Want to follow along?

If you'd like to follow along, please make sure you have the following packages installed

install.packages(c("tidyverse", "devtools", "esvis",
"roxygen2", "usethis"))
3 / 55

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

4 / 55

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

4 / 55

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

4 / 55

Why avoid sourceing

  • Documentation is generally more sparse

  • Directory issues

    • Which leads to reproducibility issues

    • This is also less of an issue if you're using RStudio Projects and {here}

5 / 55

More importantly

Bundling functions into a package is not that hard!

6 / 55

My journey with {esvis}

My first CRAN package

7 / 55

Background

Effect sizes

Standardized mean differences

8 / 55

Background

Effect sizes

Standardized mean differences

  • Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)
8 / 55

Background

Effect sizes

Standardized mean differences

  • Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)

  • Differences in means may not reflect differences at all points in scale if variances are different

8 / 55

Background

Effect sizes

Standardized mean differences

  • Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)

  • Differences in means may not reflect differences at all points in scale if variances are different

  • Substantive interest may also lie with differences at other points in the distribution.

8 / 55

Varying differences

Quick simulated example

library(tidyverse)
common_var <- tibble(low = rnorm(1000, 10, 1),
high = rnorm(1000, 12, 1),
var = "common")
diff_var <- tibble(low = rnorm(1000, 10, 1),
high = rnorm(1000, 12, 2),
var = "diff")
d <- bind_rows(common_var, diff_var)
head(d)
## # A tibble: 6 x 3
## low high var
## <dbl> <dbl> <chr>
## 1 7.855059 10.69834 common
## 2 10.40831 11.51090 common
## 3 9.980279 10.84525 common
## 4 10.76777 13.45303 common
## 5 9.934628 11.16377 common
## 6 9.520182 10.47681 common
9 / 55

Restructure for plotting

d <- d %>%
pivot_longer(
-var,
names_to = "group",
values_to = "value"
)
d
## # A tibble: 4,000 x 3
## var group value
## <chr> <chr> <dbl>
## 1 common low 7.855059
## 2 common high 10.69834
## 3 common low 10.40831
## 4 common high 11.51090
## 5 common low 9.980279
## 6 common high 10.84525
## 7 common low 10.76777
## 8 common high 13.45303
## 9 common low 9.934628
## 10 common high 11.16377
## # … with 3,990 more rows
10 / 55

Plot the distributions

ggplot(d, aes(value, fill = group)) +
geom_density(alpha = 0.7,
color = "gray40") +
facet_wrap(~var) +
scale_fill_brewer(palette = "Set3")

11 / 55

Binned effect sizes

  1. Cut the distributions into n bins (based on percentiles)

  2. Calculate the mean difference between paired bins

  3. Divide each mean difference by the overall pooled standard deviation

d[i]=X¯foc[i]X¯ref[i](nfoc1)Varfoc+(nref1)Varrefnfoc+nref2

12 / 55

Binned effect sizes

  1. Cut the distributions into n bins (based on percentiles)

  2. Calculate the mean difference between paired bins

  3. Divide each mean difference by the overall pooled standard deviation

d[i]=X¯foc[i]X¯ref[i](nfoc1)Varfoc+(nref1)Varrefnfoc+nref2

visualize it!

12 / 55

Back to the simulated example

common <- filter(d, var == "common")
diff <- filter(d, var == "diff")
13 / 55
library(esvis)
binned_es(common, value ~ group)
## # A tibble: 6 x 11
## q qtile_lb qtile_ub group_ref group_foc mean_diff length length1
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <int>
## 1 1 0 0.3333333 high low -2.035098 1000 1000
## 2 2 0.3333333 0.6666667 high low -1.930967 1000 1000
## 3 3 0.6666667 1 high low -1.957844 1000 1000
## 4 1 0 0.3333333 low high 2.035098 1000 1000
## 5 2 0.3333333 0.6666667 low high 1.930967 1000 1000
## 6 3 0.6666667 1 low high 1.957844 1000 1000
## # … with 3 more variables: psd <dbl>, es <dbl>, es_se <dbl>
binned_es(diff, value ~ group)
## # A tibble: 6 x 11
## q qtile_lb qtile_ub group_ref group_foc mean_diff length length1
## <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int> <int>
## 1 1 0 0.3333333 high low -0.9691199 1000 1000
## 2 2 0.3333333 0.6666667 high low -1.922010 1000 1000
## 3 3 0.6666667 1 high low -2.981083 1000 1000
## 4 1 0 0.3333333 low high 0.9691199 1000 1000
## 5 2 0.3333333 0.6666667 low high 1.922010 1000 1000
## 6 3 0.6666667 1 low high 2.981083 1000 1000
## # … with 3 more variables: psd <dbl>, es <dbl>, es_se <dbl>
14 / 55

Visualize it

Common Variance

binned_plot(common, value ~ group)

15 / 55

Visualize it

Different Variance

binned_plot(diff, value ~ group)

16 / 55

Wait a minute...

  • The esvis package will (among other things) calculate and visually display binned effect sizes.
  • But how did we get from an idea, to functions, to a package?

confused

17 / 55

Taking a step back

18 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
19 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
19 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
19 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
19 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
19 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
  6. Write tests for your functions
19 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
  6. Write tests for your functions
  7. Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!
19 / 55

Package Creation

The (or rather a) recipe

  1. Come up with a brilliant an idea
    • can be boring and mundane but just something you do a lot
  2. Write a function! or more likely, a set of functions
  3. Create package skelton
  4. Document your function
  5. Install/fiddle/install
  6. Write tests for your functions
  7. Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!

Use tools to automate

19 / 55

A really good point


And some further recommendations/good advice

20 / 55

Some resources

We surely won't get through everything. In my mind, the best resources are:

Advanced R

R Packages

21 / 55

Our package

We're going to write a package today! Let's keep it really simple...

  1. Idea (which we've actually used before): Report basic descriptive statistics for a vector, x: N, n-valid, n-missing, mean, and sd.
22 / 55

Our function

  • Let's have it return a data frame

  • What will be the formal arguments?

  • What will the body look like?

23 / 55

Our function

  • Let's have it return a data frame

  • What will be the formal arguments?

  • What will the body look like?

Want to give it a go?

23 / 55

The approach I took...

describe <- function(data, column_name) {
x <- data[[column_name]]
nval <- length(na.omit(x))
nmiss <- sum(is.na(x))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(N = nval + nmiss,
n_valid = nval,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
24 / 55

The approach I took...

describe <- function(data, column_name) {
x <- data[[column_name]] # Extract just the vector to summarize
nval <- length(na.omit(x))
nmiss <- sum(is.na(x))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(N = nval + nmiss,
n_valid = nval,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
25 / 55

The approach I took...

describe <- function(data, column_name) {
x <- data[[column_name]]
nval <- length(na.omit(x)) # Count non-missing
nmiss <- sum(is.na(x)) # Count missing
mn <- mean(x, na.rm = TRUE) # Compute mean
stdev <- sd(x, na.rm = TRUE) # Computer SD
out <- tibble::tibble(N = nval + nmiss,
n_valid = nval,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
26 / 55

The approach I took...

describe <- function(data, column_name) {
x <- data[[column_name]]
nval <- length(na.omit(x))
nmiss <- sum(is.na(x))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
# Compile into a df
out <- tibble::tibble(N = nval + nmiss,
n_valid = nval,
n_missing = nmiss,
mean = mn,
sd = stdev)
out
}
27 / 55

The approach I took...

describe <- function(data, column_name) {
x <- data[[column_name]]
nval <- length(na.omit(x))
nmiss <- sum(is.na(x))
mn <- mean(x, na.rm = TRUE)
stdev <- sd(x, na.rm = TRUE)
out <- tibble::tibble(N = nval + nmiss,
n_valid = nval,
n_missing = nmiss,
mean = mn,
sd = stdev)
out # Return the table
}
28 / 55

Informal testing

set.seed(8675309)
df1 <- tibble(x = rnorm(100))
df2 <- tibble(var_miss = c(rnorm(1000, 10, 4), rep(NA, 27)))
describe(df1, "x")
## # A tibble: 1 x 5
## N n_valid n_missing mean sd
## <int> <int> <int> <dbl> <dbl>
## 1 100 100 0 0.05230278 0.9291437
describe(df2, "var_miss")
## # A tibble: 1 x 5
## N n_valid n_missing mean sd
## <int> <int> <int> <dbl> <dbl>
## 1 1027 1000 27 9.881107 4.090208
29 / 55

Demo

Package skeleton:

  • usethis::create_package()
  • usethis::use_r()
  • Use roxygen2 special comments for documentation
  • Run devtools::document()
  • Install and restart, play around
30 / 55

roxygen2 comments

Typical arguments

  • @param: Describe the formal arguments. State argument name and the describe it.

#' @param x Vector to describe

  • @return: What does the function return

#' @return A tibble with descriptive data

  • @example or more commonly @examples: Provide examples of the use of your function.
31 / 55
  • @export: Export your function

If you don't include @export, your function will be internal, meaning others can't access it easily.

32 / 55

Other docs

  • NAMESPACE: Created by {roxygen2}. Don't edit it. If you need to, trash it and it will be reproduced.

  • DESCRIPTION: Describes your package (more on next slide)

  • man/: The documentation files. Created by {roxygen2}. Don't edit.

33 / 55

DESCRIPTION

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1
34 / 55

DESCRIPTION

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

34 / 55

DESCRIPTION

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

Some advice - edit within RStudio, or a good text editor like sublimetext or VSCode. "Fancy" quotes and things can screw this up.

34 / 55

Description File Fields

The ‘Package’, ‘Version’, ‘License’, ‘Description’, ‘Title’, ‘Author’, and ‘Maintainer’ fields are mandatory, all other fields are optional. - Writing R Extensions

Some optional fields include

  • Imports and Suggests (we'll do this in a minute).
  • URL
  • BugReports
  • License (we'll have {usethis} create this for us).
  • LazyData
35 / 55

DESCRIPTION for {esvis}

Package: esvis
Type: Package
Title: Visualization and Estimation of Effect Sizes
Version: 0.3.1
Authors@R: person("Daniel", "Anderson", email = "daniela@uoregon.edu",
role = c("aut", "cre"))
Description: A variety of methods are provided to estimate and visualize
distributional differences in terms of effect sizes. Particular emphasis
is upon evaluating differences between two or more distributions across
the entire scale, rather than at a single point (e.g., differences in
means). For example, Probability-Probability (PP) plots display the
difference between two or more distributions, matched by their empirical
CDFs (see Ho and Reardon, 2012; <doi:10.3102/1076998611411918>), allowing
for examinations of where on the scale distributional differences are
largest or smallest. The area under the PP curve (AUC) is an effect-size
metric, corresponding to the probability that a randomly selected
observation from the x-axis distribution will have a higher value
than a randomly selected observation from the y-axis distribution.
Binned effect size plots are also available, in which the distributions
are split into bins (set by the user) and separate effect sizes (Cohen's
d) are produced for each bin - again providing a means to evaluate the
consistency (or lack thereof) of the difference between two or more
distributions at different points on the scale. Evaluation of empirical
CDFs is also provided, with built-in arguments for providing annotations
to help evaluate distributional differences at specific points (e.g.,
semi-transparent shading). All function take a consistent argument
structure. Calculation of specific effect sizes is also possible. The
following effect sizes are estimable: (a) Cohen's d, (b) Hedges' g,
(c) percentage above a cut, (d) transformed (normalized) percentage above
a cut, (e) area under the PP curve, and (f) the V statistic (see Ho,
2009; <doi:10.3102/1076998609332755>), which essentially transforms the
area under the curve to standard deviation units. By default, effect sizes
are calculated for all possible pairwise comparisons, but a reference
group (distribution) can be specified.
36 / 55

DESCRIPTION for {esvis} (continued)

Depends:
R (>= 3.1)
Imports:
sfsmisc,
ggplot2,
magrittr,
dplyr,
rlang,
tidyr (>= 1.0.0),
purrr,
Hmisc,
tibble
URL: https://github.com/datalorax/esvis
BugReports: https://github.com/datalorax/esvis/issues
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 7.0.2
Suggests:
testthat,
viridisLite
37 / 55

Demo

  • Change the author name.
    • Add a contributor just for fun.
  • Add a license. We'll go for MIT license using usethis::use_mit_license("First and Last Name")
  • Install and reload.
38 / 55

Declare dependencies

  • The function depends on the tibble function within the {tibble} package.

  • We have to declare this dependency

39 / 55

My preferred approach

  • Declare package dependencies: usethis::use_package()

  • Create a package documentation page: usethis::use_package_doc()

    • Declare all dependencies for your package there

    • Only import the functions you need - not the entire package

      • Use #' importFrom pkg fun_name
  • Generally won't have to worry about namespacing. The likelihood of conflicts is also reduced, so long as you don't import the full package.

40 / 55

Demo

41 / 55

Write tests!

  • What does it mean to write tests?

    • ensure your package does what you expect it to
42 / 55

Write tests!

  • What does it mean to write tests?

    • ensure your package does what you expect it to
  • Why write tests?

    • If you write a new function, and it breaks an old one, that's good to know!
    • Reduces bugs, makes your package code more robust
42 / 55

How

  • usethis::use_testthat sets up the infrastructure

  • Make assertions, e.g.: testthat::expect_equal(), testthat::expect_warning(), testthat::expect_error()

43 / 55

Testing

We'll skip over testing for today, because we just don't have time to cover everything. A few good resources:

44 / 55

Check your R package

  • Use devtools::check() to run the same checks CRAN will run on your R package.

    • Use devtools::check_rhub() to test your package on https://builder.r-hub.io/ (several platforms and R versions)

    • Use devtools::build_win() to run the checks on CRAN computers.

45 / 55

Check your R package

  • Use devtools::check() to run the same checks CRAN will run on your R package.

    • Use devtools::check_rhub() to test your package on https://builder.r-hub.io/ (several platforms and R versions)

    • Use devtools::build_win() to run the checks on CRAN computers.

I would not run the latter two until you're getting close to being ready to submit to CRAN.

45 / 55

Patience

The first time, you'll likely get errors. It will probably be frustrating, but ultimately worth the effort.

46 / 55

Let's check now!

47 / 55

🎉 Hooray! 🎉

You have a package!

48 / 55

A few other best practices

  • Create a README with usethis::use_readme_rmd.
49 / 55

A few other best practices

  • Create a README with usethis::use_readme_rmd.

  • Try to get your code coverage up above 80%.

49 / 55

A few other best practices

  • Create a README with usethis::use_readme_rmd.

  • Try to get your code coverage up above 80%.

  • Automate wherever possible ({devtools} and {usethis} help a lot with this)

49 / 55

A few other best practices

  • Create a README with usethis::use_readme_rmd.

  • Try to get your code coverage up above 80%.

  • Automate wherever possible ({devtools} and {usethis} help a lot with this)

  • Use the {goodpractice} package to help you package code be more robust, specifically with goodpractice::gp(). It will give you lots of good ideas

49 / 55

A few other best practices

  • Host on GitHub, and capitalize on integration with other systems (all free, but require registering for an account)

50 / 55

Any time left?

51 / 55

Create a README

  • Use standard R Markdown. Setup the infrastructure with usethis::use_readme_rmd.
  • Write it just like a normal R Markdown doc and it should all flow into the README.

52 / 55

Use GitHub Actions

  • Run usethis::use_github_actions() to get started.

    • Go to the Actions tab on your repo

    • Copy and paste the code to the badge into your README.

53 / 55

Use GitHub Actions

  • Run usethis::use_github_actions() to get started.

    • Go to the Actions tab on your repo

    • Copy and paste the code to the badge into your README.

  • Now all your code will be automatically tested each time you push!

53 / 55

codecov

You can test your code coverage each time you push a new commit by using codecov. Initialize with usethis::use_coverage(). Overall setup process is pretty similar to Travis CI/Appveyor.

Easily see what is/is not covered by tests!

54 / 55

That's all

Thanks so much!

55 / 55

Agenda

  • Basics of package development

  • An example from my first CRAN package

  • Creating a package (we'll actually do it!)

2 / 55
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow