Developing Your First R PackageDaniel AndersonWeek 101 / 55

Agenda

Basics of package development
An example from my first CRAN package
Creating a package (we'll actually do it!)

2 / 55

Want to follow along?

If you'd like to follow along, please make sure you have the following packages installed

install.packages(c("tidyverse", "devtools", "esvis",
                   "roxygen2", "usethis"))

3 / 55

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

4 / 55

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

4 / 55

Bundle your functions

Once you've written more than one function, you may want to bundle them. There are two general ways to do this:

source?

Write a package

4 / 55

Why avoid `source`ing

Documentation is generally more sparse
Directory issues
- Which leads to reproducibility issues
- This is also less of an issue if you're using RStudio Projects and {here}

5 / 55

More importantly

Bundling functions into a package is not that hard!

6 / 55

My journey with {esvis}

My first CRAN package

7 / 55

Background

Effect sizes

Standardized mean differences

8 / 55

Background

Effect sizes

Standardized mean differences

Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)

8 / 55

Background

Effect sizes

Standardized mean differences

Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)
Differences in means may not reflect differences at all points in scale if variances are different

8 / 55

Background

Effect sizes

Standardized mean differences

Assumes reasonably normally distributed distributions (mean is a good indicator of central tendency)
Differences in means may not reflect differences at all points in scale if variances are different
Substantive interest may also lie with differences at other points in the distribution.

8 / 55

Varying differences

Quick simulated example

library(tidyverse)
common_var <- tibble(low  = rnorm(1000, 10, 1),
                     high = rnorm(1000, 12, 1),
                     var  = "common")
diff_var <- tibble(low  = rnorm(1000, 10, 1),
                   high = rnorm(1000, 12, 2),
                   var  = "diff")
d <- bind_rows(common_var, diff_var)
head(d)

## # A tibble: 6 x 3
##         low     high var   
##       <dbl>    <dbl> <chr> 
## 1  7.855059 10.69834 common
## 2 10.40831  11.51090 common
## 3  9.980279 10.84525 common
## 4 10.76777  13.45303 common
## 5  9.934628 11.16377 common
## 6  9.520182 10.47681 common

9 / 55

Restructure for plotting

d <- d %>% 
  pivot_longer(
    -var,
    names_to = "group", 
    values_to = "value"
  ) 
d

## # A tibble: 4,000 x 3
##    var    group     value
##    <chr>  <chr>     <dbl>
##  1 common low    7.855059
##  2 common high  10.69834 
##  3 common low   10.40831 
##  4 common high  11.51090 
##  5 common low    9.980279
##  6 common high  10.84525 
##  7 common low   10.76777 
##  8 common high  13.45303 
##  9 common low    9.934628
## 10 common high  11.16377 
## # … with 3,990 more rows

10 / 55

Plot the distributions

ggplot(d, aes(value, fill = group)) +
  geom_density(alpha = 0.7,
               color = "gray40") +
  facet_wrap(~var) +
  scale_fill_brewer(palette = "Set3")

11 / 55

Binned effect sizes

Cut the distributions into $n$ bins (based on percentiles)
Calculate the mean difference between paired bins
Divide each mean difference by the overall pooled standard deviation

$d_{[i]} = \frac{{\bar{X}}_{f o c_{[i]}} - {\bar{X}}_{r e f_{[i]}}}{\sqrt{\frac{(n_{f o c} - 1) V a r_{f o c} + (n_{r e f} - 1) V a r_{r e f}}{n_{f o c} + n_{r e f} - 2}}}$

12 / 55

Binned effect sizes

Cut the distributions into $n$ bins (based on percentiles)
Calculate the mean difference between paired bins
Divide each mean difference by the overall pooled standard deviation

$d_{[i]} = \frac{{\bar{X}}_{f o c_{[i]}} - {\bar{X}}_{r e f_{[i]}}}{\sqrt{\frac{(n_{f o c} - 1) V a r_{f o c} + (n_{r e f} - 1) V a r_{r e f}}{n_{f o c} + n_{r e f} - 2}}}$

visualize it!

12 / 55

Back to the simulated example

common <- filter(d, var == "common")
diff   <- filter(d, var == "diff")

13 / 55

library(esvis)
binned_es(common, value ~ group)

## # A tibble: 6 x 11
##       q  qtile_lb  qtile_ub group_ref group_foc mean_diff length length1
##   <dbl>     <dbl>     <dbl> <chr>     <chr>         <dbl>  <int>   <int>
## 1     1 0         0.3333333 high      low       -2.035098   1000    1000
## 2     2 0.3333333 0.6666667 high      low       -1.930967   1000    1000
## 3     3 0.6666667 1         high      low       -1.957844   1000    1000
## 4     1 0         0.3333333 low       high       2.035098   1000    1000
## 5     2 0.3333333 0.6666667 low       high       1.930967   1000    1000
## 6     3 0.6666667 1         low       high       1.957844   1000    1000
## # … with 3 more variables: psd <dbl>, es <dbl>, es_se <dbl>

binned_es(diff, value ~ group)

## # A tibble: 6 x 11
##       q  qtile_lb  qtile_ub group_ref group_foc  mean_diff length length1
##   <dbl>     <dbl>     <dbl> <chr>     <chr>          <dbl>  <int>   <int>
## 1     1 0         0.3333333 high      low       -0.9691199   1000    1000
## 2     2 0.3333333 0.6666667 high      low       -1.922010    1000    1000
## 3     3 0.6666667 1         high      low       -2.981083    1000    1000
## 4     1 0         0.3333333 low       high       0.9691199   1000    1000
## 5     2 0.3333333 0.6666667 low       high       1.922010    1000    1000
## 6     3 0.6666667 1         low       high       2.981083    1000    1000
## # … with 3 more variables: psd <dbl>, es <dbl>, es_se <dbl>

14 / 55

Visualize it

Common Variance

binned_plot(common, value ~ group)

15 / 55

Visualize it

Different Variance

binned_plot(diff, value ~ group)

16 / 55

Wait a minute...

The esvis package will (among other things) calculate and visually display binned effect sizes.
But how did we get from an idea, to functions, to a package?

confused

17 / 55

Taking a step back

18 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot

19 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions

19 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions
Create package skelton

19 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions
Create package skelton
Document your function

19 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install

19 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install
Write tests for your functions

19 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install
Write tests for your functions
Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!

19 / 55

Package Creation

The (or rather a) recipe

Come up with ~~a brilliant~~ an idea
- can be boring and mundane but just something you do a lot
Write a function! or more likely, a set of functions
Create package skelton
Document your function
Install/fiddle/install
Write tests for your functions
Host your package somewhere public (GitHub is probably best) and promote it - leverage the power of open source!

Use tools to automate

19 / 55

A really good point

1a) check that no one had the same idea 😇
— Maëlle Salmon 🐟 (@ma_salmon) April 10, 2018

And some further recommendations/good advice

20 / 55

Some resources

We surely won't get through everything. In my mind, the best resources are:

Advanced R

R Packages

21 / 55

Our package

We're going to write a package today! Let's keep it really simple...

Idea (which we've actually used before): Report basic descriptive statistics for a vector, x: N, n-valid, n-missing, mean, and sd.

22 / 55

Our function

Let's have it return a data frame
What will be the formal arguments?
What will the body look like?

23 / 55

Our function

Let's have it return a data frame
What will be the formal arguments?
What will the body look like?

Want to give it a go?

23 / 55

The approach I took...

describe <- function(data, column_name) {
  x <- data[[column_name]]
  nval  <- length(na.omit(x))
  nmiss <- sum(is.na(x))
  mn    <- mean(x, na.rm = TRUE)
  stdev <- sd(x, na.rm = TRUE)
  out <- tibble::tibble(N         = nval + nmiss,
                        n_valid   = nval, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

24 / 55

The approach I took...

describe <- function(data, column_name) {
  x <- data[[column_name]] # Extract just the vector to summarize
  nval  <- length(na.omit(x))
  nmiss <- sum(is.na(x))
  mn    <- mean(x, na.rm = TRUE)
  stdev <- sd(x, na.rm = TRUE)
  out <- tibble::tibble(N         = nval + nmiss,
                        n_valid   = nval, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

25 / 55

The approach I took...

describe <- function(data, column_name) {
  x <- data[[column_name]] 
  nval  <- length(na.omit(x)) # Count non-missing
  nmiss <- sum(is.na(x)) # Count missing
  mn    <- mean(x, na.rm = TRUE) # Compute mean
  stdev <- sd(x, na.rm = TRUE) # Computer SD
  out <- tibble::tibble(N         = nval + nmiss,
                        n_valid   = nval, 
                        n_missing = nmiss, 
                        mean      = mn, 
                        sd        = stdev)
  out
}

26 / 55

The approach I took...

describe <- function(data, column_name) {
  x <- data[[column_name]] 
  nval  <- length(na.omit(x)) 
  nmiss <- sum(is.na(x)) 
  mn    <- mean(x, na.rm = TRUE) 
  stdev <- sd(x, na.rm = TRUE) 
  # Compile into a df
  out <- tibble::tibble(N         = nval + nmiss,
                        n_valid   = nval,
                        n_missing = nmiss,
                        mean      = mn,
                        sd        = stdev)
  out
}

27 / 55

The approach I took...

describe <- function(data, column_name) {
  x <- data[[column_name]] 
  nval     <- length(na.omit(x)) 
  nmiss <- sum(is.na(x)) 
  mn    <- mean(x, na.rm = TRUE) 
  stdev <- sd(x, na.rm = TRUE) 
  out <- tibble::tibble(N         = nval + nmiss, 
                        n_valid   = nval, 
                        n_missing = nmiss,
                        mean      = mn, 
                        sd        = stdev)
  out # Return the table
}

28 / 55

Informal testing

set.seed(8675309)
df1 <- tibble(x = rnorm(100))
df2 <- tibble(var_miss = c(rnorm(1000, 10, 4), rep(NA, 27)))
describe(df1, "x")

## # A tibble: 1 x 5
##       N n_valid n_missing       mean        sd
##   <int>   <int>     <int>      <dbl>     <dbl>
## 1   100     100         0 0.05230278 0.9291437

describe(df2, "var_miss")

## # A tibble: 1 x 5
##       N n_valid n_missing     mean       sd
##   <int>   <int>     <int>    <dbl>    <dbl>
## 1  1027    1000        27 9.881107 4.090208

29 / 55

Demo

Package skeleton:

usethis::create_package()
usethis::use_r()
Use roxygen2 special comments for documentation
Run devtools::document()
Install and restart, play around

30 / 55

roxygen2 comments

Typical arguments

@param: Describe the formal arguments. State argument name and the describe it.

#' @param x Vector to describe

@return: What does the function return

#' @return A tibble with descriptive data

@example or more commonly @examples: Provide examples of the use of your function.

31 / 55

@export: Export your function

If you don't include @export, your function will be internal, meaning others can't access it easily.

32 / 55

Other docs

NAMESPACE: Created by {roxygen2}. Don't edit it. If you need to, trash it and it will be reproduced.
DESCRIPTION: Describes your package (more on next slide)
man/: The documentation files. Created by {roxygen2}. Don't edit.

33 / 55

`DESCRIPTION`

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

34 / 55

`DESCRIPTION`

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

34 / 55

`DESCRIPTION`

Metadata about the package. Default fields for our package are

Package: practice
Version: 0.0.0.9000
Title: What the Package Does (One Line, Title Case)
Description: What the package does (one paragraph).
Authors@R: person("First", "Last", email = "first.last@example.com", role = c("aut", "cre"))
License: What license is it under?
Encoding: UTF-8
LazyData: true
ByteCompile: true
RoxygenNote: 6.0.1

This is where the information for citation(package = "practice") will come from.

Some advice - edit within RStudio, or a good text editor like sublimetext or VSCode. "Fancy" quotes and things can screw this up.

34 / 55

Description File Fields

The ‘Package’, ‘Version’, ‘License’, ‘Description’, ‘Title’, ‘Author’, and ‘Maintainer’ fields are mandatory, all other fields are optional. - Writing R Extensions

Some optional fields include

Imports and Suggests (we'll do this in a minute).
URL
BugReports
License (we'll have {usethis} create this for us).
LazyData

35 / 55

`DESCRIPTION` for {esvis}

Package: esvis
Type: Package
Title: Visualization and Estimation of Effect Sizes
Version: 0.3.1
Authors@R: person("Daniel", "Anderson", email = "daniela@uoregon.edu", 
       role = c("aut", "cre"))
Description: A variety of methods are provided to estimate and visualize
    distributional differences in terms of effect sizes. Particular emphasis
    is upon evaluating differences between two or more distributions across
    the entire scale, rather than at a single point (e.g., differences in
    means). For example, Probability-Probability (PP) plots display the
    difference between two or more distributions, matched by their empirical
    CDFs (see Ho and Reardon, 2012; <doi:10.3102/1076998611411918>), allowing
    for examinations of where on the scale distributional differences are
    largest or smallest. The area under the PP curve (AUC) is an effect-size
    metric, corresponding to the probability that a randomly selected
    observation from the x-axis distribution will have a higher value
    than a randomly selected observation from the y-axis distribution. 
    Binned effect size plots are also available, in which the distributions
    are split into bins (set by the user) and separate effect sizes (Cohen's
    d) are produced for each bin - again providing a means to evaluate the
    consistency (or lack thereof) of the difference between two or more 
    distributions at different points on the scale. Evaluation of empirical 
    CDFs is also provided, with  built-in arguments for providing annotations 
    to help evaluate distributional differences at specific points (e.g., 
    semi-transparent shading). All function take a consistent argument 
    structure. Calculation of specific effect sizes is also possible. The
    following effect sizes are estimable: (a) Cohen's d, (b) Hedges' g, 
    (c) percentage above a cut, (d) transformed (normalized) percentage above 
    a cut, (e)  area under the PP curve, and (f) the V statistic (see Ho, 
    2009; <doi:10.3102/1076998609332755>), which essentially transforms the 
    area under the curve to standard deviation units. By default, effect sizes 
    are calculated for all possible pairwise comparisons, but a reference 
    group (distribution) can be specified.

36 / 55

`DESCRIPTION` for {esvis} (continued)

Depends:
    R (>= 3.1)
Imports:
    sfsmisc,
    ggplot2,
    magrittr,
    dplyr,
    rlang,
    tidyr (>= 1.0.0),
    purrr,
    Hmisc,
    tibble
URL: https://github.com/datalorax/esvis
BugReports: https://github.com/datalorax/esvis/issues
License: MIT + file LICENSE
LazyData: true
RoxygenNote: 7.0.2
Suggests:
    testthat, 
    viridisLite

37 / 55

Demo

Change the author name.
- Add a contributor just for fun.
Add a license. We'll go for MIT license using usethis::use_mit_license("First and Last Name")
Install and reload.

38 / 55

Declare dependencies

The function depends on the tibble function within the {tibble} package.
We have to declare this dependency

39 / 55

My preferred approach

Declare package dependencies: usethis::use_package()
Create a package documentation page: usethis::use_package_doc()
- Declare all dependencies for your package there
- Only import the functions you need - not the entire package
  - Use #' importFrom pkg fun_name
Generally won't have to worry about namespacing. The likelihood of conflicts is also reduced, so long as you don't import the full package.

40 / 55

Demo

41 / 55

Write tests!

What does it mean to write tests?
- ensure your package does what you expect it to

42 / 55

Write tests!

What does it mean to write tests?
- ensure your package does what you expect it to
Why write tests?
- If you write a new function, and it breaks an old one, that's good to know!
- Reduces bugs, makes your package code more robust

42 / 55

How

usethis::use_testthat sets up the infrastructure
Make assertions, e.g.: testthat::expect_equal(), testthat::expect_warning(), testthat::expect_error()

43 / 55

Testing

We'll skip over testing for today, because we just don't have time to cover everything. A few good resources:

Richie Cotton's book

r-pkgs Chapter

Karl Broman Blog Post

44 / 55

Check your R package

Use devtools::check() to run the same checks CRAN will run on your R package.
- Use devtools::check_rhub() to test your package on https://builder.r-hub.io/ (several platforms and R versions)
- Use devtools::build_win() to run the checks on CRAN computers.

45 / 55

Check your R package

Use devtools::check() to run the same checks CRAN will run on your R package.
- Use devtools::check_rhub() to test your package on https://builder.r-hub.io/ (several platforms and R versions)
- Use devtools::build_win() to run the checks on CRAN computers.

I would not run the latter two until you're getting close to being ready to submit to CRAN.

45 / 55

Patience

The first time, you'll likely get errors. It will probably be frustrating, but ultimately worth the effort.

46 / 55

Let's check now!

47 / 55

🎉 Hooray! 🎉

You have a package!

48 / 55

A few other best practices

Create a README with usethis::use_readme_rmd.

49 / 55

A few other best practices

Create a README with usethis::use_readme_rmd.
Try to get your code coverage up above 80%.

49 / 55

A few other best practices

Create a README with usethis::use_readme_rmd.
Try to get your code coverage up above 80%.
Automate wherever possible ({devtools} and {usethis} help a lot with this)

49 / 55

A few other best practices

Create a README with usethis::use_readme_rmd.
Try to get your code coverage up above 80%.
Automate wherever possible ({devtools} and {usethis} help a lot with this)
Use the {goodpractice} package to help you package code be more robust, specifically with goodpractice::gp(). It will give you lots of good ideas

49 / 55

A few other best practices

Host on GitHub, and capitalize on integration with other systems (all free, but require registering for an account)
- Github Actions
- codecov

50 / 55

Any time left?

51 / 55

Create a `README`

Use standard R Markdown. Setup the infrastructure with usethis::use_readme_rmd.
Write it just like a normal R Markdown doc and it should all flow into the README.

52 / 55

Use GitHub Actions

Run usethis::use_github_actions() to get started.
- Go to the Actions tab on your repo
- Copy and paste the code to the badge into your README.

53 / 55

Use GitHub Actions

Run usethis::use_github_actions() to get started.
- Go to the Actions tab on your repo
- Copy and paste the code to the badge into your README.

Now all your code will be automatically tested each time you push!

53 / 55

codecov

You can test your code coverage each time you push a new commit by using codecov. Initialize with usethis::use_coverage(). Overall setup process is pretty similar to Travis CI/Appveyor.

Easily see what is/is not covered by tests!

54 / 55

That's all

Thanks so much!

55 / 55

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Developing Your First R Package

Daniel Anderson

Week 10

Agenda

Want to follow along?

Bundle your functions

Bundle your functions

Bundle your functions

Why avoid sourceing

More importantly

My journey with {esvis}

My first CRAN package

Background

Effect sizes

Background

Effect sizes

Background

Effect sizes

Background

Effect sizes

Varying differences

Quick simulated example

Restructure for plotting

Plot the distributions

Binned effect sizes

Binned effect sizes

visualize it!

Back to the simulated example

Visualize it

Common Variance

Visualize it

Different Variance

Wait a minute...

Taking a step back

Package Creation

The (or rather a) recipe

Package Creation

The (or rather a) recipe

Package Creation

The (or rather a) recipe

Package Creation

The (or rather a) recipe

Package Creation

The (or rather a) recipe

Package Creation

The (or rather a) recipe

Package Creation

The (or rather a) recipe

Package Creation

The (or rather a) recipe

A really good point

Some resources

Advanced R

R Packages

Our package

Our function

Our function

Want to give it a go?

The approach I took...

The approach I took...

The approach I took...

The approach I took...

The approach I took...

Informal testing

Demo

roxygen2 comments

Other docs

DESCRIPTION

DESCRIPTION

DESCRIPTION

Description File Fields

DESCRIPTION for {esvis}

DESCRIPTION for {esvis} (continued)

Demo

Declare dependencies

My preferred approach

Demo

Write tests!

Write tests!

How

Why avoid `source`ing

`DESCRIPTION`

`DESCRIPTION`

`DESCRIPTION`

`DESCRIPTION` for {esvis}

`DESCRIPTION` for {esvis} (continued)

Create a `README`