Understand the fundamental difference between lists and atomic vectors
Understand how atomic vectors are coerced, implicitly or explicitly
Understand various ways to subset vectors, and how subsetting differs for lists
Understand what an attribute is, and how to set and modify attributes
One of you share your screen:
Create four atomic vectors, one for each of the fundamental types
Combine two or more of the vectors. Predict the implicit coercion of each.
Apply explicit coercions, and predict the output for each.
(basically quiz each other)
08:00
Atomic vectors by themselves make up only a small fraction of the total number of data types in R
Remember, atomic vectors are the atoms of R. Many other data structures are built from atomic vectors.
attributes
library(palmerpenguins)attributes(penguins[1:50, ]) # limiting rows just for slides
## $names## [1] "species" "island" "bill_length_mm" "bill_depth_mm" ## [5] "flipper_length_mm" "body_mass_g" "sex" "year" ## ## $row.names## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28## [29] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50## ## $class## [1] "tbl_df" "tbl" "data.frame"
head(penguins)
## # A tibble: 6 x 8## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> ## 1 Big one Torgersen 39.1 18.7 181 3750 male ## 2 Big one Torgersen 39.5 17.400 186 3800 female## 3 Big one Torgersen 40.300 18 195 3250 female## 4 Big one Torgersen NA NA NA NA <NA> ## 5 Big one Torgersen 36.7 19.3 193 3450 female## 6 Big one Torgersen 39.300 20.6 190 3650 male ## # … with 1 more variable: year <int>
attr
attr(penguins, "class")
## [1] "tbl_df" "tbl" "data.frame"
attr(penguins, "names")
## [1] "species" "island" "bill_length_mm" "bill_depth_mm" ## [5] "flipper_length_mm" "body_mass_g" "sex" "year"
Note - this is not generally how you would pull these attributes. Rather, you would use class()
and names()
.
Note in the prior slides, I'm asking for attributes on the entire data frame.
Is that what I want?... maybe. But the individual vectors may have attributes as well
attributes(penguins$species)
## $levels## [1] "Big one" "Little one" "Funny one" ## ## $class## [1] "factor"
attributes(penguins$bill_length_mm)
## NULL
attr
attr(penguins$species, "levels") <- c("Big one", "Little one", "Funny one")head(penguins)
## # A tibble: 6 x 8## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> ## 1 Big one Torgersen 39.1 18.7 181 3750 male ## 2 Big one Torgersen 39.5 17.400 186 3800 female## 3 Big one Torgersen 40.300 18 195 3250 female## 4 Big one Torgersen NA NA NA NA <NA> ## 5 Big one Torgersen 36.7 19.3 193 3450 female## 6 Big one Torgersen 39.300 20.6 190 3650 male ## # … with 1 more variable: year <int>
Note - you would generally not define levels this way either, but it is a general method for modifying attributes.
matrix(6:13, ncol = 2, byrow = TRUE)
## [,1] [,2]## [1,] 6 7## [2,] 8 9## [3,] 10 11## [4,] 12 13
vect <- 6:13dim(vect) <- c(2, 4)vect
## [,1] [,2] [,3] [,4]## [1,] 6 8 10 12## [2,] 7 9 11 13
t(vect)
## [,1] [,2]## [1,] 6 7## [2,] 8 9## [3,] 10 11## [4,] 12 13
attr(v, "matrix_mean") <- mean(v)v
## index value## the first 1 4## second 2 5## III 3 6## attr(,"matrix_mean")## [1] 3.5
attr(v, "matrix_mean")
## [1] 3.5
Fit a multilevel model and pull the variance-covariance matrix
m <- lme4::lmer(Reaction ~ 1 + Days + (1 + Days|Subject), data = lme4::sleepstudy)lme4::VarCorr(m)$Subject
## (Intercept) Days## (Intercept) 612.100158 9.604409## Days 9.604409 35.071714## attr(,"stddev")## (Intercept) Days ## 24.740658 5.922138 ## attr(,"correlation")## (Intercept) Days## (Intercept) 1.00000000 0.06555124## Days 0.06555124 1.00000000
Usually we want to work with data frames because they represent our data better.
Sometimes a matrix is more efficient because you can operat on the entire matrix at once.
set.seed(42)m <- matrix(rnorm(100, 200, 10), ncol = 10)m
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]## [1,] 213.7096 213.0487 196.9336 204.5545 202.0600 203.2193 196.3277 189.5688 215.1271## [2,] 194.3530 222.8665 182.1869 207.0484 196.3894 192.1616 201.8523 199.0981 202.5792## [3,] 203.6313 186.1114 198.2808 210.3510 207.5816 215.7573 205.8182 206.2352 200.8844## [4,] 206.3286 197.2121 212.1467 193.9107 192.7330 206.4290 213.9974 190.4648 198.7910## [5,] 204.0427 198.6668 218.9519 205.0496 186.3172 200.8976 192.7271 194.5717 188.0567## [6,] 198.9388 206.3595 195.6953 182.8299 204.3282 202.7655 213.0254 205.8100 206.1200## [7,] 215.1152 197.1575 197.4273 192.1554 191.8861 206.7929 203.3585 207.6818 197.8286## [8,] 199.0534 173.4354 182.3684 191.4909 214.4410 200.8983 210.3851 204.6377 198.1724## [9,] 220.1842 175.5953 204.6010 175.8579 195.6855 170.0691 209.2073 191.1422 209.3335## [10,] 199.3729 213.2011 193.6001 200.3612 206.5565 202.8488 207.2088 189.0022 208.2177## [,10]## [1,] 213.9212## [2,] 195.2383## [3,] 206.5035## [4,] 213.9111## [5,] 188.8921## [6,] 191.3921## [7,] 188.6826## [8,] 185.4079## [9,] 200.7998## [10,] 206.5320
sum(m)
## [1] 20032.51
mean(m)
## [1] 200.3251
rowSums(m)
## [1] 2048.470 1993.774 2041.155 2025.924 1978.173 2007.265 1998.086 1960.291 1952.476## [10] 2026.901
colSums(m)
## [1] 2054.730 1983.654 1982.192 1963.610 1997.978 2001.839 2053.908 1978.212 2025.111## [10] 1991.281
# standardize the matrixz <- (m - mean(m)) / sd(m)
z
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]## [1,] 1.28528802 1.2218239 -0.3256841 0.40613865 0.1665940 0.27791666 -0.3838736## [2,] -0.57349498 2.1646089 -1.7417882 0.64562157 -0.3779416 -0.78393268 0.1466507## [3,] 0.31748345 -1.3649263 -0.1963133 0.96277141 0.6968297 1.48192480 0.5274934## [4,] 0.57650528 -0.2989403 1.1352110 -0.61596668 -0.7290676 0.58614338 1.3129235## [5,] 0.35698951 -0.1592501 1.7887033 0.45367758 -1.3451640 0.05497234 -0.7296315## [6,] -0.13313334 0.5794704 -0.4445968 -1.68004206 0.3844054 0.23434417 1.2195893## [7,] 1.42026916 -0.3041875 -0.2782756 -0.78452812 -0.8103926 0.62108770 0.2912866## [8,] -0.12212321 -2.5821792 -1.7243635 -0.84833774 1.3555260 0.05504171 0.9660389## [9,] 1.90703954 -2.3747685 0.4106013 -2.34955213 -0.4455350 -2.90544454 0.8529388## [10,] -0.09144695 1.2364622 -0.6458013 0.00346451 0.5983857 0.24234547 0.6610253## [,8] [,9] [,10]## [1,] -1.0329155 1.42140711 1.30560568## [2,] -0.1178282 0.21645471 -0.48848642## [3,] 0.5675319 0.05370436 0.59329679## [4,] -0.9468782 -0.14731870 1.30463971## [5,] -0.5524942 -1.17812024 -1.09789797## [6,] 0.5266990 0.55646825 -0.85783015## [7,] 0.7064474 -0.23973975 -1.11801576## [8,] 0.4141258 -0.20672212 -1.43248556## [9,] -0.8818216 0.86505545 0.04558258## [10,] -1.0873272 0.75791330 0.59603916
v
## index value## the first 1 4## second 2 5## III 3 6## attr(,"matrix_mean")## [1] 3.5
rowSums(v)
## the first second III ## 5 7 9
attributes(rowSums(v))
## $names## [1] "the first" "second" "III"
Generally names
are maintained
Sometimes, dim
is maintained, sometimes not
All else is stripped
v3b <- c(5, 7, 12)names(v3b) <- c("a", "b", "c")v3b
## a b c ## 5 7 12
v3c <- setNames(c(5, 7, 12), c("a", "b", "c"))v3c
## a b c ## 5 7 12
names
is not the same thing as colnames
, but, somewhat confusingly, both work to rename the variables (columns) of a data frame. We'll talk more about why this is momentarily.Technically, each element of the list is a vector, possibly atomic
The prior example included all scalars, which are vectors of length 1.
Lists do not require all elements to be the same length
l <- list( c("a", "b", "c"), rnorm(5), c(7L, 2L), c(TRUE, TRUE, FALSE, TRUE))l
## [[1]]## [1] "a" "b" "c"## ## [[2]]## [1] 1.2009654 1.0447511 -1.0032086 1.8484819 -0.6667734## ## [[3]]## [1] 7 2## ## [[4]]## [1] TRUE TRUE FALSE TRUE
l_df <- list( a = c("red", "blue"), b = rnorm(2), c = c(7L, 2L), d = c(TRUE, FALSE))l_df
## $a## [1] "red" "blue"## ## $b## [1] 0.1055138 -0.4222559## ## $c## [1] 7 2## ## $d## [1] TRUE FALSE
data.frame(l_df)
## a b c d## 1 red 0.1055138 7 TRUE## 2 blue -0.4222559 2 FALSE
You can always use logical
Indexing works too
l[c(TRUE, FALSE, TRUE)]
## $x## a b c ## 3 5 7 ## ## $x3## $x3$vect## a b c ## 3 5 7 ## ## $x3$squared## a b c ## 9 25 49 ## ## $x3$cubed## a b c ## 27 125 343
l[c(1, 3)]
## $x## a b c ## 3 5 7 ## ## $x3## $x3$vect## a b c ## 3 5 7 ## ## $x3$squared## a b c ## 9 25 49 ## ## $x3$cubed## a b c ## 27 125 343
Generally we deal with 2d data frames
If there are two dimensions, we separate the [
subsetting with a comma
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars[3, 4]
## [1] 93
mtcars[ ,4]
## [1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52 65 97## [22] 150 150 245 175 66 91 113 264 175 335 109
mtcars[4, ]
## mpg cyl disp hp drat wt qsec vs am gear carb## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Often, you don't want the vector returned, but rather the modified data frame.
Specify drop = FALSE
mtcars[ ,4]
## [1] 110 110 93 110 175 105 245 62 95 123 123 180 180 180 205 215 230 66 52 65 97## [22] 150 150 245 175 66 91 113 264 175 335 109
mtcars[ ,4, drop = FALSE]
## hp## Mazda RX4 110## Mazda RX4 Wag 110## Datsun 710 93## Hornet 4 Drive 110## Hornet Sportabout 175## Valiant 105## Duster 360 245## Merc 240D 62## Merc 230 95## Merc 280 123## Merc 280C 123## Merc 450SE 180## Merc 450SL 180## Merc 450SLC 180## Cadillac Fleetwood 205## Lincoln Continental 215## Chrysler Imperial 230## Fiat 128 66## Honda Civic 52## Toyota Corolla 65## Toyota Corona 97## Dodge Challenger 150## AMC Javelin 150## Camaro Z28 245## Pontiac Firebird 175## Fiat X1-9 66## Porsche 914-2 91## Lotus Europa 113## Ford Pantera L 264## Ferrari Dino 175## Maserati Bora 335## Volvo 142E 109
Fairly obviously, they're much more flexible
Often returned by functions, for example, lm
m <- lm(mpg ~ hp, mtcars)str(m)
## List of 12## $ coefficients : Named num [1:2] 30.0989 -0.0682## ..- attr(*, "names")= chr [1:2] "(Intercept)" "hp"## $ residuals : Named num [1:32] -1.594 -1.594 -0.954 -1.194 0.541 ...## ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...## $ effects : Named num [1:32] -113.65 -26.046 -0.556 -0.852 0.67 ...## ..- attr(*, "names")= chr [1:32] "(Intercept)" "hp" "" "" ...## $ rank : int 2## $ fitted.values: Named num [1:32] 22.6 22.6 23.8 22.6 18.2 ...## ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...## $ assign : int [1:2] 0 1## $ qr :List of 5## ..$ qr : num [1:32, 1:2] -5.657 0.177 0.177 0.177 0.177 ...## .. ..- attr(*, "dimnames")=List of 2## .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...## .. .. ..$ : chr [1:2] "(Intercept)" "hp"## .. ..- attr(*, "assign")= int [1:2] 0 1## ..$ qraux: num [1:2] 1.18 1.08## ..$ pivot: int [1:2] 1 2## ..$ tol : num 1e-07## ..$ rank : int 2## ..- attr(*, "class")= chr "qr"## $ df.residual : int 30## $ xlevels : Named list()## $ call : language lm(formula = mpg ~ hp, data = mtcars)## $ terms :Classes 'terms', 'formula' language mpg ~ hp## .. ..- attr(*, "variables")= language list(mpg, hp)## .. ..- attr(*, "factors")= int [1:2, 1] 0 1## .. .. ..- attr(*, "dimnames")=List of 2## .. .. .. ..$ : chr [1:2] "mpg" "hp"## .. .. .. ..$ : chr "hp"## .. ..- attr(*, "term.labels")= chr "hp"## .. ..- attr(*, "order")= int 1## .. ..- attr(*, "intercept")= int 1## .. ..- attr(*, "response")= int 1## .. ..- attr(*, ".Environment")=<environment: 0x7fd864e4b5c0> ## .. ..- attr(*, "predvars")= language list(mpg, hp)## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"## .. .. ..- attr(*, "names")= chr [1:2] "mpg" "hp"## $ model :'data.frame': 32 obs. of 2 variables:## ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...## ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...## ..- attr(*, "terms")=Classes 'terms', 'formula' language mpg ~ hp## .. .. ..- attr(*, "variables")= language list(mpg, hp)## .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1## .. .. .. ..- attr(*, "dimnames")=List of 2## .. .. .. .. ..$ : chr [1:2] "mpg" "hp"## .. .. .. .. ..$ : chr "hp"## .. .. ..- attr(*, "term.labels")= chr "hp"## .. .. ..- attr(*, "order")= int 1## .. .. ..- attr(*, "intercept")= int 1## .. .. ..- attr(*, "response")= int 1## .. .. ..- attr(*, ".Environment")=<environment: 0x7fd864e4b5c0> ## .. .. ..- attr(*, "predvars")= language list(mpg, hp)## .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"## .. .. .. ..- attr(*, "names")= chr [1:2] "mpg" "hp"## - attr(*, "class")= chr "lm"
Atomic vectors must all be the same type
Lists are also vectors, but not atomic vectors
Each element can be of a different type and length
Incredibly flexible, but often a little more difficult to get the hang of, particularly with subsetting
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |