Exploring and understanding the individual experience from longitudinal data, or…

How to make better spaghetti (plots)

Nicholas Tierney, Monash University

WOMBAT

Thursday 28th November, 2019

bit.ly/njt-wombat

nj_tierney

1 / 95

bit.ly/njt-wombat • @nj_tierney
A Bit About Me2 / 95

Background: Undergraduate

Undergraduate in Psychology

Statistics
Experiment Design
Cognitive Theory
Neurology
Humans

3 / 95

Background: PhD

"Ah, statistics, everything is black and white!
"There's always an answer"
"data in, answer out"

4 / 95

Background: PhD

Data is really messy
Missing values are frustrating
How to Explore data?

5 / 95

bit.ly/njt-wombat • @nj_tierney
EDA: Why it's worth it6 / 95

EDA: Why it's worth it

-- From "Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing"

6 / 95

(My personal) motivation

A lot of research in new statistical methods, e.g., imputation, inference, prediction

7 / 95

(My personal) motivation

A lot of research in new statistical methods, e.g., imputation, inference, prediction

Not much research on how we explore data

7 / 95

(My personal) motivation

Focus on building a bridge across a river. Less focus on how it is built, and the tools used.

8 / 95

I became very interested in how we explore our data - exploratory data analysis.

My research:

Design and improve tools for (exploratory) data analysis

9 / 95

visdat.njtierney.com

naniar.njtierney.com

10 / 95

Current work:

How to explore longitudinal data effectively

11 / 95

What is longitudinal data?

Something observed sequentially over time

12 / 95

bit.ly/njt-wombat • @nj_tierney
What is longitudinal data?
 
    country 
    year 
    height_cm 
  
    Australia 
    1910 
    173 
  
13 / 95

country	year	height_cm
Australia	1910	173

bit.ly/njt-wombat • @nj_tierney
What is longitudinal data?
 
    country 
    year 
    height_cm 
  
    Australia 
    1910 
    173 
  
    Australia 
    1920 
    173 
  
14 / 95

country	year	height_cm
Australia	1910	173
Australia	1920	173

bit.ly/njt-wombat • @nj_tierney
What is longitudinal data?
 
    country 
    year 
    height_cm 
  
    Australia 
    1910 
    173 
  
    Australia 
    1920 
    173 
  
    Australia 
    1960 
    176 
  
15 / 95

country	year	height_cm
Australia	1910	173
Australia	1920	173
Australia	1960	176

bit.ly/njt-wombat • @nj_tierney
What is longitudinal data?
 
    country 
    year 
    height_cm 
  
    Australia 
    1910 
    173 
  
    Australia 
    1920 
    173 
  
    Australia 
    1960 
    176 
  
    Australia 
    1970 
    178 
  
16 / 95

country	year	height_cm
Australia	1910	173
Australia	1920	173
Australia	1960	176
Australia	1970	178

17 / 95

All of Australia

18 / 95

...And New Zealand

19 / 95

And the rest?

20 / 95

And the rest?

21 / 95

22 / 95

Problems:

Overplotting
We don't see the individuals
We could look at 144 individual plots, but this doesn't help.

23 / 95

bit.ly/njt-wombat • @nj_tierney
Answers: Transparency?24 / 95

Answers: Transparency?

24 / 95

Answers: Transparency + a model?

25 / 95

This helps reduce the overplotting
It's not that this is wrong, it is useful - but we lose the individuals
We only get the overall average. We dont get the rest of the information
How do we even get started?

But we forget about the individuals

26 / 95

The model might make some good overall predictions
But it can be really ill suited for some individual
Exploring this is somewhat clumsy - we need another way to explore

Problem #1: How do I look at some of the data?

27 / 95

Problem #1: How do I look at some of the data?

Problem #2: How do I find interesting observations?

27 / 95

Introducing `brolgar`: brolgar.njtierney.com

browsing
over
longitudinal data
graphically, and
analytically, in
r

28 / 95

It's a crane, it fishes, and it's a native Australian bird

29 / 95

bit.ly/njt-wombat • @nj_tierney
What is longitudinal data?30 / 95

What is longitudinal data?

Something observed sequentially over time

30 / 95

What is longitudinal data?

~~Something~~ Anything that is observed sequentially over time is a time series

31 / 95

What is longitudinal data? Longitudinal data is a time series.

~~Something~~ Anything that is observed sequentially over time is a time series

-- Rob Hyndman and George Athanasopolous, Forecasting: Principles and Practice

32 / 95

Longitudinal data as a time series

heights <- as_tsibble(heights,
                      index = year,
                      key = country,
                      regular = FALSE)

index: Your time variable
key: Variable(s) defining individual groups (or series)

1. + 2. determine distinct rows in a tsibble.

(From Earo Wang's talk: Melt the clock)

33 / 95

Longitudinal data as a time series

Key Concepts:

Record important time series information once, and use it many times in other places

We add information about index + key:
- Index = Year
- Key = Country

34 / 95

bit.ly/njt-wombat • @nj_tierney
## # A tsibble: 1,490 x 3 [!]
## # Key:       country [144]
##   country      year height_cm
##   <chr>       <dbl>     <dbl>
## 1 Afghanistan  1870      168.
## 2 Afghanistan  1880      166.
## 3 Afghanistan  1930      167.
## 4 Afghanistan  1990      167.
## 5 Afghanistan  2000      161.
## 6 Albania      1880      170.
## # … with 1,484 more rows
35 / 95

Remember:

key = variable(s) defining individual groups (or series)

36 / 95

bit.ly/njt-wombat • @nj_tierney
Problem #1: How do I look at some of the data?37 / 95

Problem #1: How do I look at some of the data?

Look at only a sample of the data:

37 / 95

bit.ly/njt-wombat • @nj_tierney
Sample n rows with sample_n()38 / 95

Sample `n` rows with `sample_n()`

heights %>% sample_n(5)

38 / 95

Sample `n` rows with `sample_n()`

heights %>% sample_n(5)

## # A tsibble: 5 x 3 [!]
## # Key:       country [5]
##   country           year height_cm
##   <chr>            <dbl>     <dbl>
## 1 Cambodia          1860      165.
## 2 Bolivia           1890      164.
## 3 Macedonia         1930      169.
## 4 United States     1920      173.
## 5 Papua New Guinea  1880      152.

38 / 95

Sample `n` rows with `sample_n()`

39 / 95

Sample `n` rows with `sample_n()`

## # A tsibble: 5 x 3 [!]
## # Key:       country [5]
##   country           year height_cm
##   <chr>            <dbl>     <dbl>
## 1 Cambodia          1860      165.
## 2 Bolivia           1890      164.
## 3 Macedonia         1930      169.
## 4 United States     1920      173.
## 5 Papua New Guinea  1880      152.

40 / 95

Sample `n` rows with `sample_n()`

## # A tsibble: 5 x 3 [!]
## # Key:       country [5]
##   country           year height_cm
##   <chr>            <dbl>     <dbl>
## 1 Cambodia          1860      165.
## 2 Bolivia           1890      164.
## 3 Macedonia         1930      169.
## 4 United States     1920      173.
## 5 Papua New Guinea  1880      152.

... sampling needs to select not random rows of the data, but the keys - the countries.

40 / 95

`sample_n_keys()` to sample ... keys

sample_n_keys(heights, 5)

## # A tsibble: 32 x 3 [!]
## # Key:       country [5]
##   country     year height_cm
##   <chr>      <dbl>     <dbl>
## 1 Congo, DRC  1810      163.
## 2 Congo, DRC  1870      166.
## 3 Congo, DRC  1880      163.
## 4 Congo, DRC  1890      163.
## 5 Congo, DRC  1910      165.
## 6 Congo, DRC  1920      163.
## # … with 26 more rows

41 / 95

`sample_n_keys()` to sample ... keys

42 / 95

Problem #1: How do I look at some of the data?

~~Look at subsamples~~

43 / 95

Problem #1: How do I look at some of the data?

~~Look at subsamples~~

Sample keys with sample_n_keys()

43 / 95

Problem #1: How do I look at some of the data?

~~Look at subsamples~~

Sample keys with sample_n_keys()

Look at many subsamples

43 / 95

Problem #1: How do I look at some of the data?

~~Look at subsamples~~

Sample keys with sample_n_keys()

Look at many subsamples

43 / 95

Portion out your spaghetti! 🍝 🍝 🍝 🍝

44 / 95

Look at one set of subsamples 🍝

45 / 95

Look at many subsamples 🍝 🍝

46 / 95

bit.ly/njt-wombat • @nj_tierney
How to look at many subsamplesHow many facets to look at? (2, 4, ... 16?)

47 / 95

bit.ly/njt-wombat • @nj_tierney
How to look at many subsamplesHow many facets to look at? (2, 4, ... 16?)

How many keys per facets?144 keys into 16 facets = 9 each

47 / 95

bit.ly/njt-wombat • @nj_tierney
How to look at many subsamplesHow many facets to look at? (2, 4, ... 16?)

How many keys per facets?144 keys into 16 facets = 9 each

Randomly pick 16 groups of size 9.

47 / 95

bit.ly/njt-wombat • @nj_tierney
How to look at many subsamplesHow many facets to look at? (2, 4, ... 16?)

How many keys per facets?144 keys into 16 facets = 9 each

Randomly pick 16 groups of size 9.

This might not look like much extra work, but it hits the
distraction threshold quite quickly.

47 / 95

bit.ly/njt-wombat • @nj_tierney
48 / 95

bit.ly/njt-wombat • @nj_tierney
Distraction threshold (time to rabbit hole)49 / 95

Distraction threshold (time to rabbit hole)

(Something I made up)

49 / 95

Distraction threshold (time to rabbit hole)

(Something I made up)

If solving a problem requires solving 3+ smaller problems

Your focus shifts from the current goal to something else.

You are distracted.

49 / 95

Task one
Task one being overshadowed slightly by minor task 1
Task one being overshadowed slightly by minor task 2
Task one being overshadowed slightly by minor task 3

Distraction threshold (time to rabbit hole)

I want to look at many subsamples of the data

50 / 95

Distraction threshold (time to rabbit hole)

I want to look at many subsamples of the data

How many keys are there?

50 / 95

Distraction threshold (time to rabbit hole)

I want to look at many subsamples of the data

How many keys are there?

How many facets do I want to look at

50 / 95

Distraction threshold (time to rabbit hole)

I want to look at many subsamples of the data

How many keys are there?

How many facets do I want to look at

How many keys per facet should I look at

50 / 95

Distraction threshold (time to rabbit hole)

I want to look at many subsamples of the data

How many keys are there?

How many facets do I want to look at

How many keys per facet should I look at

How do I ensure there are the same number of keys per plot

50 / 95

Distraction threshold (time to rabbit hole)

I want to look at many subsamples of the data

How many keys are there?

How many facets do I want to look at

How many keys per facet should I look at

How do I ensure there are the same number of keys per plot

What is rep, rep.int, and rep_len?

50 / 95

Distraction threshold (time to rabbit hole)

I want to look at many subsamples of the data

How many keys are there?

How many facets do I want to look at

How many keys per facet should I look at

How do I ensure there are the same number of keys per plot

What is rep, rep.int, and rep_len?

Do I want length.out or times?

50 / 95

51 / 95

bit.ly/njt-wombat • @nj_tierney
Avoiding the rabbit hole52 / 95

Avoiding the rabbit hole

We can blame ourselves when we are distracted for not being better.

52 / 95

Avoiding the rabbit hole

We can blame ourselves when we are distracted for not being better.

It's not that we should be better, rather with better tools we could be more efficient.

52 / 95

Avoiding the rabbit hole

We can blame ourselves when we are distracted for not being better.

It's not that we should be better, rather with better tools we could be more efficient.

We need to make things as easy as reasonable, with the least amount of distraction.

52 / 95

bit.ly/njt-wombat • @nj_tierney
Remove distraction by asking relevant questions53 / 95

Remove distraction by asking relevant questions

How many keys per facet?

How many plots do I want to look at?

53 / 95

Remove distraction by asking relevant questions

How many keys per facet?

How many plots do I want to look at?

facet_sample(
    n_per_facet = 3,
    n_facets = 9
  )

53 / 95

54 / 95

`facet_sample()`: See more individuals

ggplot(heights, aes(x = year, 
                    y = height_cm, 
                    group = country)) + 
  geom_line()

55 / 95

`facet_sample()`: See more individuals

ggplot(heights,
       aes(x = year,
             y = height_cm,
             group = country)) + 
  geom_line() + 
  facet_sample()

56 / 95

`facet_sample()`: See more individuals

57 / 95

bit.ly/njt-wombat • @nj_tierney
How to see all individuals?58 / 95

How to see all individuals?

`facet_strata()`

ggplot(heights,
       aes(x = year,
           y = height_cm,
           group = country)) + 
  geom_line() + 
  facet_strata()

58 / 95

`facet_strata()`: See all individuals

59 / 95

60 / 95

In asking these questions we can solve something else interesting

`facet_strata(along = -year)`: see all individuals along some variable

ggplot(heights,
       aes(x = year,
           y = height_cm,
           group = country)) + 
  geom_line() + 
  facet_strata(along = -year)

61 / 95

`facet_strata(along = -year)`: see all individuals along some variable

62 / 95

Focus on answering relevant questions instead of the minutae:

"How many lines per facet"

"How many facets?"

  facet_sample(
    n_per_facet = 10,
    n_facets = 12
    )

63 / 95

Focus on answering relevant questions instead of the minutae:

"How many lines per facet"

"How many facets?"

  facet_sample(
    n_per_facet = 10,
    n_facets = 12
    )

"How many facets to put all the data in?"

"How to arrange plots along?"

  facet_strata(
    n_strata = 10,
    along = -year
    )

63 / 95

`facet_strata()` & `facet_sample()` Under the hood

using sample_n_keys() & stratify_keys()

64 / 95

`facet_strata()` & `facet_sample()` Under the hood

using sample_n_keys() & stratify_keys()

You can still get at data and do manipulations

64 / 95

bit.ly/njt-wombat • @nj_tierney
Problem #1: How do I look at some of the data?65 / 95

Problem #1: How do I look at some of the data?

as_tsibble()

sample_n_keys()

facet_sample()

facet_strata()

65 / 95

Problem #1: How do I look at some of the data?

as_tsibble()

sample_n_keys()

facet_sample()

facet_strata()

Store useful information

View subsamples of data

View many subsamples

View all subsamples

65 / 95

Problem #1: How do I look at some of the data?

as_tsibble()

sample_n_keys()

facet_sample()

facet_strata()

Store useful information

View subsamples of data

View many subsamples

View all subsamples

66 / 95

Problem #2: How do I find interesting observations?

67 / 95

A workflow

68 / 95

A workflow

Define what is interesting

68 / 95

A workflow

Define what is interesting

maximum height

68 / 95

Identify features: one observation per key

69 / 95

Identify features: one observation per key

70 / 95

Identify features: one observation per key

71 / 95

Identify important features and decide how to filter

72 / 95

Identify important features and decide how to filter

73 / 95

Join this feature back to the data

74 / 95

Join this feature back to the data

75 / 95

🎉 Countries with smallest and largest max height

76 / 95

Let's see that one more time, but with the data

77 / 95

Identify features: one observation per key

## # A tsibble: 1,490 x 3 [!]
## # Key:       country [144]
##    country      year height_cm
##    <chr>       <dbl>     <dbl>
##  1 Afghanistan  1870      168.
##  2 Afghanistan  1880      166.
##  3 Afghanistan  1930      167.
##  4 Afghanistan  1990      167.
##  5 Afghanistan  2000      161.
##  6 Albania      1880      170.
##  7 Albania      1890      170.
##  8 Albania      1900      169.
##  9 Albania      2000      168.
## 10 Algeria      1910      169.
## # … with 1,480 more rows

78 / 95

Identify features: one observation per key

## # A tibble: 144 x 6
##    country       min   q25   med   q75   max
##    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Afghanistan  161.  164.  167.  168.  168.
##  2 Albania      168.  168.  170.  170.  170.
##  3 Algeria      166.  168.  169   170.  171.
##  4 Angola       159.  160.  167.  168.  169.
##  5 Argentina    167.  168.  168.  170.  174.
##  6 Armenia      164.  166.  169.  172.  172.
##  7 Australia    170   171.  172.  173.  178.
##  8 Austria      162.  164.  167.  169.  179.
##  9 Azerbaijan   170.  171.  172.  172.  172.
## 10 Bahrain      161.  161.  164.  164.  164 
## # … with 134 more rows

79 / 95

Identify important features and decide how to filter

heights_five %>% 
  filter(max == max(max) | max == min(max))

## # A tibble: 2 x 6
##   country            min   q25   med   q75   max
##   <chr>            <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Denmark           165.  168.  170.  178.  183.
## 2 Papua New Guinea  152.  152.  156.  160.  161.

80 / 95

Join summaries back to data

heights_five %>% 
  filter(max == max(max) | max == min(max)) %>% 
  left_join(heights, by = "country")

## # A tibble: 21 x 8
##    country   min   q25   med   q75   max  year height_cm
##    <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
##  1 Denmark  165.  168.  170.  178.  183.  1820      167.
##  2 Denmark  165.  168.  170.  178.  183.  1830      165.
##  3 Denmark  165.  168.  170.  178.  183.  1850      167.
##  4 Denmark  165.  168.  170.  178.  183.  1860      168.
##  5 Denmark  165.  168.  170.  178.  183.  1870      168.
##  6 Denmark  165.  168.  170.  178.  183.  1880      170.
##  7 Denmark  165.  168.  170.  178.  183.  1890      169.
##  8 Denmark  165.  168.  170.  178.  183.  1900      170.
##  9 Denmark  165.  168.  170.  178.  183.  1910      170 
## 10 Denmark  165.  168.  170.  178.  183.  1920      174.
## # … with 11 more rows

81 / 95

82 / 95

Identify features: one per key

heights %>%
  features(height_cm,
           feat_five_num)

## # A tibble: 144 x 6
##   country       min   q25   med   q75   max
##   <chr>       <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Afghanistan  161.  164.  167.  168.  168.
## 2 Albania      168.  168.  170.  170.  170.
## 3 Algeria      166.  168.  169   170.  171.
## 4 Angola       159.  160.  167.  168.  169.
## 5 Argentina    167.  168.  168.  170.  174.
## 6 Armenia      164.  166.  169.  172.  172.
## # … with 138 more rows

83 / 95

What is the range of the data? `feat_ranges`

heights %>%
  features(height_cm, feat_ranges)

## # A tibble: 144 x 5
##    country       min   max range_diff   iqr
##    <chr>       <dbl> <dbl>      <dbl> <dbl>
##  1 Afghanistan  161.  168.       7     3.27
##  2 Albania      168.  170.       2.20  1.53
##  3 Algeria      166.  171.       5.06  2.15
##  4 Angola       159.  169.      10.5   7.87
##  5 Argentina    167.  174.       7     2.21
##  6 Armenia      164.  172.       8.82  5.30
##  7 Australia    170   178.       8.4   2.58
##  8 Austria      162.  179.      17.2   5.35
##  9 Azerbaijan   170.  172.       1.97  1.12
## 10 Bahrain      161.  164        3.3   2.75
## # … with 134 more rows

84 / 95

Does it only increase or decrease? `feat_monotonic`

heights %>%
  features(height_cm, feat_monotonic)

## # A tibble: 144 x 5
##    country     increase decrease unvary monotonic
##    <chr>       <lgl>    <lgl>    <lgl>  <lgl>    
##  1 Afghanistan FALSE    FALSE    FALSE  FALSE    
##  2 Albania     FALSE    TRUE     FALSE  TRUE     
##  3 Algeria     FALSE    FALSE    FALSE  FALSE    
##  4 Angola      FALSE    FALSE    FALSE  FALSE    
##  5 Argentina   FALSE    FALSE    FALSE  FALSE    
##  6 Armenia     FALSE    FALSE    FALSE  FALSE    
##  7 Australia   FALSE    FALSE    FALSE  FALSE    
##  8 Austria     FALSE    FALSE    FALSE  FALSE    
##  9 Azerbaijan  FALSE    FALSE    FALSE  FALSE    
## 10 Bahrain     TRUE     FALSE    FALSE  TRUE     
## # … with 134 more rows

85 / 95

What is the spread of my data? `feat_spread`

heights %>%
  features(height_cm, feat_spread)

## # A tibble: 144 x 5
##    country        var    sd   mad   iqr
##    <chr>        <dbl> <dbl> <dbl> <dbl>
##  1 Afghanistan  7.20  2.68  1.65   3.27
##  2 Albania      0.950 0.975 0.667  1.53
##  3 Algeria      3.30  1.82  0.741  2.15
##  4 Angola      16.9   4.12  3.11   7.87
##  5 Argentina    2.89  1.70  1.36   2.21
##  6 Armenia     10.6   3.26  3.60   5.30
##  7 Australia    7.63  2.76  1.66   2.58
##  8 Austria     26.6   5.16  3.93   5.35
##  9 Azerbaijan   0.516 0.718 0.621  1.12
## 10 Bahrain      3.42  1.85  0.297  2.75
## # … with 134 more rows

86 / 95

features: MANY more features in `feasts`

Such as:

feat_acf: autocorrelation-based features
feat_stl: STL (Seasonal, Trend, and Remainder by LOESS) decomposition
Create your own features

87 / 95

bit.ly/njt-wombat • @nj_tierney
Take homesProblem #1: How do I look at some of the data?Longitudinal data is a time series
Specify structure once, get a free lunch.
Look at as much of the raw data as possible 
Use facet_sample() / facet_strata()

88 / 95

bit.ly/njt-wombat • @nj_tierney
Take homesProblem #2: How do I find interesting observations?Decide what features are interesting
Summarise down to one observation
Decide how to filter
Join this feature back to the data

89 / 95

bit.ly/njt-wombat • @nj_tierney
Future DirectionsMore features (summaries)
Generalise beyond time series
Explore stratification process

90 / 95

bit.ly/njt-wombat • @nj_tierney
ThanksDi Cook
Tania Prvan
Stuart Lee
Mitchell O'Hara Wild

Earo Wang
Rob Hyndman
Miles McBain
Hadley Wickham
Monash University

91 / 95

Resources

92 / 95

Colophon

Slides made using xaringan
Extended with xaringanthemer
Colours taken + modified from lorikeet theme from ochRe
Header font is Josefin Sans
Body text font is Montserrat
Code font is Fira Mono

93 / 95

Learning more

brolgar.njtierney.com

bit.ly/njt-wombat

nj_tierney

njtierney

nicholas.tierney@gmail.com

94 / 95

End.

95 / 95

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Exploring and understanding the individual experience from longitudinal data, or…

How to make better spaghetti (plots)

Nicholas Tierney, Monash University

WOMBAT Thursday 28th November, 2019 bit.ly/njt-wombat nj_tierney

A Bit About Me

Background: Undergraduate

Background: PhD

Background: PhD

EDA: Why it's worth it

EDA: Why it's worth it

(My personal) motivation

(My personal) motivation

(My personal) motivation

What is longitudinal data?

What is longitudinal data?

What is longitudinal data?

What is longitudinal data?

What is longitudinal data?

All of Australia

...And New Zealand

And the rest?

And the rest?

Problems:

Answers: Transparency?

Answers: Transparency?

Answers: Transparency + a model?

But we forget about the individuals

Introducing brolgar: brolgar.njtierney.com

What is longitudinal data?

What is longitudinal data?

What is longitudinal data?

What is longitudinal data? Longitudinal data is a time series.

Longitudinal data as a time series

Longitudinal data as a time series

Key Concepts:

Problem #1: How do I look at some of the data?

Problem #1: How do I look at some of the data?

Sample n rows with sample_n()

Sample n rows with sample_n()

Sample n rows with sample_n()

Sample n rows with sample_n()

Sample n rows with sample_n()

Sample n rows with sample_n()

sample_n_keys() to sample ... keys

sample_n_keys() to sample ... keys

Problem #1: How do I look at some of the data?

Problem #1: How do I look at some of the data?

Problem #1: How do I look at some of the data?

Problem #1: How do I look at some of the data?

Portion out your spaghetti! 🍝 🍝 🍝 🍝

Look at one set of subsamples 🍝

Look at many subsamples 🍝 🍝

How to look at many subsamples

How to look at many subsamples

How to look at many subsamples

How to look at many subsamples

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Distraction threshold (time to rabbit hole)

Avoiding the rabbit hole

Avoiding the rabbit hole

Avoiding the rabbit hole

Avoiding the rabbit hole

Remove distraction by asking relevant questions

Remove distraction by asking relevant questions

Remove distraction by asking relevant questions

facet_sample(): See more individuals

facet_sample(): See more individuals

facet_sample(): See more individuals

How to see all individuals?

How to see all individuals?

facet_strata()

facet_strata(): See all individuals

WOMBAT

Thursday 28th November, 2019

bit.ly/njt-wombat

nj_tierney

Introducing `brolgar`: brolgar.njtierney.com

Sample `n` rows with `sample_n()`

Sample `n` rows with `sample_n()`

Sample `n` rows with `sample_n()`

Sample `n` rows with `sample_n()`

Sample `n` rows with `sample_n()`

Sample `n` rows with `sample_n()`

`sample_n_keys()` to sample ... keys

`sample_n_keys()` to sample ... keys

`facet_sample()`: See more individuals

`facet_sample()`: See more individuals

`facet_sample()`: See more individuals

`facet_strata()`

`facet_strata()`: See all individuals

`facet_strata(along = -year)`: see all individuals along some variable

`facet_strata(along = -year)`: see all individuals along some variable

`facet_strata()` & `facet_sample()` Under the hood

`facet_strata()` & `facet_sample()` Under the hood

What is the range of the data? `feat_ranges`

Does it only increase or decrease? `feat_monotonic`

What is the spread of my data? `feat_spread`

features: MANY more features in `feasts`