Data Visualization and Design

class: center, middle, inverse, title-slide

# Data Visualization and Design
## Using ggplot2 
### Omni Analytics Group

---

class: center, middle

# PERCEPTION

---

# Why are some plots easier to read than others?

What makes bad figures bad?

- issues can be (1) aesthetic, (2) substantive, and/or (3) perceptual

---

# Why are some plots easier to read than others?

What makes bad figures bad?

- issues can be (1) **aesthetic**, (2) substantive, and/or (3) perceptual

- Edward R. Tufte is a better known critic of this style of visualization

- Graphical excellence is the well-designed presentation of interesting data and consists of:

- complex ideas communicated with clarity, precision, and efficiency

- maximizes the “data-to-ink” ratio.

- nearly always multivariate

- requires telling the truth about the data.

- defines "chartjunk" as superfluous details

???

While chartjunk is not entirely devoid of merit, bear in mind that ease of recall is only one virtue amongst many for graphics.

---

# Why are some plots easier to read than others?

What makes bad figures bad?

- issues can be (1) aesthetic, (2) **substantive**, and/or (3) perceptual

- bad data
---

# Why are some plots easier to read than others?

What makes bad figures bad?

- issues can be (1) aesthetic, (2) substantive, and/or (3) **perceptual **

- Looking at pictures of data means looking at lines, shapes, and colors

- Our visual system works in a way that makes some things easier for us to see than others

- “Preattentive” features

- Gestalt Principles

- color and contrast

???

---
# Good Graphics

Graphics consist of:

- **Structure**: boxplot, scatterplot, etc.

- **Aesthetics**: features such as color, shape, and size that map other characteristics to structural features

Both the structure and aesthetics should help viewers interpret the information.

---
class: center, middle

# Gestalt Principles
### What sorts of relationships are inferred, and under what circumstances? 
---

# Which has more structure?

![](perception_files/figure-html/rand-1.png)
???

We look for structure all the time. We are so good at it that we will find it in random data, given time. (This is one of the reasons that data visualization can hardly be a replacement for statistical modeling.) The strong inferences we make about relationships between visual elements from relatively sparse visual information are called “gestalt rules”. They are not pure perceptual effects, like the checkerboard illusions. Rather, they describe our tendency to infer relationships between the objects we are looking at in a way that goes beyond what is strictly visible.

---
# Gestalt principles

![](images/gestalt3.jpg)

???

---

# Gestalt principles

What sorts of relationships are inferred, and under what circumstances?

- **Proximity**: Things that are spatially near to one another are related.

- **Similarity**: Things that look alike are related.

- **Enlosure**: A group of related elements are surrounded with a visual element

- **Symmetry**: If an object is asymmetrical, the viewer will waste time trying to find the problem instead of concentrating on the instruction.

- **Closure**: Incomplete shapes are perceived as complete.

- **Continuity**: Partially hidden objects are completed into familiar shapes.

- **Connection**: Things that are visually tied to one another are related.

- **Figure/Ground**: Visual elements are either in the foreground or the background.

---
class: center, middle

# Pre-Attentive Features

---

# Pre-Attentive Features

---

# Pre-Attentive Features

---
# Pre-Attentive Features

Pre-Attentive Features are things that "jump out" in less than 250 ms

- Color, form, movement, spatial localization

There is a hierarchy of features

- Color is stronger than shape

- Combinations of pre-attentive features are usually not pre-attentive due to **interference**

---

## Pre-attentive Features

---

## Pre-attentive Features

---

## Color

- **Hue**: shade of color (red, orange, yellow...)

- **Intensity**: amount of color

- Both color and hue are pre-attentive. Bigger contrast corresponds to faster detection.

- Use color to your advantage

- When choosing color schemes, we will want mappings from data to color that are not just numerically but also ***perceptually*** uniform

- Distinguish bewteen sequential scales and categorical scales

---

## Color

Color is context-sensitive: A and B are the same intensity and hue, but appear to be different.

![Edward Adelson’s checkershadow illusion](images/shadow-illusion3.jpg)

---

## Ordering Variables

Which is bigger?

- Position: higher is bigger (y), items to the right are bigger (x)
- Size, Area
- Color: not always ordered. More contrast = bigger.
- Shape: Unordered.

![](perception_files/figure-html/unnamed-chunk-6-1.png)

---
class: center, middle

# Aesthetics in `ggplot2`
## scales

---

## Aesthetics in `ggplot2`

**Aesthetics**: features such as color, shape, and size that map other characteristics to structural features

**Scales** map data values to the visual values of an aesthetic

- to change a mapping, add a new scale

![](images/scales1.png)
---
## Scales

.pull-left[
![](images/scales2.png)
]

.pull-right[
![](images/scales3.png)
]

---
## Gradients

Qualitative schemes: no more than 7 colors

![](perception_files/figure-html/unnamed-chunk-7-1.png)

Can use `colorRampPalette()` from the RColorBrewer package to produce larger palettes by interpolating existing ones

![](perception_files/figure-html/unnamed-chunk-8-1.png)

Quantitative schemes: use color gradient with only one hue for positive values

![](perception_files/figure-html/unnamed-chunk-9-1.png)

---

## More Gradients

Quantitative schemes: use color gradient with two hues for positive and negative values. Gradient should go through a light, neutral color (white)

![](perception_files/figure-html/unnamed-chunk-10-1.png)

Small objects or thin lines need more contrast than larger areas

---
## Factors vs. Continuous variables

- Factor variable:
    - `scale_colour_discrete`
    - `scale_colour_brewer(palette = ...)`
- Continuous variable:
    - `scale_colour_gradient` (define low, high values)
    - `scale_colour_gradient2` (define low, mid, and high values)
    - Equivalents for fill: `scale_fill_...`

---
## Color in ggplot2

There are packages available for use that have color scheme options.

Some Examples:

- Rcolorbrewer
  - ggsci
  - viridis
  - wes anderson

---
## Color in ggplot2

- There are packages available for use that have color schemes options.

![](perception_files/figure-html/unnamed-chunk-12-1.png)

---

## Your Turn

- In the diamonds data, clarity and cut are ordinal, while price and carat are continuous
- Find a graphic that gives an overview of these four variables while respecting their types

---

## Answers

```r
ggplot(diamonds, aes(x=carat, y=price, colour=clarity))+geom_point()+facet_wrap(~factor(cut))
```

![](perception_files/figure-html/unnamed-chunk-13-1.png)