Plotting in R: ggplot2


Introduction to ggplot2

What is ggplot2?

ggplot2 is meant to be an implementation of the Grammar of Graphics, hence gg-plot. The basic notion is that there is a grammar to the composition of graphical components in statistical graphics, and by direcly controlling that grammar, you can generate a large set of carefully constructed graphics tailored to your particular needs. Each component is added to the plot as a layer.

Components of a ggplot2 plot

Plots convey information through various aspects of their aesthetics. Some aesthetics that plots use are:

The elements in a plot are geometric shapes, like

Some of these geometries have their own particular aesthetics. For instance:

points

lines

bars

text

R code

library(ggplot2)

summary(mpg)
##  manufacturer          model               displ            year     
##  Length:234         Length:234         Min.   :1.600   Min.   :1999  
##  Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
##  Mode  :character   Mode  :character   Median :3.300   Median :2004  
##                                        Mean   :3.472   Mean   :2004  
##                                        3rd Qu.:4.600   3rd Qu.:2008  
##                                        Max.   :7.000   Max.   :2008  
##       cyl           trans               drv                 cty       
##  Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
##  1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
##  Median :6.000   Mode  :character   Mode  :character   Median :17.00  
##  Mean   :5.889                                         Mean   :16.86  
##  3rd Qu.:8.000                                         3rd Qu.:19.00  
##  Max.   :8.000                                         Max.   :35.00  
##       hwy             fl               class          
##  Min.   :12.00   Length:234         Length:234        
##  1st Qu.:18.00   Class :character   Class :character  
##  Median :24.00   Mode  :character   Mode  :character  
##  Mean   :23.44                                        
##  3rd Qu.:27.00                                        
##  Max.   :44.00
g <- ggplot(mpg, aes(displ, hwy))

g + geom_point()

center

g + geom_point() + geom_smooth(method="lm")

center

g + geom_point() + geom_smooth(method="lm") + facet_grid(drv~.)

center

g + geom_point(color="steelblue", size=4, alpha=1/2)

center

g + geom_point(aes(color=drv)) + 
labs(title="Fuel ecomony...", 
     x= "Engine displ", y="highway milage") + theme_bw(base_family = "Times")

center

ggplot(mpg, aes(displ)) + geom_histogram(aes(color=drv)) + theme_bw()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

center

library(reshape2)
library(plyr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
mt <- select(mtcars, c(1,3,4,5,6,7))
mt2 <- cor(mt)
mt3 <- melt(mt2)
ggplot(mt3, aes(x=Var1, y=Var2, fill=value)) + geom_tile()

center

mtcars %>%
  select(c(1,3,4,5,6,7)) %>%
  cor() %>%
  melt() %>%
  ggplot(aes(x=Var1, y=Var2, color=value)) + geom_point()

center

Python-ized version (courtesy of @QuLogic)