Categories

Exploring Art Data 5

Let’s look at some institutional data. We can scrape the Tate Galleries attendance figures from here and make a csv file of them. The first few lines of attendance.csv look like this:

```"Year","Tate Britain","Tate Modern","Tate Liverpool","Tate St Ives","BHM","Total"
2009,1595000,4788000,523000,219000,N/A,7125000
2008,1587655,4647881,1035958,203700,N/A,7475194
2007,1533217,5236702,694228,243993,N/A,7708140
2006,1597359,4895073,556976,193700,46220,7289328
2005,1729692,3902017,666258,180771,43502,6522240```

Now we can load the data into R and start working with the data:

```## Read the csv file, N/A values and all, allowing spaces in column names
## Give the BHM a more descriptive name
names(attendance)[names(attendance) %in% c("BHM")]<-"Barbara Hepworth Museum"
## Get the years
years<-attendance[,1]
## Get the individual site counts (last column is total)
sites<-attendance[,2:(length(attendance) -1)]```

We can draw a multiple line graph of the attendance figures:

```## Create lists of line properties so we can use them in the graph and legend
line.types<-c("solid", "dashed", "dotted", "dotdash", "longdash", "twodash")
line.colours<-c("cyan", "blue", "purple", "red", "orange", "green")
## Suppress the y axis so we can draw one that doesn't use scientific notation
matplot(years, sites, type = "l", yaxt="n",
xlab="Year",ylab="Attendance",
col=line.colours, lty=line.types)
## Draw the y axis using full numbers rather than scientific notation
axis(2, axTicks(2), format(axTicks(2), scientific = F))
## Add a key to the lines
legend("topleft", names(sites), col=line.colours, lty=line.types)
## Title the graph
title(main="Tate Galleries Attendance 1980-2010")```

And we can use an area chart to show the combined attendance. It’s not the best way of examining information, but in this case it shows how the attendance figures stack up, literally:

```## Import the ggplot2 library so we can use ggplot
library("ggplot2")
## To get an area plot, we need to flatten the data to year/museum/attendance
attendance.expanded<-data.frame(Year=rep(years, ncol(sites)),
Museum=rep(names(sites), each=length(years)),
Attendance=unlist(sapply(names(sites),
function(col) {sites[col]}, simplify=TRUE)))
## We use the levels of the Museum factor to order the areas and legend labels
## We do this by clculating the range of attendance at each museum and ordering
## the factor names based on that
attendance.expanded\$Museum<-
factor(attendance.expanded\$Museum,
levels=names(sites)[order(sapply(names(sites),
function(x){max(sites[x], na.rm=TRUE) -
min(sites[x], na.rm=TRUE)}))])
## A utility function to format numbers in English non-scientific format
nonscientific<-function(x, ...)
format(x, big.mark = ',', scientific = FALSE, ...)
## Plot the areas
ggplot(attendance.expanded, aes(x=Year, y=Attendance)) +
geom_area(aes(legend.title="Site", fill=Museum)) +
## Label the y axis in millions rather than scientific notation
scale_y_continuous(formatter=nonscientific) +
## Specifying the breaks orders the legend properly
scale_fill_brewer(palette=2, breaks=rev(levels(attendance.expanded\$Museum))) +
## Set a nice title
opts(title="Tate Galleries Attendance 1980-2010")
```