Art Open Data links

Art Open Data Links

Jonathan Gray’s slides on “Open Data in the Arts and Humanities”:

Ben Werdmuller von Elgg’s blog post “Open data in the arts: an introduction”:

Culture Grid Hack day (now delayed until early next year):

Art Computing Art Open Data Art History 2

More art catalogues and price lists available for download on

Reference works mentioned by the V&A

Le trésor de la curiosité, tiré des catalogues de ventes de tableaux,
etc … avec diverses notes & notices historiques &

Les ventes de tableaux, dessins estampes et objets d’art aux XVIIe et XVIIIe siècles (1611-1800). Essai de bibliographie.

Dictionnaire des ventes d’art faites en France et a l’étranger pendant les XVIIIe et XIXe siècles.

Treasures of Art in Great Britain: being an account of the chief collections of paintings, sculptures, illuminated mss. …

Galleries and cabinets of art in Great Britain: being an account of more than forty collections of paintings, drawings, sculptures …

Catalogues by Algernon Graves

The Royal Academy of Arts; a complete dictionary of contributors and their work from its foundation in 1769 to 1904

The Society of artists of Great Britain, 1760-1791; the Free society of artists, 1761-1783 ; a complete dictionary of contributors and their work from the foundation of the societies to 1791

A dictionary of artists who have exhibited works in the principal London

Other Catalogues, some from Wikipedia’s article on the history of art auction sales

Painters and Their Works: A Dictionary of Great Artists who are Not Now Alive

Memorials of Christie’s; a record of art sales from 1766 to 1896

Provisional Catalogue of the Oil Paintings and Water Colours in the Wallace Collection

Memoirs Of Painting

The Year’s Art

The Connoisseur

Art History Art Open Data

Exploring Art Data 5

Let’s look at some institutional data. We can scrape the Tate Galleries attendance figures from here and make a csv file of them. The first few lines of attendance.csv look like this:

"Year","Tate Britain","Tate Modern","Tate Liverpool","Tate St Ives","BHM","Total"

Now we can load the data into R and start working with the data:

## Read the csv file, N/A values and all, allowing spaces in column names attendance<-read.csv("attendance.csv", na.strings="N/A", check.names=FALSE) ## Give the BHM a more descriptive name names(attendance)[names(attendance) %in% c("BHM")]<-"Barbara Hepworth Museum" ## Get the years years<-attendance[,1] ## Get the individual site counts (last column is total) sites<-attendance[,2:(length(attendance) -1)]

We can draw a multiple line graph of the attendance figures:

## Create lists of line properties so we can use them in the graph and legend
line.types<-c("solid", "dashed", "dotted", "dotdash", "longdash", "twodash")
line.colours<-c("cyan", "blue", "purple", "red", "orange", "green")
## Suppress the y axis so we can draw one that doesn't use scientific notation
matplot(years, sites, type = "l", yaxt="n",
col=line.colours, lty=line.types)
## Draw the y axis using full numbers rather than scientific notation
axis(2, axTicks(2), format(axTicks(2), scientific = F))
## Add a key to the lines
legend("topleft", names(sites), col=line.colours, lty=line.types)
## Title the graph
title(main="Tate Galleries Attendance 1980-2010")

attendance1.pngAnd we can use an area chart to show the combined attendance. It’s not the best way of examining information, but in this case it shows how the attendance figures stack up, literally:

## Import the ggplot2 library so we can use ggplot library("ggplot2") ## To get an area plot, we need to flatten the data to year/museum/attendance attendance.expanded<-data.frame(Year=rep(years, ncol(sites)), Museum=rep(names(sites), each=length(years)), Attendance=unlist(sapply(names(sites), function(col) {sites[col]}, simplify=TRUE))) ## We use the levels of the Museum factor to order the areas and legend labels ## We do this by clculating the range of attendance at each museum and ordering ## the factor names based on that attendance.expanded$Museum<- factor(attendance.expanded$Museum, levels=names(sites)[order(sapply(names(sites), function(x){max(sites[x], na.rm=TRUE) - min(sites[x], na.rm=TRUE)}))]) ## A utility function to format numbers in English non-scientific format nonscientific<-function(x, ...) format(x, big.mark = ',', scientific = FALSE, ...) ## Plot the areas ggplot(attendance.expanded, aes(x=Year, y=Attendance)) + geom_area(aes(legend.title="Site", fill=Museum)) + ## Label the y axis in millions rather than scientific notation scale_y_continuous(formatter=nonscientific) + ## Specifying the breaks orders the legend properly scale_fill_brewer(palette=2, breaks=rev(levels(attendance.expanded$Museum))) + ## Set a nice title opts(title="Tate Galleries Attendance 1980-2010")


Art Open Data

Art Magazines, Journals and Catalogues at

Scans of old (19th and early 20th century) art magazines, journals, and catalogues can be found on along with text extracted from them. These are a very useful resource for study of the history of art.

Google Books is better for searching for them, but is better for downloading them.

Be wary of later editions as these may only be out of copyright in the US.

The Yellow Book

The Magazine Of Art

The Illustrated Magazine Of Art

The Burlington Magazine

ArtNews Annual

Art In America

Studio International

Special Numbers 1897-8

Royal Academy Illustrated and Catalogues

The Print Connoisseur

Art Prices Current

Various Exhibition Catalogues

The Armory Show Catalogue

If anyone can suggest other items in the archive, names to search for, more avant-garde publications, or other kinds of periodicals that might have information relevant to art (particularly show listings, sale information) let me know in the comments!

Art Open Data Free Culture

Art Freedom Of Information Requests

WhatDoTheyKnow is an excellent website that allows you to make,check on and search Freedom of Information (FoI) requests in the UK.

Some of those FoI requests concern art.

Art organizations:

The National Gallery:

The NPG:

And of course The Tate:

It’s interesting to see not just the answers but what kinds of things peopel are asking which organizations about (and whether they’re answering).

Art Computing Art History Art Open Data Free Software Howto

Exploring Art Data 4

Let’s draw some more graphs.

Here’s the matrix of form and genre rendered graphically:

## Load the tab separated values for the table of artworks
# Get rows with both genre and form
## This loses most of the data :-/
art<-artwork[artwork$art_genre != "" & artwork$art_form != "",
c("art_genre", "art_form")]
## Drop unused factors
## Get table
art.table<-table(art) ##as.table(ftable(art))
## Strip rows and columns where max < tolerance
art.table.cropped<-art.table[rowSums(art.table) >= tolerance,
colSums(art.table) >=tolerance]
## Print levelplot
## Levelplot is in the "lattice" library
## Rotate x labels, and set colour scale to white/blue to improve readablity
levelplot(art.table.cropped, xlab="Genre", ylab="Form",
col.regions=colorRampPalette(c("white", "blue")))

levelplot.pngThe highest frequencies leap out of the graph. We should do a version without painting to look for subtleties in the rest of the data.

And here’s some of the basic frequencies from the data:

## Load the tab separated values for the table of artworks artwork<-read.delim("./visual_art/artwork.tsv") ## Function to plot a summary of the most frequent values topValuePlot<-function(values, numValues){ ## Get a count of the number of times each value name appears in the list values.summary<-summary(values) ## Draw a graph, allowing enough room for the rotated labels par(mar=c(10,4,1,1)) barplot(values.summary[1:numValues], las=2) } ## Artists topValuePlot(artwork$artist[artwork$artist != ""], 20) ## Subject topValuePlot(artwork$art_subject[artwork$art_subject != ""], 20)


The dataset is clearly dominated by Western art.
Art Computing Art Open Data

Exploring Art Data 3

Let’s look at how much the “Grants For The Arts” programme of Arts Council England (ACE) gives to each region.

First of all we’ll need the data. That’s available from under the new CC-BY compatible Crown Copyright here. It’s in XLS format, which R doesn’t load on GNU/Linux, but we can convert that to comma-separated values using Calc.

Next we’ll need a map to plot the data on. Ideally we’d use a Shapefile of the English regions, which R would be able to load and render easily, but there isn’t a freely available one. There’s a public domain SVG map of the English regions here, but R doesn’t load SVG either. We can convert the SVG to a table of co-ordinates that we can plot from R using a Python script:

from BeautifulSoup import BeautifulStoneSoup
import re
# We know that the file consists of a single top-level g
# containing a flat list of path elements.
# Each path consists of subpaths only using M/L/z
# So use this knowledge to extract the polylines
# Convert svg class names to gfta region names
names = {"east-midlands":"East Midlands", "east-england":"East of England",
"london":"London", "north-east":"North East",
"north-west":"North West", "south-east":"South East",
"south-west":"South West", "west-midlands":"West Midlands",
"yorkshire-and-humber":"Yorkshire and The Humber"}
svg = open("map/England_Regions_-_Blank.svg")
soup = BeautifulStoneSoup(svg)
# Get the canvas size, to use for flipping the y co-ordinate
height = float(soup.svg["height"])
# Get the containing g
g = soup.find("g")
# Get the translate in the transform
transform = re.match(r"translate\((.+), (.+)\)", g["transform"])
transform_x = float(
transform_y = float(
# Get the paths in the g
paths = g.findAll("path")
for path in paths:
# Get the id and convert to region name
region_name = names[path["id"]]
# Get the path data to process
path_d = path["d"]
# Split around M commands to get subpaths
path_d_subpaths = path_d.split("M")
# Keep a count of the subpaths within the id so we can identify them
subpath_count = 0
for subpath in path_d_subpaths:
# The split will result in a leading empty string
if subpath == "":
subpath_count = subpath_count + 1
# Split around the L commands to get a list of points
# The first M point already has its command letter removed
points = subpath.split("L")
for point in points:
# Remove trailing z if present
cleaned_point = point.split()[0]
# Split out the point components and translate them
(x, y) = cleaned_point.split(",")
transformed_x = float(x) + transform_x
flipped_y = height + (height - float(y))
transformed_y = flipped_y + transform_y
# Write a line in the csv
print "%s,%s,%s,%s" % (region_name, subpath_count, transformed_x,

Now we can load the grants data and the map into R, calculate the total value of grants for each region, and colour each region of the map accordingly.

Here’s the R code:

## The data used to plot a map of the English regions
colClasses=c("factor", "integer", "numeric", "numeric"))
## Plot the English regions in the given colours
## See levels(england$region) for the region names
## colours is a list of region="#FF00FF" colours for regions
## range.min and range.max are for the key values
## main.title is the main label for the plot
## key.title is the title for the key
plotEnglandRegions<-function(colours, range.min, range.max, main.title,
## Reasonable values for the window size
plot.window(c(0, 600),
c(0, 600))
## For each regionname
if (region %in% levels(england$region)){
## For each subpath of each region
lapply(1:max(england$subpath[england$region == region]),
## Get the points of that subpath
subpath.points<-england[england$region == region &
england$subpath == subpath,]
## And colour it the region's colour
polygon(subpath.points$x, subpath.points$y,
## Colour Scale
## Turn off scientific notation (for less than 10 digits)
## Sort the colours so they match the values
## The by is set to fit the number of colours and the value range
legend("topright", legend=seq(from=range.min, to=range.max,
by=((range.max - range.min) / (length(colours) - 1))),
## Load the region award data
colClasses=c("integer", "character", "character", "character",
"character", "factor", "factor", "factor",
"factor", "factor"))
## region$Award.amount contains commas
region$Award.amount<-gsub(",", "", region$Award.amount)
## And we want it as a number
## Get the totals by region
region.totals<-tapply(region$Award.amount, list(region$Region), sum)
## But we don't want the "Other" region
region.totals<-region.totals[names(region.totals) != "Other"]
## Calculate the range of colours
## The minimum value, to the nearest lowest million
## The highest vvalue, to the nearest highest million
## The darkest colour (in a range of 0.0 to 1.0)
## How to get the range of colours between that and 1.0
colour.multiplier<-(1.0 - colour.base) / (value.max - value.min)
## Make the colour levels
colour.base + (i - value.min) * colour.multiplier})
colours<-rgb(levels, 0, 0)
## Add the region names to the colours
## Plot each region in the given colour
plotEnglandRegions(colours, value.min, value.max, "Grants For The Arts 2009/10",
"Total awards in £")

And here’s the resulting map:

gtfa.pngWho can point out the methodological flaw in this visualisation? 😉
Aesthetics Art Computing Art History Art Open Data Free Software Howto

Exploring Art History Data 2

Let’s see how art form and genre relate in the Freebase “Visual Art” dataset of artworks.

# read the artwork data
# Get rows with both genre and form
# This loses most of the data :-/
art<-artwork[artwork$art_genre != "" & artwork$art_form != "", c("art_genre", "art_form")]
# Drop unused factors
# Get table
art.table<-table(art) ##as.table(ftable(art))
# Strip rows and columns where max < tolerance
art.table.cropped<-art.table[rowSums(art.table) >= tolerance,colSums(art.table) >=tolerance]
# Print wide table (make sure you resize your terminal window)

art_genre                          Drawing Fresco Installation art Metalworking Painting Photography Relief Sculpture Tapestry
Abstract art                           2      0                6            0       36           0      0         5        0
Allegory                               0      0                0            0        7           0      0         0        0
Animal Painting                        0      0                0            0       14           0      0         0        0
Christian art                          0      0                0            0        1           0      0         1        0
Christian art,History painting         0      0                0            0        2           0      0         0        0
Decorative art                         0      0                0            6        0           0      3         0        4
Fantastic art                          0      0                0            0        4           0      0         0        0
Genre painting                         0      0                0            0      120           0      0         0        0
Genre painting,Landscape art           0      0                0            0        4           0      0         0        0
History painting                       0     10                0            0      207           0      0         0        0
History painting,Landscape art         0      0                0            0        3           0      0         0        0
History painting,Religious image       0      0                0            0        3           0      0         0        0
Landscape art                          0      0                0            0      169           1      0         0        0
Landscape art,Genre painting           0      0                0            0        7           0      0         0        0
Landscape art,Marine art               0      0                0            0        3           0      0         0        0
Marine art                             0      0                0            0       34           1      0         0        0
Marine art,History painting            0      0                0            0        4           0      0         0        0
Marine art,Landscape art               0      0                0            0        3           0      0         0        0
Monument                               0      0                0            0        0           0      0         8        0
Portrait                               2      1                0            0      230           5      0         0        0
Religious image                        0      0                0            0        4           0      0         0        0
Religious image,History painting       0      0                0            0        4           0      0         0        0
Still life                             0      0                0            0       35           0      0         0        0

This time painting rather than photography has suspiciously more entries than any other medium, as more paintings than any other medium have genre information in the dataset.

Art Art Computing Art History Art Open Data Free Software Howto Projects

Exploring Art History Data 1

Freebase have a section of visual art data: here.

You can download an archive of the data: here.

Expanding the archive gives you the data as tab-separated files:

$ ls visual_art
art_acquisition_method.tsv artwork.tsv
art_owner.tsv color.tsv
art_period_movement.tsv visual_art_form.tsv
art_series.tsv visual_art_genre.tsv
art_subject.tsv visual_artist.tsv
artwork_location_relationship.tsv visual_art_medium.tsv

Loading up R, we can parse the files and check some of the features of the data:

$ R --quiet
> artwork<-read.delim("./visual_art/artwork.tsv")

> artwork<-read.delim("./visual_art/artwork.tsv") > names(artwork) [1] "name" "id" "artist" [4] "date_begun" "date_completed" "art_form" [7] "media" "period_or_movement" "art_genre" [10] "dimensions_meters" "art_subject" "edition_of" [13] "editions" "locations" "owners" [16] "belongs_to_series" > artists<-artwork$artist[artwork$artist != ""] > summary(artists)[1:20] Henri Matisse John Gutmann Pablo Picasso 72 66 66 Ferdinando Ongania Vincent van Gogh Caravaggio 57 57 49 Raphael Claude Monet Dr. William J. Pierce 48 44 42 Alexander Girard Tina Modotti Martin Kippenberger 37 37 36 Alvin Langdon Coburn Thomas Annan Robert Adams 31 31 30 Paul Cézanne Edward Weston Martin Venezky 29 28 28 Paul Klee Willi Kunz 28 28 > media<-artwork$media[artwork$media != ""] > summary(media)[1:20] Gelatin silver print Oil paint Canvas,Oil paint 1110 897 429 Oil paint,Canvas offset lithograph Albumen print 429 221 185 Bronze Photogravure chromogenic print 138 127 104 Acrylic paint Synthetic polymer paint Ink 82 69 67 Graphite Screen-printing Wood 61 57 55 Daguerreotype Mixed Media Oil paint,Panel 39 39 37 Panel,Oil paint Marble 35 30 > gelatin_silver_print_artworks<-artwork[artwork$media == "Gelatin silver print" & artwork$artist != "",] > summary(gelatin_silver_print_artworks$artist)[1:20] Dr. William J. Pierce John Gutmann 78 41 34 Robert Adams Ilse Bing Edward Weston 30 27 26 Walker Evans Tina Modotti Dorothea Lange 20 19 18 Lee Friedlander Lewis Hine Garry Winogrand 16 16 14 Henry Wessel Nicholas Nixon Ansel Adams 13 13 12 Harry Callahan Pirkle Jones Arnold Genthe 11 11 10 Bill Brandt Lewis Baltz 10 10

A couple of quick checks of the data show that it has some biases relative to mainstream art history, with more photography and photographers than you might expect. And there are several different entries for oil painting, which have skewed the numbers. This is interesting data, but about the dataset rather than about art more generally at the moment. Perhaps art history data will be as useful for institutional critique as for historical research.