Art Computing Art History Art Open Data

Exploring Art Data 8

Let’s explore the text of Vasari’s Lives of The Artists.

The full text of an English translation can be found on Project Gutenberg.

We can use a shell script to download the files to a local folder:

BOOKS="25326 25759 26860 28420 28421 28422 31845 31938 32362 33203"
mkdir -p "${DESTDIR}"
pushd "${DESTDIR}"
for ebook in ${BOOKS}
wget "${ebook}.txt"

And then we can use R’s “tm” library to load the files:

## For "dissimilar" in tm
## install.packages("proxy")
## For "plot" on dtm in tm
## source("")
## biocLite("Rgraphviz")
## install.packages("tm")
## These are the Project Gutenberg book numbers for Lives Of The Artists
## Skip volume 10, this doesn't follow the same format: , 33203
books<-c(25326, 25759, 26860, 28420, 28421, 28422, 31845, 31938, 32362)
## Make a file path for a book
paste(dir, "/", prefix, id, extention, sep="")
## Load the file
## Load the files
sapply(filenames, loadFile, USE.NAMES=FALSE)
## Load the texts
texts<-loadFiles(sapply(books, bookPath))

We can then extract the entry for each artist, clean up the data, create a corpus, and then clean up the corpus:

## Extract entries on each artist
## Split each file into blocks between "LIFE OF .*\n"
artists<-unlist(strsplit(text, "\nLIFE OF"))
## Discard first block, that is introduction
## Last block will be discarded by the article cleaning function
## Clean footnotes, etc. from article text
## Truncate at \nFOOTNOTES:
text<-unlist(strsplit (text, split="\nFOOTNOTES:"))[1]
## Remove [Text in square brackets]
text<-gsub("\\[[^]]+\\]", "", text)
## Remove punctuation
text<-gsub("[[:punct:]]", "", text)
## Lowercase words
## Get the first line of a string

We can then create a term/document matrix (and remove infrequently used terms) to explore the corpus:

## Term/document matrix
## Remove infrequent terms to save memory
dtm<-removeSparseTerms(dtm, 0.4)

We can find frequently used terms:

## Frequent terms in the matrix
findFreqTerms(dtm, 60)

 [1] "lorenzo"    "painted"    "pietro"     "life"       "andrea"
[6] "francesco"  "giovanni"   "beautiful"  "executed"   "antonio"
[11] "domenico"   "duke"       "marble"     "jacopo"     "church"
[16] "hand"       "little"     "able"       "afterwards" "age"
[21] "art"        "beauty"     "caused"     "chapel"     "christ"
[26] "city"       "day"        "death"      "del"        "design"
[31] "excellent"  "figure"     "figures"    "finished"   "florence"
[36] "friend"     "head"       "held"       "house"      "judgment"
[41] "left"       "likewise"   "manner"     "master"     "messer"
[46] "painter"    "painting"   "palace"     "pictures"   "placed"
[51] "pope"       "reason"     "rome"       "seen"       "sent"
[56] "set"        "time"       "various"    "whereupon" 

We can see which words are strongly associated:

findAssocs(dtm, "painting", 0.8)


  painting    painter   painters       hand     little    painted   pictures
1.00       0.90       0.90       0.89       0.89       0.89       0.89
beautiful    figures      grace       lady       save     beauty   executed
0.87       0.87       0.86       0.86       0.86       0.85       0.85
manner   portrait       time     worthy  excellent      hands   likewise
0.85       0.85       0.85       0.85       0.84       0.84       0.84
particular       seen       sent      truth       able        art  craftsmen
0.84       0.84       0.84       0.84       0.83       0.83       0.83
friend      house       left      lived     living     return        age
0.83       0.83       0.83       0.83       0.83       0.83       0.82
besides     christ        day    finally    mention   received      study
0.82       0.82       0.82       0.82       0.82       0.82       0.82
chapel       city  diligence excellence       head     honour     master
0.81       0.81       0.81       0.81       0.81       0.81       0.81
nature       rome       true       held  wherefore
0.81       0.81       0.81       0.80       0.80 

And we can plot those associations:

## Plot associations between terms
plot(dtm, findFreqTerms(dtm, 120), attrs=list(graph=list(),
node=list(shape="rectangle",fontsize="72", fixedsize="false")))

Which looks like this:

Art Open Data

Exploring Art Data 7

We’ve looked at brightness and contrast, let’s look at colours.

The images we’ve downloaded are stored in traditional computer graphics style as red, green and blue values (RGB values). We can extract the RGB values from the image and create a palette for the image using a standard “clustering” function. We can then sort the colours in the palette in order of brightness in order to make the palette easier to look at when we plot it.

## Get the r,g,b colour values for all the pixels in the image as a list
## Get flat lists of red, green and blue pixel values
red<-imageData(channel(bitmap, "red"))
green<-imageData(channel(bitmap, "green"))
blue<-imageData(channel(bitmap, "blue"))
## Combine these lists into a table of pixel r,g,b values
rgbs<-data.frame(red=red, green=green,blue=blue)
## Sort a palette's colours in rough order of brightness
colourValues<-apply(palette, 1, sum)
## Quantize the colours (extract a colour palette
quantizeColours<-function(bitmap, count){
## Cluster r,g,b values as points in RGB space
clusters<-kmeans(rgbs, count)
## The centre of each cluster is its average colour
## Return the colours in brightness order
## Get palettes for each painting
function(bitmap){quantizeColours(bitmap, colourCount)})

Having got the palettes we can sort them in order of total brightness.

## Get the palettes in order of brightness
## Sum the pixel values and divide them by the number of pixels
function(palette){sum(palette) / length(palette)})
## Sort the colours in order of brightness

And finally we can convert the colours to yet another format and plot the palettes.

## Convert the palette colours to R colours paletteToColours<-function(palette){ apply(palette, 1, function(colour){rgb(colour[1], colour[2], colour[3])}) } ## Plot palettes ## Get a flat list of colours palettesColours<-sapply(sortedPalettes, paletteToColours, USE.NAMES=FALSE) ## Plot the colours for each palette par(mar=c(4, 20, 4, 4)) image(matrix(1:(length(sortedPalettes) * colourCount), colourCount, length(sortedPalettes)), col=palettesColours, axes=FALSE) axis(2, at=seq(0.0, 1.0, 1.0 / (length(sortedPalettes) - 1)), labels=names(sortedPalettes), las=2, tick=0)

Which looks like this:

palettes.pngBetter palette extraction and more perceptual brightness sorting are left as exercises for the reader. 🙂

Art History Art Open Data

Art data Analysis: Unconcealed


Unconcealed is an exemplary presentation of previously undisclosed data concerning the exhibition and collection of Conceptual Art in Europe in the 1960s/1970s and using it to study the social and economic networks of the history of conceptualism. It would be fascinating to see this kind of project for other artistic movements.

See here for more information.

Aesthetics Art Computing Art History Art Open Data

Exploring Art Data 6

Let’s access an API and start analysing images.

We’ll use R to get information about a series of works (Monet’s “Haystacks) and images of them from freebase.

In order to do this we’ll need to install some new libraries:


Then load the libraries:


And patch one of them to work with freebase:

## Monkeypatch RJSONIO so list() -> []

oldlistmethod<-getMethod("toJSON", "list") setMethod("toJSON", "list", function(x, ...){ if(length(x) == 0){ return("[]") } else { return(oldListMethod(x, ...)) } })

We can then write code to access the freebase web API:

## Query the freebase API, taking and returning R objects
curlEscape(queryJSON), sep=""))
stopifnot(responseJSON$status == "200 OK")
## Get the series description and list of works from freebase
## Get the artwork description from freebase
## Get the image description from freebase
## The maximum height or width of a thumbnail
## Use the freebase thumbnail to try and get a thumbnail for the image
## Returns NULL if image couldn't be found
getThumbnail<-function(image, thumbSize){
# On fail, redirect to a url that's guaranteed not to be an image,
# we use the api root here
# Use http as EBImage's use of curl doesn't like https
image[[1]]$id, '?maxwidth=', thumbSize, '&maxheight=',
thumbSize, '&mode=fit&onfail=/', sep="")

We can fetch data about Monet’s “Haystacks”, and images where those are available:

## Fetch the series entry
## Fetch the entries for individual artworks in the series
artworks<-lapply(series$artworks, getArtwork)
## Get the names of the retrieved artwork data in order
artworksNames<-lapply(artworks, function(artwork){artwork[["name"]]})
## Get the image resource information for the artworks
artworksImages<-lapply(artworks, function(artwork){getImage(artwork[["id"]])})
## Fetch a thumbnail bitmap where available, and clear out NULLs
function(image){getThumbnail(image, thumbSize)})
artworksThumbnails<-Filter(Negate(is.null), artworksThumbnails

Having fetched the images, we can convert them to greyscale and produce a box plot of their brightness:

## Draw a box plot of the brightness, allowing enough room for rotated labels
boxplot(grayscaleArtworksThumbnails, las=2)

Which looks like this:


It’s interesting to compare the brightness ranges of the paintings, and to see the outliers.
Art History Art Open Data Art History 3

Some searches that give good results:

Victoria & Albert Museum

Painting Catalogue

Art History

Art Exhibition

Art Gallery


Modern Art

Art History Art Open Data

Google Books Art History

Don’t buy DRM-encumbered ebooks from Google.

But do download public domain ePub and PDF versions of old books on art and art history from them.

Modern art and living artists

The Art journal

The art of painting

Works of art and artists in England

The art of drawing in perspective

Colour, as a means of art

Precepts and observations on the art of colouring in landscape painting

British galleries of art

Catalogue / American Art Association, Anderson Galleries, inc., New York

The Italian schools of painting

The Picture Collector’s Manual: Alphabetical arrangement of scholars and masters and classification of subjects

A biographical history of the fine arts

And do look in the “Related Books” recommendations at the bottom of each page.

Art Open Data

A Much Better Arts Council Funding Visualisation

Arts Council Funding By Constituency

Sadly it’s made using proprietary software, but this dynamic visualisation of Arts Council England funding shows much more data than my quick R script.

Art Open Data Free Culture Free Software

Art Open Data 2

How To Use Art Open Data

Interface With APIs

Web APIs provide read access to data and may allow data to be written back to share as well.

This allows data to be accessed and published more quickly than with downloadable datasets, often instantaneously.

Load Datasets

Datasets can be loaded into applications and programming environments directly.

This makes social network analysis and statistical analysis much more efficient.

Perform Statistical Analysis

Datasets can be analysed to find statistical features such as averages and outliers. This can direct further analysis or suggest subjects for critical consideration.

Perform Knowledge Discovery

Text and images can be processed to discover patterns, similarities between different works, relationships between subjects, and even limited kinds of aesthetic and affective qualities.

Perform Social Network Analysis

The interactions of individuals in the artworld over time can be analysed to model relationships and the relative importance or position of individuals within their social cliques.

Create Data Visualisations

Static or interactive graphical presentations data can be useful for finding interesting properties of a dataset or for better understanding the features or relationships within a dataset. It can also be art in its own right.

Where To Find Data Sources And Tools


CKAN is a directory of datasets. is an online media repository. It contains scans of many important and useful art historical primary documents. is an online library of electronic texts. It includes books and lectures by John Ruskin, William Morris, and many others.

freebase is an online database that extracts information from Wikipedia and makes it available for download. It has datasets on artworks, artists, and other art-related subjects.


Culture24 provides and API to access data about exhibitions and other current events at UK galleries and museums.

The Culture Grid API provides access to aggregated information from UK museum websites.

flickr commons provides images from museum collections tagged by volunteers, searchable and taggable through an API.

Wikimedia Commons provides images uploaded by volunteers, searchable through an API.


Wordle is an online service that creates “word clouds” from text. This can be useful for visually getting the feel for a text quickly.

Many Eyes is an online data repository and graphing service. It can be a convenient way of sharing data and visualisations.

SocNetV is a social network analysis application. It allows social network data to be analysed and visualised in various useful ways.

GNUPlot is a data graphing utility. It supports many different kinds of graphs and can be a useful tool for plotting data.

Programming Environments

R is a statistical analysis programming language. It is useful for statistical analysis, machine learning, and for drawing high-quality graphs of the results.

Python is a general-purpose programming language. It is useful for accessing APIs, text processing, machine learning, and data visualisation.

Processing is a simple data visualisation programming language. It can easily be extended to use more advanced facilities and is useful for creating interactive information graphics.

PD is a graphical programming language popular with digital artists. Using it with art open data helps to include artists in the analysis and visualisation of that data.

How To Proceed

Locate And Index Data

Find new APIs and new sources of data, and explore existing sources to find new datasets, then add them to CKAN.

Digitise Primary Sources

If you have physical access to an out-of-copyright primary source, photograph or scan it and upload the results to

Extract Data From Primary Sources

Once primary sources have been scanned, more structured data can be extracted from them. Text scans can be cleaned up and converted to machine readable formats using Optical Character Recognition (OCR). Artwork scans can be cleaned up, be tagged or categorized or otherwise have metadata added, or be processed algorithmically to find features or extract aesthetic information such as palettes.

Produce Interfaces To APIs

Web APIs are no good if people can’t use them. Creating libraries of code in a programming language you use to access an art open data API opens that API up to all the users of that language.

Combine Datasets

Combining multiple datasets can add information that is missing from a main dataset or extend its coverage of dates or regions.

Add Non-Art Open Data

Using geodata from OpenStreetMap, bibliographic data from the British Museum, economic data from OpenEconomics, and other sources of Open Data can complement Art Open Data. Combining data from diverse fields can provide context and reveal or explain unexpected features of the original dataset.

Analyse Data

Having got all this data, it’s time to explore it and to find interesting things that are hidden in it. Theories can be suggested, supported or undermined by the data, and it’s here that the traditional skills of art history or art theory can come into play.

Visualise Data

Data visualisation of art data is where art and data truly join together. Whether a simple chart or a complex interactive animation, making data about art visual can provide inspiration to both the study and production of art.

Art Open Data

Art Open Data 1

Art Open Data

Open Data is data that you have the freedom to use. Data that you are not free to use is called proprietary data.

Art Open Data is Open Data that concerns art institutions, art history, the art market, and artworks.

It is useful because it allows us to examine and think about those subjects in new ways.

This doesn’t displace the social history of art, art theory, or actually looking at art. Rather it allows us to find new ways of contextualising art and of gathering evidence for theories.

Art Institutions

Historical and contemporary art institutions have collection catalogues, show details, attendance information, accounts, and organizational information.

Contemporary art institutions may make their data available through an API that they provide or that they use from a service.

Art History

Writing by artists, critics, theorists and historians are all primary texts for art history. Out of copyright texts are being digitised and uploaded to the internet. These can be processed to provide institutional and market data, to discover factual information, or for affective and aesthetic analysis.

Biographical information about artworld figures is part of art history. Books such as “Lives of the Artists” and websites such as Wikipedia can be sources of biographical data.

The Art Market

Records of art auction prices have been kept for hundreds of years.

Older, historical records are available freely but recent information is usually proprietary.


Reproductions of artworks that are out of copyright can be scanned, or artworks that are out of copyright and are in institutions without restrictions on photography can be photographed.

Institutional, historical and market data about the artwork can build up a picture of an artwork’s production, reception, and provenance.

Next we’ll look at how to find and use Art Open Data.

Art Open Data Free Software Projects

Culture Hack Days

I’d like to organize an Art Open Data Hack Day next year.

I’ve been looking for similar events, and there are good precedents, but nothing exactly like what I have in mind: