Art History Art Open Data Projects

Work In Progress: Tate Collection Data Movements

Here’s a sneak peek at one of the visualizations from my analysis of the Tate Collection data. It’s a graph of Movements linked by their artists:


More to come soon…

Art History Art Open Data

Exploring the Tate Collection Metadata

The Tate have released their collection metadata in an exemplary way here:

Let’s explore it using MongoDB, which you can find installation structions for here.

First fetch and upload the JSON data:

git clone
cd collection
find artists -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artists
find artworks -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artworks

Then in the mongo shell we can explore the artists and artworks:

// Artists

// List artist movements

{$unwind: "$movements"},
{$project: {name: "$"}},
{$group: {_id: "movements", items: {$addToSet: "$name"}}}

// List artist eras

{$unwind: "$movements"},
{$project: {name: "$"}},
{$group: {_id: "movements", items: {$addToSet: "$name"}}}

// Find artists by movement

db.artists.find({"":"Pop Art"})

// Find artists by era

db.artists.find({"":"20th century post-1945"})

// Find artists by birth year

{$group: {_id: "$birth.time.startYear", artists: {$addToSet: "$fc"}}}

// Find artists by death year

{$group: {_id: "$death.time.startYear", artists: {$addToSet: "$fc"}}}

// count artists by gender

    {$group : {_id : "$gender" , number : {$sum : 1}}},
    {$sort : {number : -1}}

// Count artists by birthplace

    {$group : {_id : "$" , number : {$sum : 1}}},
    {$sort : {number : -1}}

// Artworks

// List artwork subject categories

{$unwind: "$subjects.children"},
{$unwind: "$subjects.children.children"},
{$group: {_id: "categories",
           categories: {$addToSet: "$"}}}

// List artwork subjects

{$unwind: "$subjects.children"},
{$unwind: "$subjects.children.children"},
{$unwind: "$subjects.children.children.children"},
{$group: {_id: "subjects",
           subjects: {$addToSet: "$"}}}

// List artwork categorys and subjects

{$unwind: "$subjects.children"},
{$unwind: "$subjects.children.children"},
{$unwind: "$subjects.children.children.children"},
{$group: {_id: "category-subjects",
           subjects: {$addToSet: {category: "$",

// List artwork movements

{$unwind: "$movements"},
{$group: {_id: "artwork-movements",
           movements: {$addToSet: "$"}}}

// Find artwork by category/subject group

db.artworks.find({"":"UK counties"})

// Find artwork by subcategory/subject


// Find artwork by artist name

db.artworks.find({"contributors.fc":"Andy Warhol", "contributors.role":"artist"})

// Find artwork by movement. Will exclude works with no movement.


// Find artworks without movements


// Find artwork by date. Will exclude works with unknown date.

db.artworks.find({"dateRange.startYear": {$gte: 1900, $lt: 1910}})

// Find artworks without dates


Exploring the data it becomes clear that the structure of the metadata is wonderfully regular but some of the content is less so. For example entries in the “artists” data may be attributions to movements rather than individuals, and both movements and individuals may have null gender. Locations in birth and death data can be a town or country name in any language, or a town and country separated by a comma. Not every artwork has a creation date, movements, or subjects.

But this is standard for real-world data, and easy enough to regularise. The community can do this and submit a pull request. What’s important is that this is a high-quality metadata dataset from a world-class art institution. People are already starting to explore and visualise it. See here for a great example:

Art History Art Open Data Free Culture

Importing Tate Collection Data Into MongoDB

You have to feed records into Mongo one per line. Like this:

find artists -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artists
find artworks -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artworks
Aesthetics Art Art History

Allographic, Fake, Information, Materiality

In “Languages Of Art” Nelson Goodman describes two types of art, allographic and autographic. Allographic art has a notational score and is distributed by reproduction, like a novel or a DVD. Autographic art is a unique original artwork, like a painting or sculpture.

A copy of an allographic artwork is a print, a copy of an autographic artwork is a fake. Goodman argues (giving the example of Vermeer scholarship) that even if a fake is indistinguishable from the original today we cannot know that it will never be possible, with developments in technology or knowledge, to distinguish it in the future. Autographic art could be copied using atomic-level 3D scanning and printing, at which point the history and provenance of the artwork become the only current ways of distinguishing the original from a copy. But some as yet unknown fact or technique might still be developed to tell them apart.

When printing an allographic work, the materiality of the print is irrelevant to the extent that it does not interfere with the successful communication of the content of the work. The materiality of the print is noise in the sense of Shannon’s information theory. But noise can become ironized into signal by history. For example the hiss and crackle of vinyl records sampled in trip-hop or the deliberate digital image corruption of glitch art.

When producing a fake, another concern of Goodman’s, material differences from the original are noise. Where they become identifiable, these differences can become a signal indicating the work of celebrated or infamous fakers. Or they can become the signature of inauthenticity.

We cannot assume that every material fact about an autographic artwork or a particular print of an allographic artwork is intended to be part of the signal of the work, this would be the intentional fallacy. But every material fact about an artwork may affect its reception and interpretation. This is obvious for autographic work, where control of the medium is a sign of artistic competence, but it is also true for allographic work.

Bits require atoms to hold them, and prints require a substrate. The medium modulates the message, and the materiality of text has been something that authors have played with since at least “Tristram Shandy”. But the materiality of text that criticizes or historicizes art is not a product of authorial intent, rather it is an imposition by editors and designers. It is contingent. But this is the intentional fallacy, and the material qualities of a text affect its perception and reception whether the author cares or not.

The design of an art history, theory or criticism journal is not intended to confound the signal of the texts it contains. It is designed to lend them an air of neutrality and authority. If the authors of the texts they present do not intend this, they at least consent to it.

At art school in the early 1990s I was struck by the fact that the general posture of criticality of the cultural studies department towards other media didn’t extend to the particularities and peculiarities of their own. Media can at most appear neutral in the culture that exploits them. Much historic conceptual art and concrete poetry now speaks more immediately of mid-twentieth century bureaucracy’s office technology than of its artistic written content. But historical distance can be replaced with critical distance. We can find our own media strange. This includes the media of critical texts and of art history.

Which is why I think Charlotte Frost’s “What Is Art History Made Of” is such an important essay. Frost both recognizes the materiality of art historical media and seeks to broaden it. The Digital Humanities are already expanding the range of methods and materials available to art history, but Frost describes a broader self-critical programme for such experiments to pursue. This is a superset of a “critical digital humanities” that is much more than the call to order that label usually covers, bringing in Maker Culture and art practice as well (Art & Language are a useful precedent here). It is a self critical expansion of art history into its own objects that promises increased expressive range and communicative bandwidth for the field.

Aesthetics Art Art History Art Open Data

What Is An Artist (On Wikipedia)?

Wikipedia is the free online encyclopedia. It features articles on many thousands of artists. In the paper “Art History on Wikipedia, a Macroscopic Observation”, Doron Goldfarb et al use the Getty Union List Of Artist Names, via the Virtual Internet Authority File, as a name authority to find artists on Wikipedia. This approach has the advantage of authority, ULAN is used as the name authority by many projects including the Europeana open metadata project. But it has the disadvantage of imposing an external concept of who an artist is onto Wikipedia. If a way could be found of identifying artists using the information contained in Wikipedia’s articles, this would mean that we can use Wikipedia’s own concept of what an artist is to identify artists on Wikipedia rather than using an external authority.

What, then, is an artist on Wikipedia?

It is not an article tagged with a category containing the word “Artist”, as that also includes singers and other recording or performing artist(e)s.

It is not an article with an “Artist” InfoBox, as although that is specific to artists not every artist or artist group has one.

If we use the concept “Visual Artist” rather than “Artist”, this excludes performance artists.

The Wikipedia-derived “semantic web” database Freebase provides a performative definition of a “Visual Artist” on its wiki: anyone (or anything) who has made a work of visual art. But this definition isn’t used by the actual database, which classifies performance artists as artists.

An article’s membership of the Category “Artists” (or a sub-category of it) cannot be used to identify artists. This Category includes articles about works about artists, Artisans, and Nineteenth Century Composers.

The best approach I have found for identifying what I regard as artists is to use DBPedia, another Wikipedia-derived semantic web database, to find articles that are tagged with sub-categories of the Category “Artists” and to filter out categories that I don’t think belong. But this is not using Wikipedia’s concept of what an artist is.

So I have edited Wikipedia in order to exclude those sub-categories of “Artists” that I don’t think belong, given Wikipedia’s own definitions of the terms used to describe each sub-category. If these edits are not removed, then articles tagged with sub-categories of “Artists” will be a good definition of what an artist is given my interpretation of Wikipedia’s terms.

This isn’t a disinterested discovery of knowledge on my part though. In trying to identify knowledge I have had to intervene to create it in a system of knowledge where it is difficult for words to mean more than one thing or have more than one context. The former is postmodern, the latter modern. Wikipedia is a site of tension between these approaches, and this is reflected in its ontology, in both the computer science and the philosophical sense.

Art History

The Proletarianization Of Art Criticism

Book and film reviews have been reduced to individual data points on aggregator web sites as print journalism collapses. It’s only a matter of time until such big data/collective intelligence approaches expand to cover art criticism.

The aggregation and statistical analysis of art reviews will complete the proletarianization of art criticism. Art critics will be alienated from the products of their labour, and their value will accrue to the ruling class of the walled gardens of the Internet under their identity and reputation. The institutions that legitimate and monetize art criticism will be outside of academia and inside the market.

Art critics can react to this in two ways.

The first is to try to restore their status by becoming part of that ruling class, going meta with their own aggregation and big data efforts. This is like humanities computing (or digital humanities as it’s been rebranded) in much the same way that Google is like grep.

The second is to attempt to resist proletarianization. Gonzo art criticism, free-and-open-source-criticism, engagement with Maker culture and the production of objects, any means of producing art criticism that is resistant to or that benefits from being the subject of monetization-through-aggregation.

In either case, this will be the end of the crisis of art criticism, as it will be the end of art criticism in its current autonomous form.

3D Printing Art Art History Projects

What is Art History Made of?


(#arthistory hashtag held in front of a man walking down a street in New York describing the work of Taryn Simon, 2013, Charlotte Frost)

“I wanted to draw attention to the physicality of art historical statements whether they are made in print or online. I wanted to look at art historical writing as an object.”

I was very flattered to be asked by Dr. Charlotte Frost to become involved in the 3D printing side of her “Art History Hashtag” project. My “shareable readymades” project was in part a reaction to the treatment of artisans by post-conceptual artists such as Jeff Koons, so reversing the artist/artisan relationship from that project and becoming the person modelling the artwork appealed to me. Charlotte’s writing about the physicality of art history media touched on something I have thought since I was at art school. And I love typography and hashtags, with varying degrees of irony.

Charlotte has now written about her inspiration for the project, providing a context not just for her immediate work but for any classical or digital humanities that wish to cross over with Maker Culture and/or to engage productively in a critique of the ways that their own medium specificity and physicality are implicated in their production. It’s an informative and valuable insight into the production of art and art history. I highly recommend it.

Art Computing Art History Art Open Data Projects

Exploring Art Data 24

(This post uses new features from the R Cultural Analytics Library version 1.0.6 .)

We can divide an image into sections, analyse the R, G and B values of each of those sections and plot the results.

## Load the image
imgdir<-paste(system.file(package = "CulturalAnalytics"), "images", sep = "/")
dirimgs<-paste(imgdir, dir(path = imgdir, pattern = ".jpg"), sep = "/")
## Divide it into sections and get a table of the median RGB values
sections<-divideImage(img, 8, 8)
## Get the median rgb values for each image
rgbs<-sapply(sections, function(img){ coords(medianRgb(imageToRgb(img))) },
## The list needs transposing so we have columns of r,g,b values
## Give the columns useful names
colnames(rgbs)<-c("r", "g", "b")
## Bubble Chart
plot(rgbs[,"r"], rgbs[,"g"], type="n", xlim=c(0,1), ylim=c(0,1),
main="Section Bubble Chart of \"Bonjour, Monsieur Corbet\"",
sub="(size is blue)", xlab="Red", ylab="Green")
images(rgbs[,"r"], rgbs[,"g"], sections, cex=rgbs[,"b"])

courbet bubble graph

This shows no areas of pure, saturated colour.

Next we can cluster the sections of the image and show the resulting clusters.

## Cluster the tiles by colour
## 5 is arbitrary
clusters<-kmeans(rgbs, 5)
## distance matrix
## Multidimensional scaling
## Plot the clusters
plot(cms, type="n", xlim=range(cms[,1]), ylim=range(cms[,2]),
main="Section Clustering of \"Bonjour, Monsieur Corbet\"")
images(cms[,1], cms[,2], sections, thumbnailWidth=20)
ordispider(cms, factor(clusters$cluster), label=TRUE)
ordihull(cms, factor(clusters$cluster), lty="dotted")
courbet section clustering
If the plots didn’t have helpful titles, would you be able to recognize the image?

Despite the arbitrary number of clusters chosen the groupings make some visual sense. Improving on the number of clusters is left as an exercise for the reader.

Art Art History Art Open Data Free Culture Free Software Projects

Art Open Data – Government Art Collection Dataset

I have written a script to download a dataset containing collection information from the UK Government Art Collection site and save it in tab-seperated-value files and an sqlite database for easy access. As the data is from a UK government agency it’s under the OGL.

You don’t need to run the script, a downloaded dataset is included in the project archive:

The dataset doesn’t feature as many collections as the GAC website claims to feature, but the script does omit many duplicates. This project was inspired by Kasabi‘s scraper, adding the ability to download code and data in an easy-to-use format.

Aesthetics Art Art Computing Art History Free Culture Free Software Generative Art Howto Projects Satire

Psychogeodata (3/3)

cemetary random walk

The examples of Psychogeodata given so far have used properties of the geodata graph and of street names to guide generation of Dérive. There are many more ways that Psychogeodata can be processed, some as simple as those already discussed, some much more complex.

General Strategies

There are some general strategies that most of the following techniques can be used as part of.

  • Joining the two highest or lowest examples of a particular measure.

  • Joining the longest run of the highest or lowest examples of a particular measure.

  • Joining a series of destination waypoints chosen using a particular measure.

The paths constructed using these strategies can also be forced to be non-intersecting, and/or the waypoints re-ordered to find the shortest journey between them.


Other mathematical properties of graphs can produce interesting walks. The length of edges or ways can be used to find sequences of long or short distances.

Machine learning techniques, such as clustering, can arrange nodes spatially or semantically.

Simple left/right choices and fixed or varying degrees can create zig-zag or spiral paths for set distances or until the path self-intersects.

Map Properties

Find long or short street names or street names with the most or fewest words or syllables and find runs of them or use them as waypoints.

Find all the street names on a particular theme (colours, saints’ names, trees) and use them as waypoints to be joined in a walk.

Streets that are particularly straight or crooked can be joined to create rough or smooth paths to follow.

If height information can be added to the geodata graph, node elevation can be used as a property for routing. Join high and low points, flow downhill like water, or find the longest runs of valleys or ridges.

Information about Named entities extracted from street, location and district names from services such as DBPedia or Freebase and used to connect them. Dates, historical locations, historical facts, biographical or scientific information and other properties are available from such services in a machine-readable form.

Routing between peaks and troughs in sociological information such as population, demographics, crime occurrence, ploitical affiliation, property prices can produce a journey through the social landscape.

Locations of Interest

Points of interest in OpenStreetMap’s data are represented by nodes tagged as “historic”, “amenity”, “leisure”, etc. . It is trivial to find these nodes to use as destinations for walks across the geodata graph. They can then be grouped and used as waypoints in a route that will visit every coffee shop in a town, or one of each kind of amenity in alphabetical order, in an open or closed path for example. Making a journey joining each location with a central base will produce a star shape.

Places of worship (or former Woolworth stores can be used to find using linear regression or the techniques discussed below in “Geometry and Computer Graphics”.


The words of poems or song lyrics (less stopwords), matched either directly or through hypernyms using Wordnet, can be searched for in street and location names to use as waypoints in a path. Likewise named entities extracted from stories, news items and historical accounts.

More abstract narratives can be constructed using concepts from The Hero’s Journey.

Nodes found using any other technique can be grouped or sequenced semantically as waypoints using Wordnet hypernym matching.


Renamed Tube maps, and journeys through one city navigated using a map of another, are examples of using isomorphism in Psychogeography.

Entire city graphs are very unlikely to be isomorphic, and the routes between famous locations will contain only a few streets anyway, so sub-graphs are both easier and more useful for matching. Better geographic correlations between locations can be made by scoring possible matches using the lengths of ways and the angles of junctions. Match accuracy can be varied by changing the tolerances used when scoring.

Simple isomorphism checking can be performed using The NetworkX library’s functions . Projecting points from a subgraph onto a target graph then brute-force searching for matches by varying the matrix used in the projection and scoring each attempt based on how closely the points match . Or Isomorphisms can be bred using genetic algorithms, using degree of isomorphism as the fitness function and proposed subgraphs as the population.

The Social Graph

Another key contemporary application of graph theory is Social Network Analysis. The techniques and tools from both the social science and web 2.0 can be applied directly to geodata graphs.

Or the graphs of people’s social relationships from Facebook, Twitter and other services can mapped onto their local geodata graph using the techniques from “Isomorphism” above, projecting their social space onto their geographic space for them to explore and experience anew.

Geometry and Computer Graphics

Computer geometry and computer graphics or computer vision techniques can be used on the nodes and edges of geodata to find forms.

Shapes can be matched by using them to cull nodes using an insideness test or to find the nearest points to the lines of the shape. Or line/edge intersection can be used. Such matching can be made fuzzy or accurate using the matching techniques in “Isomorphism”.

Simple geometric forms can be found – triangles, squares and quadrilaterals, stars. Cycle bases may be a good source of these. Simple shapes can be found – smiley faces, house shapes, arrows, magical symbols. Sequences of such forms can be joined based on their mathematical properties or on semantics.

For more complex forms, face recognition, object recognition, or OCR algorithms can be used on nodes or edges to find shapes and sequences of shapes.

Classic computer graphics methods such as L-sytems, turtle graphics, Conway’s Game of Life, or Voronoi diagrams can be applied to the Geodata graph in order to produce paths to follow.

Geometric animations or tweens created on or mapped onto the geodata graph can be walked on successive days.

Lived Experience

GPS traces generated by an individual or group can be used to create new journeys relating to personal or shared history and experience. So can individual or shared checkins from social networking services. Passenger level information for mass transport services is the equivalent for stations or airports.

Data streams of personal behaviour such as scrobbles, purchase histories, and tweets can be fetched and processed semantically in order to map them onto geodata. This overlaps with “Isomorphism”, “Semantics”, and “The Social Graph” above.

Sensor Data

Temperature, brightness, sound level, radio wave, radiation, gravity and entropy levels can all be measured or logged and used as weights for pathfinding. Ths brings Psychogeodata into the realm of Psychogeophysics.


This series of posts has made the case for the concept, practicality, and future potential of Psychogeodata. The existing code produces interesting results, and there’s much more that can be added and experienced.

(Part one of this series can be found here, part two can be found here . The source code for the Psychogeodata library can be found here .)