The R Cultural Analytics Library

I have gathered together much of the code from my series of posts on Exploring Art Data as a library for the R programming language which is now available as a package on R-Forge:

I will be adding more code to the library over time. It’s very easy to install, just enter the following into an R session:

install.packages("CulturalAnalytics", repos="")

The library includes code for ImagePlot-style image scatter plots, colour histograms and colour clouds and other useful functions. The examples in the documentation should help new users to get started quickly.

R is the lingua franca for statistical computing, and I believe that it’s important for art and digital humanities computing to avail itself of its power.

The Aesthetic Seance

Aesthetics in the visual arts is ultimately the evaluation of qualia under a given theory of value. Qualia are irreducible aesthetic experiences or feelings that we have no power of introspection over. They cannot be studied materialistically through scientific naturalism. [1]

Qualia are therefore supernatural.

Supernatural entities can be summoned and given expression in the material world through seances. [2] Therefore it should be possible to summon a quale through a seance and give it a means of expression. Rather than theorising about it, ask it directly, see what it has to say for itself.

Summon “red”, for example, and give it access to a ouija board or pencil. Ask it what it contains, how it relates to other colours, what its favourite use by an artist is, whether it really is opposed to green. Summon “line” and ask it if it’s infinite. Extend the criteria used from “supernatural” to “abstract” and summon “portraiture”, “chiaroscuro” or “pop art”.

Qualia are clearly both real and supernatural so belief or disbelief in the supernatural is not an issue. Belief in the efficacy of seance is also not an issue. [3] This is different from summoning up dead artists to create new works as every individual has direct personal experience of the reality of qualia even if they do not have direct personal experience of ghosts.

There are therefore no methodological or metaphysical barriers to the exciting possibilities of the parapsychological investigation of aesthetics.

(Individuals of a nervous disposition, with mental health problems, or on medication should not take part in seances. Local law regarding the holding and recording of seances should be investigated before organizing one. This article makes no claims for the reality of the supernatural beyond the definition of qualia given.)



[3] See for example “Conjuring Up Philip”, M. Owen with Margaret Sparrow, 1976, Harper & Row, New York. Adapted in “Parapsychology, The Controversial Science”, Richard Broughton, 1992, Rider, London.

Werewolf Fiction. You’re doing it wrong.

Werewolf fiction lacks the confidence of Vampire fiction. Vampire
fiction is novel, reflexive, indexical, and complete. It is novel
because vampirism is unprecendented in both the social and personal
realities of its characters. It is reflexive because it is about the
condition of vampirism qua vampirism as its subject. It is indexical
because the condition of vampirism is used allegorically or
metaphorically to animate contemporary concerns and to illustrate the
human condition. And it is complete because no other themes or
macguffins are used to make up for the perceived deficiencies of the

“Dracula” and “Interview With The Vampire” are the two high points of
popular vampire fiction. The former takes the stuff of penny dreadfuls
and distant folk superstition as the absent core of a clash between
modernity and superstition that animates the hypocrisy and shear of
Victorian society. The latter ironises the displaced catholic theatrics
of an exhausted cinematic form into a tale of the betrayal of promise
and an illustration of the price of the impossible that does not require
its reader to have an immortal soul to lose in order for it to terrify them.

There is very little werewolf fiction that is novel, reflexive,
indexical, and
complete. I do not know why this is. Werewolfery can be an ironic symbol of
many key elements of the human condition and of their postmodern
situation. Take out the witches and faeries, the police procedural and
the pack dynamics, the hunters and the soap opera and lycanthropy can be a
prism rather than ballast.

Art Text Data Analysis 2 – Themes And Topics

Discovering Themes

Topic Models

Topic Modelling Toolbox

MALLET (and a good example of using it)

Art Text Data Analysis 1

Network Analysis and the Art Market: Goupil 1880 – 1895 [PDF]

Wyndham Lewis’s Art Criticism in The Listener, 1946-51

Tools For Exploring Text

Auto Converting Project Gutenberg Text to TEI

Open Art Data – Datasets Update

Here’s a new OGL-licenced list of works in the UK government’s art collection, scraped for a Culture Hack Day –

The JISC OpenART Project is making good progress and considering which ODC licence to use. It should be both a great resource and a great case study –

I’ve mentioned it before but this Seattle government list of public art with geolocation information is really good –

And Freebase keep adding new information about visual art –

Europeana are ensuring that all the metadata they provide is CC0 –

Their API isn’t publicly available yet, though! 🙁 –

Finally, for now, some of the National Gallery’s data now seems to be under an attempt at a BSD-style licence. The OGL would be even better… –

Exploring Art Data 23

Having written a command-line interface (CLI), we will now write a graphical user interface (GUI). GUIs can be an effective way of managing the complexity of software, but their disadvantage is that they usually cannot be effectively scripted like CLI applications and that they usually cannot be extended or modified as simply or as deeply as code run from a REPL.

That said, if software is intended as a stand-alone tool for performing tasks that will not be repeated and do not require much setup, a GUI can be very useful. So we will write one for the code in image-properties.r

As with the CLI version, we will run this code using RScript. The script can be run from the command line, or an icon for it can be created in the operating system’s applications menu or dock.

#!/usr/bin/env Rscript
## -*- mode: R -*-

The GUI framework that we will use is the cross-platform gWidgets library. I have set it up to use Gtk here, but Qt and Tk versions are available as well. You can find out more about gWidgets at

## install.packages("gWidgetsRGtk2", dep = TRUE)

We source properties-plot.r to load the code that we will use to plot the image once we have gathered all the configuration information we need using the GUI


The first part of the GUI that we define is the top level window and layout. The layout of the top level window is a tabbed pane of the kind used by preferences dialogs and web browsers. We use this to organise the large number of configuration options for the code and to present them to the user in easily understood groupings.
Notice the use of “layout” objects as matrices to arrange interface widgets such as buttons within the window and later within each page of the “notebook” tabbed view.

win<-gwindow("gui", visible=FALSE)
layout[2,1]<-gbutton("Render Image", callback=renderImage)

The first tab contains code to create and handle input from user interface elements for selecting the kind of plot, the data file and folder of images to use, and the file to save the plot as if required. It also allows the user to specify which properties from the data file to plot.

table<-glayout(container=nb, label="Files And Columns")
table[1,2]<-gcombobox(c("Display","PDF", "PNG"), handler=updateSaveFile)
table[2,1]<-glabel("Data File:")
table[2,3]<-gbutton("Set Data File...", handler=setDataFile)
table[3,1]<-glabel("Image Folder:")
table[3,3]<-gbutton("Set Image Folder...", handler=setImageFolder)
saveImageLabel<-glabel("Save Image:", enable=FALSE)
saveImageGedit<-gedit("", enable=FALSE)
saveImageButton<-gbutton("Set Image File...", enable=FALSE, handler=setSaveFile)
table[5,1]<-glabel("X Value Column:")
table[6,1]<-glabel("Y Value Column:")
table[7,1]<-glabel("Y Value Column:")
table[8,1]<-glabel("Filename Column:")
table[9,1]<-glabel("Icon Label Column:")

We use functions to allow the user to choose the data file, image folder, and save file. Using the GUI framework's built-in support for file choosing makes this code remarkably compact.

setDataFile<-function(button, ...){
choice<-gfile(type="open", text="Select the data file...",
filter=list("All files"=list(patterns=c("*")),
"TSV files"=list(patterns=c("*.txt"))))
## No choice == NA
setImageFolder<-function(button, ...){
choice<-gfile(type="selectdir", text="Select the image folder...")
## No choice == NA
setSaveFile<-function(button, ...){
choice<-gfile(type="save", text="Select the file to save to...",
filter=list("All files"=list(patterns=c("*")), filterSaveFile))
## No choice == NA

Often part of the GUI must be updated, enabled or disabled in response to changes in another part. When the user selects a "Display" plot we need not require the user to select a file to save the plot in, as the plot will be displayed in a window on the screen. The next functions implement this logic.

updateSaveFile<-function(combo, ...){
enableSaveFile(svalue(combo$obj) != "Display")
if(svalue(combo$obj) == "PDF"){
filterSaveFile<<-list("PDF files"=list(patterns("*.pdf")))
} else if(svalue(combo$obj) == "PNG"){
filterSaveFile<<-list("PNG files"=list(patterns("*.png")))

The second tab contains fields to allow the user to configure the basic visual properties of the plot, its height, width, and background colour.

table<-glayout(container=nb, label="Image")
table[3,1]<-glabel("Background Colour:")

The third tab allows the user to control the plotting of images, labels, points and lines.

table<-glayout(container=nb, label="Plotting")
table[1,1]<-gcheckbox(text="Draw Images", checked=true)
table[3,1]<-gcheckbox(text="Draw Labels", checked=true)
table[4,1]<-glabel("Label Scale:")
table[5,1]<-glabel("Label Colour:")
table[6,1]<-gcheckbox(text="Draw Points", checked=true)
table[7,1]<-glabel("Point Scale:")
table[8,1]<-glabel("Point Style:")
table[9,1]<-glabel("Point Colour:")
table[10,1]<-gcheckbox(text="Draw Lines", checked=true)
table[11,1]<-glabel("Line Width:")
table[12,1]<-glabel("Line Colour:")

The fourth (and final) tab allows the user to manage how the axes are plotted.

table4<-glayout(container=nb, label="Axes")
table4[1,1]<-gcheckbox(text="Draw Axes", checked=true)
table4[2,1]<-glabel("Axis Round Digit Precision:")
table4[2,2]<-gslider(from=0, to=10, by=1, value=axisRoundDigits)
table4[3,1]<-glabel("X Axis Label:")
table4[4,1]<-glabel("Y Axis Label:")
table4[5,1]<-glabel("Axis Colour:")

Having created the contents of each tab, we set the initial tab that will be shown to the user and display the window on the screen.


Next we will write code to set the values of the global variables from the GUI, and perform a render. Until then, we can define a do-nothing renderImage function to allow us to run and test the GUI code.


If we save this code in a file called propgui and make it executable using the shell command:

chmod +x propgui

We can call the script from the command line like this:


We can enter values into the fields of the GUI, choose files, and press buttons (although pressing the Render button will of course have no effect yet).

Digital Evaluation Of The Humanities

Humanities Computing dates back to the use of mainframe computers with museum catalogues in the 1950s. The first essays on Humanities Computing appeared in academic journals in the 1960s, the first conventions on the subject (and the Icon programming language) emerged in the 1970s, and ChArt was founded in the 1980s. But it isn’t until the advent of Big Data in the 2000s and the rebranding of Humanities Computing as the “Digital Humanities” that it became the subject of moral panic in the broader humanities.

The literature of this moral panic is an interesting cultural phenomenon that deserves closer study. The claims that critics from the broader humanities make against the Digital Humanities fall into two categories. The first is material and political: the Digital Humanities require and receive more resources than the broader humanities, and these resources are often provided by corporate interests that may have a corrupting influence. The second is effectual and categorical: it’s all well and good making pretty pictures with computers or coming up with some numbers free of any social context, but the value of the broader humanities is in the narratives and theories that they produce.

We can use the methods of the Digital Humanities to characterise and evaluate this literature. Doing so will create a test of the Digital Humanities that has bearing on the very claims against them by critics from the broader humanities that this literature contains. I propose a very specific approach to this evaluation. Rather than using the Digital Humanities to evaluate the broader humanities claims against it, we should use these claims to identify key features of the broader humanities self-image that they use to contrast themselves with the Digital Humanities and then evaluate the extent to which the literature of the broader humanities actually embody these features.

This project has five stages:

1. Determine the broader humanities’ claims of properties that they posses in contrast to the Digital Humanities.
2. Identify models or procedures that can be used to evaluate each of these claims.
3. Identify a corpus or canon of broader humanities texts to evaluate.
3. Evaluate the corpus or canon using the models or procedures.
4. Use the results of these evaluations as direct constraints on a theory of the broader humanities.

Notes on each stage:

Stage 1

I outlined some of the broader humanities’ claims against the Digital Humanities above that I am familiar with. We can perform a Digital Humanities analysis of texts critical of the Digital Humanities in order to test the centrality of these claims to the case against the Digital Humanities and to identify further claims for evaluation.

Stage 2

There are well defined computational and non-computational models of narrative, for example. There are also models of theories, and of knowledge. To the extent that the broader humanities find these insufficient to describe what they do and regard their use in a Digital critique as inadequate they will have to explain why they feel this is so. This will help both to improve such models and to advance the terms of the debate within the humanities.

One characteristic of broader humanities writing that is outside of the scope of the stated aims of this project but that I believe is worthwhile investigating are the extents to which humanities writing is simply social grooming and ideological normativity within an educational institutional bureaucracy, which can be evaluated using measures of similarity, referentiality and distinctiveness.

Stage 3

It is the broader humanities’ current self-image (in contrast to its image of the Digital Humanities) that concerns us, so we should identify a defensible set of texts for analysis.

There are well established methods for establishing a corpus or canon. We can take the most read, most cited, most awarded or most recommended articles established by a particular service or institution from a given date range (for example 2000-2009 inclusive or the academic year for 2010). We can take a reading list from a leading course on the subject. Or we can try to locate every article published online within a given period. Whichever criterion we choose we will need to explicitly identify and defend it.

Stage 4

Evaluating the corpus or canon will require an iterative process of preparing data and running software then correcting for flaws in the software, data, and models or processes. This process should be recorded publicly online in order to engender trust and gain input. To support this and to allow recreation of results the software used to evaluate the corpus or canon, and the resulting data, must be published in a free and open source manner and maintained in a publicly readable version control repository.

Stage 5

Stage five is a deceptive moment of jouissance for the broader humanities. It percolates number and model into narrative and theory, but in doing so it provides a test of the broader humanities’ self-image.

For the broader humanities to criticise the results of the project will require its critics to understand more of the Digital Humanities and of their own position than they currently do. Therefore even if the project fails to demonstrate or persuade it will succeed in advancing the terms of the debate.

Exploring Art Data 22

So far we have used the R REPL to run code. Let’s write a script that provides a command-line interface for the plotting code we have just written.
A command-line interface allows the code to be called via the terminal, and to be called from shell scripts. This is useful for exploratory coding and for creating pipelines and workflows of different programs. It also allows code to be called from network programming systems such as Hadoop without having to convert the code.
To allow the code to be called from the command line we use a “pound bang line” that tells the shell to use the Rscript interpreter rather than the interactive R system.

#!/usr/bin/env Rscript
## -*- mode: R -*-

Next we import the “getopt” library that we will use to parse arguments passed to the script from the command line.


And we import the properties-plot.r code that we will use to perform the actual work.


The first data and functions we write will be used to parse the arguments passed to the script by its caller. The arguments are defined in a standard format used by the getopt library.

args<-matrix(ncol=4, byrow=TRUE,
c('datafile',	'd', 	1,	"character",
'imagedir',      'i',	1,	"character",
'xcolumn',	'y',	1,	"character",
'ycolumn',	'x',	1,	"character",
'labelcolumn',	'c',	1,	"character",
"xmin",          'm',     1,      "double",
"xmax",          'M',     1,      "double",
"ymin",          'n',     1,      "double",
"ymax",          'N',     1,      "double",
"plotwidth",     'p',     1,      "double",
"plotheight",    'P',     1,      "double",
"plotBorder",    'r',     1,      "double",
"thumbwidth",    't',     1,      "double",
"linewidth",     'l',     1,      "double",
"linecol",       'L',     1,      "character",
"pointsize",     'O',     1,      "double",
"pointcol",      'C',     1,      "character",
"labelsize",     'b',     1,      "double",
"labelcol",      'B',     1,      "character",
"axislabelx",    'e',     1,      "character",
"axislabely",    'E',     1,      "character",
"axiscol",       'z',     1,      "character",
"axisround",     'Z',     1,      "integer",
'outfile',       'o',     1,      "character",
'help',	        'h',	 0,      "logical",
'png',	        'G',	 0,      "logical",
'pdf',	        'D',	 0,      "logical",
'display',       'X',	 0,      "logical",
'no-axes',       'S',     0,      "logical",
'no-images',     'I',     0,      "logical",
'no-labels',     'A',     0,      "logical",
'no-points',     'T',     0,      "logical",
'no-lines',      'V',     0,      "logical"))
opt = getopt(args)

Now we have the arguments we can process them. We check for the presence of arguments to see whether the user has provided them by checking whether its value is not null.
It's traditional to handle the help flag first.

if(! is.null(opt$help)) {
self = commandArgs()[1];
cat(paste(getopt(args, usage=TRUE)))

Next we check for required arguments, those arguments that the user must have provided in order for the code to run. Rather than checking each argument individually we list the required arguments in a vector and then check for their presence using set intersection. If the resulting set isn't empty, we build a string describing the missing arguments and use it to print an error message before exiting the script.

required<-c("datafile", "imagedir", "xcolumn", "ycolumn")
checkRequiredArgs<-function(opts, required){
missing<-setdiff(required, names(opts))
if(length(missing) != 0){
args<-paste("", missing, sep=" --")
cat("Missing parameters:")

Then we set the global variables from properties-plot.r to the command line arguments that have been provided for them. We map the argument name to the variable name and then where it is present we use the assign function to set the variable.

value.mappings<-matrix(ncol=2, byrow=TRUE,
c("xmin", "minXValue",
"linecol", "lineCol",
"plotwidth", "plotWidth",
"plotheight", "plotHeight",
"plotBorder", "plotBorder",
"thumbwidth", "thumbnailWidth",
"linewidth", "lineWidth",
"linecol", "lineCol",
"pointsize", "pointSize",
"pointcol", "pointCol",
"labelsize", "labelSize",
"labelcol", "labelCol",
"axislabelx", "axisLabelX",
"axislabely", "axisLabelY",
"axiscol", "axisCol",
"axisround", "axisRoundDigits"))
valueOpts<-function(opts, mappings){
for(i in 1:dim(mappings)[2]){
if(! is.null(opts[[mapping[1]]])){
assign(mapping[2], opts[[mapping[1]]], inherits=TRUE)

Some arguments need to be set to a boolean value if a particular argument is present as a flag or not. We use a similar technique for this, but the matrix containing themapping from argument to variable also has a boolean value that is used to set the variable rather than fetching an argument value.

boolean.mappings<-matrix(ncol=3, byrow=TRUE,
c("no-images", "shouldDrawImages", FALSE,
"no-points", "shouldDrawPoints", FALSE,
"no-lines", "shouldDrawLines", FALSE,
"no-labels", "shouldDrawLabels", FALSE,
"no-axes", "shouldDrawAxes", FALSE))
booleanOpts<-function(opts, mappings){
for(i in 1:dim(mappings)[2]){
if(! is.null(opts[[mapping[1]]])){
assign(mapping[2], mapping[3], inherits=TRUE)

The render type is specified through the arguments passed to the script, but we only want to perform one kind of render. We check that only one kind of render was specified or else we quit with an informative error message.

renderTypeCount<-sum(as.integer(opt$pdf, opt$png, opt$display))
if(renderTypeCount > 1){
cat("Please specify only one of png, pdf or display to render\n")

We get the file name to save the render as, if needed.

if((! is.null(opt$display)) && is.null(opt$outfile)){
cat("Missing parameter: --outfile|-o \n")

The last bit of configuration we get is the column to use for filenames in the data file, if it's provided, otherwise we default to "filename".

if(! is.null(opt$labelColumn)){
} else {

The last function we define in the script performs the render specified in the arguments to the script.

if(! is.null(opt$pdf)){
## render pdf
pdfPlot(getOutfile(opt), opt$datafile, opt$imagedir, opt$xcolumn,
opt$ycolumn, getFilenameColumn(opt))
} else if (! is.null(opt$png)){
## render png
pngPlot(getOutfile(opt), opt$datafile, opt$imagedir, opt$xcolumn,
opt$ycolumn, getFilenameColumn(opt))
} else {
## render display
X11Plot(opt$datafile, opt$imagedir, opt$xcolumn, opt$ycolumn,
## Stop R exiting and closing the window straight away...

Finally, outside of any function, we call the functions we have defined in order to do the work of processing the parameters and calling the code.

checkRequiredArgs(opt, required)
valueOpts(opt, value.mappings)
booleanOpts(opt, boolean.mappings)

If we save this code in a file called propcli and make it executable using the shell command:

chmod +x propcli

We can call the script from the command line like this:

./propcli --datafile images.txt --imagedir images --xcolumn saturation_median --ycolumn hue_median

Digital Parapsychology

The (quasi-)scientific investigation of paranormal phenomena is a category error. Paranormality is qualitative affect, not quantitative effect. To the extent that it has physical effects these are not physically caused.

Seeking to reduce the numinosity of the paranormal to number is a mistake. It should be experienced, it should be retold, it should resonate.

But decades of research and reporting of the paranormal have amassed large quantities of data. In the age of Big Data, digitizing, analysing and relating this data with other data sources (news, geodata, parish records, government statistical information) can find evidence that has previously been missed and suggest new theories and new lines of investigation.

I propose Digital Parapsychology.

Do you believe in UFOs, astral projections, mental telepathy, ESP, clairvoyance, spirit photography, telekinetic movement, full trance mediums, the Loch Ness monster and the theory of Atlantis?

Are you troubled by strange noises in the middle of the night?

Do you experience feelings of dread in your basement or attic?

Have you or your family ever seen a spook, spectre or ghost?

Get yourself a Hadoop cluster and start feeding your EMF meter and IR sensor readings into map-reduce jobs to correlate them with historical and live feed data…