Categories
Art Computing Art Open Data

Exploring Art Data 22

So far we have used the R REPL to run code. Let’s write a script that provides a command-line interface for the plotting code we have just written.
A command-line interface allows the code to be called via the terminal, and to be called from shell scripts. This is useful for exploratory coding and for creating pipelines and workflows of different programs. It also allows code to be called from network programming systems such as Hadoop without having to convert the code.
To allow the code to be called from the command line we use a “pound bang line” that tells the shell to use the Rscript interpreter rather than the interactive R system.

#!/usr/bin/env Rscript
## -*- mode: R -*-


Next we import the “getopt” library that we will use to parse arguments passed to the script from the command line.

library('getopt')


And we import the properties-plot.r code that we will use to perform the actual work.

source('properties-plot.r')


The first data and functions we write will be used to parse the arguments passed to the script by its caller. The arguments are defined in a standard format used by the getopt library.

args


Now we have the arguments we can process them. We check for the presence of arguments to see whether the user has provided them by checking whether its value is not null.
It's traditional to handle the help flag first.

if(! is.null(opt$help)) {
self = commandArgs()[1];
cat(paste(getopt(args, usage=TRUE)))
q(status=1)
}


Next we check for required arguments, those arguments that the user must have provided in order for the code to run. Rather than checking each argument individually we list the required arguments in a vector and then check for their presence using set intersection. If the resulting set isn't empty, we build a string describing the missing arguments and use it to print an error message before exiting the script.

required


Then we set the global variables from properties-plot.r to the command line arguments that have been provided for them. We map the argument name to the variable name and then where it is present we use the assign function to set the variable.

value.mappings


Some arguments need to be set to a boolean value if a particular argument is present as a flag or not. We use a similar technique for this, but the matrix containing themapping from argument to variable also has a boolean value that is used to set the variable rather than fetching an argument value.

boolean.mappings


The render type is specified through the arguments passed to the script, but we only want to perform one kind of render. We check that only one kind of render was specified or else we quit with an informative error message.

renderTypeCount 1){
cat("Please specify only one of png, pdf or display to render\n")
q(status=1)
}


We get the file name to save the render as, if needed.

getOutfile\n")
q(status=1)
}
opt$outfile
}


The last bit of configuration we get is the column to use for filenames in the data file, if it's provided, otherwise we default to "filename".

getFilenameColumn


The last function we define in the script performs the render specified in the arguments to the script.

render


Finally, outside of any function, we call the functions we have defined in order to do the work of processing the parameters and calling the code.

checkRequiredArgs(opt, required)
valueOpts(opt, value.mappings)
booleanOpts(opt, boolean.mappings)
render(opt)


If we save this code in a file called propcli and make it executable using the shell command:

chmod +x propcli

We can call the script from the command line like this:

./propcli --datafile images.txt --imagedir images --xcolumn saturation_median --ycolumn hue_median

Categories
Art Computing Art History Art Open Data Projects

Exploring Art Data 21

Now that we have a file of statistical information about the folder of images that we are examining, we can plot this using the images themselves.

First we need to install and load the library we will use to load the images to plot. You may need to install ImageMagick’s libmagick for EBImage to install, it doesn’t seem to like GraphicsMagick.

In Fedora run:

sudo yum install libmagick-devel

In Ubuntu run:

sudo aptitude install libmagick-dev

We can then install and load EBImage as follows:

## source("http://bioconductor.org/biocLite.R")
## biocLite("EBImage")
library("EBImage") 

Next we declare constants to control various aspects of the plot. This includes the size of the image, the graphical properties of the elements that we are plotting, and which elements to plot.
 

## The plot
#inches
plotWidth<-8
plotHeight<-6
plotBorder<-1
innerWidth<-plotWidth - (plotBorder * 2)
innerHeight<-plotHeight - (plotBorder * 2)
plotBackgroundCol<-rgb(0.4, 0.4, 0.4, 1.0)
## Thumbnail images
thumbnailWidth<-0.3
## Lines
lineWidth<-1
lineCol<-rgb(0.8, 0.8, 0.8, 1.0)
## Points
## This is the point scale factor (cex)
pointSize<-2
pointStyle<-19
pointCol<-rgb(0.8, 0.8, 0.8, 1.0)
## Labels
## The label scale factor (cex)
labelSize<-0.25
labelCol<-rgb(1.0, 1.0, 1.0, 1.0)
## Axes
axisLabelX<-""
axisLabelY<-""
axisCol<-rgb(1.0, 1.0, 1.0, 1.0)
## Number of significant digits to round fractional part of each tick value to
axisRoundDigits<-3
## What to draw
shouldDrawImages<-TRUE
shouldDrawPoints<-TRUE
shouldDrawLines<-TRUE
shouldDrawLabels<-TRUE
shouldDrawAxes<-TRUE 

Then we declare variables and functions that will be used to process the data in order to fit its values into the plot in a visually appealing way.

minXValue<-NULL
maxXValue<-NULL
minYValue<-NULL
maxYValue<-NULL
scaleX<-NULL
scaleY<-NULL
## Update the scaling factor for positioning images
updateXYScale<-function(){
rangeX<<-maxXValue - minXValue
scaleX<<-innerWidth / rangeX
rangeY<<-maxYValue - minYValue
scaleY<<-innerHeight / rangeY
}
scaleXValue<-function(x){
plotBorder + ((x - minXValue) * scaleX)
}
scaleYValue<-function(y){
plotBorder + ((y - minYValue) * scaleY)
}
## Set the range of the X and Y axes for positioning images
setMinMaxXYValues<-function(xMin, yMin, xMax, yMax){
minXValue<<-xMin
maxXValue<<-xMax
minYValue<<-yMin
maxYValue<<-yMax
updateXYScale()
}
## Calculate the range of the X and Y axes for positioning images
discoverMinMaxXYValues<-function(xValues, yValues){
xRange<-range(xValues)
yRange<-range(yValues)
## Handle 0..1 or a..b
if(xRange[2] - xRange[1] > 1){
xRange<-c(floor(xRange[1]), ceiling(xRange[2]))
} else {
xRange<-c(floor(xRange[1] * 1000) / 1000, ceiling(xRange[2] * 1000) / 1000)
}
if(yRange[2] - yRange[1] > 1){
yRange<-c(floor(yRange[1]), ceiling(yRange[2]))
} else {
yRange<-c(floor(yRange[1] * 1000) / 1000, ceiling(yRange[2] * 1000) / 1000)
}
## Floor and ceiling the values to round them to the nearest integers
## and make the values on the plot nicer
setMinMaxXYValues(xRange[1], yRange[1], xRange[2], yRange[2])
}
## Left X value for image
## image parameter accepted to give these calls a regular signature
imageXLeft<-function(image, valueX){
valueX
}
## Right X value for image
## image parameter accepted to give these calls a regular signature
imageXRight<-function(image, valueX){
valueX + thumbnailWidth
}
## Get the height of the image scaled to the new width
imageHeightScaled<-function(image, scaledWidth){
scale<-dim(image)[1] / scaledWidth
dim(image)[2] / scale
}
## Bottom Y value for image
imageYBottom<-function(image, valueY){
valueY - imageHeightScaled(image, thumbnailWidth)
}
## Top Y value for image
imageYTop<-function(image, valueY){
valueY
} 

The labels for
each image, the points marking the image’s position, the lines connecting each image, and the top left of each image are positioned on the x, y co-ordinates for the image’s properties being plotted.

Centering the image on the x, y co-ordinates might be more natural but it would obscure the position of the point and the connecting lines if they were also drawn.

plotLabels<-function(labelValues, xValues, yValues){
## Position the labels underneath the images
text(xValues, yValues, labelValues, col=labelCol, cex=labelSize, pos=3)
}

plotImages<-function(imageFilePaths, xValues, yValues){
for(i in 1:length(imageFilePaths)){
image<-readImage(imageFilePaths[i])
x<-xValues[i]
y<-yValues[i]
## Does the image really have to be rotated???
rasterImage(rotate(image), imageXLeft(image, x), imageYTop(image, y),
imageXRight(image, x), imageYBottom(image, y))
}
} 

When we plot the axes their tick values are auto-generated from the value ranges, so they may look weird.
 

plotAxes<-function(){
xat<-round(seq(minXValue, maxXValue,
(maxXValue - minXValue) / plotWidth),
axisRoundDigits)
axis(1, 0:plotWidth, xat, col=axisCol, col.ticks=axisCol, col.axis=axisCol)
yat<-round(seq(minYValue, maxYValue,
(maxYValue - minYValue) / plotHeight),
axisRoundDigits)
axis(2, 0:plotHeight, yat, col=axisCol, col.ticks=axisCol, col.axis=axisCol)
} 

Having written functions to plot each element, we declare an all-in-one function to plot everything that is enabled in the configuration constants above.
 

plotElements<-function(imageFilePaths, xValues, yValues, labelValues){
if(shouldDrawLines){
lines(xValues, yValues, col=lineCol, lwd=lineWidth)
}
if(shouldDrawPoints){
points(xValues, yValues, pch=pointStyle, col=pointCol)
}
if(shouldDrawImages){
plotImages(imageFilePaths, xValues, yValues)
}
if(shouldDrawLabels){
plotLabels(labelValues, xValues, yValues)
}
if(shouldDrawAxes){
plotAxes()
}
}


Then we declare a function to get the values from the data frame and call the plot-everything function.
 

setValuesAndPlot<-function(data, imageFilepaths, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
## Get the lists for the data columns, get the doubles from them,
## and scale to the plot
xValues<-data[xColumn][,1]
yValues<-data[yColumn][,1]
if(discoverRange){
discoverMinMaxXYValues(xValues, yValues)
}
scaledXValues<-sapply(xValues, scaleXValue)
scaledYValues<-sapply(yValues, scaleYValue)
axisLabelX<<-xColumn
axisLabelY<<-yColumn
plotElements(imageFilepaths, scaledXValues, scaledYValues, data[,labelColumn])
title(xlab=xColumn, ylab=yColumn, col.lab=axisCol)
} 

You’ll notice each function is combining and building on earlier functions. Functions should be short, readable, organizing units. The next one that we declare reads the data file and the image files, and then plots the values.


readAndPlot<-function(dataFile, imageFolder, xColumn, yColumn, labelColumn="filename", discoverRange=TRUE){ data<-read.delim(dataFile, stringsAsFactors=FALSE) imageFilepaths<-sapply(data["filename"], function(filename) file.path(imageFolder, filename)) setValuesAndPlot(data, imageFilepaths, xColumn, yColumn, labelColumn, discoverRange) }


The next function makes a new R plot with the proper graphics parameters
Notably this sets the bounds and background colour.
 

newPlot<-function(dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
## Call before plot.new()
par(bg=plotBackgroundCol)
plot.new()
## Use co-ordinates relative to the bounds
par(usr=c(0, plotWidth, 0, plotHeight))
par(bty="n")
readAndPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn,
discoverRange)
}


Finally we can declare functions to plot to various different kinds of R devices. X11 for screen display and testing, PNG for embedding in web pages and documents, and PDF for high-quality output. Note that the PDF will include all the images plotted, and so it will become very large very quickly. A high-resolution PNG will be more practical for very large imagesets.
 

## Make a new X11 plot
X11Plot<-function(dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
X11(width=plotWidth, height=plotHeight)
newPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn, discoverRange)
}
## Make a new PNG plot
pngPlot<-function(outFile, dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE, dpi=600){
png(filename=outFile, width=plotWidth, height=plotHeight, units="in",
res=dpi)
newPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn, discoverRange)
dev.off()
}
## Make a new PDF plot
pdfPlot<-function(outFile, dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
pdf(file=outFile, width=plotWidth, height=plotHeight)
newPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn, discoverRange)
dev.off()
} 

Calling these image generating commands from the REPL in Emacs or on the command line means that we can see the output and modify the constants we declared at the start and the parameters that we pass to the image plotting functions in order to modify and improve the results interactively.

Running:
 

X11Plot("images.txt", "images", "brightness_median", "saturation_stdev") 

Gives us:

Mondrian VisualizationNext we can wrap the functions we have written in command-line and GUI interfaces and explore the strengths and weaknesses of each.

Categories
Art Computing Art Open Data Projects

Exploring Art Data 20

[Exploring Art Data 18 and 19 concern parsing and charting the Graves Art Sales data covering Constable. They will be published later.]

Let’s reproduce the functionality of ImagePlot in R . We’ll do this in several stages. In this post we’ll write code to produce statistical information about collections of image files. In the next post we’ll write code to visualize that information. Then we’ll write command line and graphical user interface code for the visualization. Finally we’ll use the command line code to look at how to perform image analysis distributed over the network.

For the statistical analysis code (image-properties.r), first we’ll need to install and load some libraries:

## source("http://bioconductor.org/biocLite.R")
## biocLite("EBImage")
## You may need to install libmagick for EBImage
library("EBImage")
##install.packages("colorspace")
library("colorspace")

Then we will need to write code to convert from the computer-friendly RGB colourspace to the human-friendly HSB colourspace.


## Get the r,g,b colour values for all the pixels in the image as a list
imageRgbs<-function(bitmap){
## Get flat lists of red, green and blue pixel values
red<-imageData(channel(bitmap, "red"))
dim(red)<-NULL
green<-imageData(channel(bitmap, "green"))
dim(green)<-NULL
blue<-imageData(channel(bitmap, "blue"))
dim(blue)<-NULL
## Combine these lists into a table of pixel r,g,b values
data.frame(red=red, green=green,blue=blue)
}
## Convert the RGB data.frame to an RGB objects collection
rgbToHsv<-function(rgbs){
as(RGB(rgbs$red, rgbs$green, rgbs$blue), "HSV")
}

Next we write the code to produce the statistics for each image. R makes this very easy.


## Calculate the median values for the HSV coordinates
## The colour returned is not a colour in the image,
## it just contains the median values
medianHsv<-function(hsvcoords){
HSV(median(hsvcoords[,"H"]), median(hsvcoords[,"S"]), median(hsvcoords[,"V"]))
}
## Calculate the minimum and maximum values for the HSV coordinates
## Returns a vector of colours, the first containing low values,
## the second containing high values
## These are not colours that appear in the image, they just contain the values
rangeHsv<-function(hsvcoords){
hrange<-range(hsvcoords[,"H"])
srange<-range(hsvcoords[,"S"])
vrange<-range(hsvcoords[,"V"])
c(min=HSV(hrange[1], srange[1], vrange[1]),
max=HSV(hrange[2], srange[2], vrange[2]))
}
## Calculate the standard deviation for the HSV coordinates
## The colour returned is not a colour in the image,
## it just contains the sd for each value
sdHsv<-function(hsvcoords){
hsd<-sd(hsvcoords[,"H"])
ssd<-sd(hsvcoords[,"S"])
vsd<-sd(hsvcoords[,"V"])
HSV(hsd[1], ssd[1], vsd[1])
}
## A good way of getting the min, max, median and other useful values
summaryHsv<-function(hsvcoords){
list(H=summary(hsvcoords[,"H"]),
S=summary(hsvcoords[,"S"]),
V=summary(hsvcoords[,"V"]))
}

Now we write the code to output those statistics. This is slightly more complex than the simplest possible way of structuring the code would be in order to make the code more robust if we need to change it later.


## HSV == HSB
## We use brightness for compatibility
## Some columns and column names are also for compatibility
## Load the file and return a vector of named interesting statistics
fileRow<-function(filename){
cat(filename, sep="\n")
img<-readImage(filename)
rgbs<-imageRgbs(img)
hsvs<-rgbToHsv(rgbs)
hsvcoords<-coords(hsvs)
summaryhsv<-summaryHsv(hsvcoords)
sdhsv<-sdHsv(hsvcoords)
sdhsvcoords<-coords(sdhsv)
## NaN for year for now
c("year"=NaN,
## Get the values manually so that we don't rely on position in case
## that ever changes
"hue_min"=summaryhsv$H[["Min."]],
"hue_1st_qu"=summaryhsv$H[["1st Qu."]],
"hue_median"=summaryhsv$H[["Median"]],
"hue_mean"=summaryhsv$H[["Mean"]],
"hue_3rd_qu"=summaryhsv$H[["3rd Qu."]],
"hue_max"=summaryhsv$H[["Max."]],
"hue_stdev"=sdhsvcoords[,"H"][[1]], ## There must be a better way than this
"saturation_min"=summaryhsv$S[["Min."]],
"saturation_1st_qu"=summaryhsv$S[["1st Qu."]],
"saturation_median"=summaryhsv$S[["Median"]],
"saturation_mean"=summaryhsv$S[["Mean"]],
"saturation_3rd_qu"=summaryhsv$S[["3rd Qu."]],
"saturation_max"=summaryhsv$S[["Max."]],
"saturation_stdev"=sdhsvcoords[,"S"][[1]],
"brightness_min"=summaryhsv$V[["Min."]],
"brightness_1st_qu"=summaryhsv$V[["1st Qu."]],
"brightness_median"=summaryhsv$V[["Median"]],
"brightness_mean"=summaryhsv$V[["Mean"]],
"brightness_3rd_qu"=summaryhsv$V[["3rd Qu."]],
"brightness_max"=summaryhsv$V[["Max."]],
"brightness_stdev"=sdhsvcoords[,"V"][[1]])
}
## Create a frame containing interesting information about the images
filesSummaries<-function(filenames, folder){
cat("Processing: ")
cat(filenames, sep=", ")
filepaths<-sapply(filenames,
function(filename) file.path(folder, filename))
## data.frame columns can be different types, so we add the filenames here
## We don't have the strings as factors as if we paste() them as factors
## they are pasted as numbers (levels)
data.frame(filename=filenames,
imageID=1:length(filenames),
t(sapply(filepaths, fileRow)), stringsAsFactors=FALSE)
}
## Print the fileDetails frame to a tab-separated-values file
## This can easily be loaded back into R
printFilesSummaries<-function(fileDetails, outfile=""){
## Build an array of values for the file images,
## and make it a frame with the filenames as a column
cat(paste(names(fileDetails), collapse="\t"), file=outfile)
cat("\n", file=outfile)
for(row in 1:dim(fileDetails)[1]){
cat(paste(fileDetails[row,], collapse="\t"), file=outfile)
cat("\n", file=outfile)
}
cat("Done.", sep="\n")
}

This code can be run from an interactive R session. For scripting and distribution it can be convenient to have a command-line interface to the code. So in another file (imgstats) we write a simple command-line interface to the code.
First of all we load the image statistics code


source("image-properties.r")

Then we parse the command line arguments and make sure that the user has provided reasonable values to the script, quitting with an advisory message if they have not.


################################################################################
## Parse the command line
################################################################################
args<-commandArgs(TRUE)
if(length(args) != 1){
stop(paste("usage: imgstats [foldername]"))
}
folder<-args[1]
if(folder == "."){
stop("Please pass the name of the folder to process, not '.' .")
}

Next we call the code from image-properties.r to process each of the files in the folder the user named as an argument to the script.


################################################################################
## Process the files
################################################################################
## Add more formats to taste. We need to only load image files though
files<-list.files(folder, pattern="*.(jpg|jpeg|tif|tiff|png|gif|bmp)")
stats<-filesSummaries(files, folder)

Finally we call the code from image-properties.r to save the data to a file (named after the folder that has been processed by the script).


################################################################################
## Write the data
################################################################################
outfile<-file(paste(folder, ".txt", sep=""), "w")
printFilesSummaries(stats, outfile)
close(outfile)

To run the code from the command line we have to set its execute file permission:


chmod +x imgstat

And now we can generate statistical data about a folder of images ready to visualize.

Here’s an example of the output from the script:

filename	imageID	year	hue_min	hue_1st_qu	hue_median	hue_mean	hue_3rd_qu	hue_max	hue_stdev	saturation_min	saturation_1st_qu	saturation_median	saturation_mean	saturation_3rd_qu	saturation_max	saturation_stdev	brightness_min	brightness_1st_qu	brightness_median	brightness_mean	brightness_3rd_qu	brightness_max	brightness_stdev
1905.5_a_mondrian.jpg	1	NaN	0	51.43	55.29	60.23	65.71	330	16.5393230900438	0	0.232	0.2925	0.3066	0.3537	1	0.123994722897117	0.01176	0.2863	0.4549	0.4818	0.702	1	0.246702823821300
1905.5_b_mondrian.jpg	2	NaN	0	36	40.47	94.86	192	360	90.020405312462	0	0.1333	0.2155	0.1979	0.2581	0.7143	0.0935518755834357	0.04314	0.2	0.6118	0.5523	0.8549	0.9725	0.312092338505255
1905.5_c_mondrian.jpg	3	NaN	0	41.54	264	200.3	330	360	139.634180648185	0	0.141	0.2	0.2078	0.2644	1	0.0978851449446736	0.03137	0.08235	0.1608	0.2296	0.2667	1	0.216606346291597
1905.5_d_mondrian.jpg	4	NaN	0	32	42	76.1	67.5	360	90.4207404627883	0	0.2308	0.306	0.3256	0.4	1	0.139508165859475	0.01569	0.1176	0.2627	0.3912	0.698	1	0.296420604122761
1905.5_e_mondrian.jpg	5	NaN	0	72	124.3	135.3	217.2	345	65.5574296333988	0	0.1646	0.25	0.2706	0.3925	1	0.144880395319209	0.01569	0.2	0.3176	0.4501	0.8314	1	0.304302318390967
1905.5_f_mondrian.jpg	6	NaN	0	48	56.84	85.65	70	360	74.8812889958008	0	0.05	0.1452	0.1937	0.3438	0.8333	0.153920183168041	0.01569	0.2863	0.4353	0.4846	0.7059	0.9725	0.242587145239624
1905.5_g_mondrian.jpg	7	NaN	0	38.46	43.08	42.42	46.32	100	5.77930853730472	0	0.324	0.4706	0.4646	0.5972	1	0.164745255193285	0.01176	0.4196	0.5922	0.6044	0.8588	1	0.243507248302950
1905.5_h_mondrian.jpg	8	NaN	0	40	47.14	54.34	60	360	25.19346977218	0	0.07547	0.1946	0.2287	0.3524	1	0.174754958215903	0.003922	0.2902	0.5882	0.5644	0.8157	0.9961	0.300666257269201
Categories
Art Computing Art Open Data Projects

ImagePlot and properties-plot.r

ImagePlot is out:

http://lab.softwarestudies.com/2011/09/introducing-imageplot-software-explore.html

It’s a JImage macro that plots visualizations of image statistics using the images themselves. It’s very cool, do take a look. As well as the complete software under the GPL (but version 2 only???) it has sample data, and essays explaining the project. There are some great examples of visualizations created using the system at the link above.

As what will be the basis of the next posts in my “Exploring Art Data” series, I’ve implemented a simpler version of ImagePlot in R over the weekend. The code to extract image data has been wrapped up as a command-line tool, the code to produce the final visualizations hasn’t.

You can get my code here in the image-analysis folder (it’s GPLv3 or later):

https://gitorious.org/robmyers/art-data/

Here’s an example plot using the Mondrian images from ImagePlot’s sample folder:

Mondrian VisualizationYou can create PDF and PNG files as well as view the results onscreen. I want to tweak the display parameters and tidy up a few lines in the code then make a command-line interface for it. And maybe even a GUI interface…

Categories
Aesthetics Art Computing Art Open Data Free Software Howto Projects

Logging Colours To ThingSpeak

ThingSpeak is a Free Software-based web service for publishing (geolocated) data. This makes it better than proprietary services for publishing data.

Using it is very easy, as this tutorial demonstrates. here’s some code I’ve written in the Python programming language to grab a palette of 8 colours from a webcam image of my studio every 10 minutes and publish it to a ThingSpeak “Channel”:

http://OFFLINEZIP.wpsho/git/?p=thingspeak.git

And here’s the resulting data, in JSON format:

http://api.thingspeak.com/channels/357/feed.json

Update [17th April 2011]

Thanks to ThingSpeak suggesting it, here’s a jQuery display of the colours:

http://OFFLINEZIP.wpsho/git/?p=thingspeak.git;a=blob_plain;f=studio_colours.html

It starts with the 100 most recent palettes, and adds them every 10 minutes as more are uploaded.

Categories
Art Computing Art History Art Open Data Projects

Exploring Art Data – The Plan

I’ve been very, very busy recently and I haven’t had time to work on the “Exploring Art Data” series of blog posts.
I will get back to them. First I will finish the Graves Art Sales exploration. Then I will use Joy Garnett’s images of her paintings as an example of processing a (small) large dataset. Then I will analyse the Netbehaviour mailing list archive as an example of a social network.
And that’s the plan. Unless anyone has anything else they’d like to see.

Categories
Art Computing Art History Art Open Data

Exploring Art Data 17

Let’s clean up the Constable data from Graves Art Sales.

This extract shows most of the issues with the scanned and OCR-ed data:

1839 April 13 Christie's Samuel Archbutt. 114. Salisbury Cathedral from Meadows Theobald
I839 April 13 Samuel Archbutt. 115. Embarltation of George IV, Waterloo
Bridge Bought in 43 1 o
1345 May 16 Mr. Taunton. 41. Salisbury Cathedral Bought in 441 o o
1846 May 16 Mr. Taunton. 42. Dedham Bought in 357 o 0
1345 June 4 ,, Edward Higginson. 77. Waggon passing through a River Rought 378 o o
1848 May 3 Phillips Ralph Thomas. 176. Salisbury Cathedral - -
I848 June 2 Christie's Sir Thos. Baring. 21. Opening Waterloo Bridge Barton 33 2 o
1349 May 17 Taunton. 11o. Salisbury Cathedral from the Meadows. The
celebrated picture Rought 43o 1o o
1349 M87 I7 Taunton. 111. Dedham, with Towing Path Bought in 157 1o o
1351 June 13 H03lrlh- 46. Hadleigh Castle Winter 32o 5 0
1353 M111 1 R. Morris. 131. A Lock on the Stour Wass 105 o o

Some lines run on, some lines end with dashes or no numbers, some numbers are mistaken for letters, some words are corrupted. There are also blank lines between each scanned page of the book.

First we can fix the run-on lines and blank lines in a text editor, deleting the newline to make them into single lines or combining run-ons.

Then we can write a shell script to fix other issues. It’s important to make sure that the script doesn’t introduce more problems, so each substitution should be small, well-defined and carefully tested.

Here’s such a script (cleanup.sh):

#!/bin/sh
INFILE=constable-ocr.txt
OUTFILE=constable-processed.txt
EDITFILE=constable.txt
# Send the source file to sed
cat "${INFILE}" | sed --regexp-extended '
# Fix numbers, where o=0 and 1=1
s/I([123456789]+)/1\1/g
s/([123456789]+)I/\11/g
s/([123456789]+)o/\10/g
# Fix years, 15,13 = 18
s/^1[35]/18/
# Fix trailing zeroes mistaken for o
s/ o o$/ 0 0/
s/ o$/ 0/
# Fix mistaken characters
s/I-I/H/g
s/,,/"/
# Make sure john constable is properly OCR-ed
s/(Iohn|john)/John/g
s/(C0nstable|Oonstable)/Constable/g
s/JohnCon.stable/John Constable/
# Fix frequently mistaken words and acronyms
s/ILA./R.A./
# Fix spacing
s/,R.A./, R.A./
s/Constable R.A./Constable, R.A./
' > "${OUTFILE}"
# Make a copy of the processed file ready to be edited by hand
cp "${OUTFILE}" "${EDITFILE}"

In the shell script we use GNU sed, a command-line utility that allows us to use regular expressions to modify the contents of files. If you don’t know regular expressions they can look quite mysterious, but in fact they are a simple and expressive language that greatly increase what you can do on a computer if you learn how to use them.

Why on earth go to all this trouble just to change a few typos? Well, if we manage to improve the inital scanning or OCR, we don’t have to fix any remaining problems by hand. The script may record techniques that are useful elsewhere. And it’s more controllable to re-run a script rather than undo or search and replace mistakes made in a document editor.

Once you’ve replaced the obvious patterns of typos, it’s time to edit the text by hand. Compare each column of each line to the scan or to the original text, first all the dates, and auction houses, then all the names, then all the descriptions, then all the names and prices. When comparing numbers, check each digit as the year or the day of the month may be out by as little as one. Tesseract doesn’t seem to capture double quotes very often in the scans I used, so I had to add these in by hand. It also doesn’t capture em dashes, which I’ve represented as a hyphen to make parsing easier later.

It only seems like processing each page will take forever for the first few pages. You’ll quickly learn how to break down the task into manageable chunks.

Here is an example of cleaned up data:

1834 June 7 Christie's - 67. Landscape with Figures - -
1838 May 15 Foster's John Constable, R.A. 3. Stonehenge, etc. Smith 4 14 6
1838 May 15 " John Constable, R.A. 10. Glebe Farm, etc. Williams 3 15 0
1838 May 15 " John Constable, R.A. 12. Salisbury Cathedral and Helmingham Park Allnutt 3 9 0
1838 May 15 " John Constable, R.A. 13. Salisbury Cathedral and Glebe Farm Carpenter 24 10 6
1838 May 15 " John Constable, R.A. 14. Comlield. Study for N.G. picture Radford 9 19 6
1838 May 15 " John Constable, R.A. 23. Salisbury Cathedral, etc. Leslie 11 11 0
1838 May 15 " John Constable, R.A. 26. Dedham Rulton 8 8 0
1838 May 15 " John Constable, R.A. 29. View in Helmingham Park Swaby 16 5 6
1838 May 15 " John Constable, R.A. 30. Salisbury Cathedral from Bishop's Garden Archbutt 16 16 0
1838 May 15 " John Constable, R.A. 31. Hadleigh Castle. Sketch Smith 3 13 6
1838 May 15 " John Constable, R.A. 33. Two Views of East Bergholt Archbutt 24 3 0
1838 May 15 " John Constable, R.A. 35. River Scene and Horse Jumping Archbutt 52 10 0
1838 May 15 " John Constable, R.A. 37. Salisbury Cathedral from Meadows Williams 6 10 0
1838 May 15 " John Constable, R.A. 39. Mill on the Stour. Sketch Hilditch 7 17 6
1838 May 15 " John Constable, R.A. 40. Opening Waterloo Bridge. Sketch Joy 2 10 0
1838 May 15 " John Constable, R.A. 41. Weymouth Bay. Sketch. Swaby 4 4 0
1838 May 15 " John Constable, R.A. 42. Waterloo Bridge and Brighton Archbutt 5 0 0
1838 May 15 " John Constable, R.A. 43. Chain Pier and Dedham Church Stuart 5 5 0
1838 May 15 " John Constable, R.A. 44. Hampstead Heath and Waterloo Bridge Morton 4 14 0
1838 May 15 " John Constable, R.A. 45. Weymouth Bay, Waterloo Bridge, and two others Burton 7 7 0
1838 May 15 " John Constable, R.A. 46. East Bergholt, Dedham, etc. Nursey 4 14 6
1838 May 15 " John Constable, R.A. 47. Weymouth Bay and four others Williams 1 13 0
1838 May 15 " John Constable, R.A. 48. Moonlight and Landscape with Rainbow Leslie 5 5 0
1838 May 15 " John Constable, R.A. 49. Three Landscapes Archbutt 31 10 0
1838 May 15 " John Constable, R.A. 50. Salisbury Madows Sheepshanks 35 14 0
1838 May 15 " John Constable, R.A. 51. Study of Trees and Fern with Donkies Sheepshanks 23 2 0
1838 May 15 " John Constable, R.A. 52. Cottage in a Cornfield Burton 27 6 0
1838 May 15 " John Constable, R.A. 53. Hampstead Hath-at the Ponds Sheepshanks 37 5 6

It’s possible to use scripts to check that the cleaned up data makes sense. We can check the dates are sequential, for example (check_dates.py):

#!/usr/bin/python
# Usage: check_dates.py FILENAME
# Assumes file with each line starting in Graves date format: YYYY month (D)D
# Ensure that dates are sequential
# Won't catch minor errors, will catch major errors
import datetime
import sys
MONTHS={'jan':1, 'feb':2, 'mar':3, 'apr':4, 'may':5, 'jun':6, 'jul':7, 'aug':8,
'sep':9, 'oct':10, 'nov':11, 'dec':12}
def main():
if len(sys.argv) != 2:
print "Usage: %s FILENAME" % sys.argv[0]
sys.exit(1)
last_date=datetime.date(1779, 1, 1)
for line in open(sys.argv[1]):
components = line.split()
year = int(components[0])
month = MONTHS[components[1][:3].lower()]
day = int(components[2])
try:
line_date = datetime.date(year, month, day)
if line_date < last_date:
print "Date not successive: %s" % line
last_date = line_date
except ValueError, e:
print "Bad date component: %s" % line
if __name__ == "__main__":
main()

Once we have cleaned up the text, we can convert it to a machine readable format using another script. Actually writing such a script is an interactive process. Read the text, write a script that should be able to extract information from it, run the script, and then correct either the text (if the script fails to run because of typos in the text) or the script (if the script fails to extract all the information from the text). Here’s the script for the Constable data (lines_to_tsv.py):

#!/usr/bin/python
import re
import sys
################################################################################
# Assemble the regular expression to process each line
################################################################################
# Date of sale
# YYYY MONTH(.) D(D)
DATE = r'(.{4} \w+.? \d{1,2})'
# Auction house
# " or name. Names are arbitrary, so use a list of names
AUCTIONEER_LIST = ["B. Moulton", "Christie's", "Foster's", "Morrison & Co.",
"Paris", "Phillips", "Robinson & F.", ]
AUCTIONEER = ' ("|%s)' % '|'.join(AUCTIONEER_LIST)
# Owner of work
# Selling owner follows auction house and is followed by lot number
# It may be absent, in which case it is a hyphen
# Otherwise it's arbitrary but contains no numbers
# This is weak, we rely on the strength of the auctioneer & lot groups to fix it
OWNER = r' (-|\D+)'
# Lot number
# Possibly something in brackets, then one or more digits, with an optional
# single letter or punctuation character, then a full stop
LOT = r' (\([^)]+\) \d+.?\.|\d+.?\.)'
# Description
# Again this is arbitrary, so we rely on the strength of the adjacent groups
DESCRIPTION = r' (.+?)'
# Buyer
# This is complex because Description is arbitrary
# Buyer may be -, surname, initials, title and surname, and many others
BUYER_TITLES = [r'Captain .+', r'Col\. .+', r"D'\w+", r'De .+', r'Dr\. .+', r'Earl .+', r'La .+', r'Lord .+', r'Major .+',
# Miss .+ fails???
r'Miss \w+', r'Mr\. .+', r'Sir .+',]
BUYER_INSITITUTIONS = [r'Fine Art Society', r'National.+Gallery', r'New York',
r'New York Museum',]
BUYER_INITIALS = [r'[A-Z]\. [A-Z.]', ]
BUYER_INDIRECT = [r'\(.+\)',]
BUYER_SPECIAL = [r'Bought in', r'-',]
BUYER_NAME = [r'[A-Z]\. \w+', r'[A-Z]\. [A-Z]\. \w+', r'\w+',]
BUYER = r' (%s)?' % r'|'.join(BUYER_TITLES + BUYER_INSITITUTIONS + \
BUYER_INITIALS + BUYER_INDIRECT + \
BUYER_SPECIAL + BUYER_NAME)
# Sale price
# This may be absent entirely to indicate a group purchase, in which case we
# cannot check for a leading space, so make both the leading space and the
# other choices optional to handle that case
# It may be a hyphen to indicate absent data
# It may be pounds, shillings and pence, including zeros
# Or it may be Withdrawn
# Or it may be a quantity of French francs
PRICE = r' ?(\d+ francs|\d+ \d+ \d+|Withdrawn|-)?'
# The assembled regex for a line
LINE = r'^'+ DATE + AUCTIONEER + OWNER + LOT + DESCRIPTION + BUYER + PRICE +r'$'
LINE_REGEX = re.compile(LINE)
################################################################################
# Convert lines to tab separated values
################################################################################
# The column containing the auctioneer value
AUCTIONEER_COLUMN = 1
def process(infile):
"""Convert the line to tab-delimited fields"""
# Lines that fail to match
fails = []
auctioneer = ""
for line in infile:
matches = LINE_REGEX.match(line)
try:
# Convert the tuple to a list in case we need to assign to it
columns = list(matches.groups())
# Get or cache the auctioneer, so we replace " with the actual one
if columns[AUCTIONEER_COLUMN] != '"':
auctioneer = columns[AUCTIONEER_COLUMN]
else:
columns[AUCTIONEER_COLUMN] = auctioneer
print '\t'.join(columns)
except Exception, e:
print e
fails.append("FAIL: %s" % line)
return fails
def print_header():
"""Print the column headers"""
print "date\tauctioneer\towner\tlot\tdescription\tbuyer\tprice"
################################################################################
# Main flow of execution
################################################################################
def main():
if len(sys.argv) != 2:
print "Usage: %s FILENAME" % sys.argv[0]
sys.exit(1)
infile = open(sys.argv[1])
print_header()
fails = process(infile)
sys.stderr.write(''.join(fails))
if __name__ == "__main__":
main()

And here’s some of the output in tab separated value format, complete with header:

date	auctioneer	owner	lot	description	buyer	price
1834 June 7	Christie's	-	67.	Landscape with Figures	-	-
1838 May 15	Foster's	John Constable, R.A.	3.	Stonehenge, etc.	Smith	4 14 6
1838 May 15	Foster's	John Constable, R.A.	10.	Glebe Farm, etc.	Williams	3 15 0
1838 May 15	Foster's	John Constable, R.A.	12.	Salisbury Cathedral and Helmingham Park	Allnutt	3 9 0
1838 May 15	Foster's	John Constable, R.A.	13.	Salisbury Cathedral and Glebe Farm	Carpenter	24 10 6
1838 May 15	Foster's	John Constable, R.A.	14.	Comlield. Study for N.G. picture	Radford	9 19 6
1838 May 15	Foster's	John Constable, R.A.	23.	Salisbury Cathedral, etc.	Leslie	11 11 0
1838 May 15	Foster's	John Constable, R.A.	26.	Dedham	Rulton	8 8 0
1838 May 15	Foster's	John Constable, R.A.	29.	View in Helmingham Park	Swaby	16 5 6
1838 May 15	Foster's	John Constable, R.A.	30.	Salisbury Cathedral from Bishop's Garden	Archbutt	16 16 0
1838 May 15	Foster's	John Constable, R.A.	31.	Hadleigh Castle. Sketch	Smith	3 13 6
1838 May 15	Foster's	John Constable, R.A.	33.	Two Views of East Bergholt	Archbutt	24 3 0
1838 May 15	Foster's	John Constable, R.A.	35.	River Scene and Horse Jumping	Archbutt	52 10 0
1838 May 15	Foster's	John Constable, R.A.	37.	Salisbury Cathedral from Meadows	Williams	6 10 0
1838 May 15	Foster's	John Constable, R.A.	39.	Mill on the Stour. Sketch	Hilditch	7 17 6
1838 May 15	Foster's	John Constable, R.A.	40.	Opening Waterloo Bridge. Sketch	Joy	2 10 0
1838 May 15	Foster's	John Constable, R.A.	41.	Weymouth Bay. Sketch.	Swaby	4 4 0
1838 May 15	Foster's	John Constable, R.A.	42.	Waterloo Bridge and Brighton	Archbutt	5 0 0
1838 May 15	Foster's	John Constable, R.A.	43.	Chain Pier and Dedham Church	Stuart	5 5 0
1838 May 15	Foster's	John Constable, R.A.	44.	Hampstead Heath and Waterloo Bridge	Morton	4 14 0
1838 May 15	Foster's	John Constable, R.A.	45.	Weymouth Bay, Waterloo Bridge, and two others	Burton	7 7 0
1838 May 15	Foster's	John Constable, R.A.	46.	East Bergholt, Dedham, etc.	Nursey	4 14 6
1838 May 15	Foster's	John Constable, R.A.	47.	Weymouth Bay and four others	Williams	1 13 0
1838 May 15	Foster's	John Constable, R.A.	48.	Moonlight and Landscape with Rainbow	Leslie	5 5 0
1838 May 15	Foster's	John Constable, R.A.	49.	Three Landscapes	Archbutt	31 10 0
1838 May 15	Foster's	John Constable, R.A.	50.	Salisbury Madows	Sheepshanks	35 14 0
1838 May 15	Foster's	John Constable, R.A.	51.	Study of Trees and Fern with Donkies	Sheepshanks	23 2 0
1838 May 15	Foster's	John Constable, R.A.	52.	Cottage in a Cornfield	Burton	27 6 0
1838 May 15	Foster's	John Constable, R.A.	53.	Hampstead Hath-at the Ponds	Sheepshanks	37 5 6

We use tabs rather than commas as the separator because the values include commas. Alternatively we could wrap each value in speech marks and comma separate them.

Next we can load the data into R and examine it.

Categories
Art Computing Art History Art Open Data

Art Data Analysis: Sparse Coding Analysis

Bruegel

Sparse Coding


Recently, statistical techniques have been used to assist art historians in the analysis of works of art. We present a novel technique for the quantification of artistic style that utilizes a sparse coding model. Originally developed in v
ision research, sparse coding models can be trained to represent any image space by maximizing the kurtosis of a representation of an arbitrarily selected image from that space. We apply such an analysis to successfully distinguish a set of authentic drawings by Pieter Bruegel the Elder from another set of well-known Bruegel imitations. We show that our approach, which involves a direct comparison based on a single relevant statistic, offers a natural and potentially more germane alternative to wavelet-based classification techniques that rely on more complicated statistical frameworks. Specifically, we show that our model provides a method capable of discriminating between authentic and imitation Bruegel drawings that numerically outperforms well-known existing approaches. Finally, we discuss the applications and constraints of our technique.

http://www.pnas.org/content/107/4/1279

You can download the pdf here.

Categories
Art Computing Art History Art Open Data

Exploring Art Data 16

The scanned and OCRed text from Graves’ Art Sales is very noisy. Let’s start cleaning it up.

Firstly we’ll improve the source images.

In Scan Tailor, after fixing the orientation and letting the program Split Pages and Deskew, we can set the Content Box to “Manual” in Select Content, and crop out the header on each page and any entries that do not refer to Constable on the first and last pages of the series of scans. In Output we can then set the Output Resolution to 600dpi Black and White, Thickness to 30 (selecting Apply To… All Pages), and Despeckle to maximum (selecting Apply To… All Pages).

The resulting images are not ideal for human beings to read but give better results when processed with Tesseract.

We can create a config file telling Tesseract which characters to expect to find in a file. This should help remove some of the stranger characters from the output file.

We can save the following:

tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,."&-'

in a config file for Tesseract. The location of config files may be /usr/share/tesseract/tessdata/configs/ or /usr/local/share/tessdata/configs/, and you may need to be root to access the directory in either case. e.g.:

su -c 'echo tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.\"\&-'"\'"' > /usr/local/share/tessdata/configs/gravesartsales'

(Ignore the silly quoting we have to use to include a single quote in a single quoted string…)

We can then use the processed files (in the “out” directory of the Scan Tailor project) and this config file along with a shell script to create an improved version of the extracted text. Why a shell script? Writing a script allows us to iteratively improve our approach to the task, and it allows ourselves and others to reproduce the task later. Shell scripts are sketchbooks and notebooks as well as useful tools in themselves.

Here’s the script:

#!/bin/bash

# The cleaned up pages in order
PAGES="ALIM0008 ALIM0005 ALIM0009 ALIM0006 ALIM0010 ALIM0007 ALIM0011"
# The directory the input images and output text are in
DIRECTORY="./out"
# The combined output text
RESULT="constable.txt"

# Create empty output file
echo > ${RESULT}

for PAGE in ${PAGES}
do
    # Perform OCR
    tesseract -psm 6 "${DIRECTORY}/${PAGE}.tif" "${DIRECTORY}/${PAGE}" \
    gravesartsales
    # Append results to output file
    cat "${DIRECTORY}/${PAGE}.txt" >> ${RESULT}
done

If we save the script as ocr.sh next to the out folder of the Scan Tailor project and run it:

bash ocr.sh

Then the results in constable.txt can be seen to be much improved on the original version:

$ cat constable.txt 

1834 June 7 Christie's - 67. Landscape with Figures - -
1838 May 15 Foster's John Constable, R.A. 3. Stonehenge, etc. Smith 4 14 6
1838 May 15 john Constable, R.A. 1o. Glebe Farm, etc. Williams 3 15 o
1838 May 15 JohnConstable,R.A. 12. SalisburyCathedraland l-lelminghamPark Allnutt 3 9 o
1838 May 15 John Constable, R.A. 13. Salisbury Cathedral and Glebe Farm Carpenter 24 1o 6
1838 May 15 john Constable, R.A. 14. Cornlield. Study for N.G. picture Radford 9 19 6
1838 May 15 John Constable, R.A. 23. Salisbury Cathedral, etc. Leslie 11 11 o
1838 May 15 john Constable, R.A. 26. Dedham Rulton 8 8 o
1838 May 15 John Constable, R.A. 29. View in I-lelmingharn Park Swaby 16 5 6
1838 May 15 John Constable, R.A. 30. SalisburyCathedral from Bishop'sGarden Archbutt 16 16 o
1838 May 15 john Constable, R.A. 31. Hadleigh Castle. Sketch Smith 3 13 6
1838 May 15 john Constable, R.A. 33. Two Views of Eat Bergholt Archbutt 24 3 o
1838 May 15 john Constable, R.A. 35. River Scene and Horse jumping Archbutt 52 lo o
1838 May 15 John Constable, R.A. 37. Salisbury Cathedral from Meadows Williams 6 1o o
1838 May 15 john Constable, R.A. 39. Mill on the Stour. Sketch I-lilditch 7 17 6
1838 May 15 john Constable, R.A. 4o. Opening Waterloo Bridge. Sketch Joy 2 1o o
1838 May 15 john Constable, R.A. 41. Weymouth Bay. Sketch. Swaby 4 4 o
1838 May 15 john Constable, R.A. 42. Waterloo Bridge and Brighton Archbutt 5 o o
1838 May 15 john Constable, R.A. 43. Chain Pier and Dedham Church Stuart 5 5 o
1838 May 15 john Constable, R.A. 44. l-lampstead Heath and Waterloo Bridge Morton 4 14 o
1838 May 15 john Constable, R.A. 45. Weymouth Bay, Waterloo Bridge,
and two others Burton
1838 May 15 john Constable, R.A. 46. East Bergholt, Dedham, etc. Nursey
1838 May 15 john Constable, R.A. 47. Weymouth Bay and four others Williams
1838 May 15 john Constable, R.A. 48. Moonlight and Landscape with
Rainbow Leslie
I338 May 15 John Constable, R.A. 49. Three Landscapes Archbutt
I338 May 15 john Constable, R.A. 5o. Salisbury Meadows Sheepshanks
1838 May 15 John Constable, R.A. 51. Study of Trees and Fem with
Donkies Sheepshanlts 23 a o
I838 May 15 John Constable, R.A. 52. Cottage in a Comtield Burton a7 6 o
1333 May 15 ,, john Constable, R.A. 53. Hampstead Heath-at t.l1e Ponds Sheepshanks 37 5 6

1838 May 15 Foster's John Constable, R.A. 54. Flatford Mill-Horse and Barge Leslie
1838 May 15 john Constable, R.A. 55. View near Flatford Mill Rochard
1838 May 15 John Constable R.A. 56. Hampstead Heath Burton
1838 May 15 john Constable, R.A. 57. Gillingharn Mill Leslie
1838 May 15 John Constable, R.A. 58. East Bergholt Nursey
1838 May 15 john Constable, R.A. 59. Flatford-Barge Building Sheepshanlrs
1838 May 15 john Constable, R.A. 6o. Two Views near Petworth Swaby
1838 May 15 john Constable, R.A. 61. Hampstead Heath -London in distance Archbutt
1838 May 15 John Constable, R.A. 65. Dedham Vale-Long Valley Norton
1838 May 15 John Constable, R.A. 66. London from l-lampstead Burton
1838 May 15 John Constable, R.A. 67. Flatford Mill-Dark Allnutt
1838 May 15 john Constable, R.A. 68. Brighton and Chain Pier. Ex. 1827 Tiflin
1838 May 15 john Constable, R.A. 69. The Lock near Flatford Mill Archbutt
1838 May 15 John Constable, R.A. 7o. The Glebe Farm. R.A. 1835 Miss Constable
1838 May 15 John Constable, R.A. 71. The Cenotaph, etc. R.A. 1836 Miss Constable
1838 May 15 john Constable, R.A. 72. Salisbury Cathedral from Bishop's
Garden. 1823 Tiflin
1838 May 15 john Constable, R.A. 73. View in Helmingham Park. R.A. 1830 Allnutt
1838 May 15 john Constable, R.A. 74. Opening of Waterloo Bridge. R.A. 1832 Mosley
1838 May 15 Iohn Constable, R.A. 75. View of Dedham-Gipsies. R.A. 1828 M. Bone
1838 May 15 john Constable, R.A. 76. The Loclr. R.A. 1824. Sold at
Foster's, February 15th, 1855, for 6903 Birch
1838 May 15 john Constable, R.A. 77. On the River Stour-Horse on a
Barge. R.A. 1819 Morton
1838 May 15 John Constable, R.A. 78. I-Iadleigh Cutie. R.A. 1819 Miss Constable
1838 May 15 John Constable, R.A. 79. Salisbury Cathedral from Meadows. 1831 Ellis
1333 May I5 John Constable, R.A. 8o. Dedham Mill and Church Brown
I838 May 15 ,, john Constable, R.A. 81. Arundel Castle and Mill. 1837 1. Constable
1839 April 13 Christie's Samuel Archbutt. 114. Salisbury Cathedral from Meadows Theobald
I839 April 13 Samuel Archbutt. 115. Embarltation of George IV, Waterloo
Bridge Bought in 43 1 o
1345 May 16 Mr. Taunton. 41. Salisbury Cathedral Bought in 441 o o
1846 May 16 Mr. Taunton. 42. Dedham Bought in 357 o 0
1345 June 4 ,, Edward Higginson. 77. Waggon passing through a River Rought 378 o o
1848 May 3 Phillips Ralph Thomas. 176. Salisbury Cathedral - -
I848 June 2 Christie's Sir Thos. Baring. 21. Opening Waterloo Bridge Barton 33 2 o
13 49 May 17 Taunton. 11o. Salisbury Cathedral from the Meadows. The
celebrated picture Rought 43o 1o o
1349 M87 I7 Taunton. 111. Dedham, with Towing Path Bought in 157 1o o
1351 June 13 H03lrlh- 46. Hadleigh Castle Winter 32o 5 0
1353 M111 1 R. Morris. 131. A Lock on the Stour Wass 105 o o
1353 I 1111' 1 Charles Birch. 41. jumping Horse on the Stour Gambart 393 15 0
1353 I1llY 7 .1 Charles Birch. 42. Opening of London Bridge Bought in 252 o o
1855 Feb. 15 Foster's Charles Birch. 18. The Lock. 55 x 48 1-lolmgg 368 a o

1855 Mar. 31 B. Archer QBurtonJ. 99. The Wl1ite Horse. f.Cl1efd'auvreJ Horlgson 630 0 0
1858 Feb. 3 Henry Wallis. 104. Opening of Waterloo Bridge. 51 x 86 - -
1858 May 21 John Miller. 162. Salisbury Cathedral. Sketch - 49 0 0
1858 May 22 John Miller. 227. Comlield-Reapers. A Plough in Foreground - 63 1 0
1859 June 13 ,, Potts. 240. Dedham. From Constable's sale Wallis 197 8 0
1860 April 25 Foster's C. R. Leslie, R.A. 87. House with Hatchment and Trees - -
1860 April 25 C. R. Leslie, R.A. 90. Willy Lott's House - -
1860 April 25 C. R. Leslie, R.A. 92. Sketch in Suffolk, with inscription - -
1860 April 25 C. R. Leslie, R.A. 93. Mill at Arundcl - -
1860 April 25 C. R. Leslie, R.A. 94. Lock on the Stour -
1860 April 25 C. R. Leslie, R.A. 95. Stonehenge. Engraved - -
1860 April 25 C. R. Leslie, R.A. 96. A Running Brook -
1860 April 25 C. R. Leslie, R.A. 97. The Glebe Farm. Presented to Leslie -
1860 April 25 C. R. Leslie, R.A. 98. Hampstead Heath, with Surrey Hills
1860 April 27 C. R. Leslie, R A. 386. Burning of Houses of Parliament. Drawing -
1860 April 27 C. R. Leslie, R.A. 387. Jacques and Wounded Deer. Drawing
1860 April 27 C. R. Leslie, R.A. 388. Mill at Colchester. Drawing -
1860 April 27 C. R. Leslie, R.A. 390. Brighton Fishing Boats. Drawing - -
1860 April 27 C. R. Leslie, R.A. 392. South Stoke. Drawing - -
1860 April 27 C. R. Leslie, R.A. 393. Dover-Two French Luggers. Drawing - -
1860 April 27 C. R. Leslie, R.A. 394. Studies of Trees. Chalk -
1860 May 17 J. Constable, R.A. 60. Colchester Church -
1860 May 17 J. Constable, R.A. 61. Hampstead towards Harrow -
1860 May 17 J. Constable, R.A. 62. Hadlow Castle -
1860 May 17 J. Constable, R.A. 63. Flatford
1860 May 17 J. Constable, R.A. 64. A Mill - -
1860 May 17 J. Constable, R.A. 65. Cattle on Hampstmd Heath - -
1861 Feb. 6 Henry Wallis. 86. Opening of Waterloo Bridge. 86 x 52 Davenport 464 o 0
1861 May 3 E. Gambart. 294. The Lock. 475-x 55. The original picture Leatham 231 0 o
1863 May 16 ,, Gentleman. 160. The Glebe Farm - -
1863 June 17 Foster's Charles Pemberton. 47. Near Dedham-River and Boats -
1863 June 17 Charles Pemberton. 69. The Leaping Horse. 72 x 54 -
1865 May 6 i 44. Cathedral-Salisbury. 19Q x 199
1865 May 6 ,, i 45. The Mill Stream. Engraved. 33 x 38 -
1866 Mar. 28 B. Moulton Thomas Churchyard. 64. Willy Lott's House. 24 x 20 Cox
1866 Mar. 28 Thomas Churchyard. 65. Flatford Mill. 16 11 13 -
1856 Mar. 28 Thomas Churchyard. 66. Bergholl Heath. 19 x 12 Cox
1866 Mar. 28 Thomas Churchyard. 67. View at Dedham. 25 x 18 Pearce
1866 May 19 Christie's George Young. 25. The Hay Wain Cox
1867 June 22 ,, -i 91. Landscape -
1867 Dec. 11 Foster's i 121. The Leaping Horse, Dedham Lock. E. Pemberton coll.
1870 May 21 Christie's Edwin Bullock. 86. Weymouth Bay
1870 May 21 Edwin Bullock. 109. Hampstead Heath
1870 May 21 ,, Edwin Bullock. 1 15. Heath Scene-Three Peasants in Cart

1872 Mar. I6 Christie's G. R. Burnett. I2o. On the Stour, near Canterbury Agnew
I872 Mar. I6 G. R. Burnett. I2I. Opening of Waterloo Bridge Agnew
1872 April 26 Joseph Gillott. I93. Approach to London from Hampstmd Agnew
I872 April 26 Joseph Gillott. I95. Landscape with Cottage New York
1872 April 26 Joseph Gillott. I96. On the Stour-Dedham Church New York
X872 April 26 Joseph Gillott. I97. On the Stour, with Cow New York
I872 April 26 Joseph Gillott. I98. Weymouth Bay New York
x873 June 5 John Hargreaves. 292. Heath Scene--Three Peasants in a Cart Agnew
1874 June I3 A. Wood. 5I. Hampstead Heath. Bullock 81 Hargreave coll. Bought in
1875 April 23 Sam Mendel. 3I5. On Suffolk River-Watermill Ashton
1875 June I2 T. Woolner, R.A. I34. On the Stour -
I875 June I2 T. Woolner, R.A. I35. View nun Highgate. Young
1875 July 3 Jesse Watts Russell. 26. Harwich Lighthouse Smith
1876 May 6 Wynn Ellis. 36. The Glebe Farm. I8 x 235 Agnew
1878 April 6 Munro of Novar. I2. Stralford St. Mary, Suffolk. I2 x I95 Martinmu
I878 April 6 Munro of Novar. I3. Hampstcad Heath. I2 x I95 Bentley
I878 April 6 Munro of Novar. I4. Ploughing-Windmill. Ioix I4 Agnew
1879 May 3 Jonathan Nield. I2. Landscape and Watermill Currie
1879 May 3 Jonathan Nield. I3. Stoke by Neyland Permain
1879 May 3 Jonathan Nield. I4. Thames-Westminster Agnew
1879 May 5 Joseph Fenton. I49. Embarkation of George IV from Whitehall Agnew
1879 May Io W. Fuller Maitland. 7I. Vale of Dedham. I8II. 29521 445 Daniel
1879 May Io W. Fuller Maitland. 72. Weymouth Bay. Sketch. 2I x 298 Daniel
I879 May 3o James Hughes Anderdon. Io8. A Brook Scene. C. R. Leslie coll. Agnew
1879 May 3o James Hughes Anderdon. III. Malvern Hall. -R.A. I878 Salting
I881 July 9 William Sharp. 1Io6 N.J 72. Hampstead Heath Agnew
1882 Mar. I8 G. R. Burnett. 1158 N.J I03. Opening Wstminster Bridge Permain
I883 May 5 J. M. Dunlop. 1285 P.J 6I. View on the Stour-Children Angling Martin
1883 May 5 Henry Woods. 15I3 P.J I46. Salisbury Cathedral. Heugli coll. Brooks
1883 June 8 J. Scovell. 158 SJ 243. Helmingham Park. Engraved Fielder
1883 Dec. 8 Edward Fitzgerald. 1573 SJ 27. The Edge of a Wood-Cows
Watering W. C. Quilter
I334 May 3 S. Dunning. 1869 S.J I37. River Scene-Two Children Fishing Agnew
I335 Fell il Mrs. George Vaughan. 194 V and I 34 V.J 86. The Lock Lesser
1886 Man 11 H. McConnell. 125-, w.y 65. Flatford Mill. ,5 X 55 Brooks
1886 Mar. 27 H. McConnell. 66. Dell in Helmington Park. 44 x 5I S. White
1335 MW I5 Henry Barton. 1294 W.J I68. Landscape, C-mvel Cart, etc. M. Colnaghi
I986 May H 5- Addinsm 1541 W9 s5. Windmill and Landscape. ll 295.
GillOll collection Permgin 141 I5 Q
I357 May 7 Malcolm Onne. 36. Mudow Scene and Sheep. 6 x 9 Permain 55 I 3 o
1337 July II Constable V. Blundell. 68. Hampstead Heath. I830 Stewart 1050 o o
1'37 July ll Constable V. Blundell. 72. Salisbury from Fields Agnew 94 Io o
3337 II-IIY II Constable V. Blundell. 8I. West End Fields, Hampstead Agnew 294 o o
I888 Mar. 24 ,, Frederick Fish. 280. The Mill Stream Laser 346 Io o

1888 April 14 Christie's A. Andrews. 154. The Lock. 35 x 2911- Fraser
1888 April 28 -i 44. Flntford Mill Withdrawn
1890 April 26 John Hunt. 104. Carrying Hay. 35 x 47 Lesser
1890 June 23 Captain Constable. 75. Salisbury. 1821. Dowdeswell
1890 June 23 Captain Constable. 77. Coast Scene. Colnaghi
1890 June 23 Captain Constable. 78. Stormy Sunset-Brighton. 1824 Agnew
And many others under L100.
1891 April 25 Marquis de Santurce. 16. Windmill--Peasant Ploughing. 15 x 20 Norman 210 0 0
1891 May 28 Miss Isabel Constable. 109. Abram Constable Gooden 45 3 0
1891 May 28 Miss Isabel Constable. 142. Lock on the Stour Gooden 94 10 0
1891 May 28 Miss Isabel Constable. 148. The Stour, Flatford Mill Gooden 105 0 0
1891 May 28 Miss Isabel Constable. 149. landscape with Cottages Colnaghi 151 4 0
1891 May 28 Miss lsabel Constable. 150. Dedham Vale Colquhoun 514 10 0
1891 June 27 Sir William R. Drake. 15. Willy Lot's House Dowdeswell 105 0 0
1892 Feb. 13 Charles L. Collard. 133. Noon. Sketch Gooden 262 10 0
1892 Mar. 19 i 738. Dedham Vale - 131 5 0
1892 April 30 Messrs. Murriela. 64. Cattle under Trees. 10 x 13 Agnew 161 15 0
1892 April 30 Messrs. Murrieta. 65. Cottages and Trees. 10 x 13 Agnew 105 0 0
1892 April 30 Messrs. Murrieta. 66. Hampstead Heath. 6Q x 11 Hardy 115 10 0
1892 June 17 Miss Isabel Constable. 253. Hadleigh Boussod 110 5 0
1892 June 17 Miss Isabel Constable. 261. Brighton Dowdeswell 309 15 0
1892 June 17 Miss Isabel Constable. 262. Hampstead Heath Boussod 472 10 0
1893 Feb. 18 -Z 69. Hampstead Heath Wallis 160 13 0
1893 April 15 Edwin Webster. 20. Waterloo Bridge. Sketch Laurie 209 15 0
1893 April 29 Ralph Brocklebank. 94. Landscape, with Church. 17 x 14 Earle 31 10 0
1893 June 3 J. Stewart Hodgson. 27. Hampstcad Heath. 1830. 26 x 39 Wallis 2625 0 0
1893 June 19 Vicat Cole, R.A. 188. Landscape-Sheep and Cottage Wallis 178 10 0
1894 April 21 Henry Hibbert. 126. Hampstead Heath. 1827. 241 x 31f Tooth 1835 10 o
1894 April 28 Richard Hemming. 84. On the River Stour. 51 x 73 Agnew 6510 0 0
1894 April 30 Dr. Barford. 126. Dedham Mill Tooth 117 12 0
1894 May 5 John Graham. 44. The Dell, Helmi Gooden 241 10 0
1894 May 26 ,, Jol1n Gibbon. 6. Yarmouth Jetty. Gooden 514 10 0
1894 Nov. 15 Robinson 11' F. -- 134. On the Stour Tooth 309 10 0
1895 April 27 Christie's James Orrock. 276. Gravel Pits RadclylTe 99 15 0
1895 April 27 James Orrock. 288. A Lock on the Stour Simmons 105 0 0
1895 April 27 James Orrock. 296. Brighton Beach Silber 325 10 0
1895 April 27 James Orrock. 297. Near Bergholt Wilson 346 10 0
1895 May 18 Thos. Woolner, R.A. 112. Near Highgate. 12 11 19 Salting 189 0 0
1895 June 8 J. Clark. 87. Barges on the Stour. 40911 539 Lawrie 472 10 0
1895 June 15 James Price. 26. The Mill Tail. 5111 81 Agnew 378 0 0
1895 July 6 Charles Frederick Huth. 6. Windmill and Cottages. 6 x 8 Vokins 110 5 0
1895 July 6 Charles Frederick Huth. 12. Cottage, Angler, Dog. 8 x 91 Colnaghi 183 15 0
1895 July 6 Charles Frederick Huth. 77. Stratford Mill. 1820. 50 x 72 Agnew 8925 0 0
1395 April 16 ,, Eustace Constable. 34. Chesil Beach Agnew 246 15 0

I896 June 6 Christie's Z 32. On the Stour Anson 199 10 0
1896 June 13 Sir Julian Goldsmid. 52. Embarkation of George IV Tooth 2100 o 0
I896 July 14 Lord Leighton. 290. The Hay Wain. Study. 13511 11 Wallis 157 10 o
1896 July 14 Lord Leighton. 291. The Shower. 9511 12 Agnew 21o 0 0
1897 May 22 ,, Z 66. Salisbury Cathedral. 18 x 23 D. Nathan 141 15 o
1897 May 28 Robinson 8t F. Col. Unthank. 229. The Lock. 40 x 49 - 116 0 O
I898 Feb. 5 Christie's Z 11. View on the Stour Wigzell 420 o o
1898 May 7 Z 29. View from Hampstead Heath. 1824. 195 x 3o Marayos 493 10 o
1898 May 21 Joseph Ruston. 13. View on Hampstead Heath. 13 11 16f McLean 252 0 o
1593 July 2 Z 65. Watermill, Figures crossing Bridge. 17 x 235 Black 157 1o o
I899 Mar. 11 Sir John Kelk. 6. Salisbury Cathedral. 289 x 35f Agnew 1365 o o
I899 Mar. 27 Z 73. A Cottage in a Wood Gribble 120 15 o
1899 May 6 Sir John Fowler. 51. Ploughing-Windmill. 10911 13Q Radley 241 10 0
1399 July 15 Z 79 A. Lock, and Horse on Towing Path. 11 x 155 Wigzell 126 0 9
1990 Mgy 5 Mrs. Bloomfield Moore. 371. Gipsy Encampment, Dedham.
16511 27 Dunlhome 178 10 o
I901 Feb. 23 Z 101. View on the Stour. 24 x 30. Gillott collection Tooth 388 1o o
I901 Mar. 2 Hubert Martincau. 78. Stratford St. Mary's. 12 x 195 Colnaghi 756 o 0
I901 Mar. 30 Z 11. Ploughman, Bergholt. 8 x 21 Wallis 231 15 o
I901 May 18 ,, E. A. Lcatham. 121. The Lock. 55 X 47 Vicars 1995 0 o
1901 June 27 Robinson 81 F. Z 79. On the Stour Micholls 420 0 o
1902 Feb. 17 Christie's Z 116. View ol' Dedham Mill. 24 x 30 Leggatt 304 1o 0
1902 Mar. 15 Z 146. Hampstead Heath. 18 x 23 Ruten 157 1o o
1902 April 28 Z 65. Timber Waggon. 18 It 26 Arthur 189 0 o
I902 May 3 C. A. Barton. 5. Gillingham Mill. 19x 23 Fallte 1207 1o o
1902 May 3 C. A. Barton. 6. Brighton Beach. 12 x 165 Agnew 441 o o
ipa May 3 C. A. Barton. 7. Hampstead Heath. 9 x 12 Dubbs 231 o o
1-9115 May 3 Z 84. From Hampstead Heath. 135- x 175 Dubbs 105 o o
T902 July 7 -- 79. landscape, with Woodman. 19 x 30 - 105 0 0
1963 Feb. 28 Z 123. A House at Hampstead. 23.111 19Q Agnew 524 0 0
I903 May 16 R. T. H. Bruce. 28. Jumping Horse. Sketch. 19411 25 Wigzell 199 1o 0
I903 May 23 Reginald Vaile. 7. Dredgers on the Medway. 9121 131 Sedelmeyer 231 0 o
I903 May 23 Reginald Vaile. 8. Stonehenge. Engraved. 7 x 104 Colnaghi 75 2 o
1903 June 27 ,, Z 140. Hudleigh Castle. 169 I 22J Graham 157 10 o
1903 Nov. 6 Morrison 81 Co. D. McCorltindale. 77. The House on the Hill. Illustrated. 2o x 15 - -
1904 Mar. 19 Christie's C. F. Huth. 45. Mill at Gillingham. 10 x 125 Permain 178 0 0
I904 April 3o Z 79. Mill Stream, Flatford. 9 I 12 Amor 152 5 o
I904 April 30 Z 133. West End Fields. 9421 14 Agnew 598 1o o
1904 June 4 J. Orrock. 9. View near Bentley. Drawing. Boswell 210 o 0
1904 June 4 J. Orrock. 64. East Bergholt Mill. 34 x 44 Agnew 1050 o o
1904 June 4 J. Orroclt. 65. Hampstead Heath. 26 x 40 Low 546 0 0
1904 June 4 J. Orrock. 67. Lake-Figures on Road. 24511 295 A. Smith 420 o o
I904 June 4 J. Orrock. 68. The Glebe Farm. 20 x 28 Agnew 273 o o
1904 June 4 ,, J. Orroclt. 69. East Bergholt. 19 x 29 McLean 105 0 o

1904 June 4 Christie's I. Orrock. 70. A Glebe Farm. 27511 35 Richardson 199 ro 0
1904 June 6 I. Orrock. 246. Landscape and Figure. 7 at 105 Agnew 262 10 0
1904 Nov. 19 i 24. Helmingham Dell. 281136 Clare 262 10 0
1905 April 29 john Gabbitas. 82. Peasant Woman on Road. 8 x 115 Mchan 110 5 0
1905 April 29 John Gabbitas. 84. Cottage at Langham. 125x 145 Ogaton 294 0 0
1995 May 13 Charles Neck. 22. River-Road over Bridge. 34 x 45 Sulley 378 0 0
1905 May 20 Louis Hutli. 38. Salisbury Cathedral. 28 x 36 Colnaghi 1785 0 0
1905 May 2o Louis Huth. 39. Dedham Watermill. 21 x 30 Agnew 525 0 0
1906 Mar. 31 E. M. Denny. 5. Salisbury Bridge. Illustmted. 21 x 295 Knoedler 2835 0 0
1906 Mar. 31 E. M. Denny. 6. Strand on the Green. 11 x 155 Wallis 483 0 0
1907 April 2o ,, i 104. Salisbury Cathedral. 335 x 43 Gribble 1575 0 0
1907 May 16 Paris Sedelrneyer. 27. Hastings - 2200 francs
1907 May 16 ,, Sedelmeyer. 43. Stratford Church - 1600 francs
1907 June 14 Christi ' Lord Falkland. 23. The Canal Boat. 47 x 38 McLean 399 0 0
1907 June 28 i 49. Vale of Health, Hampstead. 10511 15 Gooden 283 10 0
1908 jan. 18 ,, Thos. McLean. r8. Heimingham Dell. 28 11 365 Bone 157 10 0
1908 May 6 Paris i 5. The Glebe Farm - 6000 francs
1908 May 23 Christie's Humphrey Roberts. 8. Opening ofWaterl0o Bridge. Illus. 175 x 32 Reid 1155 0 0
1908 May 23 Humphrey Roberts. 9. Brighton Beach. 125 x 195 Clark 556 10 0
1908 'May 23 Humphrey Roberts. 10. A Farm. 1 15 x 155 Gooden 336 0 0
1908 June 25 StephenG. Holland. 12. Salisbury Cathedral. 1826. Illus. 34 x 435 Knoedler 8190 0 0
1908 June 25 Stephen G. Holland. 14. Arundel Mill and Castle. 1 1 x 155 Colnaghi 336 0 0
1908 July 3 i 26. The Valley Farm. 50 x 40 Lane 651 0 0
1909 Mar. 27 Richard Hobson. 103. Hampstead Heath. 18511 255 Agnew 378 0 0
1909 April 24 Professor B. Bertrand. 33. Yarmouth jetty. 27 11 35 Holt 1449 o 0
1909 May 7 R. G. Behrens. 18. Nur Dedham. 105 x 14 Evelyn 1 15 1o 0
1909 May 21 E. H. Cuthberlson. 13. River Stour-Barges. 25 11 40 Vicars 714 0 0
1909 May 21 E. H. Cuthbertson. 14. ln Helmingham Park. 295 x 245 Gooden 441 0 0
1909 May 21 E. H. Cutbbertson. 15. Salisbury. 275 x 355 Leggatt 404 5 0
1909 May 21 E. H. Cuthbertson. 20. A Cornield, Brighton. 12511 195 Leggatt 126 0 0
1909 June 24 Holbrook Gaskell. 8. Arundel Mill and Castle. 27 x 37 Knoedler 8820 0 0
1909 July 9 Sir C. Quilter. 5. Brighton Beach. Drawing. 45 x 75 Wallis 162 15 0
1909 July 9 Sir C. Quilter. 50. Wat End Fields. Oil. 125 x 201- A. Gibson 630 0 0
1910 May 6 O. E. Coope. 8. The Vicarage. Oil. 185 x 235 Colnaghi 735 o 0
1910 June 17 Sir F. T. Mappin. 14. Stoke by Neyland. Oil. 49l x 65 Sulley 9240 0 o
1910 june 24 Armstrong Heirlooms. 47. Glebe Farm, Dedham. OiL 18 x 235 Tooth 2047 10 0
1910 June 24 ,, Armstrong Heirlooms. 48. Hampstead Heath. 15511 195 Gooden 131 5 0

There are still confusions between cases of letters, between 0 and o, between ” and ,, , and the fractions cannot be recognized at all but these can easily be found and fixed using a combination of scripting and manual editing.

Which is what we will do next.

Categories
Art History Art Open Data

Exploring Art Data 15

Let’s find an art historical data source that hasn’t already been digitised and made freely available.

Graves’ Art Sales is often referred to in economic studies of art history. It is in the public domain but isn’t (at the time of writing) available to download from any of the public domain text repository projects. Fortunately, copies of early-1970s facsimiles (also out of copyright) are available through online booksellers.

While waiting for the first volume to be delivered, I made a very simple homebrew book scanner. It’s a cardboard box cut in half, a bright lamp, a sheet of glass and cheap digital camera camera on a tripod. The design is from http://www.diybookscanner.org/ , which also has more sophisticated designs available.

The scanner with volume one of Graves in place ready for scanning:

IMG_20110126_224748.jpg

Here’s the book:

IMG_20110126_224707.jpg

We’ll scan the pages containing the data of Constable’s sale prices. This is a simple (but slow) matter of photographing first all the front sides of those pages in turn, then turning the book around and photographing all the back sides. This makes scanning faster but does mean that the pages are out of order. Since we are only using a few pages here, we can rename them manually but there are scripts to help do this for entire books.

A scanned page:

ALIM0005.JPG

To rotate and clean up the pages we will use a piece of software called Scan Tailor (http://scantailor.sourceforge.net/). After processing in Scan Tailor, the above page looks like this:

ALIM005.png
We can extract the text from this page using the Tesseract Optical Character Recognition program:

$ tesseract ALIM0005.tif ALIM0005
$ cat ALIM005.txt

·=~· 
CONSTABLE, john, R.A.-—-ranlinued
1838 May IS Foste1's john Constable, R.A. 54. Flatford Mill-Horse and Barge Leslie 52 IO O
1838 May IS ,, john Constable, R.A. 55. View near Flatford Mill Rochard II 0 6
1838 May I5 ,, john Constable, R.A. 56. Hampstead Heath Burton I7 6 6
1838 May IS ,, john Constable, R.A. 57. Gillingham Mill Leslie 37 16 6
1838 May I5 ,, john Constable, R.A. 58. East Bergholt Nursey 5 I5 6
1838 May IS ,, john Constable, R.A. 59. Flatl`ord·-Barge Building Sheepshanlrs SI 9 O
1838 May IS ,, john Constable, R.A. 60. Two Views near Pétworth Swaby 7 7 0
1838 May IS ,. john Constable, R.A. 61. Hampstead Heath—London in distance Archbutt Jl IO 0
1838 May I5 ,, john Constable, R.A. 65. Dedham Vale—Long Valley Norton 25 4 6
1838 May IS ., john Constable, R.A. 66. London from Hampstead Burton 63 0 0
1838 May I5 ,, john Constable, R.A. 67. Flatford Mill—Dark Allnutt 34 I3 0
1838 May IS ,, john Constable, R.A. 68. Brighton and Chain Pier. Ex. 1827 Tiffin 45 3 0
1838 May I5 ., john Constable, R.A. 69. The Lock near Flatford Mill Archbutt 44 2 0
1838 May IS ,, john Constable, R.A. 70. The Glebe Farm. R.A. 1835 Miss Constable 74 ll 0
1838 May IS ,, john Constable, R.A. 71. The Cenotaph, etc. R.A. 1836 Miss Constable 42 0 0
1838 May IS ,, john Constable, R.A. 72. Salisbury Cathedral from Bishop’s
Garden. 1823 Tiflin 64 1 0
1838 May I5 john Constable, R.A. 73. View in Helmingham Park. R./\. 1830 Allnutt 56 I4 0
1838 May I5 john Constable, R./\. 74. Opening of \Vaterloo Bridge. R.A. 1832 Mosley 63 0 0
1838 May IS john Constable, R.A. 75. View of Dedham—Gipsies. R.A. 1828 M. Bone 105 0 0
1838 May IS john Constable, R.A. 76. The Lock. R.A. 1824. Sold at
Foster's, February lslh, 1855, for {903 Birch 131 0 0
1838 May I5 john Constable, R.A. 77. On the River Stour—Ho1se on a
Barge. R.A. 1819 Morton 157 I0 0
1838 May IS ,, john Constable, R.A. 78. Hadleigh Castle. R.A. 18:9 Miss Constable 105 0 °
1838 May IS ,, john Constable, R.A. 79. Salisbury Cathedral from Meadows. 1831 Ellis 110 5 0
1838 May IS ,, john Constable, R.A. 80. Dedham Mill and Church Brown 45 3 0
1838 May IS ,, john Constable, R.A. 81. Arundel Castle and Mill. 1837 j. Constable 7S I5 0
1839 April 13 Christie's Samuel Archbutt. 114. Salisbury Cathedral from Meadows Theobald 3l I0 0
1839 April 13 ,, Samuel Archbutt. 115. Embarlcation of George IV, Waterloo
Bridge Bought in 43 1 0
1846 May 16 ,, Mr. 'I`aunton. 41. Salisbury Cathedral Bought in 441 0 0
1846 May 16 ,, Mr. Taunton. 42. Dedham Bought in 357 0 0
1846 june 4 ,, Edward Higginson. 77. Waggon passing through a River Rought 378 0 0
1848 May 3 Phillips Ralph Thomas. 176. Salisbury Cathedral — —-
1848 june 2 Christie's Sir Thos. Baring. 21. Opening \Vaterloo Bridge Barton 33 2 0
1849 May 27 ,, Taunton. 110. Salisbury Cathedral from the Meadows. The
celebrated picture Rought 430 I0 0
1849 May 27 ,, Taunton. ll 1. Dedham, with Towing Path Bought in 157 I0 0
ISS! june 13 ,, Hogarth. 46. Hadleigh Castle Winter 320 5 0
1853 Mar. 7 ,, R. Morris. 131. A Lock on the Stour Wass 105 0 0
1853 july 7 ,, Charles Birch. 41. jumping Horse on the Stour Gambart 393 I5 0
1853 .l'·'lY 7 1. Charles Birch. 42. Opening of London Bridge Bought in 252 0 0
185; Feb. I5 F0ster's Charles Birch. 18. The Lock. 55 11 48 Holmes 860 0 0

The data is quite noisy. It’s possible to clean up a few pages by hand individually, but cleaning up an entire volume would be more practical with Internet-based collaboration. Project Gutenberg’s Distributed Proofreaders project is a good example of this.

Next time we’ll clean up the data and load it into R.