Categories
Art Art Computing Art History Free Culture Free Software

Source Code

The part of my review of “White Heat Cold Logic” that seems to have
caught people’s attention is:

“for preservation, criticism and artistic progress (and I do mean
progress) it is vital that as much code as possible is found and
published under a Free Software licence (the GPL). Students of art
computing can learn a lot from the history of their medium despite the
rate at which the hardware and software used to create it may change,
and code is an important part of that.”

http://www.furtherfield.org/features/reviews/white-heat-cold-logic

I have very specific reasons for saying this, informed by personal
experience.

When I was an art student at Kingston Polytechnic, I was given an
assignment to make a new artwork by combining two previous artworks: a
Jackson Pollock drip painting and a Boccioni cyclist. I could not “read”
the Boccioni cyclist: the forms did not make sense to me, and so I was
worried I would not be able to competently complete the assignment. As
luck would have it there was a book of Boccioni’s drawings in the
college library that included the preparatory sketches for the painting.
Studying them allowed me to understand the finished painting and to
re-render it in an action painting style.

When I was a child, a book on computers that I bought from my school
book club had a picture of Harold Cohen with a drawing by his program
AARON. The art of AARON has fascinated me to this day, but despite my
proficiency as a programmer and as an artist my ability to “read”
AARON’s drawings and to build on Cohen’s work artistically is limited by
the fact that I do not have access to their “preparatory work”, their
source code.

I have been told repeatedly that access to source code is less important
than understanding the concepts behind the work or experiencing the work
itself. But the concepts are expressed through the code, and the work
itself is a product of it. I can see a critical case being made for the
idea that “computer art” fails to the extent that the code rather than
the resultant artwork is of interest. But as an artist and critic I want
to understand as much of the work and its history as possible.

So my call for source code to be recovered (for historical work) and
released (for contemporary work) under a licence that allows everyone to
copy and modify it comes from my personal experience of understanding
and remaking an artwork thanks to access to its preparatory materials on
the one hand and the frustration of not having access to such materials
on the other. And I think that awareness of and access to source code
for prior art (in both senses of the term) will enable artists who use
computers to stop re-inventing the wheel.

If you are making software art please make the source code publicly
available under the GPL3+, and if you are making software-based net art
please make it available under the AGPL3+ .

Categories
Art History Art Open Data

Art Text Data Analysis 1

Network Analysis and the Art Market: Goupil 1880 – 1895 [PDF]

Wyndham Lewis’s Art Criticism in The Listener, 1946-51

Tools For Exploring Text

Auto Converting Project Gutenberg Text to TEI

Categories
Art Art History Art Open Data

Open Art Data – Datasets Update

Here’s a new OGL-licenced list of works in the UK government’s art collection, scraped for a Culture Hack Day –

http://kasabi.com/dataset/government-art-collection/

http://blog.kasabi.com/2011/09/22/hacking-on-culture-data/

The JISC OpenART Project is making good progress and considering which ODC licence to use. It should be both a great resource and a great case study –

https://dlibwiki.york.ac.uk/confluence/display/openart/Home

I’ve mentioned it before but this Seattle government list of public art with geolocation information is really good –

http://data.seattle.gov/dataset/Public-Art/7ckr-2zz9

And Freebase keep adding new information about visual art –

http://www.freebase.com/view/visual_art

Europeana are ensuring that all the metadata they provide is CC0 –

https://creativecommons.org/weblog/entry/29133

Their API isn’t publicly available yet, though! 🙁 –

http://www.europeana-libraries.eu/web/api/

Finally, for now, some of the National Gallery’s data now seems to be under an attempt at a BSD-style licence. The OGL would be even better… –

http://rdf.ng-london.org.uk/scientific/rdf.php

Categories
Art Computing Art History Art Open Data Projects

Exploring Art Data 21

Now that we have a file of statistical information about the folder of images that we are examining, we can plot this using the images themselves.

First we need to install and load the library we will use to load the images to plot. You may need to install ImageMagick’s libmagick for EBImage to install, it doesn’t seem to like GraphicsMagick.

In Fedora run:

sudo yum install libmagick-devel

In Ubuntu run:

sudo aptitude install libmagick-dev

We can then install and load EBImage as follows:

## source("http://bioconductor.org/biocLite.R")
## biocLite("EBImage")
library("EBImage") 

Next we declare constants to control various aspects of the plot. This includes the size of the image, the graphical properties of the elements that we are plotting, and which elements to plot.
 

## The plot
#inches
plotWidth<-8
plotHeight<-6
plotBorder<-1
innerWidth<-plotWidth - (plotBorder * 2)
innerHeight<-plotHeight - (plotBorder * 2)
plotBackgroundCol<-rgb(0.4, 0.4, 0.4, 1.0)
## Thumbnail images
thumbnailWidth<-0.3
## Lines
lineWidth<-1
lineCol<-rgb(0.8, 0.8, 0.8, 1.0)
## Points
## This is the point scale factor (cex)
pointSize<-2
pointStyle<-19
pointCol<-rgb(0.8, 0.8, 0.8, 1.0)
## Labels
## The label scale factor (cex)
labelSize<-0.25
labelCol<-rgb(1.0, 1.0, 1.0, 1.0)
## Axes
axisLabelX<-""
axisLabelY<-""
axisCol<-rgb(1.0, 1.0, 1.0, 1.0)
## Number of significant digits to round fractional part of each tick value to
axisRoundDigits<-3
## What to draw
shouldDrawImages<-TRUE
shouldDrawPoints<-TRUE
shouldDrawLines<-TRUE
shouldDrawLabels<-TRUE
shouldDrawAxes<-TRUE 

Then we declare variables and functions that will be used to process the data in order to fit its values into the plot in a visually appealing way.

minXValue<-NULL
maxXValue<-NULL
minYValue<-NULL
maxYValue<-NULL
scaleX<-NULL
scaleY<-NULL
## Update the scaling factor for positioning images
updateXYScale<-function(){
rangeX<<-maxXValue - minXValue
scaleX<<-innerWidth / rangeX
rangeY<<-maxYValue - minYValue
scaleY<<-innerHeight / rangeY
}
scaleXValue<-function(x){
plotBorder + ((x - minXValue) * scaleX)
}
scaleYValue<-function(y){
plotBorder + ((y - minYValue) * scaleY)
}
## Set the range of the X and Y axes for positioning images
setMinMaxXYValues<-function(xMin, yMin, xMax, yMax){
minXValue<<-xMin
maxXValue<<-xMax
minYValue<<-yMin
maxYValue<<-yMax
updateXYScale()
}
## Calculate the range of the X and Y axes for positioning images
discoverMinMaxXYValues<-function(xValues, yValues){
xRange<-range(xValues)
yRange<-range(yValues)
## Handle 0..1 or a..b
if(xRange[2] - xRange[1] > 1){
xRange<-c(floor(xRange[1]), ceiling(xRange[2]))
} else {
xRange<-c(floor(xRange[1] * 1000) / 1000, ceiling(xRange[2] * 1000) / 1000)
}
if(yRange[2] - yRange[1] > 1){
yRange<-c(floor(yRange[1]), ceiling(yRange[2]))
} else {
yRange<-c(floor(yRange[1] * 1000) / 1000, ceiling(yRange[2] * 1000) / 1000)
}
## Floor and ceiling the values to round them to the nearest integers
## and make the values on the plot nicer
setMinMaxXYValues(xRange[1], yRange[1], xRange[2], yRange[2])
}
## Left X value for image
## image parameter accepted to give these calls a regular signature
imageXLeft<-function(image, valueX){
valueX
}
## Right X value for image
## image parameter accepted to give these calls a regular signature
imageXRight<-function(image, valueX){
valueX + thumbnailWidth
}
## Get the height of the image scaled to the new width
imageHeightScaled<-function(image, scaledWidth){
scale<-dim(image)[1] / scaledWidth
dim(image)[2] / scale
}
## Bottom Y value for image
imageYBottom<-function(image, valueY){
valueY - imageHeightScaled(image, thumbnailWidth)
}
## Top Y value for image
imageYTop<-function(image, valueY){
valueY
} 

The labels for
each image, the points marking the image’s position, the lines connecting each image, and the top left of each image are positioned on the x, y co-ordinates for the image’s properties being plotted.

Centering the image on the x, y co-ordinates might be more natural but it would obscure the position of the point and the connecting lines if they were also drawn.

plotLabels<-function(labelValues, xValues, yValues){
## Position the labels underneath the images
text(xValues, yValues, labelValues, col=labelCol, cex=labelSize, pos=3)
}

plotImages<-function(imageFilePaths, xValues, yValues){
for(i in 1:length(imageFilePaths)){
image<-readImage(imageFilePaths[i])
x<-xValues[i]
y<-yValues[i]
## Does the image really have to be rotated???
rasterImage(rotate(image), imageXLeft(image, x), imageYTop(image, y),
imageXRight(image, x), imageYBottom(image, y))
}
} 

When we plot the axes their tick values are auto-generated from the value ranges, so they may look weird.
 

plotAxes<-function(){
xat<-round(seq(minXValue, maxXValue,
(maxXValue - minXValue) / plotWidth),
axisRoundDigits)
axis(1, 0:plotWidth, xat, col=axisCol, col.ticks=axisCol, col.axis=axisCol)
yat<-round(seq(minYValue, maxYValue,
(maxYValue - minYValue) / plotHeight),
axisRoundDigits)
axis(2, 0:plotHeight, yat, col=axisCol, col.ticks=axisCol, col.axis=axisCol)
} 

Having written functions to plot each element, we declare an all-in-one function to plot everything that is enabled in the configuration constants above.
 

plotElements<-function(imageFilePaths, xValues, yValues, labelValues){
if(shouldDrawLines){
lines(xValues, yValues, col=lineCol, lwd=lineWidth)
}
if(shouldDrawPoints){
points(xValues, yValues, pch=pointStyle, col=pointCol)
}
if(shouldDrawImages){
plotImages(imageFilePaths, xValues, yValues)
}
if(shouldDrawLabels){
plotLabels(labelValues, xValues, yValues)
}
if(shouldDrawAxes){
plotAxes()
}
}


Then we declare a function to get the values from the data frame and call the plot-everything function.
 

setValuesAndPlot<-function(data, imageFilepaths, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
## Get the lists for the data columns, get the doubles from them,
## and scale to the plot
xValues<-data[xColumn][,1]
yValues<-data[yColumn][,1]
if(discoverRange){
discoverMinMaxXYValues(xValues, yValues)
}
scaledXValues<-sapply(xValues, scaleXValue)
scaledYValues<-sapply(yValues, scaleYValue)
axisLabelX<<-xColumn
axisLabelY<<-yColumn
plotElements(imageFilepaths, scaledXValues, scaledYValues, data[,labelColumn])
title(xlab=xColumn, ylab=yColumn, col.lab=axisCol)
} 

You’ll notice each function is combining and building on earlier functions. Functions should be short, readable, organizing units. The next one that we declare reads the data file and the image files, and then plots the values.


readAndPlot<-function(dataFile, imageFolder, xColumn, yColumn, labelColumn="filename", discoverRange=TRUE){ data<-read.delim(dataFile, stringsAsFactors=FALSE) imageFilepaths<-sapply(data["filename"], function(filename) file.path(imageFolder, filename)) setValuesAndPlot(data, imageFilepaths, xColumn, yColumn, labelColumn, discoverRange) }


The next function makes a new R plot with the proper graphics parameters
Notably this sets the bounds and background colour.
 

newPlot<-function(dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
## Call before plot.new()
par(bg=plotBackgroundCol)
plot.new()
## Use co-ordinates relative to the bounds
par(usr=c(0, plotWidth, 0, plotHeight))
par(bty="n")
readAndPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn,
discoverRange)
}


Finally we can declare functions to plot to various different kinds of R devices. X11 for screen display and testing, PNG for embedding in web pages and documents, and PDF for high-quality output. Note that the PDF will include all the images plotted, and so it will become very large very quickly. A high-resolution PNG will be more practical for very large imagesets.
 

## Make a new X11 plot
X11Plot<-function(dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
X11(width=plotWidth, height=plotHeight)
newPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn, discoverRange)
}
## Make a new PNG plot
pngPlot<-function(outFile, dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE, dpi=600){
png(filename=outFile, width=plotWidth, height=plotHeight, units="in",
res=dpi)
newPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn, discoverRange)
dev.off()
}
## Make a new PDF plot
pdfPlot<-function(outFile, dataFile, imageFolder, xColumn, yColumn,
labelColumn="filename", discoverRange=TRUE){
pdf(file=outFile, width=plotWidth, height=plotHeight)
newPlot(dataFile, imageFolder, xColumn, yColumn, labelColumn, discoverRange)
dev.off()
} 

Calling these image generating commands from the REPL in Emacs or on the command line means that we can see the output and modify the constants we declared at the start and the parameters that we pass to the image plotting functions in order to modify and improve the results interactively.

Running:
 

X11Plot("images.txt", "images", "brightness_median", "saturation_stdev") 

Gives us:

Mondrian VisualizationNext we can wrap the functions we have written in command-line and GUI interfaces and explore the strengths and weaknesses of each.

Categories
Art Computing Art History Evie Matthieson Generative Art

[Evie Matthieson] Parallel Space

Parallel Space
Saturday July 5th 1997

Talk outlines and Biographies

Tracey Matthieson

“A number of preoccupations surround my practise within VR such as:

How ideas and concepts might be described through the medium of VR?
The medium’s strengths and weaknesses.
Are there features that belong only to the VR medium? If so what are
they?
The participation and encouragement of the viewer and the control (or
lack) of the artist.
Blurring the edges of real and unreal.
The illusory nature of the medium, and how to respect and yet also
challenge the way we interpret space, place and the objects within them.
Where the translation of elements of literal ‘reality’ are useful and
where they should not be considered.
recoding ideas about real places into virtual ‘landscapes’
To provide an alternative

My projects are a series of experiments looking at the use of VR as
an art medium. relationships in VR of space and place, the illusory
nature of VR.
The revelation of spaces according to the movement of the user. it
focuses on methods to encourage user participation within worlds by
visual methods. My most recent works centre around the creation of a
virtual system that supports and encourages the viewer through a yet
unmade virtual landscape. I designed a system to generate on-the-fly
virtual ‘zones’ around the viewer. The randomness of zones is affected
only by the viewers choice of movement through the work. The viewer
creates their own piece of the work. The system then records the 3D
virtual map generated behind the viewer to leave a sense of constancy
and an individual path they can retrace if they wish. It is expected
that each map will be different.

I am using this work as a vehicle to explore the notions of the
permanent, temporary and transient ‘structures’ within the medium of VR
and also within computer and viewer memory.”

Tracey Matthieson has just finished her part time research MA
at the Centre for Electronic Arts at Middlesex University. She chose to
research existing Virtual Environments and create her own experimental
spaces. Her intention through her experimental work is to offer an
alternative to existing interpretations of the uses of VR. Programming
support for the “catalyst map” system was from Rob Myers.

Categories
Art Computing Art History Art Open Data Projects

Exploring Art Data – The Plan

I’ve been very, very busy recently and I haven’t had time to work on the “Exploring Art Data” series of blog posts.
I will get back to them. First I will finish the Graves Art Sales exploration. Then I will use Joy Garnett’s images of her paintings as an example of processing a (small) large dataset. Then I will analyse the Netbehaviour mailing list archive as an example of a social network.
And that’s the plan. Unless anyone has anything else they’d like to see.

Categories
Art Art History Free Culture Projects

Shapeways Urinal Print

My print of the Urinal has just arrived from Shapeways.

Here it is in its packaging with an SD Card for scale:

IMG_20110215_135052.jpg

And here’s a tasteful installation shot of it:

IMG_20110215_135403.jpg
The original blog post about this project, with links to the source files, is here.
Categories
Art Computing Art History Art Open Data

Exploring Art Data 17

Let’s clean up the Constable data from Graves Art Sales.

This extract shows most of the issues with the scanned and OCR-ed data:

1839 April 13 Christie's Samuel Archbutt. 114. Salisbury Cathedral from Meadows Theobald
I839 April 13 Samuel Archbutt. 115. Embarltation of George IV, Waterloo
Bridge Bought in 43 1 o
1345 May 16 Mr. Taunton. 41. Salisbury Cathedral Bought in 441 o o
1846 May 16 Mr. Taunton. 42. Dedham Bought in 357 o 0
1345 June 4 ,, Edward Higginson. 77. Waggon passing through a River Rought 378 o o
1848 May 3 Phillips Ralph Thomas. 176. Salisbury Cathedral - -
I848 June 2 Christie's Sir Thos. Baring. 21. Opening Waterloo Bridge Barton 33 2 o
1349 May 17 Taunton. 11o. Salisbury Cathedral from the Meadows. The
celebrated picture Rought 43o 1o o
1349 M87 I7 Taunton. 111. Dedham, with Towing Path Bought in 157 1o o
1351 June 13 H03lrlh- 46. Hadleigh Castle Winter 32o 5 0
1353 M111 1 R. Morris. 131. A Lock on the Stour Wass 105 o o

Some lines run on, some lines end with dashes or no numbers, some numbers are mistaken for letters, some words are corrupted. There are also blank lines between each scanned page of the book.

First we can fix the run-on lines and blank lines in a text editor, deleting the newline to make them into single lines or combining run-ons.

Then we can write a shell script to fix other issues. It’s important to make sure that the script doesn’t introduce more problems, so each substitution should be small, well-defined and carefully tested.

Here’s such a script (cleanup.sh):

#!/bin/sh
INFILE=constable-ocr.txt
OUTFILE=constable-processed.txt
EDITFILE=constable.txt
# Send the source file to sed
cat "${INFILE}" | sed --regexp-extended '
# Fix numbers, where o=0 and 1=1
s/I([123456789]+)/1\1/g
s/([123456789]+)I/\11/g
s/([123456789]+)o/\10/g
# Fix years, 15,13 = 18
s/^1[35]/18/
# Fix trailing zeroes mistaken for o
s/ o o$/ 0 0/
s/ o$/ 0/
# Fix mistaken characters
s/I-I/H/g
s/,,/"/
# Make sure john constable is properly OCR-ed
s/(Iohn|john)/John/g
s/(C0nstable|Oonstable)/Constable/g
s/JohnCon.stable/John Constable/
# Fix frequently mistaken words and acronyms
s/ILA./R.A./
# Fix spacing
s/,R.A./, R.A./
s/Constable R.A./Constable, R.A./
' > "${OUTFILE}"
# Make a copy of the processed file ready to be edited by hand
cp "${OUTFILE}" "${EDITFILE}"

In the shell script we use GNU sed, a command-line utility that allows us to use regular expressions to modify the contents of files. If you don’t know regular expressions they can look quite mysterious, but in fact they are a simple and expressive language that greatly increase what you can do on a computer if you learn how to use them.

Why on earth go to all this trouble just to change a few typos? Well, if we manage to improve the inital scanning or OCR, we don’t have to fix any remaining problems by hand. The script may record techniques that are useful elsewhere. And it’s more controllable to re-run a script rather than undo or search and replace mistakes made in a document editor.

Once you’ve replaced the obvious patterns of typos, it’s time to edit the text by hand. Compare each column of each line to the scan or to the original text, first all the dates, and auction houses, then all the names, then all the descriptions, then all the names and prices. When comparing numbers, check each digit as the year or the day of the month may be out by as little as one. Tesseract doesn’t seem to capture double quotes very often in the scans I used, so I had to add these in by hand. It also doesn’t capture em dashes, which I’ve represented as a hyphen to make parsing easier later.

It only seems like processing each page will take forever for the first few pages. You’ll quickly learn how to break down the task into manageable chunks.

Here is an example of cleaned up data:

1834 June 7 Christie's - 67. Landscape with Figures - -
1838 May 15 Foster's John Constable, R.A. 3. Stonehenge, etc. Smith 4 14 6
1838 May 15 " John Constable, R.A. 10. Glebe Farm, etc. Williams 3 15 0
1838 May 15 " John Constable, R.A. 12. Salisbury Cathedral and Helmingham Park Allnutt 3 9 0
1838 May 15 " John Constable, R.A. 13. Salisbury Cathedral and Glebe Farm Carpenter 24 10 6
1838 May 15 " John Constable, R.A. 14. Comlield. Study for N.G. picture Radford 9 19 6
1838 May 15 " John Constable, R.A. 23. Salisbury Cathedral, etc. Leslie 11 11 0
1838 May 15 " John Constable, R.A. 26. Dedham Rulton 8 8 0
1838 May 15 " John Constable, R.A. 29. View in Helmingham Park Swaby 16 5 6
1838 May 15 " John Constable, R.A. 30. Salisbury Cathedral from Bishop's Garden Archbutt 16 16 0
1838 May 15 " John Constable, R.A. 31. Hadleigh Castle. Sketch Smith 3 13 6
1838 May 15 " John Constable, R.A. 33. Two Views of East Bergholt Archbutt 24 3 0
1838 May 15 " John Constable, R.A. 35. River Scene and Horse Jumping Archbutt 52 10 0
1838 May 15 " John Constable, R.A. 37. Salisbury Cathedral from Meadows Williams 6 10 0
1838 May 15 " John Constable, R.A. 39. Mill on the Stour. Sketch Hilditch 7 17 6
1838 May 15 " John Constable, R.A. 40. Opening Waterloo Bridge. Sketch Joy 2 10 0
1838 May 15 " John Constable, R.A. 41. Weymouth Bay. Sketch. Swaby 4 4 0
1838 May 15 " John Constable, R.A. 42. Waterloo Bridge and Brighton Archbutt 5 0 0
1838 May 15 " John Constable, R.A. 43. Chain Pier and Dedham Church Stuart 5 5 0
1838 May 15 " John Constable, R.A. 44. Hampstead Heath and Waterloo Bridge Morton 4 14 0
1838 May 15 " John Constable, R.A. 45. Weymouth Bay, Waterloo Bridge, and two others Burton 7 7 0
1838 May 15 " John Constable, R.A. 46. East Bergholt, Dedham, etc. Nursey 4 14 6
1838 May 15 " John Constable, R.A. 47. Weymouth Bay and four others Williams 1 13 0
1838 May 15 " John Constable, R.A. 48. Moonlight and Landscape with Rainbow Leslie 5 5 0
1838 May 15 " John Constable, R.A. 49. Three Landscapes Archbutt 31 10 0
1838 May 15 " John Constable, R.A. 50. Salisbury Madows Sheepshanks 35 14 0
1838 May 15 " John Constable, R.A. 51. Study of Trees and Fern with Donkies Sheepshanks 23 2 0
1838 May 15 " John Constable, R.A. 52. Cottage in a Cornfield Burton 27 6 0
1838 May 15 " John Constable, R.A. 53. Hampstead Hath-at the Ponds Sheepshanks 37 5 6

It’s possible to use scripts to check that the cleaned up data makes sense. We can check the dates are sequential, for example (check_dates.py):

#!/usr/bin/python
# Usage: check_dates.py FILENAME
# Assumes file with each line starting in Graves date format: YYYY month (D)D
# Ensure that dates are sequential
# Won't catch minor errors, will catch major errors
import datetime
import sys
MONTHS={'jan':1, 'feb':2, 'mar':3, 'apr':4, 'may':5, 'jun':6, 'jul':7, 'aug':8,
'sep':9, 'oct':10, 'nov':11, 'dec':12}
def main():
if len(sys.argv) != 2:
print "Usage: %s FILENAME" % sys.argv[0]
sys.exit(1)
last_date=datetime.date(1779, 1, 1)
for line in open(sys.argv[1]):
components = line.split()
year = int(components[0])
month = MONTHS[components[1][:3].lower()]
day = int(components[2])
try:
line_date = datetime.date(year, month, day)
if line_date < last_date:
print "Date not successive: %s" % line
last_date = line_date
except ValueError, e:
print "Bad date component: %s" % line
if __name__ == "__main__":
main()

Once we have cleaned up the text, we can convert it to a machine readable format using another script. Actually writing such a script is an interactive process. Read the text, write a script that should be able to extract information from it, run the script, and then correct either the text (if the script fails to run because of typos in the text) or the script (if the script fails to extract all the information from the text). Here’s the script for the Constable data (lines_to_tsv.py):

#!/usr/bin/python
import re
import sys
################################################################################
# Assemble the regular expression to process each line
################################################################################
# Date of sale
# YYYY MONTH(.) D(D)
DATE = r'(.{4} \w+.? \d{1,2})'
# Auction house
# " or name. Names are arbitrary, so use a list of names
AUCTIONEER_LIST = ["B. Moulton", "Christie's", "Foster's", "Morrison & Co.",
"Paris", "Phillips", "Robinson & F.", ]
AUCTIONEER = ' ("|%s)' % '|'.join(AUCTIONEER_LIST)
# Owner of work
# Selling owner follows auction house and is followed by lot number
# It may be absent, in which case it is a hyphen
# Otherwise it's arbitrary but contains no numbers
# This is weak, we rely on the strength of the auctioneer & lot groups to fix it
OWNER = r' (-|\D+)'
# Lot number
# Possibly something in brackets, then one or more digits, with an optional
# single letter or punctuation character, then a full stop
LOT = r' (\([^)]+\) \d+.?\.|\d+.?\.)'
# Description
# Again this is arbitrary, so we rely on the strength of the adjacent groups
DESCRIPTION = r' (.+?)'
# Buyer
# This is complex because Description is arbitrary
# Buyer may be -, surname, initials, title and surname, and many others
BUYER_TITLES = [r'Captain .+', r'Col\. .+', r"D'\w+", r'De .+', r'Dr\. .+', r'Earl .+', r'La .+', r'Lord .+', r'Major .+',
# Miss .+ fails???
r'Miss \w+', r'Mr\. .+', r'Sir .+',]
BUYER_INSITITUTIONS = [r'Fine Art Society', r'National.+Gallery', r'New York',
r'New York Museum',]
BUYER_INITIALS = [r'[A-Z]\. [A-Z.]', ]
BUYER_INDIRECT = [r'\(.+\)',]
BUYER_SPECIAL = [r'Bought in', r'-',]
BUYER_NAME = [r'[A-Z]\. \w+', r'[A-Z]\. [A-Z]\. \w+', r'\w+',]
BUYER = r' (%s)?' % r'|'.join(BUYER_TITLES + BUYER_INSITITUTIONS + \
BUYER_INITIALS + BUYER_INDIRECT + \
BUYER_SPECIAL + BUYER_NAME)
# Sale price
# This may be absent entirely to indicate a group purchase, in which case we
# cannot check for a leading space, so make both the leading space and the
# other choices optional to handle that case
# It may be a hyphen to indicate absent data
# It may be pounds, shillings and pence, including zeros
# Or it may be Withdrawn
# Or it may be a quantity of French francs
PRICE = r' ?(\d+ francs|\d+ \d+ \d+|Withdrawn|-)?'
# The assembled regex for a line
LINE = r'^'+ DATE + AUCTIONEER + OWNER + LOT + DESCRIPTION + BUYER + PRICE +r'$'
LINE_REGEX = re.compile(LINE)
################################################################################
# Convert lines to tab separated values
################################################################################
# The column containing the auctioneer value
AUCTIONEER_COLUMN = 1
def process(infile):
"""Convert the line to tab-delimited fields"""
# Lines that fail to match
fails = []
auctioneer = ""
for line in infile:
matches = LINE_REGEX.match(line)
try:
# Convert the tuple to a list in case we need to assign to it
columns = list(matches.groups())
# Get or cache the auctioneer, so we replace " with the actual one
if columns[AUCTIONEER_COLUMN] != '"':
auctioneer = columns[AUCTIONEER_COLUMN]
else:
columns[AUCTIONEER_COLUMN] = auctioneer
print '\t'.join(columns)
except Exception, e:
print e
fails.append("FAIL: %s" % line)
return fails
def print_header():
"""Print the column headers"""
print "date\tauctioneer\towner\tlot\tdescription\tbuyer\tprice"
################################################################################
# Main flow of execution
################################################################################
def main():
if len(sys.argv) != 2:
print "Usage: %s FILENAME" % sys.argv[0]
sys.exit(1)
infile = open(sys.argv[1])
print_header()
fails = process(infile)
sys.stderr.write(''.join(fails))
if __name__ == "__main__":
main()

And here’s some of the output in tab separated value format, complete with header:

date	auctioneer	owner	lot	description	buyer	price
1834 June 7	Christie's	-	67.	Landscape with Figures	-	-
1838 May 15	Foster's	John Constable, R.A.	3.	Stonehenge, etc.	Smith	4 14 6
1838 May 15	Foster's	John Constable, R.A.	10.	Glebe Farm, etc.	Williams	3 15 0
1838 May 15	Foster's	John Constable, R.A.	12.	Salisbury Cathedral and Helmingham Park	Allnutt	3 9 0
1838 May 15	Foster's	John Constable, R.A.	13.	Salisbury Cathedral and Glebe Farm	Carpenter	24 10 6
1838 May 15	Foster's	John Constable, R.A.	14.	Comlield. Study for N.G. picture	Radford	9 19 6
1838 May 15	Foster's	John Constable, R.A.	23.	Salisbury Cathedral, etc.	Leslie	11 11 0
1838 May 15	Foster's	John Constable, R.A.	26.	Dedham	Rulton	8 8 0
1838 May 15	Foster's	John Constable, R.A.	29.	View in Helmingham Park	Swaby	16 5 6
1838 May 15	Foster's	John Constable, R.A.	30.	Salisbury Cathedral from Bishop's Garden	Archbutt	16 16 0
1838 May 15	Foster's	John Constable, R.A.	31.	Hadleigh Castle. Sketch	Smith	3 13 6
1838 May 15	Foster's	John Constable, R.A.	33.	Two Views of East Bergholt	Archbutt	24 3 0
1838 May 15	Foster's	John Constable, R.A.	35.	River Scene and Horse Jumping	Archbutt	52 10 0
1838 May 15	Foster's	John Constable, R.A.	37.	Salisbury Cathedral from Meadows	Williams	6 10 0
1838 May 15	Foster's	John Constable, R.A.	39.	Mill on the Stour. Sketch	Hilditch	7 17 6
1838 May 15	Foster's	John Constable, R.A.	40.	Opening Waterloo Bridge. Sketch	Joy	2 10 0
1838 May 15	Foster's	John Constable, R.A.	41.	Weymouth Bay. Sketch.	Swaby	4 4 0
1838 May 15	Foster's	John Constable, R.A.	42.	Waterloo Bridge and Brighton	Archbutt	5 0 0
1838 May 15	Foster's	John Constable, R.A.	43.	Chain Pier and Dedham Church	Stuart	5 5 0
1838 May 15	Foster's	John Constable, R.A.	44.	Hampstead Heath and Waterloo Bridge	Morton	4 14 0
1838 May 15	Foster's	John Constable, R.A.	45.	Weymouth Bay, Waterloo Bridge, and two others	Burton	7 7 0
1838 May 15	Foster's	John Constable, R.A.	46.	East Bergholt, Dedham, etc.	Nursey	4 14 6
1838 May 15	Foster's	John Constable, R.A.	47.	Weymouth Bay and four others	Williams	1 13 0
1838 May 15	Foster's	John Constable, R.A.	48.	Moonlight and Landscape with Rainbow	Leslie	5 5 0
1838 May 15	Foster's	John Constable, R.A.	49.	Three Landscapes	Archbutt	31 10 0
1838 May 15	Foster's	John Constable, R.A.	50.	Salisbury Madows	Sheepshanks	35 14 0
1838 May 15	Foster's	John Constable, R.A.	51.	Study of Trees and Fern with Donkies	Sheepshanks	23 2 0
1838 May 15	Foster's	John Constable, R.A.	52.	Cottage in a Cornfield	Burton	27 6 0
1838 May 15	Foster's	John Constable, R.A.	53.	Hampstead Hath-at the Ponds	Sheepshanks	37 5 6

We use tabs rather than commas as the separator because the values include commas. Alternatively we could wrap each value in speech marks and comma separate them.

Next we can load the data into R and examine it.

Categories
Art Art History Free Culture Projects

Freeing Art History: Urinal

I commissioned the ultra-talented cwebber to make a 3D model of a urinal suitable for 3D printing and signing. It’s licenced under the Creative Commons Attribution-ShareAlike 3.0 Unported licence. Here’s a picture of it:

urinal.pngYou can download the original Blender file here.

And there’s a version suitable for 3D printing available for download here.

Which you can also download from thingiverse here.

(You’ll need to scale it to fit your printer.)

If you don’t have a 3D printer yet, you can order a 3D print of the model from Shapeways here. If it’s too large/expensive you can upload and print a smaller version.

Next I’d like to commission a 3D printable model of a glass ampoule suitable for containing a small volume of air from a town such as Paris…

Categories
Art Computing Art History Art Open Data

Art Data Analysis: Sparse Coding Analysis

Bruegel

Sparse Coding


Recently, statistical techniques have been used to assist art historians in the analysis of works of art. We present a novel technique for the quantification of artistic style that utilizes a sparse coding model. Originally developed in v
ision research, sparse coding models can be trained to represent any image space by maximizing the kurtosis of a representation of an arbitrarily selected image from that space. We apply such an analysis to successfully distinguish a set of authentic drawings by Pieter Bruegel the Elder from another set of well-known Bruegel imitations. We show that our approach, which involves a direct comparison based on a single relevant statistic, offers a natural and potentially more germane alternative to wavelet-based classification techniques that rely on more complicated statistical frameworks. Specifically, we show that our model provides a method capable of discriminating between authentic and imitation Bruegel drawings that numerically outperforms well-known existing approaches. Finally, we discuss the applications and constraints of our technique.

http://www.pnas.org/content/107/4/1279

You can download the pdf here.