Art Computing Art History Art Open Data

Exploring Art Data 16

The scanned and OCRed text from Graves’ Art Sales is very noisy. Let’s start cleaning it up.

Firstly we’ll improve the source images.

In Scan Tailor, after fixing the orientation and letting the program Split Pages and Deskew, we can set the Content Box to “Manual” in Select Content, and crop out the header on each page and any entries that do not refer to Constable on the first and last pages of the series of scans. In Output we can then set the Output Resolution to 600dpi Black and White, Thickness to 30 (selecting Apply To… All Pages), and Despeckle to maximum (selecting Apply To… All Pages).

The resulting images are not ideal for human beings to read but give better results when processed with Tesseract.

We can create a config file telling Tesseract which characters to expect to find in a file. This should help remove some of the stranger characters from the output file.

We can save the following:

tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,."&-'

in a config file for Tesseract. The location of config files may be /usr/share/tesseract/tessdata/configs/ or /usr/local/share/tessdata/configs/, and you may need to be root to access the directory in either case. e.g.:

su -c 'echo tessedit_char_whitelist abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789,.\"\&-'"\'"' > /usr/local/share/tessdata/configs/gravesartsales'

(Ignore the silly quoting we have to use to include a single quote in a single quoted string…)

We can then use the processed files (in the “out” directory of the Scan Tailor project) and this config file along with a shell script to create an improved version of the extracted text. Why a shell script? Writing a script allows us to iteratively improve our approach to the task, and it allows ourselves and others to reproduce the task later. Shell scripts are sketchbooks and notebooks as well as useful tools in themselves.

Here’s the script:


# The cleaned up pages in order
PAGES="ALIM0008 ALIM0005 ALIM0009 ALIM0006 ALIM0010 ALIM0007 ALIM0011"
# The directory the input images and output text are in
# The combined output text

# Create empty output file
echo > ${RESULT}

for PAGE in ${PAGES}
    # Perform OCR
    tesseract -psm 6 "${DIRECTORY}/${PAGE}.tif" "${DIRECTORY}/${PAGE}" \
    # Append results to output file
    cat "${DIRECTORY}/${PAGE}.txt" >> ${RESULT}

If we save the script as next to the out folder of the Scan Tailor project and run it:


Then the results in constable.txt can be seen to be much improved on the original version:

$ cat constable.txt 

1834 June 7 Christie's - 67. Landscape with Figures - -
1838 May 15 Foster's John Constable, R.A. 3. Stonehenge, etc. Smith 4 14 6
1838 May 15 john Constable, R.A. 1o. Glebe Farm, etc. Williams 3 15 o
1838 May 15 JohnConstable,R.A. 12. SalisburyCathedraland l-lelminghamPark Allnutt 3 9 o
1838 May 15 John Constable, R.A. 13. Salisbury Cathedral and Glebe Farm Carpenter 24 1o 6
1838 May 15 john Constable, R.A. 14. Cornlield. Study for N.G. picture Radford 9 19 6
1838 May 15 John Constable, R.A. 23. Salisbury Cathedral, etc. Leslie 11 11 o
1838 May 15 john Constable, R.A. 26. Dedham Rulton 8 8 o
1838 May 15 John Constable, R.A. 29. View in I-lelmingharn Park Swaby 16 5 6
1838 May 15 John Constable, R.A. 30. SalisburyCathedral from Bishop'sGarden Archbutt 16 16 o
1838 May 15 john Constable, R.A. 31. Hadleigh Castle. Sketch Smith 3 13 6
1838 May 15 john Constable, R.A. 33. Two Views of Eat Bergholt Archbutt 24 3 o
1838 May 15 john Constable, R.A. 35. River Scene and Horse jumping Archbutt 52 lo o
1838 May 15 John Constable, R.A. 37. Salisbury Cathedral from Meadows Williams 6 1o o
1838 May 15 john Constable, R.A. 39. Mill on the Stour. Sketch I-lilditch 7 17 6
1838 May 15 john Constable, R.A. 4o. Opening Waterloo Bridge. Sketch Joy 2 1o o
1838 May 15 john Constable, R.A. 41. Weymouth Bay. Sketch. Swaby 4 4 o
1838 May 15 john Constable, R.A. 42. Waterloo Bridge and Brighton Archbutt 5 o o
1838 May 15 john Constable, R.A. 43. Chain Pier and Dedham Church Stuart 5 5 o
1838 May 15 john Constable, R.A. 44. l-lampstead Heath and Waterloo Bridge Morton 4 14 o
1838 May 15 john Constable, R.A. 45. Weymouth Bay, Waterloo Bridge,
and two others Burton
1838 May 15 john Constable, R.A. 46. East Bergholt, Dedham, etc. Nursey
1838 May 15 john Constable, R.A. 47. Weymouth Bay and four others Williams
1838 May 15 john Constable, R.A. 48. Moonlight and Landscape with
Rainbow Leslie
I338 May 15 John Constable, R.A. 49. Three Landscapes Archbutt
I338 May 15 john Constable, R.A. 5o. Salisbury Meadows Sheepshanks
1838 May 15 John Constable, R.A. 51. Study of Trees and Fem with
Donkies Sheepshanlts 23 a o
I838 May 15 John Constable, R.A. 52. Cottage in a Comtield Burton a7 6 o
1333 May 15 ,, john Constable, R.A. 53. Hampstead Heath-at t.l1e Ponds Sheepshanks 37 5 6

1838 May 15 Foster's John Constable, R.A. 54. Flatford Mill-Horse and Barge Leslie
1838 May 15 john Constable, R.A. 55. View near Flatford Mill Rochard
1838 May 15 John Constable R.A. 56. Hampstead Heath Burton
1838 May 15 john Constable, R.A. 57. Gillingharn Mill Leslie
1838 May 15 John Constable, R.A. 58. East Bergholt Nursey
1838 May 15 john Constable, R.A. 59. Flatford-Barge Building Sheepshanlrs
1838 May 15 john Constable, R.A. 6o. Two Views near Petworth Swaby
1838 May 15 john Constable, R.A. 61. Hampstead Heath -London in distance Archbutt
1838 May 15 John Constable, R.A. 65. Dedham Vale-Long Valley Norton
1838 May 15 John Constable, R.A. 66. London from l-lampstead Burton
1838 May 15 John Constable, R.A. 67. Flatford Mill-Dark Allnutt
1838 May 15 john Constable, R.A. 68. Brighton and Chain Pier. Ex. 1827 Tiflin
1838 May 15 john Constable, R.A. 69. The Lock near Flatford Mill Archbutt
1838 May 15 John Constable, R.A. 7o. The Glebe Farm. R.A. 1835 Miss Constable
1838 May 15 John Constable, R.A. 71. The Cenotaph, etc. R.A. 1836 Miss Constable
1838 May 15 john Constable, R.A. 72. Salisbury Cathedral from Bishop's
Garden. 1823 Tiflin
1838 May 15 john Constable, R.A. 73. View in Helmingham Park. R.A. 1830 Allnutt
1838 May 15 john Constable, R.A. 74. Opening of Waterloo Bridge. R.A. 1832 Mosley
1838 May 15 Iohn Constable, R.A. 75. View of Dedham-Gipsies. R.A. 1828 M. Bone
1838 May 15 john Constable, R.A. 76. The Loclr. R.A. 1824. Sold at
Foster's, February 15th, 1855, for 6903 Birch
1838 May 15 john Constable, R.A. 77. On the River Stour-Horse on a
Barge. R.A. 1819 Morton
1838 May 15 John Constable, R.A. 78. I-Iadleigh Cutie. R.A. 1819 Miss Constable
1838 May 15 John Constable, R.A. 79. Salisbury Cathedral from Meadows. 1831 Ellis
1333 May I5 John Constable, R.A. 8o. Dedham Mill and Church Brown
I838 May 15 ,, john Constable, R.A. 81. Arundel Castle and Mill. 1837 1. Constable
1839 April 13 Christie's Samuel Archbutt. 114. Salisbury Cathedral from Meadows Theobald
I839 April 13 Samuel Archbutt. 115. Embarltation of George IV, Waterloo
Bridge Bought in 43 1 o
1345 May 16 Mr. Taunton. 41. Salisbury Cathedral Bought in 441 o o
1846 May 16 Mr. Taunton. 42. Dedham Bought in 357 o 0
1345 June 4 ,, Edward Higginson. 77. Waggon passing through a River Rought 378 o o
1848 May 3 Phillips Ralph Thomas. 176. Salisbury Cathedral - -
I848 June 2 Christie's Sir Thos. Baring. 21. Opening Waterloo Bridge Barton 33 2 o
13 49 May 17 Taunton. 11o. Salisbury Cathedral from the Meadows. The
celebrated picture Rought 43o 1o o
1349 M87 I7 Taunton. 111. Dedham, with Towing Path Bought in 157 1o o
1351 June 13 H03lrlh- 46. Hadleigh Castle Winter 32o 5 0
1353 M111 1 R. Morris. 131. A Lock on the Stour Wass 105 o o
1353 I 1111' 1 Charles Birch. 41. jumping Horse on the Stour Gambart 393 15 0
1353 I1llY 7 .1 Charles Birch. 42. Opening of London Bridge Bought in 252 o o
1855 Feb. 15 Foster's Charles Birch. 18. The Lock. 55 x 48 1-lolmgg 368 a o

1855 Mar. 31 B. Archer QBurtonJ. 99. The Wl1ite Horse. f.Cl1efd'auvreJ Horlgson 630 0 0
1858 Feb. 3 Henry Wallis. 104. Opening of Waterloo Bridge. 51 x 86 - -
1858 May 21 John Miller. 162. Salisbury Cathedral. Sketch - 49 0 0
1858 May 22 John Miller. 227. Comlield-Reapers. A Plough in Foreground - 63 1 0
1859 June 13 ,, Potts. 240. Dedham. From Constable's sale Wallis 197 8 0
1860 April 25 Foster's C. R. Leslie, R.A. 87. House with Hatchment and Trees - -
1860 April 25 C. R. Leslie, R.A. 90. Willy Lott's House - -
1860 April 25 C. R. Leslie, R.A. 92. Sketch in Suffolk, with inscription - -
1860 April 25 C. R. Leslie, R.A. 93. Mill at Arundcl - -
1860 April 25 C. R. Leslie, R.A. 94. Lock on the Stour -
1860 April 25 C. R. Leslie, R.A. 95. Stonehenge. Engraved - -
1860 April 25 C. R. Leslie, R.A. 96. A Running Brook -
1860 April 25 C. R. Leslie, R.A. 97. The Glebe Farm. Presented to Leslie -
1860 April 25 C. R. Leslie, R.A. 98. Hampstead Heath, with Surrey Hills
1860 April 27 C. R. Leslie, R A. 386. Burning of Houses of Parliament. Drawing -
1860 April 27 C. R. Leslie, R.A. 387. Jacques and Wounded Deer. Drawing
1860 April 27 C. R. Leslie, R.A. 388. Mill at Colchester. Drawing -
1860 April 27 C. R. Leslie, R.A. 390. Brighton Fishing Boats. Drawing - -
1860 April 27 C. R. Leslie, R.A. 392. South Stoke. Drawing - -
1860 April 27 C. R. Leslie, R.A. 393. Dover-Two French Luggers. Drawing - -
1860 April 27 C. R. Leslie, R.A. 394. Studies of Trees. Chalk -
1860 May 17 J. Constable, R.A. 60. Colchester Church -
1860 May 17 J. Constable, R.A. 61. Hampstead towards Harrow -
1860 May 17 J. Constable, R.A. 62. Hadlow Castle -
1860 May 17 J. Constable, R.A. 63. Flatford
1860 May 17 J. Constable, R.A. 64. A Mill - -
1860 May 17 J. Constable, R.A. 65. Cattle on Hampstmd Heath - -
1861 Feb. 6 Henry Wallis. 86. Opening of Waterloo Bridge. 86 x 52 Davenport 464 o 0
1861 May 3 E. Gambart. 294. The Lock. 475-x 55. The original picture Leatham 231 0 o
1863 May 16 ,, Gentleman. 160. The Glebe Farm - -
1863 June 17 Foster's Charles Pemberton. 47. Near Dedham-River and Boats -
1863 June 17 Charles Pemberton. 69. The Leaping Horse. 72 x 54 -
1865 May 6 i 44. Cathedral-Salisbury. 19Q x 199
1865 May 6 ,, i 45. The Mill Stream. Engraved. 33 x 38 -
1866 Mar. 28 B. Moulton Thomas Churchyard. 64. Willy Lott's House. 24 x 20 Cox
1866 Mar. 28 Thomas Churchyard. 65. Flatford Mill. 16 11 13 -
1856 Mar. 28 Thomas Churchyard. 66. Bergholl Heath. 19 x 12 Cox
1866 Mar. 28 Thomas Churchyard. 67. View at Dedham. 25 x 18 Pearce
1866 May 19 Christie's George Young. 25. The Hay Wain Cox
1867 June 22 ,, -i 91. Landscape -
1867 Dec. 11 Foster's i 121. The Leaping Horse, Dedham Lock. E. Pemberton coll.
1870 May 21 Christie's Edwin Bullock. 86. Weymouth Bay
1870 May 21 Edwin Bullock. 109. Hampstead Heath
1870 May 21 ,, Edwin Bullock. 1 15. Heath Scene-Three Peasants in Cart

1872 Mar. I6 Christie's G. R. Burnett. I2o. On the Stour, near Canterbury Agnew
I872 Mar. I6 G. R. Burnett. I2I. Opening of Waterloo Bridge Agnew
1872 April 26 Joseph Gillott. I93. Approach to London from Hampstmd Agnew
I872 April 26 Joseph Gillott. I95. Landscape with Cottage New York
1872 April 26 Joseph Gillott. I96. On the Stour-Dedham Church New York
X872 April 26 Joseph Gillott. I97. On the Stour, with Cow New York
I872 April 26 Joseph Gillott. I98. Weymouth Bay New York
x873 June 5 John Hargreaves. 292. Heath Scene--Three Peasants in a Cart Agnew
1874 June I3 A. Wood. 5I. Hampstead Heath. Bullock 81 Hargreave coll. Bought in
1875 April 23 Sam Mendel. 3I5. On Suffolk River-Watermill Ashton
1875 June I2 T. Woolner, R.A. I34. On the Stour -
I875 June I2 T. Woolner, R.A. I35. View nun Highgate. Young
1875 July 3 Jesse Watts Russell. 26. Harwich Lighthouse Smith
1876 May 6 Wynn Ellis. 36. The Glebe Farm. I8 x 235 Agnew
1878 April 6 Munro of Novar. I2. Stralford St. Mary, Suffolk. I2 x I95 Martinmu
I878 April 6 Munro of Novar. I3. Hampstcad Heath. I2 x I95 Bentley
I878 April 6 Munro of Novar. I4. Ploughing-Windmill. Ioix I4 Agnew
1879 May 3 Jonathan Nield. I2. Landscape and Watermill Currie
1879 May 3 Jonathan Nield. I3. Stoke by Neyland Permain
1879 May 3 Jonathan Nield. I4. Thames-Westminster Agnew
1879 May 5 Joseph Fenton. I49. Embarkation of George IV from Whitehall Agnew
1879 May Io W. Fuller Maitland. 7I. Vale of Dedham. I8II. 29521 445 Daniel
1879 May Io W. Fuller Maitland. 72. Weymouth Bay. Sketch. 2I x 298 Daniel
I879 May 3o James Hughes Anderdon. Io8. A Brook Scene. C. R. Leslie coll. Agnew
1879 May 3o James Hughes Anderdon. III. Malvern Hall. -R.A. I878 Salting
I881 July 9 William Sharp. 1Io6 N.J 72. Hampstead Heath Agnew
1882 Mar. I8 G. R. Burnett. 1158 N.J I03. Opening Wstminster Bridge Permain
I883 May 5 J. M. Dunlop. 1285 P.J 6I. View on the Stour-Children Angling Martin
1883 May 5 Henry Woods. 15I3 P.J I46. Salisbury Cathedral. Heugli coll. Brooks
1883 June 8 J. Scovell. 158 SJ 243. Helmingham Park. Engraved Fielder
1883 Dec. 8 Edward Fitzgerald. 1573 SJ 27. The Edge of a Wood-Cows
Watering W. C. Quilter
I334 May 3 S. Dunning. 1869 S.J I37. River Scene-Two Children Fishing Agnew
I335 Fell il Mrs. George Vaughan. 194 V and I 34 V.J 86. The Lock Lesser
1886 Man 11 H. McConnell. 125-, w.y 65. Flatford Mill. ,5 X 55 Brooks
1886 Mar. 27 H. McConnell. 66. Dell in Helmington Park. 44 x 5I S. White
1335 MW I5 Henry Barton. 1294 W.J I68. Landscape, C-mvel Cart, etc. M. Colnaghi
I986 May H 5- Addinsm 1541 W9 s5. Windmill and Landscape. ll 295.
GillOll collection Permgin 141 I5 Q
I357 May 7 Malcolm Onne. 36. Mudow Scene and Sheep. 6 x 9 Permain 55 I 3 o
1337 July II Constable V. Blundell. 68. Hampstead Heath. I830 Stewart 1050 o o
1'37 July ll Constable V. Blundell. 72. Salisbury from Fields Agnew 94 Io o
3337 II-IIY II Constable V. Blundell. 8I. West End Fields, Hampstead Agnew 294 o o
I888 Mar. 24 ,, Frederick Fish. 280. The Mill Stream Laser 346 Io o

1888 April 14 Christie's A. Andrews. 154. The Lock. 35 x 2911- Fraser
1888 April 28 -i 44. Flntford Mill Withdrawn
1890 April 26 John Hunt. 104. Carrying Hay. 35 x 47 Lesser
1890 June 23 Captain Constable. 75. Salisbury. 1821. Dowdeswell
1890 June 23 Captain Constable. 77. Coast Scene. Colnaghi
1890 June 23 Captain Constable. 78. Stormy Sunset-Brighton. 1824 Agnew
And many others under L100.
1891 April 25 Marquis de Santurce. 16. Windmill--Peasant Ploughing. 15 x 20 Norman 210 0 0
1891 May 28 Miss Isabel Constable. 109. Abram Constable Gooden 45 3 0
1891 May 28 Miss Isabel Constable. 142. Lock on the Stour Gooden 94 10 0
1891 May 28 Miss Isabel Constable. 148. The Stour, Flatford Mill Gooden 105 0 0
1891 May 28 Miss Isabel Constable. 149. landscape with Cottages Colnaghi 151 4 0
1891 May 28 Miss lsabel Constable. 150. Dedham Vale Colquhoun 514 10 0
1891 June 27 Sir William R. Drake. 15. Willy Lot's House Dowdeswell 105 0 0
1892 Feb. 13 Charles L. Collard. 133. Noon. Sketch Gooden 262 10 0
1892 Mar. 19 i 738. Dedham Vale - 131 5 0
1892 April 30 Messrs. Murriela. 64. Cattle under Trees. 10 x 13 Agnew 161 15 0
1892 April 30 Messrs. Murrieta. 65. Cottages and Trees. 10 x 13 Agnew 105 0 0
1892 April 30 Messrs. Murrieta. 66. Hampstead Heath. 6Q x 11 Hardy 115 10 0
1892 June 17 Miss Isabel Constable. 253. Hadleigh Boussod 110 5 0
1892 June 17 Miss Isabel Constable. 261. Brighton Dowdeswell 309 15 0
1892 June 17 Miss Isabel Constable. 262. Hampstead Heath Boussod 472 10 0
1893 Feb. 18 -Z 69. Hampstead Heath Wallis 160 13 0
1893 April 15 Edwin Webster. 20. Waterloo Bridge. Sketch Laurie 209 15 0
1893 April 29 Ralph Brocklebank. 94. Landscape, with Church. 17 x 14 Earle 31 10 0
1893 June 3 J. Stewart Hodgson. 27. Hampstcad Heath. 1830. 26 x 39 Wallis 2625 0 0
1893 June 19 Vicat Cole, R.A. 188. Landscape-Sheep and Cottage Wallis 178 10 0
1894 April 21 Henry Hibbert. 126. Hampstead Heath. 1827. 241 x 31f Tooth 1835 10 o
1894 April 28 Richard Hemming. 84. On the River Stour. 51 x 73 Agnew 6510 0 0
1894 April 30 Dr. Barford. 126. Dedham Mill Tooth 117 12 0
1894 May 5 John Graham. 44. The Dell, Helmi Gooden 241 10 0
1894 May 26 ,, Jol1n Gibbon. 6. Yarmouth Jetty. Gooden 514 10 0
1894 Nov. 15 Robinson 11' F. -- 134. On the Stour Tooth 309 10 0
1895 April 27 Christie's James Orrock. 276. Gravel Pits RadclylTe 99 15 0
1895 April 27 James Orrock. 288. A Lock on the Stour Simmons 105 0 0
1895 April 27 James Orrock. 296. Brighton Beach Silber 325 10 0
1895 April 27 James Orrock. 297. Near Bergholt Wilson 346 10 0
1895 May 18 Thos. Woolner, R.A. 112. Near Highgate. 12 11 19 Salting 189 0 0
1895 June 8 J. Clark. 87. Barges on the Stour. 40911 539 Lawrie 472 10 0
1895 June 15 James Price. 26. The Mill Tail. 5111 81 Agnew 378 0 0
1895 July 6 Charles Frederick Huth. 6. Windmill and Cottages. 6 x 8 Vokins 110 5 0
1895 July 6 Charles Frederick Huth. 12. Cottage, Angler, Dog. 8 x 91 Colnaghi 183 15 0
1895 July 6 Charles Frederick Huth. 77. Stratford Mill. 1820. 50 x 72 Agnew 8925 0 0
1395 April 16 ,, Eustace Constable. 34. Chesil Beach Agnew 246 15 0

I896 June 6 Christie's Z 32. On the Stour Anson 199 10 0
1896 June 13 Sir Julian Goldsmid. 52. Embarkation of George IV Tooth 2100 o 0
I896 July 14 Lord Leighton. 290. The Hay Wain. Study. 13511 11 Wallis 157 10 o
1896 July 14 Lord Leighton. 291. The Shower. 9511 12 Agnew 21o 0 0
1897 May 22 ,, Z 66. Salisbury Cathedral. 18 x 23 D. Nathan 141 15 o
1897 May 28 Robinson 8t F. Col. Unthank. 229. The Lock. 40 x 49 - 116 0 O
I898 Feb. 5 Christie's Z 11. View on the Stour Wigzell 420 o o
1898 May 7 Z 29. View from Hampstead Heath. 1824. 195 x 3o Marayos 493 10 o
1898 May 21 Joseph Ruston. 13. View on Hampstead Heath. 13 11 16f McLean 252 0 o
1593 July 2 Z 65. Watermill, Figures crossing Bridge. 17 x 235 Black 157 1o o
I899 Mar. 11 Sir John Kelk. 6. Salisbury Cathedral. 289 x 35f Agnew 1365 o o
I899 Mar. 27 Z 73. A Cottage in a Wood Gribble 120 15 o
1899 May 6 Sir John Fowler. 51. Ploughing-Windmill. 10911 13Q Radley 241 10 0
1399 July 15 Z 79 A. Lock, and Horse on Towing Path. 11 x 155 Wigzell 126 0 9
1990 Mgy 5 Mrs. Bloomfield Moore. 371. Gipsy Encampment, Dedham.
16511 27 Dunlhome 178 10 o
I901 Feb. 23 Z 101. View on the Stour. 24 x 30. Gillott collection Tooth 388 1o o
I901 Mar. 2 Hubert Martincau. 78. Stratford St. Mary's. 12 x 195 Colnaghi 756 o 0
I901 Mar. 30 Z 11. Ploughman, Bergholt. 8 x 21 Wallis 231 15 o
I901 May 18 ,, E. A. Lcatham. 121. The Lock. 55 X 47 Vicars 1995 0 o
1901 June 27 Robinson 81 F. Z 79. On the Stour Micholls 420 0 o
1902 Feb. 17 Christie's Z 116. View ol' Dedham Mill. 24 x 30 Leggatt 304 1o 0
1902 Mar. 15 Z 146. Hampstead Heath. 18 x 23 Ruten 157 1o o
1902 April 28 Z 65. Timber Waggon. 18 It 26 Arthur 189 0 o
I902 May 3 C. A. Barton. 5. Gillingham Mill. 19x 23 Fallte 1207 1o o
1902 May 3 C. A. Barton. 6. Brighton Beach. 12 x 165 Agnew 441 o o
ipa May 3 C. A. Barton. 7. Hampstead Heath. 9 x 12 Dubbs 231 o o
1-9115 May 3 Z 84. From Hampstead Heath. 135- x 175 Dubbs 105 o o
T902 July 7 -- 79. landscape, with Woodman. 19 x 30 - 105 0 0
1963 Feb. 28 Z 123. A House at Hampstead. 23.111 19Q Agnew 524 0 0
I903 May 16 R. T. H. Bruce. 28. Jumping Horse. Sketch. 19411 25 Wigzell 199 1o 0
I903 May 23 Reginald Vaile. 7. Dredgers on the Medway. 9121 131 Sedelmeyer 231 0 o
I903 May 23 Reginald Vaile. 8. Stonehenge. Engraved. 7 x 104 Colnaghi 75 2 o
1903 June 27 ,, Z 140. Hudleigh Castle. 169 I 22J Graham 157 10 o
1903 Nov. 6 Morrison 81 Co. D. McCorltindale. 77. The House on the Hill. Illustrated. 2o x 15 - -
1904 Mar. 19 Christie's C. F. Huth. 45. Mill at Gillingham. 10 x 125 Permain 178 0 0
I904 April 3o Z 79. Mill Stream, Flatford. 9 I 12 Amor 152 5 o
I904 April 30 Z 133. West End Fields. 9421 14 Agnew 598 1o o
1904 June 4 J. Orrock. 9. View near Bentley. Drawing. Boswell 210 o 0
1904 June 4 J. Orrock. 64. East Bergholt Mill. 34 x 44 Agnew 1050 o o
1904 June 4 J. Orroclt. 65. Hampstead Heath. 26 x 40 Low 546 0 0
1904 June 4 J. Orrock. 67. Lake-Figures on Road. 24511 295 A. Smith 420 o o
I904 June 4 J. Orrock. 68. The Glebe Farm. 20 x 28 Agnew 273 o o
1904 June 4 ,, J. Orroclt. 69. East Bergholt. 19 x 29 McLean 105 0 o

1904 June 4 Christie's I. Orrock. 70. A Glebe Farm. 27511 35 Richardson 199 ro 0
1904 June 6 I. Orrock. 246. Landscape and Figure. 7 at 105 Agnew 262 10 0
1904 Nov. 19 i 24. Helmingham Dell. 281136 Clare 262 10 0
1905 April 29 john Gabbitas. 82. Peasant Woman on Road. 8 x 115 Mchan 110 5 0
1905 April 29 John Gabbitas. 84. Cottage at Langham. 125x 145 Ogaton 294 0 0
1995 May 13 Charles Neck. 22. River-Road over Bridge. 34 x 45 Sulley 378 0 0
1905 May 20 Louis Hutli. 38. Salisbury Cathedral. 28 x 36 Colnaghi 1785 0 0
1905 May 2o Louis Huth. 39. Dedham Watermill. 21 x 30 Agnew 525 0 0
1906 Mar. 31 E. M. Denny. 5. Salisbury Bridge. Illustmted. 21 x 295 Knoedler 2835 0 0
1906 Mar. 31 E. M. Denny. 6. Strand on the Green. 11 x 155 Wallis 483 0 0
1907 April 2o ,, i 104. Salisbury Cathedral. 335 x 43 Gribble 1575 0 0
1907 May 16 Paris Sedelrneyer. 27. Hastings - 2200 francs
1907 May 16 ,, Sedelmeyer. 43. Stratford Church - 1600 francs
1907 June 14 Christi ' Lord Falkland. 23. The Canal Boat. 47 x 38 McLean 399 0 0
1907 June 28 i 49. Vale of Health, Hampstead. 10511 15 Gooden 283 10 0
1908 jan. 18 ,, Thos. McLean. r8. Heimingham Dell. 28 11 365 Bone 157 10 0
1908 May 6 Paris i 5. The Glebe Farm - 6000 francs
1908 May 23 Christie's Humphrey Roberts. 8. Opening ofWaterl0o Bridge. Illus. 175 x 32 Reid 1155 0 0
1908 May 23 Humphrey Roberts. 9. Brighton Beach. 125 x 195 Clark 556 10 0
1908 'May 23 Humphrey Roberts. 10. A Farm. 1 15 x 155 Gooden 336 0 0
1908 June 25 StephenG. Holland. 12. Salisbury Cathedral. 1826. Illus. 34 x 435 Knoedler 8190 0 0
1908 June 25 Stephen G. Holland. 14. Arundel Mill and Castle. 1 1 x 155 Colnaghi 336 0 0
1908 July 3 i 26. The Valley Farm. 50 x 40 Lane 651 0 0
1909 Mar. 27 Richard Hobson. 103. Hampstead Heath. 18511 255 Agnew 378 0 0
1909 April 24 Professor B. Bertrand. 33. Yarmouth jetty. 27 11 35 Holt 1449 o 0
1909 May 7 R. G. Behrens. 18. Nur Dedham. 105 x 14 Evelyn 1 15 1o 0
1909 May 21 E. H. Cuthberlson. 13. River Stour-Barges. 25 11 40 Vicars 714 0 0
1909 May 21 E. H. Cuthbertson. 14. ln Helmingham Park. 295 x 245 Gooden 441 0 0
1909 May 21 E. H. Cutbbertson. 15. Salisbury. 275 x 355 Leggatt 404 5 0
1909 May 21 E. H. Cuthbertson. 20. A Cornield, Brighton. 12511 195 Leggatt 126 0 0
1909 June 24 Holbrook Gaskell. 8. Arundel Mill and Castle. 27 x 37 Knoedler 8820 0 0
1909 July 9 Sir C. Quilter. 5. Brighton Beach. Drawing. 45 x 75 Wallis 162 15 0
1909 July 9 Sir C. Quilter. 50. Wat End Fields. Oil. 125 x 201- A. Gibson 630 0 0
1910 May 6 O. E. Coope. 8. The Vicarage. Oil. 185 x 235 Colnaghi 735 o 0
1910 June 17 Sir F. T. Mappin. 14. Stoke by Neyland. Oil. 49l x 65 Sulley 9240 0 o
1910 june 24 Armstrong Heirlooms. 47. Glebe Farm, Dedham. OiL 18 x 235 Tooth 2047 10 0
1910 June 24 ,, Armstrong Heirlooms. 48. Hampstead Heath. 15511 195 Gooden 131 5 0

There are still confusions between cases of letters, between 0 and o, between ” and ,, , and the fractions cannot be recognized at all but these can easily be found and fixed using a combination of scripting and manual editing.

Which is what we will do next.

Art History Art Open Data

Exploring Art Data 15

Let’s find an art historical data source that hasn’t already been digitised and made freely available.

Graves’ Art Sales is often referred to in economic studies of art history. It is in the public domain but isn’t (at the time of writing) available to download from any of the public domain text repository projects. Fortunately, copies of early-1970s facsimiles (also out of copyright) are available through online booksellers.

While waiting for the first volume to be delivered, I made a very simple homebrew book scanner. It’s a cardboard box cut in half, a bright lamp, a sheet of glass and cheap digital camera camera on a tripod. The design is from , which also has more sophisticated designs available.

The scanner with volume one of Graves in place ready for scanning:


Here’s the book:


We’ll scan the pages containing the data of Constable’s sale prices. This is a simple (but slow) matter of photographing first all the front sides of those pages in turn, then turning the book around and photographing all the back sides. This makes scanning faster but does mean that the pages are out of order. Since we are only using a few pages here, we can rename them manually but there are scripts to help do this for entire books.

A scanned page:


To rotate and clean up the pages we will use a piece of software called Scan Tailor ( After processing in Scan Tailor, the above page looks like this:

We can extract the text from this page using the Tesseract Optical Character Recognition program:

$ tesseract ALIM0005.tif ALIM0005
$ cat ALIM005.txt

CONSTABLE, john, R.A.-—-ranlinued
1838 May IS Foste1's john Constable, R.A. 54. Flatford Mill-Horse and Barge Leslie 52 IO O
1838 May IS ,, john Constable, R.A. 55. View near Flatford Mill Rochard II 0 6
1838 May I5 ,, john Constable, R.A. 56. Hampstead Heath Burton I7 6 6
1838 May IS ,, john Constable, R.A. 57. Gillingham Mill Leslie 37 16 6
1838 May I5 ,, john Constable, R.A. 58. East Bergholt Nursey 5 I5 6
1838 May IS ,, john Constable, R.A. 59. Flatl`ord·-Barge Building Sheepshanlrs SI 9 O
1838 May IS ,, john Constable, R.A. 60. Two Views near Pétworth Swaby 7 7 0
1838 May IS ,. john Constable, R.A. 61. Hampstead Heath—London in distance Archbutt Jl IO 0
1838 May I5 ,, john Constable, R.A. 65. Dedham Vale—Long Valley Norton 25 4 6
1838 May IS ., john Constable, R.A. 66. London from Hampstead Burton 63 0 0
1838 May I5 ,, john Constable, R.A. 67. Flatford Mill—Dark Allnutt 34 I3 0
1838 May IS ,, john Constable, R.A. 68. Brighton and Chain Pier. Ex. 1827 Tiffin 45 3 0
1838 May I5 ., john Constable, R.A. 69. The Lock near Flatford Mill Archbutt 44 2 0
1838 May IS ,, john Constable, R.A. 70. The Glebe Farm. R.A. 1835 Miss Constable 74 ll 0
1838 May IS ,, john Constable, R.A. 71. The Cenotaph, etc. R.A. 1836 Miss Constable 42 0 0
1838 May IS ,, john Constable, R.A. 72. Salisbury Cathedral from Bishop’s
Garden. 1823 Tiflin 64 1 0
1838 May I5 john Constable, R.A. 73. View in Helmingham Park. R./\. 1830 Allnutt 56 I4 0
1838 May I5 john Constable, R./\. 74. Opening of \Vaterloo Bridge. R.A. 1832 Mosley 63 0 0
1838 May IS john Constable, R.A. 75. View of Dedham—Gipsies. R.A. 1828 M. Bone 105 0 0
1838 May IS john Constable, R.A. 76. The Lock. R.A. 1824. Sold at
Foster's, February lslh, 1855, for {903 Birch 131 0 0
1838 May I5 john Constable, R.A. 77. On the River Stour—Ho1se on a
Barge. R.A. 1819 Morton 157 I0 0
1838 May IS ,, john Constable, R.A. 78. Hadleigh Castle. R.A. 18:9 Miss Constable 105 0 °
1838 May IS ,, john Constable, R.A. 79. Salisbury Cathedral from Meadows. 1831 Ellis 110 5 0
1838 May IS ,, john Constable, R.A. 80. Dedham Mill and Church Brown 45 3 0
1838 May IS ,, john Constable, R.A. 81. Arundel Castle and Mill. 1837 j. Constable 7S I5 0
1839 April 13 Christie's Samuel Archbutt. 114. Salisbury Cathedral from Meadows Theobald 3l I0 0
1839 April 13 ,, Samuel Archbutt. 115. Embarlcation of George IV, Waterloo
Bridge Bought in 43 1 0
1846 May 16 ,, Mr. 'I`aunton. 41. Salisbury Cathedral Bought in 441 0 0
1846 May 16 ,, Mr. Taunton. 42. Dedham Bought in 357 0 0
1846 june 4 ,, Edward Higginson. 77. Waggon passing through a River Rought 378 0 0
1848 May 3 Phillips Ralph Thomas. 176. Salisbury Cathedral — —-
1848 june 2 Christie's Sir Thos. Baring. 21. Opening \Vaterloo Bridge Barton 33 2 0
1849 May 27 ,, Taunton. 110. Salisbury Cathedral from the Meadows. The
celebrated picture Rought 430 I0 0
1849 May 27 ,, Taunton. ll 1. Dedham, with Towing Path Bought in 157 I0 0
ISS! june 13 ,, Hogarth. 46. Hadleigh Castle Winter 320 5 0
1853 Mar. 7 ,, R. Morris. 131. A Lock on the Stour Wass 105 0 0
1853 july 7 ,, Charles Birch. 41. jumping Horse on the Stour Gambart 393 I5 0
1853 .l'·'lY 7 1. Charles Birch. 42. Opening of London Bridge Bought in 252 0 0
185; Feb. I5 F0ster's Charles Birch. 18. The Lock. 55 11 48 Holmes 860 0 0

The data is quite noisy. It’s possible to clean up a few pages by hand individually, but cleaning up an entire volume would be more practical with Internet-based collaboration. Project Gutenberg’s Distributed Proofreaders project is a good example of this.

Next time we’ll clean up the data and load it into R.

Art History Art Open Data

Art Data Analysis: The Sale Of The Late King’s Goods

late_king.pngIn “The Sale Of The Late King’s Goods” (Macmillan, 2006, ISBN 1405041528) Jerry Brotton surveys the inventories, invoices and auction records of the art collected by King Charles I.

This isn’t quantitative analysis of art data but Brotton does use the use of data such as the purchase dates, prices and other hard facts of Charles’s art collection during his life and after the King’s execution to drive and underwrite the dramatic narrative of artistic and political history.

Art History Art Open Data

Art Data Analysis: Old Masters Auctions And The Weather (Contains link to download full PDF)


“Psychological evidence predicts that sunny weather is associated with an upbeat mood. Although standard economic theory presumes invariant preferences and full rationality, the finance literature has documented a strong relationship between morning sunshine in the city of a country’s stock exchange and daily market index returns. In this paper we examine the effect of different weather conditions on art auction selling prices. Our sample includes art prices at auctions conducted from 1786 to 1909 in England. With respect to the main variables identified by the literature as being associated with agents’ moods, we find that the length of daylight duration (from sunrise to sunset) on which the auction is conducted has a significant positive effect on the auction selling prices in all our model specifications. In addition, we find in some specifications direct positive effects of hours of sunshine during the day, precipitation, temperature, and whether the daylight duration increases relative to the previous day, on auction selling prices.”

One of the advantages of having multiple large historical datasets freely available is that they can be combined or cross-referenced to find novel information. Such as that art auctions affected by the weather.

Aesthetics Art History Art Open Data

Exploring Art Data 14

If we save the data of Roger de Piles’ scores for artists to a csv file we can load them into R:

## Load the tab separated values for the table of artist scores
colClasses=c("character", "integer", "integer", "integer",
## Replace NA values with zero
nas<-which(, arr.ind=TRUE)
scores[nas[1], nas[2]]<-0
## Create the total score
scores<-cbind(scores, Total=apply(scores[2:5], 1, sum))

This allows us to find the lowest and highest scores:

## Min, max of each score
scoreMinMax<-function(scores, column){
cat(column, "\nMin (", lowest, "): ", sep="")
cat(scores$Painter[scores[column] == lowest], sep=", ")
cat("\nMax (", highest, "): ", sep="")
cat(scores$Painter[scores[column] == highest], sep=", ")
> scoreMinMax(scores, "Composition")
Min (0): Guido Reni, Gianfrancesco Penni
Max (18): Guercino, Rubens
> scoreMinMax(scores, "Drawing")
Min (6): Giovanni Bellini, Lucas van Leyden, Caravaggio, Palma il Vecchio, Rembrandt
Max (18): Raphael
> scoreMinMax(scores, "Colour")
Min (0): Pietro Testa
Max (18): Giorgione, Titian
> scoreMinMax(scores, "Expression")
Min (0): Jacopo Bassano, Giovanni Bellini, Caravaggio, Palma il Vecchio, Gianfrancesco Penni
Max (18): Raphael
> scoreMinMax(scores, "Total")
Min (23): Gianfrancesco Penni
Max (65): Raphael, Rubens

Cluster the artists:

## Clustering Utilities clustersNames<-function(clusters, names){ clusterCount<-length(clusters$size)<-lapply(1:clusterCount, function(cluster){ names[clusters$cluster == cluster]}) } printClustersNames<-function(clustersNames){ clusterCount<-length(clustersNames) for(cluster in 1:clusterCount){ cat("Cluster", cluster, ":", paste(unlist(clustersNames[cluster]), collapse=", "), "\n\n") } } ## Cluster based on the numeric scores. 8 = 2x2x2 (Low/High) clusters<-kmeans(scores[2:5], 8) names<-clustersNames(clusters, scores$Painter) printClustersNames(names)

Cluster 1 : Correggio, Rembrandt, Van Dyck
Cluster 2 : Andrea del Sarto, Federico Barocci, Daniele da Volterra, Guercino, Lucas Jordaens, Giovanni Lanfranco, Otho Venius, Perin del Vaga, Primaticcio, Francesco Salviati, Taddeo Zuccari
Cluster 3 : Charles Le Brun, Il Domenichino, Giulio Romano, Leonardo da Vinci, Eustache Le Sueur
Cluster 4 : I Carracci, Raphael, Rubens, Vanius
Cluster 5 : Guido Reni, Gianfrancesco Penni
Cluster 6 : Jacopo Bassano, Giovanni Bellini, Caravaggio, Murillo, Palma il Vecchio
Cluster 7 : Sebastian Bourdon, Cavalier D'Arpino, Albrecht Dürer, Lucas van Leyden, Michelangelo, Il Parmigianino, Pietro Testa, Federico Zuccari
Cluster 8 : Abraham van Diepenbeeck, Giorgione, Giovanni da Udine, Holbein, Jacob Jordaens, Palma il Giovane, Sebastiano del Piombo, Teniers, Tintoretto, Titian, Veronese

And graph the scores:

## Stacked bar chart
## Allow room for names at bottom and legend at right
## 7 is from trial and error
par(xpd=T, mar=par()$mar+c(7,0,0,7))
barplot(t(as.matrix(scores[2:5])), names.arg=scores$Painter,
main="Roger de Piles' Ratings", col=rainbow(4), las=2, border=NA)
## Position legend in right margin
## 60 is from trial and error
legend(60, 60, names(scores[2:5]), fill=rainbow(5), cex=0.75)


Add category Art History Art Open Data

Art Data Analysis: Venus Iconography

Afbeelding 4.pngVenus Iconography

The Topical Catalogues are a resource for further studies and offer a tool to develop applications of the quantitative approach in art history.
The application of an inverse power law, known as Lotka’s law of scientific productivity, is a singularity in art history

Another well-defined art historical study with useful conclusions. K. Bender has assembled catalogues of depictions of the goddess Venus from various regions, and analysed the resulting data. The results fit a power law.

Art History Art Open Data

Google Books Art History 2

En français:

Gazette des beaux-arts

Le trésor de la curiosité 1

Le trésor de la curiosité 2

Histoire des peintres de toutes les écoles: école Flamande

Catalogue de la galerie des tableaux



Art Art Computing Art History

Explor Update

example6.pngI have updated my Explor compiler.

The new version uses a shared library for the Explor functions, and cats the source file to the Fortran compiler along with a file that contains the “END” command rather than creating an intermediate file.

You’ll need git, autotools, libtool and g77 installed. On Fedora the command to install them is something like:

su -c "yum install git gcc-gfortran libtool autoconf automake"

Fetch the source code:

git clone http://OFFLINEZIP.wpsho/git/explor.git

Set up the build environment:

cd explor



And install:

su -c "make install"

You can then make a test image:

cd examples
explor example6.explor

This will print messages something like this:

Compiling example6.explor to example6
Running example6 with output to

And if you open the resulting .ps file, you’ll see the image at the top of this post.

Aesthetics Art History Art Open Data

Art Data Analysis: Dissecting The Canon

venus-long-tail.pngDissecting the Canon: Visual Subject Co-Popularity Networks in Art Research

This is a well-defined statistical study of the art historical literature of a particular period. It counts the number of times that ancient artworks are mentioned in Renaissance art literature. By measuring the popularity and co-popularity of artworks it uncovers several interesting facts.

Firstly, canons are identical with the most popular items over a distribution of popularity. Secondly, sub-tails of genres or subjects have broadly the same properties as the main long tail of which they are a sub-tail. And thirdly the co-popularity of otherwise unrelated monuments may be a product of their spatial proximity at the time they were documented in the renaissance.

These facts are interesting in themselves and indicate further possibilities for research. They are also of use to more theoretical or social art historical approaches.

(I originally posted about this here.)

Art Art Computing Art History Art Open Data

Art Data Analysis: Art & Language

art&lang_index1-01.jpgArt & Language are a conceptual art group founded in the late 1960s in England. Much of their early work didn’t look like art. It was essays, mathematical notation, transcripts of conversations, all different kinds of written materials. Faced with the opportunity to exhibit in a gallery setting to an artworld audience, A&L needed a way of realistically presenting their work in a way that a viewer who hadn’t been part of the original conversations might have a chance of being able to navigate the results.

161.jpgA&L’s solution was to assemble copies of all the texts in filing cabinets and produce an index to them. Texts were given “markers” (tags) and indexes of the relationships between each text’s tags were produced in print or on microfilm. Mainframe computer time was used to create the index for Index 04, although reports differ on which computer was used and whether the index was in fact random or not.
162.jpgThis is an obvious forerunner to Google or It is also a use of what would now be regarded as search technology to produce a genuinely artistic solution to a genuine artistic problem.