Problems to be Isolated, Described and Discussed

A & L developed slowly and untidily around a consensus that there were historical and objective problems which could be isolated and described, and thus discussed. This is what distinguished and distinguishes A & L from other artists or artistic formations. A & L saw these problems as matters to be articulated by work, rather than as professional aspects of their social lives to be adopted only once they had left the studio. Conversation, discussion, and conceptualisation became their primary practice, as art.

– p22, “A Provisional History of Art & Language”, Charles Harrison & Fred Orton.

Contemporary Art Daily Text Analysis

cad-wordcloudContemporary Art Daily (CAD) is a leading contemporary art blog that publishes documentation for selected shows of contemporary art. It was started in 2008 by then art student Forrest Nash, who describes the site as follows:

Contemporary Art Daily is a website that publishes documentation of at least one contemporary art exhibition every day. We have an international purview, and we work hard to get especially high-quality documentation of the shows we publish.

Since 2008 CAD has published the details of more than 1800 shows including descriptive text, images of works included, and lists of artists involved in each show.

Nash describes the criteria used for selecting that documentation as follows:

Our criteria for Contemporary Art Daily is complicated and not perfectly reducible, but I like to say that we are generally trying to balance two motives that sometimes conflict with each other. On the one hand, we do have a kind of journalistic motive: we hope to in some way represent the breadth of what is happening in contemporary art, even when a particular artist is not of personal interest to us. On the other hand, we have a curatorial motive, to advance art we believe in and think is important. I am usually more concerned about making a mistake and failing to see or include something than I am accidentally letting something through the filter that doesn’t belong.

(from: http://metropolism.com/features/why-contemporary-art-daily/).

As a curated resource, CAD is not a statistically representative population sample of all available contemporary art shows. Like a museum collection, a survey show or a textbook it is a mediated, value-laden view of the artworld. Its popularity demonstrates the appeal of this particular view to contemporary artworld audiences. Analyzing CAD is therefore a way of gaining an insight into one popular view of the contemporary artworld.

The html code of www.contemporaryartdaily.com was downloaded in January 2014 and processed with an R script to extract text and information from each post on the site announcing a show that fits their standard format. This data was then loaded by the R code in this file to generate the report you are now reading. For reasons of practicality and clarity Some analysis has been performed on the entire dataset, some on just the most popular entities (…most frequently occurring values) within it.

The presence or absence of surprises in the data may indicate fidelity or bias in the worldview of either Contemporary Art Daily or of the online contemporary artworld audience in relation to each other. The extent to which this generalizes to the culture or the reality of the wider contemporary artworld is open to question. Comparing CAD to the data of a more general art show resource website would provide evidence for this but is outside the scope of the current study. The reader’s intuition will have to suffice on these matters for now.

You can download an archive of the report here in several formats, the html version is by far the best:

Click here to download

The source code is available here:

https://gitorious.org/robmyers/contemporary-art-daily-analysis/

 

Simple Word Frequency in Contemporary Art Daily Press Releases

A simple word frequency count of press releases on Contemporary Art Daily (note split city names):

art:4511
exhibition:4422
work:4160
works:3906
new:3659
artist:3073
one:2195
gallery:2156
museum:1901
paintings:1898
link:1875
also:1765
space:1671
painting:1666
time:1521
like:1493
york:1462
first:1390
artists:1307
show:1282
objects:1230
solo:1222
two:1193
series:1179
form:1077
made:1058
contemporary:1027
world:985
images:978
present:967
sculptures:907
exhibitions:902
sculpture:892
well:873
way:865
group:849
image:832
life:805
film:794
forms:775
different:760
berlin:723
years:721
within:703
body:700
london:675
based:661
material:660
part:656
history:656
three:655
process:650
many:637
recent:636
often:623
2009:614
wall:610
materials:609
installation:607
practice:604
artistic:601
large:599
photographs:597
modern:595
lives:594
light:584
even:581
visual:579
2010:575
2008:568
black:567
since:566
together:556
object:554
use:552
including:551
would:543
white:540
born:536
american:524
become:521
place:518
pleased:516
used:507
another:502
viewer:498
2011:498
self:497
paris:495
early:493
abstract:492
could:491
point:483
room:483
something:482
around:482
project:482
back:476
language:471
drawings:468
subject:466
human:464
include:464
making:459
make:457
los:457
people:457
angeles:454
elements:453
production:453
various:451
created:451
view:449
color:445
surface:442
much:440
video:437
experience:437
title:436
yet:435
sense:432
still:432
found:431
photography:427
paper:427
presented:425
rather:425
social:418
shows:418
always:418
2012:417
city:417
pictures:412
paint:410
seen:408
using:406
galerie:405
hand:403
text:392
see:392
painted:390
create:390
public:388
working:383
2007:381
things:379
culture:379
historical:377
nature:377
idea:375
arts:375
past:374
kunsthalle:372
cultural:371
design:370
long:368
canvas:367
media:366
may:364
presents:363
installations:362
included:361
second:359
reality:356
pieces:355
scale:354
piece:354
relationship:354
whose:353
exhibited:352
specific:352
conceptual:350
thus:350
kind:350
year:350
drawing:346
produced:345
order:343
architecture:341
physical:341
last:341
context:339
formal:337
abstraction:337
spaces:337
individual:336
collection:334
narrative:334
political:333
sculptural:333
shown:331
figures:329
without:327
approach:326
becomes:323
real:323
meaning:323
almost:321
set:321
germany:321

Exploring Tate Art Open Data 0

Why visualise the Tate’s collection dataset?

The Tate is the UK’s largest art institution. The free and open release of Tate’s collection data shows just how far open data has come in the last decade, and makes a major resource available for study. This resource allows us to follow two lines of investigation.

The first is into the history of art, using the Tate’s collection as a model of art in general, particularly of British art. The Tate’s collection data describes the form, content, attribution and dates of a sample of art from the past several hundred years. This is a history of art, and as long as we place it in its historical context it can be a useful one.

The second is institutional critique, to analyse the Tate’s collection and contrast it with other collections and with other models of the history of art (verbal, data-based or otherwise). Rather than allowing or controlling for the historical context of the data this makes recovering and examining that context the focus.

It’s possible to succeed or fail at each, and neither requires taking the claims of Museums to represent history or of data to represent reality at face value or in a vacuum. Data visualisation and statistical analysis are ways of dealing with datasets that would take a human reader many years to examine. They are forms of rhetoric, but they are also useful tools.

With suitable modesty of aims and suitable reflection on the historical and political contexts which have given rise to our tools and materials, let us begin…

Exploring Tate Art Open Data 2

This is the second in a series of posts examining Tate’s excellent collection dataset. You can read the first part here.The R and R Markdown code for this series is available at https://gitorious.org/robmyers/tate-data-r/ .

As before, let’s get started by loading the data.

source("../r/load_tate_data.r")

Movement Artwork Counts

Next let’s load some code to visualize the number of artworks in the collection categorized as being produced by a particular movement each year.

source("../r/movement_artwork_counts.r")

You can see the code in the Git repository above. It loads the Tate collection data files and declares some functions that we can use to plot movement artwork counts.

We can plot the number of artworks from a given movement, for example the Young British Artists (YBAs):

plotMovementFrequency("Young British Artists (YBA)")

YBAs
Or we can plot the combined counts for multiple movements, for example those since 1800:

plotArtworkCountsByYear()

Movements Since 1800
These figures are available as PDFs in the Git repository.

Movement Durations

When did a movement start and end, and how long did it last? We can plot this for movements as defined by the date of production of the artworks labelled as being part of that movement in the Tate collection.

source("../r/movement_durations.r")

First by movement name:

plotMovementDurations(movement.durations.alpha, movement.order.alpha)

Movements By Name

And then by movement start date:

plotMovementDurations(movement.durations.from, movement.order.from)

Movements By Start Date

These figures are also available as PDFs in the Git repository.

Movement Influences

We can use artists who are in two or more movements as links between movements, constructing a network graph of social connections between movements.
Like the Wikipedia data-based update of Alfred Barr’s handmade diagram for the MoMA Cubism & Abstract Art exhibition of 1936 Collectivizing The Barr Model we can extract a family tree (or Rhizome) of influence between art movements and otherwise use network analysis methods to study the social network of art movements:

plotMovementArtistLinks()

Movements Connected By ArtistsAgain, this figure is also available as PDFs in the Git repository.

Conclusions

As you can see some of these graphics work better as posters or large-scale PDFs than as bitmaps. There’s much that could be done with curve fitting and comparison of movement artwork counts. And all the techniques of social network analysis can be applied to the graph of artists and movements.

Next we’ll look at artwork genres, which are not explicitly labelled in the collection dataset.

Exploring Tate Art Open Data 1

This is the first in a series of posts examining Tate's excellent collection dataset available at http://www.tate.org.uk/about/our-work/digital/collection-data .

I've processed that dataset using code for Mongo DB and Node.js available at https://gitorious.org/robmyers/tate-data/ .

The R and R Markdown code for this series is available at https://gitorious.org/robmyers/tate-data-r/ .

This document has been produced using Knitr. Text in light grey boxes is R code or the output of that code.

Let's get started by loading the data.

source("../r/load_tate_data.r")

That file reads the comma separated value (csv) files containing information about the Tate's collection and generates some useful extra tables of information. Now we have everything in memory we can start examining the collection data.

Artists

What can we find out about artists in general?

summary(artist[c("name", "gender", "dates", "yearOfBirth", "yearOfDeath", "placeOfBirth", 
    "placeOfDeath")])
              name         gender                 dates     
 Bateman, James :   2         : 112   dates not known:  59  
 Doyle, John    :   2   Female: 521   born 1967      :  42  
 Hone, Nathaniel:   2   Male  :2894   born 1936      :  38  
 Peri, Peter    :   2                 born 1930      :  36  
 Stokes, Adrian :   2                 born 1938      :  36  
 Wilson, Richard:   2                 born 1941      :  34  
 (Other)        :3515                 (Other)        :3282  
  yearOfBirth    yearOfDeath                      placeOfBirth 
 Min.   :1497   Min.   :1543                            : 491  
 1st Qu.:1855   1st Qu.:1874   London, United Kingdom   : 446  
 Median :1910   Median :1944   Paris, France            :  57  
 Mean   :1887   Mean   :1920   Edinburgh, United Kingdom:  47  
 3rd Qu.:1941   3rd Qu.:1982   New York, United States  :  43  
 Max.   :2004   Max.   :2013   Glasgow, United Kingdom  :  35  
 NA's   :57     NA's   :1309   (Other)                  :2408  
                    placeOfDeath 
                          :2079  
 London, United Kingdom   : 442  
 Paris, France            :  82  
 New York, United States  :  45  
 Roma, Italia             :  22  
 Edinburgh, United Kingdom:  18  
 (Other)                  : 839  

There are more male than female artists, and the yBA and Pop generations lead the births.

Depending on whether we treat place of birth or place of death as more representative, London and Paris are ahead of New York or Edinburgh.

We can smooth out the birth and death dates by grouping them by decade or century.

summary(artist.birth.decade)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1500    1860    1910    1890    1940    2000      57 
summary(artist.death.decade)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1540    1870    1940    1920    1980    2010    1309 
sort(table(artist.birth.decade), decreasing = TRUE)
artist.birth.decade
1940 1930 1960 1920 1970 1900 1950 1910 1880 1890 1860 1870 1840 1780 1800 
 363  285  256  255  222  217  197  186  153  151  136  123   77   72   69 
1850 1820 1830 1980 1790 1810 1760 1770 1740 1750 1730 1700 1720 1710 1630 
  69   67   65   58   57   49   45   44   42   38   31   27   15   13   12 
1680 1640 1660 1600 1580 1590 1610 1650 1690 1620 1990 2000 1500 1530 1540 
  10    9    8    6    5    4    4    4    4    3    3    3    2    2    2 
1550 1560 1670 1570 
   2    2    2    1 

summary(artist.birth.century)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1500    1900    1900    1890    1900    2000      57 
summary(artist.death.century)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1500    1900    1900    1920    2000    2000    1309 
sort(table(artist.death.decade), decreasing = TRUE)
artist.death.decade
2000 1980 1960 1990 1970 1940 2010 1920 1930 1950 1900 1910 1840 1860 1880 
 224  191  172  157  140  131  112  102   92   89   80   69   59   59   54 
1850 1870 1890 1820 1830 1800 1810 1780 1790 1700 1760 1770 1750 1720 1730 
  53   49   49   46   44   42   40   24   23   15   14   12   10    7    7 
1740 1680 1710 1640 1690 1620 1650 1660 1570 1670 1600 1630 1540 
   7    6    6    5    5    4    4    4    3    3    2    2    1 

That's quite a different result from that suggested by the yearly results. Decade-wise, birth percentiles are clustered around the turn of the 20th century, deaths around the second world war. But the largest number of births are in the 1930s/1940s with the 1960s coming in second. The deaths look like they reflect the distribution of births, although it would be useful to confirm this statistically.

The maximim birth being in the 2000s doesn't mean that the Tate is collecting child artists, the birth data also includes the years that artist groups were started.

How well is gender represented in the collection?

table(artist.birth.decade, artist$gender)

artist.birth.decade     Female Male
               1500   1      0    1
               1530   1      0    1
               1540   0      0    2
               1550   0      0    2
               1560   0      0    2
               1570   0      0    1
               1580   0      0    5
               1590   1      0    3
               1600   3      0    3
               1610   1      0    3
               1620   0      0    3
               1630   0      1   11
               1640   0      0    9
               1650   0      0    4
               1660   0      0    8
               1670   1      0    1
               1680   0      0   10
               1690   0      0    4
               1700   4      1   22
               1710   0      0   13
               1720   0      1   14
               1730   0      0   31
               1740   1      1   40
               1750   0      3   35
               1760   0      1   44
               1770   0      1   43
               1780   1      5   66
               1790   1      0   56
               1800  10      0   59
               1810   0      2   47
               1820   0      1   66
               1830   1      6   58
               1840   0      5   72
               1850   0      2   67
               1860   1     10  125
               1870   0     15  108
               1880   4     23  126
               1890   4     18  129
               1900   8     38  171
               1910   3     37  146
               1920   2     33  220
               1930   4     38  243
               1940  12     62  289
               1950   2     40  155
               1960   6     77  173
               1970   8     70  144
               1980   3     21   34
               1990   2      0    1
               2000   2      0    1

table(artist.birth.century, artist$gender)

artist.birth.century      Female Male
                1500    2      0    5
                1600    5      1   44
                1700    6      4  157
                1800   13     24  576
                1900   39    293 1667
                2000   22    190  422

The first, unlabelled, column is for artists whose gender is not currently recorded in the data.

As we saw in the summary, there are more male artists than female artists in the Tate's collection. There is no decade or century in which this trend is reversed. The story is slightly different when we look at artistic movements.

Movements

The data for artists includes information on


Error in movements$movement.name : 
  $ operator is invalid for atomic vectors

artists movements. If we looked at the artwork data there might be more, but we'll stick with the artists for now.

summary(artist.movements[c("artist.fc", "artist.gender", "movement.era.name", 
    "movement.name")])
                       artist.fc   artist.gender
 Ben Nicholson OM           :  6         :  5   
 Dame Barbara Hepworth      :  5   Female: 27   
 Gilbert Soest              :  5   Male  :324   
 Joseph Beuys               :  5                
 Sir Peter Lely             :  5                
 British School 17th century:  4                
 (Other)                    :326                
              movement.era.name
 16th and 17th century : 47    
 18th century          : 27    
 19th century          : 63    
 20th century 1900-1945: 95    
 20th century post-1945:124    


                                 movement.name
 Performance Art                        : 14  
 Conceptual Art                         : 10  
 Netherlands-trained, working in Britain: 10  
 Constructivism                         :  9  
 Body Art                               :  8  
 British Surrealism                     :  8  
 (Other)                                :297  
summary(artist.movements$movement.era.name)
 16th and 17th century           18th century           19th century 
                    47                     27                     63 
20th century 1900-1945 20th century post-1945 
                    95                    124 
summary(artist.movements$movement.name)
                         Performance Art 
                                      14 
                          Conceptual Art 
                                      10 
 Netherlands-trained, working in Britain 
                                      10 
                          Constructivism 
                                       9 
                                Body Art 
                                       8 
                      British Surrealism 
                                       8 
                          St Ives School 
                                       8 
                         Victorian/Genre 
                                       8 
                    Abstraction-Création 
                                       7 
                         British War Art 
                                       7 
                                   Court 
                                       7 
                       Environmental Art 
                                       7 
                            Later Stuart 
                                       7 
                             Picturesque 
                                       7 
                              Surrealism 
                                       7 
                               Symbolism 
                                       7 
                              Abject art 
                                       6 
                                 Baroque 
                                       6 
                  British Constructivism 
                                       6 
                   British Impressionism 
                                       6 
                               Decadence 
                                       6 
                          Pre-Raphaelite 
                                       6 
                                Unit One 
                                       6 
                            Grand Manner 
                                       5 
                             Kinetic Art 
                                       5 
                                Land Art 
                                       5 
                              Minimalism 
                                       5 
                         Neo-Romanticism 
                                       5 
                                Tachisme 
                                       5 
                               Vorticism 
                                       5 
                      Aesthetic Movement 
                                       4 
                       Camden Town Group 
                                       4 
                      Conversation Piece 
                                       4 
                                  Cubism 
                                       4 
                            Feminist Art 
                                       4 
                        Geometry of Fear 
                                       4 
                       Neo-Expressionism 
                                       4 
                             Restoration 
                                       4 
                         Return to Order 
                                       4 
                          Seven and Five 
                                       4 
                                 Sublime 
                                       4 
                             British Pop 
                                       3 
              Civil War and Commonwealth 
                                       3 
                                    Dada 
                                       3 
                           Fancy Picture 
                                       3 
                           Fin de Siècle 
                                       3 
                           Impressionism 
                                       3 
                            London Group 
                                       3 
                    New English Art Club 
                                       3 
                      Post-Impressionism 
                                       3 
                                   Tudor 
                                       3 
             Young British Artists (YBA) 
                                       3 
                            Art Informel 
                                       2 
                             Art Nouveau 
                                       2 
                    Auto-Destructive art 
                                       2 
                          Direct Carving 
                                       2 
                      Euston Road School 
                                       2 
                          Neo-Classicism 
                                       2 
                          Neo-Plasticism 
                                       2 
                           Newlyn School 
                                       2 
                           New Sculpture 
                                       2 
                             Optical Art 
                                       2 
                                 Pop Art 
                                       2 
              Post Painterly Abstraction 
                                       2 
                                Regional 
                                       2 
                               Situation 
                                       2 
              Situationist International 
                                       2 
                  Abstract Expressionism 
                                       1 
                               Actionism 
                                       1 
                           Arte Nucleare 
                                       1 
                  Artist Placement Group 
                                       1 
       Artists International Association 
                                       1 
                                 Bauhaus 
                                       1 
                                   Cobra 
                                       1 
                        Der Blaue Reiter 
                                       1 
                                De Stijl 
                                       1 
                            Early Stuart 
                                       1 
English-born, working in the Netherlands 
                                       1 
                           Expressionism 
                                       1 
                                 Fauvism 
                                       1 
                                  Fluxus 
                                       1 
      French-trained, working in Britain 
                                       1 
                                Futurism 
                                       1 
                    German Expressionism 
                                       1 
                              Grand Tour 
                                       1 
                       Independent Group 
                                       1 
     Italian-trained, working in Britain 
                                       1 
                                    Merz 
                                       1 
                        Metaphysical Art 
                                       1 
                    Modern Moral Subject 
                                       1 
                          Modern Realism 
                                       1 
                       Neo-Impressionism 
                                       1 
                             Neue Wilden 
                                       1 
                   New British Sculpture 
                                       1 
                          Norwich School 
                                       1 
                        Nouveau Réalisme 
                                       1 
                             Orientalist 
                                       1 
                           Origine group 
                                       1 
                        Post-Reformation 
                                       1 
                                 (Other) 
                                       9 

The artists included in the most movements are some of the grand elders of British 20th Century art. Being in an art movement doesn't improve gender representation.

The most movements are post-1945. Performance art is more popular than Conceptual art, which is interesting given public discussion of state art funding in the UK. “Netherlands-trained, working in Britain” clearly isn't a movement, as with the birth dates the movement name field doesn't always describe a movement per se.

Let's break down gender by movement.

table(artist.movements$movement.era.name, artist.movements$artist.gender)

                             Female Male
  16th and 17th century    5      0   42
  18th century             0      0   27
  19th century             0      0   63
  20th century 1900-1945   0      9   86
  20th century post-1945   0     18  106
movement.gender <- table(artist.movements$movement.name, artist.movements$artist.gender)
movement.gender <- movement.gender[order(movement.gender[, 2], decreasing = TRUE), 
    ]
movement.gender[1:20, ]

                                Female Male
  Performance Art             0      5    9
  Feminist Art                0      4    0
  Abject art                  0      3    3
  Abstraction-Création        0      2    5
  Constructivism              0      2    7
  St Ives School              0      2    6
  Body Art                    0      1    7
  Camden Town Group           0      1    3
  Kinetic Art                 0      1    4
  Minimalism                  0      1    4
  Rayonism                    0      1    0
  Seven and Five              0      1    3
  Surrealism                  0      1    6
  Unit One                    0      1    5
  Young British Artists (YBA) 0      1    2
  Abstract Expressionism      0      0    1
  Actionism                   0      0    1
  Aesthetic Movement          0      0    4
  Arte Nucleare               0      0    1
  Art Informel                0      0    2

Representation improves slightly over time. Unsurprisingly, feminist art has more female than male artists represented. Abject art is a tie, and there are more than half as many female performance artists as male ones.

Artworks

There are


Error in eval(expr, envir, enclos) : object 'artwork.title' not found

artworks in the dataset.

summary(artwork[c("artist", "title", "dateText")])
                            artist                    title      
 Turner, Joseph Mallord William:39389   [title not known]: 3659  
 Jones, George                 : 1046   [blank]          : 3520  
 Moore, Henry, OM, CH          :  623   Blank            : 1995  
 Daniell, William              :  612   [no title]       : 1883  
 Beuys, Joseph                 :  578   Untitled         :  627  
 British (?) School            :  388   Mountains        :  540  
 (Other)                       :26493   (Other)          :56905  
           dateText    
 date not known: 5993  
 1819          : 2908  
 1801          : 1331  
 c.1830–41     : 1194  
 1833          : 1171  
 1831          : 1170  
 (Other)       :55362  
summary(artwork$year)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1540    1820    1830    1870    1950    2010    5397 

JMW Turner has tens of thousands more works in the collection than the next nearest artist. Is this a glitch? No, it's due to the fact that the Tate holds the Turner Bequest of around 30,000 works on paper.

What are artworks titled? Usually Untitled, or simply no title. “Mountains” appears to be the most popular actual title, although if we stemmed or otherwise abstracted and clustered the titles other popular ones might emerge.

The most popular years for artworks are in the early 1800s. This, and possibly the titles, are again attributable to Turner. It would probably be productive to remove Turner's works on paper (or more simply just remove all Turner's works) from the dataset and try again, as his presence is clearly skewing the analysis.

Both artists and artworks have movements. Let's look at how artwork movements differ from artists.

summary(artwork.movements)
   artwork.id    
 Min.   :    22  
 1st Qu.:  6050  
 Median : 11496  
 Mean   : 21962  
 3rd Qu.: 21954  
 Max.   :114918  

                                                  artwork.title 
 [no title]                                              : 674  
 [title not known]                                       : 169  
 Untitled                                                : 116  
 Insertions into Ideological Circuits 2: Banknote Project:  54  
 Walking the Dog                                         :  39  
 Exquisite Corpse                                        :  37  
 (Other)                                                 :5894  
      year                   artwork.medium movement.era.id
 Min.   :1545   Screenprint on paper:1301   Min.   :  8    
 1st Qu.:1920   Oil paint on canvas :1113   1st Qu.:290    
 Median :1963   Lithograph on paper : 527   Median :415    
 Mean   :1936   Etching on paper    : 393   Mean   :327    
 3rd Qu.:1973   Graphite on paper   : 205   3rd Qu.:415    
 Max.   :2009   Bronze              : 113   Max.   :415    
 NA's   :303    (Other)             :3331                  
              movement.era.name  movement.id             movement.name 
 16th and 17th century : 177    Min.   :  293   British Pop     : 846  
 18th century          : 469    1st Qu.:  363   Conceptual Art  : 445  
 19th century          :1004    Median :  433   Pre-Raphaelite  : 405  
 20th century 1900-1945:1156    Mean   : 2421   St Ives School  : 400  
 20th century post-1945:4177    3rd Qu.: 1683   School of London: 373  
                                Max.   :18626   Neo-Classicism  : 310  
                                                (Other)         :4204  
summary(artwork.movements$movement.name)[1:20]
                British Pop              Conceptual Art 
                        846                         445 
             Pre-Raphaelite              St Ives School 
                        405                         400 
           School of London              Neo-Classicism 
                        373                         310 
                    Pop Art Young British Artists (YBA) 
                        246                         226 
          Independent Group              Constructivism 
                        178                         147 
            British War Art                  Minimalism 
                        141                         138 
            Victorian/Genre           Neo-Expressionism 
                        125                         111 
            Neo-Romanticism      Abstract Expressionism 
                        107                         102 
                 Surrealism            Geometry of Fear 
                         96                          84 
            Performance Art          British Surrealism 
                         81                          75 
summary(artwork.movements$movement.era.name)
 16th and 17th century           18th century           19th century 
                   177                    469                   1004 
20th century 1900-1945 20th century post-1945 
                  1156                   4177 

Pop and Pre-Raphaelitism gain in popularity, but Conceptualism and Surrealism are still popular.

Subjects

Each artwork is tagged with descriptions of the subjects that it depicts. Subjects have levels, from general to specific, which I've named the category, subcategory and subject. We can group the subjects of artworks by artists and movements to find out what their characteristic subjects were.

summary(artwork.subjects[c("artwork.title", "artwork.dateText", "category.name", 
    "subcategory.name", "subject.name")])
           artwork.title          artwork.dateText       category.name  
 [title not known]: 13992   date not known: 29732   nature      :76796  
 [no title]       :  8146   1819          : 12948   places      :60314  
 Untitled         :  2148   1833          :  5865   architecture:57507  
 Mountains        :   899   1801          :  5023   people      :52820  
 Shipping         :   462   1831          :  4817   objects     :22990  
 Walking the Dog  :   412   1840          :  4498   society     :20032  
 (Other)          :316957   (Other)       :280133   (Other)     :52557  
                      subcategory.name              subject.name   
 landscape                    : 32722   hill              :  9737  
 adults                       : 22048   wooded            :  8223  
 townscapes, man-made features: 21272   man               :  8164  
 seascapes and coasts         : 12202   figure            :  8118  
 water: inland                : 11839   townscape, distant:  7916  
 countries and continents     : 11704   England           :  7661  
 (Other)                      :231229   (Other)           :293197  
summary(artwork.subjects$category.name)[1:20]
                 abstraction                 architecture 
                       13304                        57507 
emotions, concepts and ideas                      history 
                       11583                         1948 
                   interiors         leisure and pastimes 
                        2467                         3446 
      literature and fiction                       nature 
                        2977                        76796 
                     objects                       people 
                       22990                        52820 
                      places          religion and belief 
                       60314                         4376 
                     society   symbols & personifications 
                       20032                         6242 
        work and occupations                         <NA> 
                        6214                           NA 
                        <NA>                         <NA> 
                          NA                           NA 
                        <NA>                         <NA> 
                          NA                           NA 
summary(artwork.subjects$subcategory.name)[1:16]
                       landscape                           adults 
                           32722                            22048 
   townscapes, man-made features             seascapes and coasts 
                           21272                            12202 
                   water: inland         countries and continents 
                           11839                            11704 
        UK countries and regions cities, towns, villages (non-UK) 
                           10800                            10160 
            non-representational                 transport: water 
                            9583                             9537 
   actions: postures and motions                      UK counties 
                            9055                             8867 
                        features                         military 
                            8694                             7091 
                formal qualities    UK cities, towns and villages 
                            6934                             6695 
summary(artwork.subjects$subject.name)[1:20]
              hill             wooded                man 
              9737               8223               8164 
            figure townscape, distant            England 
              8118               7916               7661 
             river              woman           mountain 
              7549               7303               5932 
            castle             bridge              rocky 
              5298               3769               3759 
             group              coast              Italy 
              3694               3545               3509 
     boat, sailing          townscape             colour 
              3381               3157               2859 
               sea              tower 
              2810               2803 

The summary looks like Turner is skewing the results again. The subjects are mostly English landscape of the early 19th Century. But the categories are led by more non-representional subjects, before the subcategories and subjects return to landscape. People (“adults”, “man”, “woman”) emerge as popular subjects as well, indeed they are the second largest subcategory.

summary(artist.subjects[c("artist.name", "category.name", "subcategory.name", 
    "subject.name")])
                                          artist.name        category.name
 David Lucas                                    :1653   nature      :991  
 Jacques Lipchitz                               : 301   places      :551  
 Colin Lanceley                                 : 181   people      :471  
 Bernard Leach                                  : 104   architecture:345  
 Langlands & Bell (Ben Langlands and Nikki Bell):  78   abstraction :275  
 Linder                                         :  65   objects     :256  
 (Other)                                        :1091   (Other)     :584  
                 subcategory.name    subject.name 
 landscape               : 287    figure   : 157  
 adults                  : 250    England  : 144  
 weather                 : 198    wooded   : 138  
 non-representational    : 186    cloud    :  99  
 UK countries and regions: 151    man      :  84  
 animals: mammals        : 145    geometric:  74  
 (Other)                 :2256    (Other)  :2777  
summary(artist.subjects$category.name)[1:20]
                 abstraction                 architecture 
                         275                          345 
emotions, concepts and ideas                      history 
                         132                           16 
                   interiors         leisure and pastimes 
                          18                           36 
      literature and fiction                       nature 
                          36                          991 
                     objects                       people 
                         256                          471 
                      places          religion and belief 
                         551                           70 
                     society   symbols & personifications 
                         153                           51 
        work and occupations                         <NA> 
                          72                           NA 
                        <NA>                         <NA> 
                          NA                           NA 
                        <NA>                         <NA> 
                          NA                           NA 
summary(artist.subjects$subcategory.name)[1:16]
                    landscape                        adults 
                          287                           250 
                      weather          non-representational 
                          198                           186 
     UK countries and regions              animals: mammals 
                          151                           145 
                  UK counties townscapes, man-made features 
                          129                           118 
UK cities, towns and villages                 water: inland 
                          118                            99 
             formal qualities     from recognisable sources 
                           92                            89 
         seascapes and coasts actions: postures and motions 
                           66                            62 
             transport: water                   residential 
                           57                            49 
summary(artist.subjects$subject.name)[1:20]
       figure       England        wooded         cloud           man 
          157           144           138            99            84 
    geometric         woman       Suffolk          hill        colour 
           74            62            58            52            47 
          cow         river         horse          rain      ceramics 
           40            38            37            35            32 
monochromatic   River Stour         Essex       sunbeam      farmland 
           29            28            27            27            26 

The results from artist subjects don't differ appreciably from the artwork ones. We wouldn't expect any difference, but some artworks have more than one artist or have none, so this introduces variations.

summary(movement.subjects[c("movement.name", "era.name", "artwork.title ", "category.name", 
    "subcategory.name", "subject.name")])
Error: undefined columns selected
summary(movement.subjects$category.name)[1:20]
                 abstraction                 architecture 
                        4977                         2831 
emotions, concepts and ideas                      history 
                        4486                          578 
                   interiors         leisure and pastimes 
                         619                          821 
      literature and fiction                       nature 
                         858                         5634 
                     objects                       people 
                        7516                        11828 
                      places          religion and belief 
                        2635                         1296 
                     society   symbols & personifications 
                        4097                         1843 
        work and occupations                         <NA> 
                        1568                           NA 
                        <NA>                         <NA> 
                          NA                           NA 
                        <NA>                         <NA> 
                          NA                           NA 
summary(movement.subjects$subcategory.name)[1:16]
                          adults             non-representational 
                            4077                             3651 
                formal qualities    actions: postures and motions 
                            2500                             2138 
   clothing and personal effects                     inscriptions 
                            1602                             1402 
       from recognisable sources                             body 
                            1326                             1170 
                       landscape               universal concepts 
                            1161                             1049 
    emotions and human qualities                   social comment 
                             937                              913 
   townscapes, man-made features                      furnishings 
                             898                              868 
reading, writing, printed matter                         features 
                             820                              735 
summary(movement.subjects$subject.name)[1:20]
          woman             man          figure       geometric 
           1854            1649            1197            1191 
         colour    photographic irregular forms     head / face 
           1111             920             563             531 
       standing         England         sitting          female 
            519             503             497             476 
   printed text            text           group        gestural 
            443             428             411             389 
         wooded       landscape        man-made             sea 
            333             305             276             243 

“Insertions into Ideological Circuits 2: Banknote Project” has multiple json records with multiple movements and topics in each, so it's over-represented here. The subjects are still similar, although with more photography.

Conclusions

What can we conclude from this? The collection is dominated by male British pop artists, more from England than from Scotland or the rest of the UK. The subjects of artworks are what one would expect: landscape, human figures, abstracts. The Turner Bequest skews some of the data, and this should be accounted for or addressed in analysis. A few other artworks also skew some results.

Next we'll look more closely at artistic movements with some data visualizations.

Exploring Art Data: My _MON3Y AS AN 3RRROR | MON3Y.US Review

Reviewing almost 70 artworks quickly and in depth is a challenge. With _MON3Y AS AN 3RRROR | MON3Y.US, I chose the approach of describing each artwork’s notable features and then pulling out themes and commonalities at the end. Halfway through I realised that by changing each description into a standard format, I could write code to parse the descriptions and analyse them to help me find those themes and commonalities. So I did. The code is in R and it’s available here:

https://gitorious.org/robmyers/art-review-scripts/

The code loads various modules, parses the file and constructs a corpus and matrix from the words in each review. It then outputs various statistics and graphs regarding them.

First up, which terms do I use most frequently, ten or more times:

 [1] "animated" "bill"     "dollar"   "euro"     "glitched" "image"   
 [7] "mapped"   "show"     "texture"  "video"

The most popular subjects are dollar and Euro bills. Art about them shows something about them. It does so using video, animations (whether video, Flash, or HTML5), images, glitch and texture mapping.

Terms I use five or more times:

 [1] "aesthetic"  "animated"   "art"        "background" "banknotes" 
 [6] "bill"       "collage"    "colour"     "dollar"     "economic"  
[11] "euro"       "flag"       "gif"        "glitched"   "graphic"   
[16] "hundred"    "image"      "loop"       "makes"      "mapped"    
[21] "money"      "note"       "piece"      "rendering"  "show"      
[26] "texture"    "video"      "words"

Flags and words join the subjects, hundred unit notes are the most popular, looped animated GIFs, collages and graphics join the forms and figure/ground relations are there with mention of “background”.

Finally let’s look at words I use three or more times:

 [1] "abstract"    "aesthetic"   "album"       "allow"       "american"   
 [6] "animated"    "apparently"  "application" "art"         "background" 
[11] "banknotes"   "bill"        "black"       "blue"        "changing"   
[16] "classic"     "collage"     "colour"      "composite"   "depicted"   
[21] "direct"      "dollar"      "economic"    "effective"   "euro"       
[26] "facebook"    "flag"        "flickering"  "frame"       "gif"        
[31] "glitched"    "google"      "graphic"     "grid"        "html5"      
[36] "hundred"     "image"       "landscape"   "like"        "link"       
[41] "loop"        "love"        "makes"       "mapped"      "million"    
[46] "money"       "monochrome"  "morphing"    "new"         "note"       
[51] "one"         "page"        "patterns"    "piece"       "pixelart"   
[56] "playing"     "polygons"    "possibly"    "price"       "rendering"  
[61] "screen"      "show"        "signs"       "sites"       "stack"      
[66] "style"       "texture"     "time"        "use"         "video"      
[71] "virtual"     "web"         "white"       "words"       "work"       
[76] "yellow"      "zoomed"

No surprises there, except possibly “love”. The code will confuse “Euro” and “European”, so that’s why the US is mentioned but not Europe. Facebook and Google add corporations to the subjects. Colours are added to the formal properties: yellow, blue, white, black. Landscape joins the subjects. And works play, are direct, are classic, have style, an aesthetic, a price, are new. And I weasel about them with “possibly”.

Next lets look at the associations between words. First some obvious ones.

Money:

google           love          1990s            age        ambient 
  0.65           0.59           0.43           0.43           0.43

Art:

corrupted     miscoloured         nothing          purest            rows 
     0.75            0.75            0.75            0.75            0.75 
   street            look            much           piece         classic 
     0.75            0.52            0.52            0.48            0.41 

Net:

carefully    contract   described        form        sale    specific 
     1.00        1.00        1.00        1.00        1.00        1.00 
  another application       price       piece         art 
     0.70        0.49        0.44        0.43        0.36

The corruption found in association with art here is aesthetic, thanks to glitch art.

The word cloud in the next section has some stand-out words. We can look at their associations as well to follow suggestions from within the data.

Dollar:

bill                         1950s 
0.87                          0.33

Video:

vimeo     amateur      batter       beach     clipart   commodity 
 0.40        0.39        0.39        0.39        0.39        0.39

Bill:

dollar                         1950s 
  0.87                          0.38

Videos are mostly on Vimeo. Dollar and bill occur together so there’s no surprises there.

Word clouds are a good way of quickly visualising word frequency. Here’s one of the words in the reviews:

wordcloud

Using the code from my old posts on Vasari’s Lives and on art bloggers we can find the most similar reviews:

Dominik Podsiadly :  JUST DO IT, Jefta Hoekendijk 

Maximilian Roganov :  Jasper Elings, Jefta Hoekendijk, Keigo Hara, Alfredo Salazar Caro | TMVRTX, Mathieu St-Pierre 

JUST DO IT :  Jefta Hoekendijk, Dominik Podsiadly, Lars Hulst 

Mitch Posada :  Dafna Ganani 

Lorna Mills & Yoshi Sodeoka :  Jennifer Chan 
Jasper Elings :  Maximilian Roganov, Curt Cloninger, Adam Braffman, Δεριζαματζορ Προμπλεμ Ιναυστραλια 

Alfredo Salazar Caro | TMVRTX :  Nick Briz, Maximilian Roganov 

Dafna Ganani :  Mitch Posada 

Jennifer Chan :  Lorna Mills & Yoshi Sodeoka 

Jefta Hoekendijk :  JUST DO IT, Maximilian Roganov, Lars Hulst, Dominik Podsiadly 

Keigo Hara :  Maximilian Roganov, Nick Briz 

Ellectra Radikal :  Lars Hulst 

A Bill Miller :  Mathieu St-Pierre 

Nicolas Sassoon :  Lars Hulst 

Curt Cloninger :  Jasper Elings, Nick Briz 
Δεριζαματζορ Προμπλεμ Ιναυστραλια :  Jasper Elings 
Lars Hulst :  Ellectra Radikal, JUST DO IT, Jefta Hoekendijk, Nicolas Sassoon 

Nick Briz :  Alfredo Salazar Caro | TMVRTX, Keigo Hara, Curt Cloninger 

Adam Braffman :  Jasper Elings 

Rollin Leonard :  Maximilian Roganov 

Mathieu St-Pierre :  A Bill Miller, José Irion Neto, Maximilian Roganov 

José Irion Neto :  Mathieu St-Pierre 

Do those make sense to look at the art?

The clustering code from the same old posts produces different groupings:

Cluster 1 : Robert B. Lisek, Geraldine Juarez 

Cluster 2 : Mitch Posada, Nick Kegeyan, Dafna Ganani, Marco Cadioli, Andrey Keske, Guayayo Coco 

Cluster 3 : Rafaël Rozendaal, Adam Ferriss, Aaron Koblin + Takashi Kawashima, Maximilian Roganov, Fabien Zocco, Jasper Elings, Alfredo Salazar Caro | TMVRTX, Anthony Antonellis, Haydi Roket, Keigo Hara, A Bill Miller, Benjamin Berg, Δεριζαματζορ Προμπλεμ Ιναυστραλια, Nick Briz, Vince Mckelvie, Adam Braffman, Rollin Leonard, Mathieu St-Pierre 

Cluster 4 : Dominik Podsiadly, Thomas Cheneseau 

Cluster 5 : Ciro Múseres 

Cluster 6 : Curt Cloninger 

Cluster 7 : Miron Tee, Jan Robert Leegte, Paul Hertz, Jon Cates, León David Cobo, Kamilia Kard 

Cluster 8 : Nuria Güell, Paolo Cirio, Filipe Matos, Agente Doble | UAFC, JUST DO IT, Gustavo Romano, Tom Galle, Cesar Escudero, Jefta Hoekendijk, Gusti Fink, Ellectra Radikal, Aoto Oouchi, Kim Laughton, Martin Kohout, Marc Stumpel, LaTurbo Avedon, Nicolas Sassoon, Erica Lapadat-Janzen, Milos Rajkovic, Rozita Fogelman, Lars Hulst, Yemima Fink, José Irion Neto 

Cluster 9 : Emilio Vavarella 

Cluster 10 : Dave Greber, Lorna Mills & Yoshi Sodeoka, Jennifer Chan, Frère Reinert, V5MT, Addie Wagenknecht, Systaime, Émilie Brout & Maxime Marion, Georges Jacotey

I chose ten clusters arbitrarily. There’s some overlap looking at the two techniques.

I wanted to try out Topic Modelling on the data but an algorithm for choosing the optimal number of topics simply returned the same number as there are documents. So I tried 8, 12 and 20.

12 gave “nice” results:

     Topic 1    Topic 2       Topic 3    Topic 4      Topic 5        
[1,] "video"    "mapped"      "price"    "bill"       "animated"     
[2,] "bill"     "dollar"      "changing" "dollar"     "architectural"
[3,] "dollar"   "texture"     "image"    "love"       "euro"         
[4,] "direct"   "bill"        "show"     "artist"     "glitched"     
[5,] "facebook" "virtual"     "allow"    "google"     "graphic"      
[6,] "faster"   "polygons"    "also"     "money"      "money"        
[7,] "page"     "constituent" "analysis" "monochrome" "zoomed"       
[8,] "abstract" "exploding"   "another"  "pixelart"   "1990s"        
     Topic 6      Topic 7           Topic 8       Topic 9    Topic 10  
[1,] "graphic"    "labels"          "dollar"      "dollar"   "texture" 
[2,] "abstract"   "landscape"       "glitched"    "euro"     "blank"   
[3,] "aesthetic"  "album"           "bill"        "note"     "blue"    
[4,] "album"      "animated"        "video"       "animated" "classic" 
[5,] "apparently" "appears"         "aesthetic"   "bill"     "economic"
[6,] "banknotes"  "art"             "application" "image"    "essay"   
[7,] "european"   "banknotecollage" "colour"      "loop"     "euro"    
[8,] "flag"       "banknotes"       "economic"    "american" "show"    
     Topic 11     Topic 12  
[1,] "bill"       "art"     
[2,] "dollar"     "bill"    
[3,] "video"      "depicted"
[4,] "background" "dollar"  
[5,] "flag"       "labour"  
[6,] "loop"       "video"   
[7,] "reactive"   "words"   
[8,] "roughly"    "1950s"   

The topics are clearer with more words, these are just the first few for each one. I think this is the closest to what I want in terms of discovering what I have written about, although as I say the choice is arbitrary (or at least aesthetic rather than statistical).

Using more code from the Vasari/bloggers posts, we can plot the associations between words:

plot

Changing the parameters and outputting to PDF creates a more detailed and readable graph. It’s fun and inbetween topic modelling and frequency counts for usefulness.

Finally let’s see how I feel about the art with sentiment analysis:

neutral positive 
     66        3 

I do try to find the positive in artworks but there was one that gave me an immediate and visceral negative reaction in the show (you can spot it if you look hard at the reviews). I’m surprised that there are fewer that count as positive. I “love” one of the pieces. Is it in the positive list?

[1] "Martin Kohout" "Marc Stumpel"  "Ciro Múseres"

It’s not. But one of the ones listed does mention “love”, so I don’t know what’s happened there. Sentiment analysis has improved greatly over the last few years, but apparently not in the library I was using.

If I was going to use these techniques to help review art I’d write longer “bag of word” descriptions for each artwork, with fragments of text and individual words acting almost as tags or streams of consciousness, and I would then use topic modeling and clustering to help pull out themes. I’d prefer to use an algorithm to choose the number of topics, as I feel this is more intellectually defensible, but I like the results enough to use it without. I’m disappointed by the performance of the sentiment analysis library I used, next time I’ll try a different one.

Will there be a next time? Yes, the next time I’m reviewing a group show with more than a few artists. Producing this report has been labour intensive, but I’ve a libary of code now and a better understanding of the issues. And I can automate report construction and revision using Knitr, which would allow me to mix Markdown text and R code without hacing to copy and reformat output.

Work In Progress: Tate Collection Data Movements

Here’s a sneak peek at one of the visualizations from my analysis of the Tate Collection data. It’s a graph of Movements linked by their artists:

tate-movements-sna-preview

More to come soon…

Exploring the Tate Collection Metadata

The Tate have released their collection metadata in an exemplary way here:

https://github.com/tategallery/collection

Let’s explore it using MongoDB, which you can find installation structions for here.

First fetch and upload the JSON data:

git clone https://github.com/tategallery/collection.git
cd collection
find artists -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artists
find artworks -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artworks

Then in the mongo shell we can explore the artists and artworks:

////////////////////////////////////////////////////////////////////////
// Artists
////////////////////////////////////////////////////////////////////////

// List artist movements

db.artists.aggregate(
{$unwind: "$movements"},
{$project: {name: "$movements.name"}},
{$group: {_id: "movements", items: {$addToSet: "$name"}}}
)

// List artist eras

db.artists.aggregate(
{$unwind: "$movements"},
{$project: {name: "$movements.era.name"}},
{$group: {_id: "movements", items: {$addToSet: "$name"}}}
)

// Find artists by movement

db.artists.find({"movements.name":"Pop Art"})

// Find artists by era

db.artists.find({"movements.era.name":"20th century post-1945"})

// Find artists by birth year

db.artists.aggregate(
{$group: {_id: "$birth.time.startYear", artists: {$addToSet: "$fc"}}}
)

// Find artists by death year

db.artists.aggregate(
{$group: {_id: "$death.time.startYear", artists: {$addToSet: "$fc"}}}
)

// count artists by gender

db.artists.aggregate(
    {$group : {_id : "$gender" , number : {$sum : 1}}},
    {$sort : {number : -1}}
)

// Count artists by birthplace

db.artists.aggregate(
    {$group : {_id : "$birth.place.name" , number : {$sum : 1}}},
    {$sort : {number : -1}}
)

////////////////////////////////////////////////////////////////////////
// Artworks
////////////////////////////////////////////////////////////////////////

// List artwork subject categories

db.artworks.aggregate(
{$unwind: "$subjects.children"},
{$unwind: "$subjects.children.children"},
{$group: {_id: "categories",
           categories: {$addToSet: "$subjects.children.children.name"}}}
)

// List artwork subjects

db.artworks.aggregate(
{$unwind: "$subjects.children"},
{$unwind: "$subjects.children.children"},
{$unwind: "$subjects.children.children.children"},
{$group: {_id: "subjects",
           subjects: {$addToSet: "$subjects.children.children.children.name"}}}
)

// List artwork categorys and subjects

db.artworks.aggregate(
{$unwind: "$subjects.children"},
{$unwind: "$subjects.children.children"},
{$unwind: "$subjects.children.children.children"},
{$group: {_id: "category-subjects",
           subjects: {$addToSet: {category: "$subjects.children.children.name",
                       subject:"$subjects.children.children.children.name"}}}}
)

// List artwork movements

db.artworks.aggregate(
{$unwind: "$movements"},
{$group: {_id: "artwork-movements",
           movements: {$addToSet: "$movements.name"}}}
)

// Find artwork by category/subject group

db.artworks.find({"subjects.children.children.name":"UK counties"})

// Find artwork by subcategory/subject

db.artworks.find({"subjects.children.children.children.name":"beacon"})

// Find artwork by artist name

db.artworks.find({"contributors.fc":"Andy Warhol", "contributors.role":"artist"})

// Find artwork by movement. Will exclude works with no movement.

db.artworks.find({"movements.name":"Pre-Raphaelite"})

// Find artworks without movements

db.artworks.find({"movementCount":0})

// Find artwork by date. Will exclude works with unknown date.

db.artworks.find({"dateRange.startYear": {$gte: 1900, $lt: 1910}})

// Find artworks without dates

db.artworks.find({"dateRange":null})

Exploring the data it becomes clear that the structure of the metadata is wonderfully regular but some of the content is less so. For example entries in the “artists” data may be attributions to movements rather than individuals, and both movements and individuals may have null gender. Locations in birth and death data can be a town or country name in any language, or a town and country separated by a comma. Not every artwork has a creation date, movements, or subjects.

But this is standard for real-world data, and easy enough to regularise. The community can do this and submit a pull request. What’s important is that this is a high-quality metadata dataset from a world-class art institution. People are already starting to explore and visualise it. See here for a great example:

http://www.shardcore.org/shardpress/index.php/2013/11/06/tate-data-explorer/

Importing Tate Collection Data Into MongoDB

You have to feed records into Mongo one per line. Like this:

find artists -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artists
find artworks -name *.json -exec perl -p -e 's/\n/ /' '{}' \; -exec echo \; | mongoimport --db tate --collection artworks