Exploring Art Data 12

Back to Vasari’s Lives.

We can compare Vasari’s description of Giovanni Cimabue to Wikipedia’s article on the artist.

The results show a surprising degree of similarity:


## install.packages("RCurl")
library(RCurl)
## Strip wiki code
deWikify<-function(text){
## Remove {{stuff}}
text<-gsub("\\{\\{[^}]+\\}\\}", "", text)
## Remove [[stuff]]
text<-gsub("\\[\\[[^]]+\\]\\]", "", text)
## Remove [stuff]
text<-gsub("\\[[^]]+\\]", "", text)
## Remove 
text<-gsub("<[^>]+>", "", text)
## Remove punctuation
#text<-gsub("[[:punct:]]", "", text)
## Lowercase words
text<-tolower(text)
text
}
## Get the text of a page from Wikipedia
getWikipediaArticle<-function(subject){
page<-getURL(paste("http://en.wikipedia.org/w/index.php?title=",
curlEscape(subject), "&action=raw", sep=""),
.opts=list(useragent="Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.1.3) Gecko/20090913 Firefox/3.5.3"))
deWikify(page)
}
cimabuePage<-getWikipediaArticle("Cimabue")
cimabue.corpus<-Corpus(VectorSource(c(artists[1], cimabuePage)),
readerControl=list(language="english",
reader=readPlain))
cimabueDtm<-DocumentTermMatrix(cimabue.corpus)
dissimilarity(cimabueDtm, method="cosine")

They seem reassuringly similar (similarity is 1.0 – dissimilarity):


1 2 0.1079431
Posted in Art History, Art Open Data