Back to Vasari’s Lives.
We can compare Vasari’s description of Giovanni Cimabue to Wikipedia’s article on the artist.
The results show a surprising degree of similarity:
## install.packages("RCurl") library(RCurl) ## Strip wiki code deWikify<-function(text){ ## Remove {{stuff}} text<-gsub("\\{\\{[^}]+\\}\\}", "", text) ## Remove [[stuff]] text<-gsub("\\[\\[[^]]+\\]\\]", "", text) ## Remove [stuff] text<-gsub("\\[[^]]+\\]", "", text) ## Removetext<-gsub("<[^>]+>", "", text) ## Remove punctuation #text<-gsub("[[:punct:]]", "", text) ## Lowercase words text<-tolower(text) text } ## Get the text of a page from Wikipedia getWikipediaArticle<-function(subject){ page<-getURL(paste("http://en.wikipedia.org/w/index.php?title=", curlEscape(subject), "&action=raw", sep=""), .opts=list(useragent="Mozilla/5.0 (X11; U; Linux i686; fr; rv:1.9.1.3) Gecko/20090913 Firefox/3.5.3")) deWikify(page) } cimabuePage<-getWikipediaArticle("Cimabue") cimabue.corpus<-Corpus(VectorSource(c(artists[1], cimabuePage)), readerControl=list(language="english", reader=readPlain)) cimabueDtm<-DocumentTermMatrix(cimabue.corpus) dissimilarity(cimabueDtm, method="cosine")
They seem reassuringly similar (similarity is 1.0 – dissimilarity):
1 2 0.1079431