Analyzing National Geographic Covers

Purely by chance and random surfing, I ended up here, staring at old covers from National Geographic. I was wondering if it is possible to analyze the evolution of the covers in some mildly scientific way. If you accept the fact that pictures are nothing else than three dimensional matrices , you can do quite a lot of things with them.  The figure below for example is created by averaging all RGB values for each pixel of the 1263 available covers.



Let's start from the beginning. I scraped all available National Geographic covers up to December 2000 from here with some simple wget calls. Afterwards I used a batch script in gimp to rescale all covers to 630x420 and exported them as png files.

The R code below shows how the above figure was created.

library(png)
#get the file names of cover files 
lf=list.files(pattern=".png")
#sort them in increasing order (initial sorting is lexicographic)
indx=order(as.integer(sapply(lf,function(x){strsplit(x,"-",fixed=T)[[1]][1]})))
lf=lf[indx]
 
A=array(0,dim = c(630,420,3))
for(f in lf){
  tmp=readPNG(f)
  A=A+tmp
}
A=A/length(lf)
writePNG(A,"average.png")

The readPNG function reads the png files as three dimensional arrays. Each dimension is a 630x420 matrix that holds the RGB values for the pixels. So array A holds the average red, green and blue value for each pixel of all covers.

Average RGB Color of Covers


Another nifty thing you can do is calculate the average red, green and blue value for each cover.

k=0
rgbs=matrix(0,1263,3)
for(f in lf){
  tmp=readPNG(f)
  k=k+1
  rgbs[k,]=apply(tmp,3,mean)
}
df=data.frame(red=rgbs[,1],green=rgbs[,2],blue=rgbs[,3])
df.melt=melt(df)
df.melt$indx=rep(1:1263,3)


The variable rgbs now holds the average RGB values for each cover. Plotting the values reveals an interesting pattern of how the colors of the cover changed over time.




The red, green and blue line show the average values stored in rgbs and the rectangular
inset the corresponding rgb color for the three respective values.

Although the first illustrated cover appeared in July 1942, regularly illustrated covers didn't appear until 1959, which is clearly visible in the above figure. Before 1959 it had a fairly standard layout from 1914 on and there is not much variety in colors. Before 1914, the cover was mostly redish. After 1959, each cover featured a unique illustration explaining the strong oscillation.

An Animated Moving Average of Covers


As a last step, we can visualize the evolution of the cover in a small animation, by creating a moving average cover over the whole set.

This is the simple R Code i used to generate the frames
 
#read the png files and save the matrices in a list 
lop=list()
library(png)
for(f in lf){
  tmp=readPNG(f)
  k=k+1
  lop[[k]]=tmp 
}
#create a moving average by combining 100 consecutive covers 
for(i in 1:1164){
  A=Reduce("+", lop[i:(i+99)]) / 100
  writePNG(A,paste0(i,".png"))
}

A word of warning: Better do not  save all matrices at once as done here. The list lop takes up quite a lot of space (~7.5GB).

The video is afterwards simply rendered via shell command
 ffmpeg -i %d.png macover.webm

And this is the final product. Enjoy!


Labels: