Sunday, November 22, 2009
CRU Roundup...
The blogosphere is chock full of stuff on the CRU data hack, now being dubbed “ClimateGate”. Some that caught my fancy are here, here, here, and here. The third link is to ClimateAudit's mirror site – Steve McIntyre has been so overwhelmed with traffic the past couple of days that he's had to engage a bigger server!
The lamestream media has fallen on its face with this one, as they have with so many other scandals that required even the tiniest bit of research and imagination. Also, as with some of the past scandals (such as “RatherGate”), they seem to believe they have a vested interest – this time, in a pro-AGW stance. Pajamas-clad amateurs have showed up the “pros” once again...
The lamestream media has fallen on its face with this one, as they have with so many other scandals that required even the tiniest bit of research and imagination. Also, as with some of the past scandals (such as “RatherGate”), they seem to believe they have a vested interest – this time, in a pro-AGW stance. Pajamas-clad amateurs have showed up the “pros” once again...
Still More on the CRU Data Dump...
Here's one more code file, this one calibrate_nhrecon.pro. This is another IDL code file, and it appears to be calibrating tree ring data against direct temperature measurement data:
1. Another arbitrary-looking choice of limits. In this case, they've selected the period 1881 through 1960 to correlate the two data series. The comment above, however, is very suggestive of why the 1960 cutoff was chosen.
2. This piece of code looks like classic cherry-picking and fudging. Here the software writer is choosing arbitrary months and data thresholds when sampling the annual mean of the instrumental data (which will be used to calibrate the tree-ring data. The comment reinforces this notion.
Something that strikes me as I read through all this code: it looks amateurish, not at all like what I'd imagine code written by scientists expecting peer review would produce. I'm no scientist, so it may well just reflect my naiveté about things scientific. But I'm quite disappointed in it...
;Where to start? Here, I guess:
; Calibrates, usually via regression, various NH and quasi-NH records
; against NH or quasi-NH seasonal or annual temperatures.
;
; Specify period over which to compute the regressions (stop in 1960 to avoid
; the decline that affects tree-ring density records)
;
perst=1881. ;note 1
peren=1960.
thalf=10. ; filter to use to give extra hi & lo pass info
;
; Select season of the temperature data against which to calibrate
;
if n_elements(doseas) eq 0 then doseas=0 ; 0=annual, 1=Apr-Sep, 2=Oct-Mar
seasname=['annual','aprsep','octmar']
;
; Select record to calibrate
;
if n_elements(dorec) eq 0 then dorec=0
recname=['esper','jones','mann','3tree','briffa','iwarm','mj2000','crowley03','rutherford04']
print,recname[dorec]
doland=0 ; for Mann et al. only, 0=use NH mean, 1=use land>20N mean
;
if recname[dorec] eq 'mann' then begin
if doland eq 0 then openw,1,'recon_mannNH.out'+seasname[doseas] $
else openw,1,'recon_mannLN20.out'+seasname[doseas]
endif else begin
openw,1,'recon_'+recname[dorec]+'.out'+seasname[doseas]
endelse
;
multi_plot,nrow=2
if !d.name eq 'X' then window,ysize=800
;
; Compute the >20N land instrumental temperature timseries and filter
;
datst=1860
daten=1980
print,'Reading temperatures'
ncid=ncdf_open('/cru/u2/f055/data/obs/grid/surface/crutem2_18512001.mon.nc')
tsmon=crunc_rddata(ncid,tst=[datst,0],tend=[daten,11],grid=gtemp)
ncdf_close,ncid
nmon=12
ntemp=gtemp.nt
nyrtemp=ntemp/nmon
yrtemp=reform(gtemp.year,nmon,nyrtemp)
yrtemp=reform(yrtemp(0,*))
;
; Compute the northern hemisphere >20N land series
;
; First extract >20N rows
kl=where(gtemp.y gt 20.)
ylat=gtemp.y(kl)
tsnorth=tsmon(*,kl,*)
; Compute latitude-weighted mean
nhmon=globalmean(tsnorth,ylat)
; Compute seasonal/annual mean
nhmon=reform(nhmon,nmon,nyrtemp)
case doseas of ;note 2
0: lvy=mkseason(nhmon,0,11,datathresh=6) ; could try 9,8 (Oct-Sep annual)!
1: lvy=mkseason(nhmon,3,8,datathresh=3)
2: lvy=mkseason(nhmon,9,2,datathresh=3)
endcase
;
; Filter it
;
filter_cru,thalf,tsin=lvy,tslow=lvylow,tshigh=lvyhi,/nan
;
; Read in record and filter
;
case recname[dorec] of
'esper': begin
openr,2,'/cru/u2/f055/data/paleo/esper2002/esper.txt'
readf,2,nyr
headst=''
readf,2,headst
rawdat=fltarr(7,nyr)
readf,2,rawdat
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(1,*))
end
'jones': begin
openr,2,'../tree5/phil_nhrecon.dat'
nyr=992
rawdat=fltarr(4,nyr)
readf,2,rawdat,format='(I5,F7.2,I3,F7.2)'
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(3,*))
end
'mann': begin
if doland eq 0 then begin
openr,2,'../tree5/mann_nhrecon1000.dat'
nyr=981
rawdat=fltarr(2,nyr)
readf,2,rawdat ;,format='(I6,F11.7)'
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(1,*))
endif else begin
openr,2,'../tree5/mannarea_all.dat'
nyr=981
rawdat=fltarr(11,nyr)
headdat=' '
readf,2,headdat
readf,2,rawdat ;,format='(I6,F11.7)'
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(10,*))
endelse
end
'3tree': begin
openr,2,'../tree6/tornyamataim.ave'
readf,2,nnn
rawdat=fltarr(2,nnn)
readf,2,rawdat
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(1,*))
end
'briffa': begin
restore,filename='/cru/u2/f055/tree6/bandtempNH_calmultipcr.idlsave'
x=yrmxd
densall=prednh
end
'iwarm': begin ; use warm-season instrumental series as the predictor!
x=yrtemp
densall=mkseason(nhmon,3,8,datathresh=3)
end
'mj2000': begin
openr,2,'/cru/u2/f055/data/paleo/ipccar4/data/mann03_orig.dat'
readf,2,ncol
readf,2,icol
readf,2,nyr
readf,2,nhead
headst=strarr(nhead)
rawdat=fltarr(ncol,nyr)
readf,2,headst
readf,2,rawdat
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(icol,*))
end
'crowley03': begin
openr,2,'/cru/u2/f055/data/paleo/ipccar4/data/crowley03_orig.dat'
readf,2,ncol
readf,2,icol
readf,2,nyr
readf,2,nhead
headst=strarr(nhead)
rawdat=fltarr(ncol,nyr)
readf,2,headst
readf,2,rawdat
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(icol,*))
end
'rutherford04': begin
openr,2,'/cru/u2/f055/data/paleo/ipccar4/data/rutherford04_orig.dat'
readf,2,ncol
readf,2,icol
readf,2,nyr
readf,2,nhead
headst=strarr(nhead)
rawdat=fltarr(ncol,nyr)
readf,2,headst
readf,2,rawdat
close,2
x=reform(rawdat(0,*))
densall=reform(rawdat(icol,*))
end
endcase
;
oopsx=x
oopsy=densall
;
kl=where((x ge datst) and (x le daten))
x=x(kl)
densall=densall(kl)
if total(abs(yrtemp-x)) ne 0 then message,'Incompatible years'
y1=densall
filter_cru,thalf,tsin=y1,tslow=ylow1,tshigh=yhi1,/nan
;
; Now correlate and regress them
;
printf,1,'Correlations and regression coefficients for '+seasname[doseas]
keeplist=where(finite(y1+lvy) and (x ge perst) and (x le peren),nkeep)
x=x(keeplist)
ts1=y1(keeplist)
ts2=lvy(keeplist)
r1=correlate(ts1,ts2)
dum=linfit(ts1,ts2,sigma=err_sigma)
c1=dum
printf,1,'Full timeseries:',r1,c1
plot,x,ts2
oplot,x,ts1*c1[1]+c1[0],thick=2
;
tsa=ts1(0:nkeep-2)
tsb=ts1(1:nkeep-1)
;mxd_ar1=correlate(tsa,tsb)
mxd_ar1=a_correlate(ts1,[-1])
tsa=ts2(0:nkeep-2)
tsb=ts2(1:nkeep-1)
;nht_ar1=correlate(tsa,tsb)
nht_ar1=a_correlate(ts2,[-1])
printf,1,'AR1 for MXD and NHEMI:',mxd_ar1,nht_ar1
;
tspred=c1(0)+c1(1)*ts1
tswant=ts2
plot,tspred,ts2,psym=1,$
xtitle='Scaled Esper et al. anomaly (!Uo!NC)',$
ytitle='Northern Hemisphere temperature anomaly (!Uo!NC)',$
/xstyle,xrange=[-0.6,0.3],$
/ystyle,yrange=[-0.6,0.3]
oplot,!x.crange,[0.,0.],linestyle=1
oplot,[0.,0.],!y.crange,linestyle=1
dum=linfit(tspred,ts2,sigma=se)
oplot,!x.crange,!x.crange*dum(1)+dum(0)
oplot,!x.crange,!x.crange*dum(1)+dum(0)+se(0)
oplot,!x.crange,!x.crange*dum(1)+dum(0)-se(0)
oplot,!x.crange,!x.crange*(dum(1)+se(1))+dum(0)
oplot,!x.crange,!x.crange*(dum(1)-se(1))+dum(0)
oplot,!x.crange,!x.crange*(dum(1)+se(1))+dum(0)+se(0)
oplot,!x.crange,!x.crange*(dum(1)-se(1))+dum(0)-se(0)
oplot,!x.crange,!x.crange*(dum(1)+se(1))+dum(0)-se(0)
oplot,!x.crange,!x.crange*(dum(1)-se(1))+dum(0)+se(0)
;
ts1=yhi1(keeplist)
ts2=lvyhi(keeplist)
r2=correlate(ts1,ts2)
dum=linfit(ts1,ts2)
c2=dum
printf,1,'High-pass :',r2,c2
;
ts1=ylow1(keeplist)
ts2=lvylow(keeplist)
r3=correlate(ts1,ts2)
dum=linfit(ts1,ts2)
c3=dum
printf,1,'Low-pass :',r3,c3
;
; Now compute the rms error between the reconstruction and the original
; timeseries
;
tserr=tswant-tspred
rmserr=sqrt( total( tserr^2 ) / float(n_elements(tserr)) )
printf,1,'RMS error between land>20Napr-sep temperature and Esper et al. reconstruction'
printf,1,rmserr
printf,1,'Uncertainties surrounding regression coefficients'
printf,1,err_sigma
;
printf,1,' '
printf,1,'Computations carried out over the period ',perst,peren
printf,1,' '
printf,1,'To separate low and high frequency components, a gaussian weighted'
printf,1,'filter was used with a half-width (years) of ',thalf
;
close,1
;
end
1. Another arbitrary-looking choice of limits. In this case, they've selected the period 1881 through 1960 to correlate the two data series. The comment above, however, is very suggestive of why the 1960 cutoff was chosen.
2. This piece of code looks like classic cherry-picking and fudging. Here the software writer is choosing arbitrary months and data thresholds when sampling the annual mean of the instrumental data (which will be used to calibrate the tree-ring data. The comment reinforces this notion.
Something that strikes me as I read through all this code: it looks amateurish, not at all like what I'd imagine code written by scientists expecting peer review would produce. I'm no scientist, so it may well just reflect my naiveté about things scientific. But I'm quite disappointed in it...
More on the CRU Data Dump...
The blogosphere is chock full of stuff this morning, mainly analyzing the emails and documents (more on that in another post). So this morning I thought I'd take a gander at some of the software code that was included in the hacker's dump. This is the contents of an IDL file named briffa_sep98_e.pro. This code appears to be used to pre-process raw tree ring data. I've highlighted areas of interest in bold red, and added a comment with a footnote number; otherwise the file's contents are unmodified:
1. This code constructs a pair of 20 element arrays that are used later as inputs to an interpolation routine. As the comments "fudge factor" and "very artificial correction for decline" would seem to indicate, these values appear to be completely arbitrary, and unsupported by any underlying theory about why the raw data should be manipulated this way. I'll tell you what it looks like to this ancient, gray-bearded software who has constructed many software models in his career: this looks like someone manipulating the input to a model to get the desired results. Publicly such a modeler would proclaim: “Lookee here! I crunched the tree-ring data and got this result!” – when the reality is that the modeler “fudged” and “corrected” the data until the results matched his preconceived and desired result. Smoking code gun number one.
2. These two places are where the “very artificial correction” is being applied.
3. This code limits the data in the results to the years 1401 to 1991. It would be interesting to know why those particular years were chosen. I'm especially intrigued by the 1991 cutoff, as I've read elsewhere that the tree ring data from the late 20th century doesn't support the notion that tree ring width is a good proxy for temperature. Could this be the reason for the early cutoff?
;Well. Where to start? I guess I'll just dive right in...
; PLOTS 'ALL' REGION MXD timeseries from age banded and from hugershoff
; standardised datasets.
; Reads Harry's regional timeseries and outputs the 1600-1992 portion
; with missing values set appropriately. Uses mxd, and just the
; "all band" timeseries
;****** APPLIES A VERY ARTIFICIAL CORRECTION FOR DECLINE*********
;
yrloc=[1400,findgen(19)*5.+1904] ;note 1
valadj=[0.,0.,0.,0.,0.,-0.1,-0.25,-0.3,0.,-0.1,0.3,0.8,1.2,1.7,2.5,2.6,2.6,$
2.6,2.6,2.6]*0.75 ; fudge factor
if n_elements(yrloc) ne n_elements(valadj) then message,'Oooops!'
;
loadct,39
def_1color,20,color='red'
plot,[0,1]
multi_plot,nrow=4,layout='large'
if !d.name eq 'X' then begin
window, ysize=800
!p.font=-1
endif else begin
!p.font=0
device,/helvetica,/bold,font_size=18
endelse
;
; Get regional tree lists and rbar
;
restore,filename='reglists.idlsave'
harryfn=['nwcan','wnam','cecan','nweur','sweur','nsib','csib','tib',$
'esib','allsites']
;
rawdat=fltarr(4,2000)
for i = nreg-1 , nreg-1 do begin
fn='mxd.'+harryfn(i)+'.pa.mean.dat'
print,fn
openr,1,fn
readf,1,rawdat
close,1
;
densadj=reform(rawdat(2:3,*))
ml=where(densadj eq -99.999,nmiss)
densadj(ml)=!values.f_nan
;
x=reform(rawdat(0,*))
kl=where((x ge 1400) and (x le 1992)) ;note 3
x=x(kl)
densall=densadj(1,kl) ; all bands
densadj=densadj(0,kl) ; 2-6 bands
;
; Now normalise w.r.t. 1881-1960
;
mknormal,densadj,x,refperiod=[1881,1960],refmean=refmean,refsd=refsd
mknormal,densall,x,refperiod=[1881,1960],refmean=refmean,refsd=refsd
;
; APPLY ARTIFICIAL CORRECTION
;
yearlyadj=interpol(valadj,yrloc,x) ;note 2
densall=densall+yearlyadj
;
; Now plot them
;
filter_cru,20,tsin=densall,tslow=tslow,/nan
cpl_barts,x,densall,title='Age-banded MXD from all sites',$
xrange=[1399.5,1994.5],xtitle='Year',/xstyle,$
zeroline=tslow,yrange=[-7,3]
oplot,x,tslow,thick=3
oplot,!x.crange,[0.,0.],linestyle=1
;
endfor
;
; Restore the Hugershoff NHD1 (see Nature paper 2)
;
xband=x
restore,filename='../tree5/densadj_MEAN.idlsave'
; gets: x,densadj,n,neff
;
; Extract the post 1600 part
;
kl=where(x ge 1400)
x=x(kl)
densadj=densadj(kl)
;
; APPLY ARTIFICIAL CORRECTION
;
yearlyadj=interpol(valadj,yrloc,x) ;note 2
densadj=densadj+yearlyadj
;
; Now plot it too
;
filter_cru,20,tsin=densadj,tslow=tshug,/nan
cpl_barts,x,densadj,title='Hugershoff-standardised MXD from all sites',$
xrange=[1399.5,1994.5],xtitle='Year',/xstyle,$
zeroline=tshug,yrange=[-7,3],bar_color=20
oplot,x,tshug,thick=3,color=20
oplot,!x.crange,[0.,0.],linestyle=1
;
; Now overplot their bidecadal components
;
plot,xband,tslow,$
xrange=[1399.5,1994.5],xtitle='Year',/xstyle,$
yrange=[-6,2],thick=3,title='Low-pass (20-yr) filtered comparison'
oplot,x,tshug,thick=3,color=20
oplot,!x.crange,[0.,0.],linestyle=1
;
; Now overplot their 50-yr components
;
filter_cru,50,tsin=densadj,tslow=tshug,/nan
filter_cru,50,tsin=densall,tslow=tslow,/nan
plot,xband,tslow,$
xrange=[1399.5,1994.5],xtitle='Year',/xstyle,$
yrange=[-6,2],thick=3,title='Low-pass (50-yr) filtered comparison'
oplot,x,tshug,thick=3,color=20
oplot,!x.crange,[0.,0.],linestyle=1
;
; Now compute the full, high and low pass correlations between the two
; series
;
perst=1400.
peren=1992.
;
openw,1,'corr_age2hug.out'
thalf=[10.,30.,50.,100.]
ntry=n_elements(thalf)
printf,1,'Correlations between timeseries'
printf,1,'Age-banded vs. Hugershoff-standardised'
printf,1,' Region Full <10 >10 >30 >50 >100'
;
kla=where((xband ge perst) and (xband le peren))
klh=where((x ge perst) and (x le peren))
ts1=densadj(klh)
ts2=densall(kla)
;
r1=correlate(ts1,ts2)
rall=fltarr(ntry)
for i = 0 , ntry-1 do begin
filter_cru,thalf(i),tsin=ts1,tslow=tslow1,tshigh=tshi1,/nan
filter_cru,thalf(i),tsin=ts2,tslow=tslow2,tshigh=tshi2,/nan
if i eq 0 then r2=correlate(tshi1,tshi2)
rall(i)=correlate(tslow1,tslow2)
endfor
;
printf,1,'ALL SITES',r1,r2,rall,$
format='(A11,2X,6F6.2)'
;
printf,1,' '
printf,1,'Correlations carried out over the period ',perst,peren
;
close,1
;
end
1. This code constructs a pair of 20 element arrays that are used later as inputs to an interpolation routine. As the comments "fudge factor" and "very artificial correction for decline" would seem to indicate, these values appear to be completely arbitrary, and unsupported by any underlying theory about why the raw data should be manipulated this way. I'll tell you what it looks like to this ancient, gray-bearded software who has constructed many software models in his career: this looks like someone manipulating the input to a model to get the desired results. Publicly such a modeler would proclaim: “Lookee here! I crunched the tree-ring data and got this result!” – when the reality is that the modeler “fudged” and “corrected” the data until the results matched his preconceived and desired result. Smoking code gun number one.
2. These two places are where the “very artificial correction” is being applied.
3. This code limits the data in the results to the years 1401 to 1991. It would be interesting to know why those particular years were chosen. I'm especially intrigued by the 1991 cutoff, as I've read elsewhere that the tree ring data from the late 20th century doesn't support the notion that tree ring width is a good proxy for temperature. Could this be the reason for the early cutoff?