Projects‎ > ‎Solar activity‎ > ‎

2. Processing the data

We select one file - arbitrarily - to analyze a little further, prior to batch processing all files. We convert its header to a list for convenience.

sdo = readFITS("AIA20100513_0000_0211.fits")
Hdr2List = function(h) { 
    myL = list() 
    i = 1 
    while (i < length(h)) { 
        myL[[h[i]]] = h[i+1] 
        i = i + 2 
hdrlst = Hdr2List(sdo$hdr) 

Information on the keywords in the header (hdrlst) can be found here. Next we get the actual image into a matrix, and obtain information on the coordinate axes. Then we plot the image.

x = sdo$imDat
ax1 = axVec(1,sdo$axDat)
ax2 = axVec(2,sdo$axDat)

This results in the following image.

The distribution of the pixel values is such that there are only few at the higher range. As is common in solar imagery, log-transforming may improve matters.

xlog = x 
xlog[x < 1] = 1 # avoid taking log of zero or negatives
xlog = log(xlog)

Next we construct a mask to remove all pixels not on the solar disk. We retrieve the sun's radius in image coordinates from data in the header of the file.

sunR = as.numeric(hdrlst["IMSCL_MP"]) * as.numeric(hdrlst["R_SUN"])   # the sun's radius
M = outer(ax1^2,ax2^2,FUN="+") < sunR^2   # mask: TRUE if pixel in disk, FALSE otherwise
xlogM = xlog * M 

We do a simple analysis of the pixels representing the solar disk. We use the mask M to obtain the relevant data points. The functions summary() and quantile() give some distributional characteristics of the data.

xsun = x[M]
quantile(xsun, probs = seq(0, 1, 0.1))

The above can be done in batch, on all available data files. The results are accumulated in a data frame, which is written to disk. This data frame with results is analyzed in the next section (see link below). The following code implements the batch processing.

allFiles = dir("D:\\data\\SDOAIA\\") # replace with relevant directory!
A = data.frame(file = allFiles)
A$date = substr(A$file,4,11)

processFile = function(fileName) {
    sdo = readFITS(paste( "D:\\data\\SDOAIA\\" ,fileName,sep=""))
    hdrlst = Hdr2List(sdo$hdr) 
    myR = as.numeric(hdrlst["IMSCL_MP"]) * as.numeric(hdrlst["R_SUN"])
    ax1 = axVec(1,sdo$axDat)
    ax2 = axVec(2,sdo$axDat)
    sunMask = outer(ax1^2,ax2^2,FUN="+") < myR^2
    xsun = sdo$imDat[sunMask]
    res =, probs = seq(0, 1, 0.1))))
    res$sum = sum(xsun)
    res$pixcount = length(xsun)
    res$mean = mean(xsun)

b = processFile(A$file[1])
A = cbind(A,b) # now all columns exist
resnames = names(b)

for (f in A$file) {
    A[A$file == f,resnames] = processFile(f)
    print(A[A$file == f,])
names(A)[3:13] = paste("q",seq(0,100,10),sep="") # other names, for convenience later on
A$datechar = paste(substr(A$date,1,4),substr(A$date,5,6),substr(A$date,7,8),sep="-")
A$date = as.Date(A$datechar,format="%Y-%m-%d") # make date of date class, again for convenience later

The result is a file containing the data frame A, ready for further analysis.

Note: we encountered some corrupt files. The above routine failed on some files, and others resulted in odd statistics. Some manual investigation and comparisons with the input files learned that those files which had negative values in their 'xsun' pixels (see code above) indeed had corrupt images (noise, stripes, blank areas etc). Of the 760 files considered, 6 had issues. We removed them manually from the data frame. We have not investigated whether the files on the server are corrupt, or whether they got corrupted in the process of downloading. Code related to this issue is not shown.