Last review date
2010-03-25
Reviewer
Marco Bencivenni
Enrico Fattibene

Astrophysics: analysis of CMB data


Table of Contents

Astrophysics: analysis of CMB data
Running the jobs
The JDL
The shell script

Astrophysics: analysis of CMB data

The Planck satellite is producing a map of the Cosmic Microwave Background (CMB), which will help cosmologists understand the peculiarities of the Big Bang and some characteristics of the subsequent evolution of the Early Universe. The data collected by Planck is unavoidably ridden with noise, both from astrophysical origin and from the electronic circuits of the satellite itself. Cosmologists apply filtering techniques to the computer-generated simulations in order to enhance signals with a given scale and compare them to Planck data. All the simulations and the Planck data undergo a subsequent statistical analysis to highlight possible anomalies or peculiarities in the CMB map.

Running the jobs

11,000 simulations of CMB maps are run, which translate into 11,000 independent jobs sent to the Grid. Each job will simulate maps at four of the nine frequency channels of the Planck satellite detectors. White noise is then added to each map, and the resulting map is filtered. The job also performs the statistical analysis of each filtered map, at various different size scales. The final output of a job is tipically a quite small file, that is saved appropriately in a Storage Element (SE). Each jobs takes about 7 CPU hours, and runs in a single CPU.

The JDL

A sample Job Description Language (JDL) file would look like the following:

Executable          = "sim00001.sh";
StdOutput           = "sim.out";
StdError            = "sim.err";
InputSandbox        = {"sim00001.sh"};
OutputSandbox       = {"sim.out","sim.err"};
VirtualOrganisation = "planck";

The structure and information is of the simplest kind: it requests that a shell script (that must be provided) be run in a Worker Node (WN). The shell script provided will take care of all the operations required to run the job.

The shell script

As mentioned above, a bare shell script controls the execution of all the steps in the job. An example of such a script is dissected as follows:

lcg-cp -v --vo planck lfn:/path-to-data/simulation-data.tar
file:///tmp/simulation-data.tar
tar -xf /tmp/simulation-data.tar

Archive with data and executables required for the execution is copied over from a SE, and expanded locally.

cat > input.txt << input.txt
rm -rf input.txt

An input file is built, and fed to the executable that generates the simulated maps, at the selected frequencies (70, 100, 143 and 217 GHz). This executable adds the appropriate white noise to the maps too, and uses them to generate a single combined map.

pwd
ls
ls -1 CMBGAUSS_SIMU_COMB_*.fits > names_simus.txt

cat > input_wavelets.txt << input_wavelets.txt

rm -rf input_wavelets.txt

This part generates an input file which is fed to a second executable. The executable proceeds to perform a filtering on the combined map, with a Spherican Mexican Hat Wavelet (SMHW) at various size scales. The results of the filterings are finally used to generate an statistical analysis.

tar cvf CMBGAUSS_SIMU_COMB_00001.tar CMBGAUSS_SIMU_COMB_*.fits
lcg-cr --vo planck -d selected-SE -l
lfn:/path-to-save/CMBGAUSS_SIMU_COMB_00001.tar
file://$PWD/CMBGAUSS_SIMU_COMB_00001.tar

lcg-cr --vo planck -d selected-SE -l
lfn:/path-to-save/stats_fullsky_00001.dat
file://$PWD/stats_fullsky_00001.dat

Finally, two output files are uploaded to the SE. The first one would consist on the combined simulated map generated by the first executable, amounting to around a 55 MB file size. The second one would contain the results of the statistical analysis produced by the second executable, with a file size of about 30 kB.