Programming in C
C has been the de facto standard for system programming for more than 2 decades. While some of its features (like dynamic memory allocation, structures etc.) have made it increasingly attractive for the implementation of scientific codes, the lack of built-in complex arithmetic (unlike Fortran), the absence of an established standard complex arithmetic library, the fact that in most cases the simplicity of Fortran makes the job of optimizing compilers easier (and hence a Fortran code is more likely than not to be faster than its C equivalent) and the huge volume of Fortran-based scientific library subroutines make a lot of scientists & engineers program in mixed mode: partly C (mostly the basic structure of the code, including memory allocation) and partly Fortran (for the compute intensive parts).
- Compiling & Linking
To compile a C program in file c_program.c (note C programs on most systems need to be in files with a .c extension):
% cc c_program.c -o executable_name
If you omit -o executable_name the executable binary will be called a.out by default.
If your program is split over more than one file, you can either compile them all on the same line:
% cc c_program1.c c_program2.c -o executable_name
or compile them separately and then link the resulting object files (".o" files) together:
% cc -c c_program1.c
% cc -c c_program2.c
% cc c_program1.o c_program2.o -o executable_name
If some of the include files that you require (by # include <include_file.h> statements in your C source code) are not in the standard include file search paths the C preprocessor cpp (called by the C compiler automatically) searches in, you can specify them by the -Iinclude_search_path flag:
% cc -I/usr/local/mpich/include -c mpi.c
Similarly, if some of the librariy functions that your program uses are not to be found among the standard libraries the linker looks in for a C program, you have to specify them yourself, sometimes including the path to the library if it not in the standard directory the linker looks in:
% cc mathc_program1.o mathc_program2.o -o executable_name -lm
% cc mpi.o -o mpi-test -L/usr/local/mpich/lib/IRIX/ch_shmem -lmpi -lm
If you are using a library whose filename is libblas.a, you specify it as -lblas.
cc is the standard name for a C compiler but not the only option:
- On all systems:
- The GNU C compiler, a portable C compiler. May not work on the SGIs from time to time. On the IBM's it is called rs6000-ibm-aix3.2.5-gcc.
- On the Suns running SunOS 4.1.* (Solaris 1.*) we have:
- Sun's K&R C compiler - usually slower than both acc and gcc.
- Sun's ANSI C compiler. Check the man pages for the flags that give the various language standards, the default being ANSI C plus Sun C compatibility extensions, without semantic changes required by ANSI C. Where Sun C and ANSI C specify different semantics for the same construct, the compiler will issue warnings about the conflict and use the Sun C interpretation. Usually produces the fastest binaries.
- On the Suns running Solaris 2.*, 7, 8 (SunOS 5.*, 5.7, 5.8) we have:
- cc, c89, acc
- Sun's ANSI compliant C compiler. Check the man pages for the flags that give the various language standards.
- On the SGIs we have:
- The MIPS (IRIX 5.*) and MIPSpro (IRIX 6.*) optimizing compiler. Check the man pages for the flags that give the various language standards, the default being extended ANSI/ISO.
- On the IBMs we have:
- cc, xlc & c89
- IBM's veryoptimizing C compiler, with language standards conforming to extended C (cc) or ANSI C (xlc, c89).
- Optimization flags
Please make sure that once you've developed and debugged a code you compile it optimized for any production runs you make. Running unoptimized production code is a waste of your time as well as well as CFM computing resources. One only needs to take care when using optimization flags as high optimization levels can alter the semantics of your code and produce significantly different and hence erroneous results. You should check the respective man pages for each compiler to see which optimization flags pose such a threat. In that case it is necessary to test your optimized code by comparing the results of one or more of its runs with those of the code compiled with no optimization. If the difference in the results is small (machine or algorithmic accuracy) then go ahead and use the optimized code. If the difference is large enough for the results to be wrong, choose a lower optimization level and try again. Despite the extra trouble you may have to go through, please try and compile your code optimized, you may be very surprised by how much the time it takes to run (especially if it is well written) decreases! And of course always remember that usually the best optimization is a better algorithm.
It is suggested that you look up the man pages for the compiler you plan to use for the best results. (Note that sometimes even supposedly safe optimization options could cause problems due to bugs in the optimizer.) Suggestions for optimization flags for the C compilers on our systems are as follows:
- All systems
- gcc 2.7.2
- safe: -O3
- high: -O3 -ffast-math -funroll-loops
- On the Suns with SunOS 4.1.* (Solaris 1.*):
- safe: -O2
- high: -O4 -dalign
- safe: -fast
- high: -fast -native -O3 -bsdmalloc (-O4 will also inline routines)
- On the Suns with Solaris 2.*, 7, 8 (SunOS 5.*, 5.7, 5.8):
- safe: -xO3 -xdepend -xchip=generic -xarch=generic
- high: -fast -xO4 -xdepend -xtarget=native
- For some reason on UltraSparc based machines at least this flag combination chooses the wrong libraries currently. Please use:
-fast -xO4 -xdepend -xtarget=native -xarch=v8plusa -xchip=ultra
- You can also experiment with use of -xO5 instead of -xO4.
- Adding -xsafe=mem can also help sometimes.
- for autoparallelizing add: -xautopar -xloopinfo -xreduction
- Link with: -lfast
- On the SGIs with IRIX 5.3:
- safe: -O2
- high: -O2 -sopt,-r=3,-so=4,-o=5,-lo=s
- can use -O3 for even higher optimization but then -c has to be replaced by -j
- On the SGIs with IRIX 6.*:
- safe: -O2 -n32
- high: -O3 -n32 -OPT:roundoff=3:IEEE_arithmetic=3
- higher & unsafer: -O3 -n32 -OPT:roundoff=3:IEEE_arithmetic=3:alias=restrict
- very high with interprocedural optimization: -Ofast=IP??
To find the platform number IP?? execute uname -m. This option may break the correctness of your code though.
- For an executable tuned for a specific architecture add:
To find out the processor of your machine execute hinv.
- -r5000 for an R5000 based machine.
- -r8000 for an R8000 based machine.
- -r10000 for an R10000 based machine.
- Certainly look at the man page for the compilers as there is a multitude of beneficial options one can try.
- For the time being -v6 may need to be added to the flags above as the MIPSpro7.1 compilers may produce really slow code in very few cases. That will use the MIPSpro6.2 compilers.
- If the processor is an R4*00 -mips3 should be added to the lines above, otherwise -mips4 should be added. hinv -t cpu should give this information.
- If your machine is still running IRIX6.0.1 (but not for long) or you require 64 bit addressing for very large datasets (greater than 2 GB), then replace -n32 with -64.
- Sometimes -O3 will produce wrong code - try using -O2 or even -O1 instead, leaving all other flags as they are.
- On the IBMs:
- safe: on SP2 wide/thin nodes, like cws.cfm.brown.edu: -O2 -qarch=pwr2 -qtune=pwr2 -Q
- safe: on SP2 silver nodes, like control.cfm.brown.edu: -O2 -qarch=ppc -qtune=604 -Q
- safe: on Power3 nodes, like control.cfm.brown.edu: -O2 -qarch=pwr3 -qtune=pwr3 -Q
- high: -O3 -qarch=pwr2 -qtune=pwr2 -Q -qinlglue -qproto
- high but safer: -O3 -qarch=pwr2 -qtune=pwr2 -Q -qstrict -qinlglue -qproto
- kxlc (KAP preprocessor driver for kapc followed by xlc). KAP is usually beneficial to the performance of your code.
- high: -O3 -qarch=pwr2 -qtune=pwr2 -Q -qinlglue -qproto +K5 +Ktmpdir=/tmp +Kargs=-r=3:-chs=128
- Rarely using -O2 instead of -O3 is actually a better option.
- For interprocedural optimizations add -qipa. Look at the man page for the compiler for more details.
- For cws.cfm.brown.edu replace -qarch=pwr2 -qtune=pwrx with -qarch=pwr -qtune=pwr and don't bother with KAP.
- IBM's Optimization and Tuning Guide for Fortran, C, and C++ can be seen online by typing: info -l xlf
- Keep in mind that single precision calculations on the Power and Power2 processors on our IBMs are significantly slower than double precision ones - so use single precision only if you have to and use the relevant flags (see the man page) to speed code up.
- If you are unsure about your architecture, you can also try -qarch=auto -qtune=auto
- For more C related information