Programming in C

C has been the de facto standard for system programming for more than 2 decades. While some of its features (like dynamic memory allocation, structures etc.) have made it increasingly attractive for the implementation of scientific codes, the lack of built-in complex arithmetic (unlike Fortran), the absence of an established standard complex arithmetic library, the fact that in most cases the simplicity of Fortran makes the job of optimizing compilers easier (and hence a Fortran code is more likely than not to be faster than its C equivalent) and the huge volume of Fortran-based scientific library subroutines make a lot of scientists & engineers program in mixed mode: partly C (mostly the basic structure of the code, including memory allocation) and partly Fortran (for the compute intensive parts).

Compiling & Linking

c_program.c

	% cc c_program.c -o executable_name

-o executable_name

	% cc c_program1.c c_program2.c -o executable_name

	% cc -c c_program1.c
	% cc -c c_program2.c
	% cc c_program1.o c_program2.o -o executable_name

# include <include_file.h>

-Iinclude_search_path

	% cc -I/usr/local/mpich/include -c mpi.c

	% cc mathc_program1.o mathc_program2.o -o executable_name -lm
	% cc mpi.o -o mpi-test -L/usr/local/mpich/lib/IRIX/ch_shmem -lmpi -lm

libblas.a

-lblas

Compilers

On all systems:

gcc
The GNU C compiler, a portable C compiler. May not work on the SGIs from time to time. On the IBM's it is called rs6000-ibm-aix3.2.5-gcc.
On the Suns running SunOS 4.1.* (Solaris 1.*) we have:

cc
Sun's K&R C compiler - usually slower than both acc and gcc.
acc
Sun's ANSI C compiler. Check the man pages for the flags that give the various language standards, the default being ANSI C plus Sun C compatibility extensions, without semantic changes required by ANSI C. Where Sun C and ANSI C specify different semantics for the same construct, the compiler will issue warnings about the conflict and use the Sun C interpretation. Usually produces the fastest binaries.
On the Suns running Solaris 2.*, 7, 8 (SunOS 5.*, 5.7, 5.8) we have:

cc, c89, acc
Sun's ANSI compliant C compiler. Check the man pages for the flags that give the various language standards.
On the SGIs we have:

cc
The MIPS (IRIX 5.*) and MIPSpro (IRIX 6.*) optimizing compiler. Check the man pages for the flags that give the various language standards, the default being extended ANSI/ISO.
On the IBMs we have:

cc, xlc & c89
IBM's veryoptimizing C compiler, with language standards conforming to extended C (cc) or ANSI C (xlc, c89).

Optimization flags

Please make sure that once you've developed and debugged a code you compile it optimized for any production runs you make. Running unoptimized production code is a waste of your time as well as well as CFM computing resources.

high

Despite the extra trouble you may have to go through, please try and compile your code optimized, you may be very surprised by how much the time it takes to run (especially if it is well written) decreases!

supposedly safe

All systems
- gcc 2.7.2
  - safe: -O3
  - high: -O3 -ffast-math -funroll-loops
On the Suns with SunOS 4.1.* (Solaris 1.*):
- cc
  - safe: -O2
  - high: -O4 -dalign
- acc
  - safe: -fast
  - high: -fast -native -O3 -bsdmalloc (-O4 will also inline routines)
On the Suns with Solaris 2.*, 7, 8 (SunOS 5.*, 5.7, 5.8):
- cc
  - safe: -xO3 -xdepend -xchip=generic -xarch=generic
  - high: -fast -xO4 -xdepend -xtarget=native
    - For some reason on UltraSparc based machines at least this flag combination chooses the wrong libraries currently. Please use:
      -fast -xO4 -xdepend -xtarget=native -xarch=v8plusa -xchip=ultra
      instead.
    - You can also experiment with use of -xO5 instead of -xO4.
    - Adding -xsafe=mem can also help sometimes.
  - for autoparallelizing add: -xautopar -xloopinfo -xreduction
  - Link with: -lfast
On the SGIs with IRIX 5.3:
- cc
  - safe: -O2
  - high: -O2 -sopt,-r=3,-so=4,-o=5,-lo=s
  - can use -O3 for even higher optimization but then -c has to be replaced by -j
On the SGIs with IRIX 6.*:
- cc
  - safe: -O2 -n32
  - high: -O3 -n32 -OPT:roundoff=3:IEEE_arithmetic=3
  - higher & unsafer: -O3 -n32 -OPT:roundoff=3:IEEE_arithmetic=3:alias=restrict
  - very high with interprocedural optimization: -Ofast=IP??
    To find the platform number IP?? execute uname -m. This option may break the correctness of your code though.
  - For an executable tuned for a specific architecture add:
    - -r5000 for an R5000 based machine.
    - -r8000 for an R8000 based machine.
    - -r10000 for an R10000 based machine.
    To find out the processor of your machine execute hinv.
  - Certainly look at the man page for the compilers as there is a multitude of beneficial options one can try.
  - For the time being -v6 may need to be added to the flags above as the MIPSpro7.1 compilers may produce really slow code in very few cases. That will use the MIPSpro6.2 compilers.
  - If the processor is an R4*00 -mips3 should be added to the lines above, otherwise -mips4 should be added. hinv -t cpu should give this information.
  - If your machine is still running IRIX6.0.1 (but not for long) or you require 64 bit addressing for very large datasets (greater than 2 GB), then replace -n32 with -64.
  - Sometimes -O3 will produce wrong code - try using -O2 or even -O1 instead, leaving all other flags as they are.

On the IBMs:
- xlc
  - safe: on SP2 wide/thin nodes, like cws.cfm.brown.edu: -O2 -qarch=pwr2 -qtune=pwr2 -Q
  - safe: on SP2 silver nodes, like control.cfm.brown.edu: -O2 -qarch=ppc -qtune=604 -Q
  - safe: on Power3 nodes, like control.cfm.brown.edu: -O2 -qarch=pwr3 -qtune=pwr3 -Q
  - high: -O3 -qarch=pwr2 -qtune=pwr2 -Q -qinlglue -qproto
  - high but safer: -O3 -qarch=pwr2 -qtune=pwr2 -Q -qstrict -qinlglue -qproto
- kxlc (KAP preprocessor driver for kapc followed by xlc). KAP is usually beneficial to the performance of your code.
  - high: -O3 -qarch=pwr2 -qtune=pwr2 -Q -qinlglue -qproto +K5 +Ktmpdir=/tmp +Kargs=-r=3:-chs=128
  - Rarely using -O2 instead of -O3 is actually a better option.
- For interprocedural optimizations add -qipa. Look at the man page for the compiler for more details.
- For cws.cfm.brown.edu replace -qarch=pwr2 -qtune=pwrx with -qarch=pwr -qtune=pwr and don't bother with KAP.
- IBM's Optimization and Tuning Guide for Fortran, C, and C++ can be seen online by typing: info -l xlf
- Keep in mind that single precision calculations on the Power and Power2 processors on our IBMs are significantly slower than double precision ones - so use single precision only if you have to and use the relevant flags (see the man page) to speed code up.
- If you are unsure about your architecture, you can also try -qarch=auto -qtune=auto
For more C related information
- Tutorials & Courses
  - Another Tutorial by Brian W. Kernighan (the C presented in this tutorial is K&R C, not ANSI C)
  - C Programming Course - Course notes from the University of Strathclyde
  - Google's resources on C programming
  - The C programmer's pages
  - Programming in C
  - C documentation
  - The Ten Commandments for C Programmers

Programming in C

Compiling & Linking

Compilers

Optimization flags

For more C related information