While the set \( \mathbb{R} \) of real numbers is known to everyone, the way in which computers treat them is perhaps less well known. Computers have both an integer mode and a floating-point mode for representing numbers. The integer mode is used for performing calculations that are known to be integers and has limited usage for numerical analysis. Human being do arithmetic using decimal (base 10) number system (known as Hindu-Arabic numerals). Computers do arithmetic using the binary (base 2) number system. On any input, a computer converts a number to base 2 (or perhaps base 16), then performs base 2 arithmetic, and finally translates the answer into base 10, before it displays a result. Binary (base 2) fractions can be expressed as sums involving positive and negative powers of 2. For example, let R be a real number in the range [0,1]; then there exist a sequence of digits \( d_1 , d_2 , \ldots , d_n , \ldots \) so that

\[ R = \left( d_1 \times 2^{-1} \right) + \left( d_2 \times 2^{-2} \right) + \cdots + \left( d_n \times 2^{-n} \right) + \cdots , \]
where \( d_j \in [0,1] . \) A standard way to represent a real number, called scientific notation, is obtained by shifting the decimal point and supplying an appropriate power of 10. For instance,
\begin{align*} 0.0001234 &= 1.234 \times 10^{-4} , \\ 314.159265 &= 3.14159265 \times 10^2 , \\ 9,988,000 &= 9.988 \times 10^6 . \end{align*}
On one hand, since machines have limited resources, only a subset \( \mathbb{F} \) of finite dimension of \( \mathbb{R} \) can be represented. The numbers in this subset are called floating-point numbers. On the other hand, \( \mathbb{F} \) is characterized by properties that are different from those of \( \mathbb{R}. \) The reason is that any real number x is in principle truncated by the machine, giving rise to a new number (called the floating-point number), denoted by fl(x), which does not necessarily coincide with the original number x. Computers use a normalized floating-point binary representation for real numbers. This means that the mathematical quantity x is not actually stored in the computer. Instead, the computer stores a binary approximation to x:
\[ x \approx \pm q \times 2^n . \]
The number q is the mantissa and it is a finite binary expression satisfying the inequality \( 1/2 \le q < 1 . \) The integer n is called the exponent. If a 32-bit mantissa is used, numbers with nine decimal places can be stored. However, computers that use 32 bits to represent single-precision real numbers use 8 bits fo the exponent and 24 bits for the mantissa (which is translated to about seven decimal places).

Since \( \mathbb{F} \) is a proper subset of \( \mathbb{R} , \) elementary algebraic operations on floating-point numbers do not enjoy all the properties of analogous operations on \( \mathbb{R}. \) Precisely, commutativity still holds for addition (that is fl(x + y) = fl(y + x)) as well as for multiplication (fl(xy) = fl(yx)), but other properties such as associativity and distributivity are violated. Moreover, 0 is no longer unique (number 0 does not belong to \( \mathbb{F} \) ). There exists therefore at least one number b different from 0 such that a+b=a.

To become acquainted with the differences between \( \mathbb{R} \) and \( \mathbb{F}, \) let us make a few experiments which illustrate the way that a computer deals with real numbers. Note that whether we use matlab or Octave rather than another language is just a matter of convenience. The results of our calculation, indeed, depend primarily on the manner in which the computer works, and only to a lesser degree on the programming language.

Sometimes you may wish to view more or less digits than matlab’s default setting of four decimal places. There are many different number viewing options in matlab but for the purposes of this text, we will restrict these options to “short,” “long,” and “bank” unless you are specifically told otherwise. The short format is matlab or Octave default setting. It displays all numbers to four significant figures and to six figures, respectively. The long format displays the maximum number of digits that matlab can store, which is 16. The bank format displays exactly two. You can change the formatting by typing of the following commands:

format short
format long
format bank
Note that this changes only how the numbers are displayed; it does not alter the actual value being used. Be aware that Octave by default gives 6 decimal places.

Let us consider the rational number x = 1/7, whose decimal representation is \( 0.\overline{142857}. \) This is an infinite representation, since the number of decimal digits is infinite. To get its computer representation, let us introduce after the prompt the ratio 2/7 and obtain
which is a number with only four decimal digits, the last being different from the original number.

The same number can take different expressions depending upon the specific format declaration that is made. For instance, for the number 1/7, some possible format output formats are avalibale in matlab:
format short yields 0.1429,
format short e ” 1.4286e − 01,
format short g ” 0.14286,
format long ” 0.142857142857143,
format long e ” 1.428571428571428e− 01,
format long g ” 0.142857142857143.
The same formats are available in Octave, but the yielded results do not necessarily coincide with those of matlab:
format short yields 0.14286,
format short e ” 1.4286e − 01,
format short g ” 0.14286,
format long ” 0.142857142857143,
format long e ” 1.42857142857143− 01,
format long g ” 0.142857142857143.

Obviously, these differences, even if slight, will imply possible different results in the treatment of our examples.

The default arithmetic used in matlab is double precision (with 16 desimal places). The default output to the screen is to have 4 digits to the right of the decimal point. To control formating of output to the screen, use command format. The default formating is obtained using

>> format short

To obtain the full accuracy available in a number, you can use

>> format long

matlab can be used as a calculator. Let us present some simple examples.

>> x=4
x =
       4
>> y=x^2
y =
       16
>> z = factorial(y)
z =
      2.0923e+13
>> w=log(z)*1.e-05
w =
       3.0672e-04
>> format long
>> w
w =
       3.067186010608067e-04
>> format long eng
>> w
w =
       306.718601060807e-006
>> format short
>> w
w =
      3.0672e-04
>> sin(pi)
ans =
        1.2246e-16

Summary of arithmetic operations. ?? To be changed ???

  • Types of operations are expressed as + (addition), - (subtraction), *(multiplication), / (division).
  • Exponents are expressed with carrots ^. Beware that when you input exponents, you must hit the right arrow key afterwards or else you will continue to type into the exponent. The same concept applies to division; when you are typing in the numerator/denominator, in order to move the blinking cursor outside of the fraction, you must hit the arrow keys to move it outside of the fraction (Clicking works as well).
  • Absolute values are expressed with vertical lines ||. They can also be expressed using the abs( ) command.
  • Square roots (radicals) are expressed with the input sqrt( ).
  • matlab accepts pi as π.
  • When using division, be sure to separate the numerators and denominators with parentheses to prevent errors.
  • When you wish to substitute a variable with a value, use the subs command. The generic syntax is as follows:
    The % is a special symbol which takes in the most recently inputted equation. This is a useful symbol to use to avoid assigning equations to variables.
  • If a and b are two variables, then ab or a(b) or (a)b or (a)(b) are not equal to their product a*b; this means that the operator * must be used for multiplication.

     

Once you have assigned a value to a variable, matlab remembers it forever. To remove a value from a variable you can use the ‘clear’
statement - try
>> clear w
>> w
Undefined function or variable 'w'.

If you type ‘clear’ and omit the variable, then everything gets cleared. Don’t do that now – but it is useful when you want to start a fresh calculation.

An order of operations is a standard order of precedence that different operations have in relationship to one another. Powers are executed before multiplication and division, which are executed before addition and subtraction. Parentheses, (), can also be used in matlab to supercede the standard order of operations.

matlab can handle the expression 1/0, which is infinity. Note that matlab will return 0/0 as “not a number” or NaN. You can type Inf at the command prompt to denote infinity or NaN to denote something that is not a number that you wish to be handled as a number.