--->

# Markov Chains

In a probabilistic description of an experiment or process, the outcome or result is uncertain and may take any value from known set of states. For example, if a coin is tossed successively, the outcome in the n-th toss could be a head or a tail; or if an ordinary die is rolled, the outcome may be 1, 2, 3, 4, 5, or 6. Associated with each ourcome is a number p ($$0 \le p \le 1$$ ), called the probability of the outcome, which is a measure of the likelihood of occurence of the outcome. If we denote the outcome of a roll of a die by X, then we write $$\Pr [X=i] = p_i$$ to mean that the probability that X takes the value i is equal to pi, where $$p_1 + p_2 + \cdots + p_6 = 1 .$$ The symbol X is called a random variable and, in general, X may take a real value.

Often we are given the information that a particular event A has occurred, in which case the probability of occurrence of another event B will be effected. This gives rise to the concept of the conditional probability of the event A given that B has occurred, which is denoted by $$\Pr [A\,|\,B]$$ and defined as

$$\label{EqMarkov.1} \Pr [A\,|\,B] = \Pr [A \ \mbox{ and } B]/\Pr [B] ,$$
provided that $$\Pr [B] \ne 0.$$

Stochastic process is a process in which changing of the system from one state to another is not predetermined but can only be specified in terms of certain probabilities which are obtained from the past behavior or history of the system.

Now we are going to define a first-order Markov process or a Markov Chain as stochastic process where in the transition probability of a state depends exclusively on its immediately preceding state. A more explicit definition can be given as follows.
A discrete-time Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event (also sometimes characterized as "memorylessness"). Roughly speaking, a process satisfies the Markov propertise:
• The outcome of the n-th trial depends only on the outcome of the (n−1)-th trial and not on the outcome of the earlier trials; and
• In two successive trials of the experiment, the probability of going from state Si to state Sj remains the same or does not change.

A Markov process is named after the Russian mathematician Andrey Andreyevich Markov (1856--1922), a student of Pafnuty Chebyshev (Saint Petersburg, Russia). Markov studied Markov processes in the early 20th century, publishing his first paper on the topic in 1906. Markov and his younger brother Vladimir Andreevich Markov (1871--1897) proved Markov brothers' inequality. His son, another Andrei Andreevich Markov (1903--1979), was also a notable mathematician, making contributions to constructive mathematics and recursive function theory. One of his famous Ph.D. students was Jacob/Yakov Davidovich Tamarkin (1888--1945), who upon immigration to the USA, worked since 1927 at Brown University.

Markov chains have many applications as statistical models of real-world processes, such as studying cruise control systems in motor vehicles, queues or lines of customers arriving at an airport, exchange rates of currencies, storage systems such as dams, and population growths of certain animal species. The algorithm known as PageRank, which was originally proposed for the internet search engine Google, is based on a Markov process. Furthermore, Markov processes are the basis for general stochastic simulation methods known as Gibbs sampling and Markov Chain Monte Carlo, are used for simulating random objects with specific probability distributions, and have found extensive application in Bayesian statistics.

To understand a system described by a Markov process or a Markov chain the following concepts are necessary.

4 probability vector is a column vector, p = [p1, p2, … , pm] whose entries are nonnegative and their sum is one, that is, $\sum_{i=1}^m p_i = 1 ,$
Note that in probability theory, it is a custom to consider row-vectors in opposite to engineering where vecors are column-verctors.
Stochastic matrix (also called transition matrix or markov matrix) is a matrix P = [ pi,j ] of order m × m where each of its rows is a probability vector, so $$\sum_{j=1}^m p_{i,j} = 1.$$ This is a right stochastic matrix; however, when sum of of entries in every column is 1, the matrix is called a left stochastic matrix.
Transition probability is the probability of a system moving from one state to another. Transition matrix of a Markov chain is a probability matrix T = [ ti,j ] of order m × m where ti,j is the transition probability of the system going from state Si to state Sj on successive trials of the experiment.
A transition matrix T is said to be regular if all the entries of at least one of its powers Tn are strictly positive. A Markov chain is regular if its transition matrix is regular.

If T is a regular probability matrix, then there exists a unique probability vector t such that ${\bf T}\,{\bf t} = {\bf t} .$ This unique probability vector is also called as a steady state vector.

Example 1: Suppose that m moleculas are distributed between urns (I) and (II), and at each step or trial, a molecule is chosen at random and transferred from its urn to the otehr urn. Let Xn denote the number of molecules in urn (I) just after the n-th step. Then $$X_0 , X_1 , \ldots$$ is a Markov chain with state space $$\left\{ 0, 1, \ldots , m \right\}$$ and transition probabilities are

$\begin{split} P_{i,i-1} &= \frac{i}{m} , \quad\mbox{if } 0 \le i \le m , \\ P_{i,i+1} &= \frac{m-i}{m} , \quad\mbox{if } 0 \le i \le m , \\ P_{i,j} &=0 , \quad\mbox{if } |i-j| >1. \end{split}$

Example 2: In $$\mathbb{R}^n ,$$ the vectors $$e_1 [1,0,0,\ldots , 0] , \quad e_2 =[0,1,0,\ldots , 0], \quad \ldots , e_n =[0,0,\ldots , 0,1]$$ form a basis for n-dimensional real space, and it is called the standard basis. Its dimension is n.

Example 3: In state elections, it has been determined that voters will switch votes according to the following probabilities.
 Socialist Capitalist Independent Socialist 0.7 0.35 0.4 Capitalist 0.2 0.6 0.3 Independent 0.1 0.05 0.3

The transition matrix for the Markov chain is

${\bf T} = \begin{bmatrix} 0.7 & 0.35 & 0.4 \\ 0.2 & 0.6 & 0.3 \\ 0.1 & 0.05 & 0.3 \end{bmatrix}$
Since T is regular matrix, there exists a unique vector t = [x, y, z]T such that T t = t. Since t is a probability vector, x + y + z = 1 and further
$\begin{bmatrix} 0.7 & 0.35 & 0.4 \\ 0.2 & 0.6 & 0.3 \\ 0.1 & 0.05 & 0.3 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} x \\ y \\ z \end{bmatrix} .$
This leads to the system of algebraic equations
$\begin{split} 1 &= x+y+z , \\ x &= 0.7\,x + 0.35\,y + 0.4\,z , \\ y &= 0.2\,x + 0.6\,y + 0.3\,z , \\ z &= 0.1\,x + 0.05\,y + 0.3\,z . \end{split}$
Solving this system with Mathematica, we get
Solve[{1 == x + y + z, x == 0.7*x + 0.35*y + 0.4*z, y == 0.2*x + 0.6*y + 0.3*z, z == 0.1*x + 0.05*y + 0.3*z}, {x, y, z}]
{{x -> 0.546392, y -> 0.350515, z -> 0.103093}}
$x \approx 0.546392, \qquad y \approx 0.350515, \qquad z \approx 0.103093 .$
Hence, the percentages of voters that will vote for socialist is 54.63%, capitalist is 35.06% and an independent is 10.31%. Note that if the given information is close to reality, then the conclusion will also be close to reality.

Example: The set of monomials $$\left\{ 1, x, x^2 , \ldots , x^n \right\}$$ form a basis in the set of all polynomials of degree up to n. It has dimension n+1. ■

Example: The infinite set of monomials $$\left\{ 1, x, x^2 , \ldots , x^n , \ldots \right\}$$ form a basis in the set of all polynomials. ■

Theorem: Let V be a vector space and $$\beta = \left\{ {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n \right\}$$ be a subset of V. Then β is a basis for V if and only if each vector v in V can be uniquely decomposed into a linear combination of vectors in β, that is, can be uniquely expressed in the form

${\bf v} = \alpha_1 {\bf u}_1 + \alpha_2 {\bf u}_2 + \cdots + \alpha_n {\bf u}_n$
for unique scalars $$\alpha_1 , \alpha_2 , \ldots , \alpha_n .$$

If the vectors $$\left\{ {\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n \right\}$$ form a basis for a vector space V, then every vector in V can be uniquely expressed in the form

${\bf v} = \alpha_1 {\bf u}_1 + \alpha_2 {\bf u}_2 + \cdots + \alpha_n {\bf u}_n$
for appropriately chosen scalars $$\alpha_1 , \alpha_2 , \ldots , \alpha_n .$$ Therefore, v determines a unqiue n-tuple of scalars $$\left[ \alpha_1 , \alpha_2 , \ldots , \alpha_n \right]$$ and, conversely, each n-tuple of scalars determines a unique vector $${\bf v} \in V$$ by using the entries of the n-tuple as the coefficients of a linear combination of $${\bf u}_1 , {\bf u}_2 , \ldots , {\bf u}_n .$$ This fact suggests that V is like the n-dimensional vector space $$\mathbb{R}^n ,$$ where n is the number of vectors in the basis for V.

Theorem: Let S be a linearly independent subset of a vector space V, and let v be an element of V that is not in S. Then $$S \cup \{ {\bf v} \}$$ is linearly dependent if and only if v belongs to the span of the set S.

Theorem: If a vector space V is generated by a finite set S, then some subset of S is a basis for V

A vector space is called finite-dimensional if it has a basis consisting of a finite
number of elements. The unique number of elements in each basis for V is called
the dimension of V and is denoted by dim(V). A vector space that is not finite-
dimensional is called infinite-dimensional.

The next example demonstrates how Mathematica can determine the basis or set of linearly independent vectors from the given set. Note that basis is not unique and even changing the order of vectors, a software can provide you another set of linearly independent vectors.

Example: Suppose we are given four linearly dependent vectors:  MatrixRank[m = {{1, 2, 0, -3, 1, 0}, {1, 2, 2, -3, 1, 2}, {1, 2, 1, -3, 1, 1}, {3, 6, 1, -9, 4, 3}}] 
Out[1]= 3

Then each of the following scripts determine a subset of linearly independent vectors:

 m[[ Flatten[ Position[#, Except[0, _?NumericQ], 1, 1]& /@ Last @ QRDecomposition @ Transpose @ m ] ]] 
Out[2]= {{1, 2, 0, -3, 1, 0}, {1, 2, 2, -3, 1, 2}, {3, 6, 1, -9, 4, 3}}

or, using subroutine

 MinimalSublist[x_List] := Module[{tm, ntm, ytm, mm = x}, {tm = RowReduce[mm] // Transpose, ntm = MapIndexed[{#1, #2, Total[#1]} &, tm, {1}], ytm = Cases[ntm, {___, ___, d_ /; d == 1}]}; Cases[ytm, {b_, {a_}, c_} :> mm[[All, a]]] // Transpose] 

we apply it to our set of vectors.

 m1 = {{1, 2, 0, -3, 1, 0}, {1, 2, 1, -3, 1, 2}, {1, 2, 0, -3, 2, 1}, {3, 6, 1, -9, 4, 3}}; MinimalSublist[m1] 
Out[3]= {{1, 0, 1}, {1, 1, 1}, {1, 0, 2}, {3, 1, 4}}

In m1 you see 1 row and n columns together,so you can transpose it to see it as column {{1, 1, 1, 3}, {0, 1, 0, 1}, {1, 1, 2, 4}}

One can use also the standard Mathematica command: IndependenceTest. ■

1. Using matrix $${\bf N} = \begin{bmatrix} 8 & -1 \\ 13 & 18 \end{bmatrix} ,$$ send message