SymPy's Architecture

Software architecture is of central importance in any large software project because it establishes predictable patterns of usage and development.

Basic Usage

Since the user needs the majority of functions and subroutines, it is strongly recomemnded to import all SymPy functions into the global Python namespace. From here on, all examples assume that the following statement has been executed:

>>> from sympy import *

Symbolic variables, called symbols, must be defined and assigned to Python variables before they can be used. This is typically done through the symbols function, which may create multiple symbols in a single function call. For instance,

>>> x, y, z = symbols('x y z')

creates three symbols representing variables named x,y, and z. In this particular instance, these symbols are all assigned to Python variables of the same name. However, the user is free to assign them to different Python variables, while representing the same symbol, such as a, b, c = symbols('x y z'). In order to minimize potential confusion, though, all examples in this titorial will assume that the symbols x, y, and z have been assigned to Python variables identical to their symbolic names.

Expressions are created from symbols using Python’s mathematical syntax. For instance, the following Python code creates the expression \( \left( x^3 - x\,y + 3\, x \right) /y^2 \)


>>>  ( x**3 - x*y + 3*x )/y**2 

SymPy expressions are immutable. This simplifies the design of SymPy by allowing expression interning. It also enables expressions to be hashed and stored in Python dictionaries, thereby permitting features such as caching.

A computer algebra system (CAS) represents mathematical expressions as data structures. For example, the mathematical expression x + y is represented as a tree with three nodes, +, x, and y, where x and y are ordered children of +. As users manipulate mathematical expressions with traditional mathematical syntax, the CAS manipulates the underlying data structures. Automated optimizations and computations such as integration, simplification, etc. are all functions that consume and produce expression trees.

In SymPy every symbolic expression is an instance of a Python Basic class, a superclass of all SymPy types providing common methods to all SymPy tree-elements, such as traversals. The children of a node in the tree are held in the args attribute. A terminal or leaf node in the expression tree has empty args.

For example, consider the expression \( x\,y +3 : \)

>>> x*y + 3

By order of operations, the parent of the expression tree for expr is an addition, so it is of type Add. The child nodes of expr are 3 and x*y.

 >>> type(expr) 
 <class 'sympy.core.add.Add'>   
 >>>  expr.args 
 (3,x*y) 

Descending further down into the expression tree yields the full expression. For example, the next child node (given by expr.args[0]) is 3. Its class is Integer, and it has an empty args tuple, indicating that it is a leaf node.

 >>> expr.arg[0] 
 3   
 >>>  type(expr.args[0]) 
 <class 'sympy.core.numbers.Integer'> 
 >>>  expr.args[0].args 
 () 

A useful way to view an expression tree is using the srepr function, which returns a string representation of an expression as valid Python code with all the nested class constructor calls to create the given expression.

 >>> srepr(expr) 
 "Add(Mul(Symbol('x'), Symbol('y')), Integer(3))"   

Every SymPy expression satisfies a key identity invariant:

 expr.func(*expr.args) == expr  

This means that expressions are rebuildable from their args. Note that in SymPy the == operator represents exact structural equality, not mathematical equality. This allows testing if any two expressions are equal to one another as expression trees. For example, even though \( \left( x+2 \right)^2 \) and \( x^2 + 4\,x +4 \) are equal mathematically, SymPy gives

 >>> (x + 2)**2 == x**2 + 4*x + 4 
 False   
because they are different as expression trees (the former is a Pow object and the latter is an Add object).

Python allows classes to override mathematical operators. The Python interpreter translates the above x*y + 3 to, roughly, (x.__mul__(y)).__add__(2). Both x and y, returned from the symbols function, are Symbol instances. The 3 in the expression is processed by Python as a literal, and is stored as Python’s built in int type. When 3 is passed to the __add__ method of Symbol, it is converted to the SymPy type Integer(3) before being stored in the resulting expression tree. In this way, SymPy expressions can be built in the natural way using Python operators and numeric literals.

SymPy performs logical inference through its assumptions system. The assumptions system allows users to specify that symbols have certain common mathematical properties, such as being positive, imaginary, or integral. SymPy is careful to never perform simplifications on an expression unless the assumptions allow them. For instance, the identity \( \sqrt{x^2} =x \) holds if x is nonnegative (\( x \ge 0 \) ). However, for general complex x, no such identity holds.

 >>> x = Symbol('x')  
 >>>  sqrt(x**2) 
 sqrt(x**2) 

By assuming the most general case, that symbols are complex by default, SymPy avoids performing mathematically invalid operations. However, in many cases users will wish to simplify expressions containing terms like \( \sqrt{x^2} . \)

Assumptions are set on Symbol objects when they are created. For instance, Symbol('x', positive=True) will create a symbol named x that is assumed to be positive:

 >>> x = Symbol('x', positive=True)  
 >>>  sqrt(x**2) 
 x 

Some of the common assumptions that SymPy allows are positive, negative, real, nonpositive, integer, prime, and commutative. Assumptions on any object can be checked with the is_assumption attributes, like x.is_positive.

Assumptions are only needed to restrict a domain so that certain simplifications can be performed. They are not required to make the domain match the input of a function. For instance, one can create the object \( \sum_{k=0}^m f(k) \) as Sum(f(k), (k, 0, m)) without setting integer=True when creating the Symbol object k.

The assumptions system additionally has deductive capabilities. The assumptions use a three-valued logic using the Python built in objects True, False, and None. Note that False is returned if the SymPy object doesn’t or can’t have the assumption. For example, both I.is_real and I.is_prime return False for the imaginary unit I.

None represents the “unknown” case. This could mean that given assumptions do not unambiguously specify the truth of an attribute. For instance, Symbol('x', real=True).is_positive will give None because a real symbol might be positive or negative. The None could also mean that not enough is known or implemented to compute the given fact. For instance, (pi + E).is_irrational gives None, because determining whether π + e is rational or irrational is an open problem in mathematics.

Basic implications between the facts are used to deduce assumptions. For instance, the assumptions system knows that being an integer implies being rational, so Symbol('x', integer=True).is_rational returns True. Furthermore, expressions compute the assumptions on themselves based on the assumptions of their arguments. For instance, if x and y are both created with positive=True, then (x + y).is_positive will be True whereas (x - y).is_positive will be None.

The typical way to create a custom SymPy object is to subclass an existing SymPy class, usually Basic, Expr, or Function. All SymPy classes used for expression trees should be subclasses of the base class Basic, which defines some basic methods for symbolic expression trees. Expr is the subclass for mathematical expressions that can be added and multiplied together. Instances of Expr typically represent complex numbers, but may also include other “rings” like matrix expressions. Not all SymPy classes are subclasses of Expr. For instance, logic expressions such as And(x, y) are subclasses of Basic but not of Expr.

. Additionally, the following Table gives a compact listing of all major capabilities present in the SymPy codebase. This grants a sampling from the breadth of topics and application domains that SymPy services. Unless stated otherwise, all features noted in Table 1 are symbolic in nature. Numeric features are discussed in Section Numerics.

 Feature  Description
 Calculus  Algorithms for computing derivatives, integrals, and limits.
 Category Theory  Representation of objects, morphisms, and diagrams. Tools for drawing diagrams with Xy-pic.
 Code Generation  Generation of compilable and executable code in a variety of different programming languages from expressions directly. Target languages include C, Fortran, Julia, JavaScript, Mathematica, MATLAB and Octave, Python, and Theano.
 Combinatorics  Permutations, combinations, partitions, subsets, various permutation groups (such as polyhedral, Rubik, symmetric, and others), Gray codes, and Prufer sequences
 Discrete Math  Summation, products, tools for determining whether summation and product expressions are convergent, absolutely convergent, hypergeo- metric, and for determining other properties; computation of Gosper’s normal form for two univariate polynomials.
 Cryptography  Block and stream ciphers, including shift, Affine, substitution, Vi- genère’s, Hill’s, bifid, RSA, Kid RSA, linear-feedback shift registers, and Elgamal encryption.
 Diff Geometry  Representations of manifolds, metrics, tensor products, and coordinate systems in Riemannian and pseudo-Riemannian geometries.
 Geometry  Representations of 2D geometrical entities, such as lines and circles. Enables queries on these entities, such as asking the area of an ellipse, checking for collinearity of a set of points, or finding the intersection between objects.
 Lie Algebras  Representations of Lie algebras and root systems.
 Logic  Boolean expressions, equivalence testing, satisfiability, and normal forms.
 Matrices  Tools for creating matrices of symbols and expressions. Both sparse and dense representations, as well as symbolic linear algebraic operations (e.g., inversion and factorization), are supported.
 Matrix Calculus  Matrices with symbolic dimensions (unspecified entries). Block matri- ces.
 Number Theory  Prime number generation, primality testing, integer factorization, con- tinued fractions, Egyptian fractions, modular arithmetic, quadratic residues, partitions, binomial and multinomial coefficients, prime number tools, hexidecimal digits of π, and integer factorization.
 Plotting  Hooks for visualizing expressions via matplotlib or as text drawings when lacking a graphical back-end. 2D function plotting, 3D function plotting, and 2D implicit function plotting are supported.
 Polynomials  Polynomial algebras over various coefficient domains. Functionality ranges from simple operations (e.g., polynomial division) to advanced computations (e.g., Gröbner bases and multivariate factorization over algebraic number domains).
 Printing  Functions for printing SymPy expressions in the terminal with ASCII or Unicode characters and converting SymPy expressions to LATEX and MathML.
 Quantum Mechanics  Quantum states, bra–ket notation, operators, basis sets, representa- tions, tensor products, inner products, outer products, commutators, anticommutators, and specific quantum system implementations.
 Series  Series expansion, sequences, and limits of sequences. This includes Taylor, Laurent, and Puiseux series as well as special series, such as Fourier and formal power series.
 Sets  Representations of empty, finite, and infinite sets. This includes special sets such as for all natural, integer, and complex numbers. Operations on sets such as union, intersection, Cartesian product, and building sets from other sets are supported.
 Simplification  Functions for manipulating and simplifying expressions. Includes algorithms for simplifying hypergeometric functions, trigonometric expressions, rational functions, combinatorial functions, square root denesting, and common subexpression elimination
 Solvers  Functions for symbolically solving equations, systems of equations, both linear and non-linear, inequalities, ordinary differential equations, partial differential equations, Diophantine equations, and recurrence relations.
 Special Functions  Implementations of a number of well known special functions, including Dirac delta, Gamma, Beta, Gauss error functions, Fresnel integrals, Exponential integrals, Logarithmic integrals, Trigonometric integrals, Bessel, Hankel, Airy, B-spline, Riemann Zeta, Dirichlet eta, polyloga- rithm, Lerch transcendent, hypergeometric, elliptic integrals, Mathieu, Jacobi polynomials, Gegenbauer polynomial, Chebyshev polynomial, Legendre polynomial, Hermite polynomial, Laguerre polynomial, and spherical harmonic functions
 Statistics  Support for a random variable type as well as the ability to declare this variable from prebuilt distribution functions such as Normal, Exponential, Coin, Die, and other custom distributions
 Tensors  Symbolic manipulation of indexed objects
 Vectors  Basic operations on vectors and differential calculus with respect to 3D Cartesian coordinate systems.

========================================

All symbolic things are implemented using subclasses of the Basic class. First, you need to create symbols using Symbol('x') or numbers using Integer(5) or Float(34.3). Then you construct the expression using any class from SymPy. For example Add(Symbol('a'), Symbol('b')) gives an instance of the Add class. You can call all methods, which the particular class supports.

For easier use, there is a syntactic sugar for expressions like:

cos(x) + 1 is equal to cos(x).__add__(1) is equal to Add(cos(x), Integer(1))

or

2/cos(x) is equal to cos(x).__rdiv__(2) is equal to Mul(Rational(2), Pow(cos(x), Rational(-1))).

So, you can write normal expressions using python arithmetics like this:

a = Symbol("a")
b = Symbol("b")
e = (a + b)**2
print e

but from the SymPy point of view, we just need the classes Add, Mul, Pow, Rational, Integer.

Automatic evaluation to canonical form

For computation, all expressions need to be in a canonical form, this is done during the creation of the particular instance and only inexpensive operations are performed, necessary to put the expression in the canonical form. So the canonical form doesn't mean the simplest possible expression. The exact list of operations performed depend on the implementation. Obviously, the definition of the canonical form is arbitrary, the only requirement is that all equivalent expressions must have the same canonical form. We tried the conversion to a canonical (standard) form to be as fast as possible and also in a way so that the result is what you would write by hand - so for example b*a + -4 + b + a*b + 4 + (a + b)**2 becomes 2*a*b + b + (a + b)**2.

Whenever you construct an expression, for example Add(x, x), the Add.__new__() is called and it determines what to return. In this case:

>>> from sympy import Add
>>> from sympy.abc import x
>>> e = Add(x, x)
>>> e
2*x

>>> type(e)
<class 'sympy.core.mul.Mul'>

e is actually an instance of Mul(2, x), because Add.__new__() returned Mul.

Comparisons

Expressions can be compared using a regular python syntax:

>>> from sympy.abc import x, y
>>> x + y == y + x
True

>>> x + y == y - x
False

We made the following decision in SymPy: a = Symbol("x") and another b = Symbol("x") (with the same string "x") is the same thing, i.e. a == b is True. We chose a == b, because it is more natural - exp(x) == exp(x) is also True for the same instance of x but different instances of exp, so we chose to have exp(x) == exp(x) even for different instances of x.

Sometimes, you need to have a unique symbol, for example as a temporary one in some calculation, which is going to be substituted for something else at the end anyway. This is achieved using Dummy("x"). So, to sum it up:

>>> from sympy import Symbol, Dummy
>>> Symbol("x") == Symbol("x")
True

>>> Dummy("x") == Dummy("x")
False

Debugging

Starting with 0.6.4, you can turn on/off debug messages with the environment variable SYMPY_DEBUG, which is expected to have the values True or False. For example, to turn on debugging, you would issue:

[user@localhost]: SYMPY_DEBUG=True ./bin/isympy

Functionality

There are no given requirements on classes in the library. For example, if they don't implement the fdiff() method and you construct an expression using such a class, then trying to use the Basic.series() method will raise an exception of not finding the fdiff() method in your class. This "duck typing" has an advantage that you just implement the functionality which you need.

You can define the cos class like this:

class cos(Function):
    pass

and use it like 1 + cos(x), but if you don't implement the fdiff() method, you will not be able to call (1 + cos(x)).series().

The symbolic object is characterized (defined) by the things which it can do, so implementing more methods like fdiff(), subs() etc., you are creating a "shape" of the symbolic object. Useful things to implement in new classes are: hash() (to use the class in comparisons), fdiff() (to use it in series expansion), subs() (to use it in expressions, where some parts are being substituted) and series() (if the series cannot be computed using the general Basic.series() method). When you create a new class, don't worry about this too much - just try to use it in your code and you will realize immediately which methods need to be implemented in each situation.

All objects in sympy are immutable - in the sense that any operation just returns a new instance (it can return the same instance only if it didn't change). This is a common mistake to change the current instance, like self.arg = self.arg + 1 (wrong!). Use arg = self.arg + 1; return arg instead. The object is immutable in the sense of the symbolic expression it represents. It can modify itself to keep track of, for example, its hash. Or it can recalculate anything regarding the expression it contains. But the expression cannot be changed. So you can pass any instance to other objects, because you don't have to worry that it will change, or that this would break anything.