Basis Functions
Basis functions control how smooth terms behave in your GAM models -- different basis types are suitable for different kinds of data and modeling requirements:
ThinPlateSpline
Thin plate regression spline basis.
Parameters:
-
shrinkage
(bool | None
, default:False
) –If True, the penalty is modified so that the term is shrunk to zero for a high enough smoothing parameter.
-
m
(int | None
, default:None
) –The order of the derivative in the thin plate spline penalty. If \(d\) is the number of covariates for the smooth term, this must satisfy \(m>(d+1)/2\). If left to None, the smallest value satisfying \(m>(d+1)/2\) will be used, which creates "visually smooth" functions.
CubicSpline
Cubic regression spline basis.
Cubic splines use piecewise cubic polynomials with knots placed throughout
the data range. They tend to be computationally efficient, but often
performs slightly worse than thin plate splines and are limited to
univariate smooths. Note the limitation of being restricted to
one-dimensional smooths does not imply they cannot be used for
multivariate T
smooths,
which are constructed from marginal bases.
Parameters:
-
cyclic
(bool
, default:False
) –If True, creates a cyclic spline where the function values and derivatives match at the boundaries. Use for periodic data like time of day, angles, or seasonal patterns. Default is False.
-
shrinkage
(bool
, default:False
) –If True, adds penalty to the null space (linear component). Helps with model selection and identifiability. Default is False. Cannot be used with cyclic=True.
Raises:
-
ValueError
–If both cyclic and shrinkage are True (incompatible options)
BSpline
B-spline basis with derivative-based penalties.
These are univariate (but note univariate smooths can be used for multivariate
smooths constructed with T
).
BSpline(degree=3, penalty_orders=[2])
constructs a conventional cubic spline.
Parameters:
-
degree
(int
, default:3
) –The degree of the B-spline basis (e.g. 3 for a cubic spline).
-
penalty_orders
(Iterable[int] | None
, default:None
) –The derivative orders to penalize. Default to [degree - 1].
PSpline
P-spline (penalized spline) basis as proposed by Eilers and Marx (1996).
Uses B-spline bases penalized by discrete penalties applied directly to the basis
coefficients. Note for most use cases splines with derivative-based penalties (e.g.
ThinPlateSpline
or
CubicSpline
) tend to yield better
MSE performance. BSpline(degree=3, penalty_order=2)
is
cubic-spline-like.
Parameters:
-
degree
(int
, default:3
) –Degree of the B-spline basis (e.g. 3 for cubic).
-
penalty_order
(int | None
, default:None
) –The difference order to penalize. 0-th order is ridge penalty. Default to
degree-1
.
DuchonSpline
Duchon spline basis - a generalization of thin plate splines.
These smoothers allow the use of lower orders of derivative in the penalty than conventional thin plate splines, while still yielding continuous functions.
The description, adapted from mgcv is as follows: Duchon’s (1977) construction generalizes the usual thin plate spline penalty as follows. The usual thin plate spline penalty is given by the integral of the squared Euclidian norm of a vector of mixed partial \(m\)-th order derivatives of the function w.r.t. its arguments. Duchon re-expresses this penalty in the Fourier domain, and then weights the squared norm in the integral by the Euclidean norm of the fourier frequencies, raised to the power \(2s\), where \(s\) is a user selected constant.
If \(d\) is the number of arguments of the smooth:
- It is required that \(-d/2 < s < d/2\).
- If \(s=0\) then the usual thin plate spline is recovered.
- To obtain continuous functions we further require that \(m + s > d/2\).
For example, DuchonSpline(m=1, s=d/2)
can be used in order to use first
derivative penalization for any \(d\), and still yield continuous functions.
Parameters:
-
m
–Order of derivative to penalize.
-
s
–\(s\) as described above, should be an integer divided by 2.
SplineOnSphere
Isotropic smooth for data on a sphere (latitude/longitude coordinates).
This should be used with exactly two variables, where the first represents latitude on the interval [-90, 90] and the second represents longitude on the interval [-180, 180].
Parameters:
-
m
–An integer in [-1, 4]. Setting
m=-1
usesDuchonSpline(m=2,s=1/2)
. Settingm=0
signals to use the 2nd order spline on the sphere, computed by Wendelberger’s (1981) method. For m>0, (m+2)/2 is the penalty order, with m=2 equivalent to the usual second derivative penalty.
RandomEffect
Random effect basis for correlated grouped data.
This can be used with any mixture of numeric or categorical variables. Acts
similarly to an Interaction
but penalizes
the corresponding coefficients with a multiple of the identity matrix (i.e. a ridge
penalty), corresponding to an assumption of i.i.d. normality of the parameters.
MarkovRandomField
Markov Random Field basis for discrete spatial data with neighborhood structure.
The smoothing penalty encourages similar value in neighboring locations. When using
this basis, the variable passed to S
should be a
categorical variable representing the area labels.
Parameters:
-
polys
(list[ndarray]
) –List of numpy arrays defining the spatial polygons or neighborhood structure. Each array represents the boundary or connectivity information for a spatial unit.
RandomWigglyCurve
S for each level of a categorical variable.
When using this basis, the first variable of the smooth should be a numeric variable, and the second should be a categorical variable.
Unlike using a categorical by variable e.g. S(x, by="group")
:
- The terms share a smoothing parameter.
- The terms are fully penalized, with seperate penalties on each null space component (e.g. intercepts). The terms are non-centered, and can be used with an intercept without introducing indeterminacy, due to the penalization.
Parameters:
-
bs
(BasisLike
, default:<factory>
) –Any singly penalized basis function. Defaults to
ThinPlateSpline
. Only the type of the basis is passed to mgcv (i.e. what is returned bystr(bs)
). This is a limitation of mgcv (e.g. you cannot do ) mgcv provides no way to pass more details for setting up the basis function.
BasisLike
Protocol defining the interface for GAM basis functions.
All basis function classes must implement this protocol to be usable with smooth terms. The protocol ensures basis functions can be converted to appropriate mgcv R syntax and provide any additional parameters needed.