Terms

Terms are the components of GAM models (e.g. linear, smooths, intercepts etc).
If you are familiar with mgcv, then the mgcv representation of the term can be inspected for any term using str(term).
We support adding of terms as syntactic sugar for creating a list of terms, i.e.

from pymgcv.terms import S, L
assert L("x0") + S("x1") == [L("x0"), S("x1")]

L

L(name: str)

Linear (parametric) term with no basis expansion.

If the variable if continuous, the term will be included in the model matrix as is, with a single corresponding coefficient.
If the variable is a categorical variable, the term will be expanded (one-hot encoded into a set of dummy variables.

Parameters:

name (str) –

Name of the variable to include as a linear term.

S

S(
    *varnames: str,
    by: str | None = None,
    k: int | None = None,
    bs: BasisLike | None = None,
    id: int | None = None,
    fx: bool = False,
)

Smooth term.

For all the arguments, passing None will use the mgcv defaults.

Note

For multiple variables, this creates an isotropic smooth, meaning all variables are treated on the same scale. If variables have very different scales or units, consider using T.

Parameters:

*varnames (str) –

Names of variables to smooth over. For single variables, creates a univariate smooth. For multiple variables, creates an isotropic multi-dimensional smooth.
k (int | None, default: None ) –

The dimension of the basis used to represent the smooth term. The default depends on the basis and number of variables that the smooth is a function of.
bs (BasisLike | None, default: None ) –

Basis function. For available options see Basis Functions. If left to none, uses ThinPlateSpline.
by (str | None, default: None ) –

variable name used to scale the smooth. If it's a numeric vector, it scales the smooth, and the "by" variable should not be included as a seperate main effect (as the smooth is usually not centered). If the "by" is a categorical variable, a separate smooth is created for each factor level. In this case the smooths are centered, so the categorical variable should be included as a main effect.
id (int | None, default: None ) –

Identifier for grouping smooths with shared penalties. If using a categorical by variable, providing an id will ensure a shared smoothing parameter for each level.
fx (bool, default: False ) –

Indicates whether the term is a fixed d.f. regression spline (True) or a penalized regression spline (False). Defaults to False.

T

T(
    *varnames: str,
    by: str | None = None,
    k: int | Iterable[int] | None = None,
    bs: BasisLike | Iterable[BasisLike] | None = None,
    d: Iterable[int] | None = None,
    id: int | None = None,
    fx: bool = False,
    np: bool = True,
    interaction_only: bool = False,
)

Tensor product smooth for scale-invariant multi-dimensional smoothing.

Tensor smooths create smooth functions of multiple variables using marginal smooths in order to be robust to variables on different scales. For the sequence arguments, the length must match the number of variables if d is not provided, else they must match the length of d.

Parameters:

*varnames (str) –

Names of variables for the tensor smooth.
k (int | Iterable[int] | None, default: None ) –

The basis dimension for each marginal smooth. If an integer, all marginal smooths will have the same basis dimension.
bs (BasisLike | Iterable[BasisLike] | None, default: None ) –

basis type to use, or an iterable of basis types for each marginal smooth. Defaults to CubicSpline
d (Iterable[int] | None, default: None ) –

Sequence specifying the dimension of each variable's smooth. For example, (2, 1) would specify to use one two dimensional marginal smooth and one 1 dimensional marginal smooth, where three variables are provided. This is useful for space-time smooths (2 dimensional space and 1 time dimension).
by (str | None, default: None ) –

Variable name for 'by' variable scaling the tensor smooth, or creating a smooth for each level of a categorical by variable.
id (int | None, default: None ) –

Identifier for sharing penalties across multiple tensor smooths.
fx (bool, default: False ) –

indicates whether the term is a fixed d.f. regression spline (True) or a penalized regression spline (False). Defaults to False.
np (bool, default: True ) –

If False, use a single penalty for the tensor product. If True (default), use separate penalties for each marginal. Defaults to True.
interaction_only (bool, default: False ) –

If True, creates ti() instead of te() - interaction only, excluding main effects of individual variables.

Interaction

Interaction(*varnames: str)

Parametric interaction term between multiple variables.

Any categorical variables involved in an interaction are expanded into indicator variables representing all combinations at the specified interaction order. Numeric variables are incorporated by multiplication (i.e. with eachother and any indicator variables).

Note, this does not automatically include main effects or lower order interactions.

Parameters:

*varnames (str) –

Variable names to include in the interaction. Can be any number of variables.

Example

# Two-way interaction (multiplication if both numeric)
from pymgcv.terms import Interaction
age_income = Interaction('age', 'income')

# Three-way interaction
varnames = ['group0', 'group1', 'group2']
three_way = Interaction(*varnames)

# Generate all pairwise interactions
from itertools import combinations
pairs = [Interaction(*pair) for pair in combinations(varnames, 2)]

Initialize an interaction term.

Parameters:

*varnames (str) –

Names of variables to include in the interaction. Must be 2 or more variables.

Offset

Offset(name: str)

Offset term, added to the linear predictor as is.

This means:

For log-link models: offset induces a multiplicative effect on the response scale
For identity-link models: an offset induces an additive effect on the response scale

Parameters:

name (str) –

Name of the variable to use as an offset. Must be present in the modeling data.

Intercept

Intercept()

Intercept term.

By default, this is added to all formulas in the model. If you want to control the intercept term, then GAM should have add_intercepts set to False, in which case, only intercepts explicitly added will be included in the model.

TermLike

Protocol defining the interface for GAM model terms.

All term types in pymgcv must implement this protocol. It defines the basic interface for model terms including variable references, string representations, and the ability to compute partial effects. See the source code of this class for more details.

Attributes:

varnames –

Tuple of variable names used by this term. For univariate terms, this contains a single variable name. For multivariate terms (like tensor smooths), this contains multiple variable names.
by –

Optional name of a 'by' variable that scales this term.

label

label() -> str

The label used by pymgcv for the term in plotting and columns.

All labels must be unique in a formula. Labels should be implemented such that each unique label must map to a unique mgcv identifier (but not necessarily the other way around).

mgcv_identifier

mgcv_identifier(formula_idx: int = 0) -> str

Generate the mgcv identifier for the term.

When computing partial effects, we look for a column with this name from the mgcv output, and if not we fall back on using partial_effect.

Example

from pymgcv.terms import S
assert S("x1").mgcv_identifier(formula_idx=1) == "s.1(x1)"

Parameters:

formula_idx (int, default: 0 ) –

Index of the formula in multi-formula models.

str

__str__() -> str

Convert the term to mgcv formula syntax.