High-performance MoG calculations

The idea of fitting a MoG to a dataset is to have a semi-analytical description of the probability distribution of the dataset, even though the original distribution is unknown. Although Multimin allows generating nearly pure functions for the MoGs fitted through the get_function method, these routines are written in Python, which is a less efficient language than, for example, C. Multimin includes an optional tool that allows creating optimized routines with C code. To run these routines, it is necessary to download and install the GNU Scientific Library.

Installation and importing

If you’re running this in Google Colab or need to install the package, uncomment and run the following cell:

[1]:

try:
    from google.colab import drive
    %pip install -Uq multimin
except ImportError:
    print("Not running in Colab, skipping installation")
    %load_ext autoreload
    %autoreload 2
!mkdir -p gallery/

# Uncomment to install from GitHub (development version)
# !pip install git+https://github.com/seap-udea/MultiMin.git

Not running in Colab, skipping installation

[2]:

import multimin as mn
import numpy as np
import matplotlib.pyplot as plt

import warnings
warnings.filterwarnings("ignore")

deg = np.pi / 180
figprefix = "cmog"

Welcome to MultiMin v0.11.2. ¡Al infinito y más allá!

Verifying that GSL is installed

To verify that the library is installed, you can use the imultimin command from the command line:

[3]:

!imultimin --verify

Verifying MultiMin installation...
Welcome to MultiMin v0.11.2. ¡Al infinito y más allá!
✅ MultiMin package found: /Users/jzuluaga/dev/multimin/src/multimin/__init__.py
✅ Shared library found: /Users/jzuluaga/dev/multimin/src/multimin/lib/multimin.dylib

If it is not installed you can try to install it:

!imultimin --install-gsl

And follow the instructions.

3D distribution with one variable on a finite domain

Let’s try it in an infinite domain:

[4]:

weights = [0.5,0.5]
mus = [[1.0, 0.5, -0.5], [1.0, -0.5, +0.5],]
sigmas = [[1, 1.2, 2.3], [0.8, 0.2, 3.3]]
angles = [
    [10*deg, 30*deg, 20*deg],
    [-20*deg, 0*deg, 30*deg],
]
Sigmas = mn.Stats.calc_covariance_from_rotation(sigmas, angles)

MoG = mn.MixtureOfGaussians(mus=mus, weights=weights, Sigmas=Sigmas)

sample = MoG.rvs(5000)

MixtureOfGaussians.rvs executed in 0.2905158996582031 seconds

Normally:

[5]:

function, pymog = MoG.get_function()

from multimin import Util

def mog(X):

    mu1_1 = 1.0
    mu1_2 = 0.5
    mu1_3 = -0.5
    mu1 = [mu1_1, mu1_2, mu1_3]
    Sigma1 = [[1.132931, -0.33952, 0.223726], [-0.33952, 2.282436, -1.675999], [0.223726, -1.675999, 4.314632]]
    n1 = Util.nmd(X, mu1, Sigma1)

    mu2_1 = 1.0
    mu2_2 = -0.5
    mu2_3 = 0.5
    mu2 = [mu2_1, mu2_2, mu2_3]
    Sigma2 = [[0.621908, 0.102606, 0.0], [0.102606, 0.058092, 0.0], [0.0, 0.0, 10.89]]
    n2 = Util.nmd(X, mu2, Sigma2)

    w1 = 0.5
    w2 = 0.5

    return (
        w1*n1
        + w2*n2
    )

Now we can get the C-MoG version:

[6]:

function, cmog = MoG.get_function(cmog=True)

import numpy as np
from multimin import Util, cmog

def mog(X):
    weights = np.array([0.5, 0.5], dtype=np.float64)
    mus = np.array([
        [1, 0.5, -0.5],
        [1, -0.5, 0.5],
    ], dtype=np.float64)
    Sigmas = np.array([
        [
            [1.13293, -0.33952, 0.223726],
            [-0.33952, 2.28244, -1.676],
            [0.223726, -1.676, 4.31463],
        ],
        [
            [0.621908, 0.102606, 0],
            [0.102606, 0.0580922, 0],
            [0, 0, 10.89],
        ],
    ], dtype=np.float64)
    return cmog.mog_c(X, weights, mus, Sigmas)

And test it:

[7]:

X = np.array([[0.0, 0.3, 1.2]])
cmog(X), pymog(X)

[7]:

(array([0.00478285]), np.float64(0.004782856043975975))

For a single point let’s see the performance of both approaches:

[8]:

X = [0.2, 0.3, 0.5]
%timeit pymog(X)
%timeit cmog(X)

69.4 μs ± 6.67 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
13 μs ± 392 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

And for many points:

[9]:

X = np.random.normal(size=(10000,3))
%timeit pymog(X)
%timeit cmog(X)

454 μs ± 20.2 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
339 μs ± 47.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Fitting 3D data with a finite domain on one variable

It also works for truncated MoGs.

[10]:

weights = [0.5, 0.5]
mus = [[0.0, 0.3, 0.0], [0.0, 0.7, 0.0]]  # two bumps along the bounded variable (y)
sigmas = [[0.6, 0.15, 0.6], [0.6, 0.15, 0.6]]
Sigmas = [np.diag(s)**2 for s in sigmas]

MoG_3d = mn.MixtureOfGaussians(
    mus=mus,
    weights=weights,
    Sigmas=Sigmas,
    domain=[None, [0, 1], None],  # only variable 1 in [0, 1]
)

Normally:

[11]:

function, pymog = MoG_3d.get_function()

import numpy as np
from multimin import Util

def mog(X):

    a = [-np.inf, 0.0, -np.inf]
    b = [np.inf, 1.0, np.inf]

    mu1_1 = 0.0
    mu1_2 = 0.3
    mu1_3 = 0.0
    mu1 = [mu1_1, mu1_2, mu1_3]
    Sigma1 = [[0.36, 0.0, 0.0], [0.0, 0.0225, 0.0], [0.0, 0.0, 0.36]]
    Z1 = 0.977248
    n1 = Util.tnmd(X, mu1, Sigma1, a, b, Z=Z1)

    mu2_1 = 0.0
    mu2_2 = 0.7
    mu2_3 = 0.0
    mu2 = [mu2_1, mu2_2, mu2_3]
    Sigma2 = [[0.36, 0.0, 0.0], [0.0, 0.0225, 0.0], [0.0, 0.0, 0.36]]
    Z2 = 0.977248
    n2 = Util.tnmd(X, mu2, Sigma2, a, b, Z=Z2)

    w1 = 0.5
    w2 = 0.5

    return (
        w1*n1
        + w2*n2
    )

Now we can get the C-MoG version:

[12]:

function, cmog = MoG_3d.get_function(cmog=True)

import numpy as np
from multimin import Util, cmog

def mog(X):
    weights = np.array([0.5, 0.5], dtype=np.float64)
    mus = np.array([
        [0, 0.3, 0],
        [0, 0.7, 0],
    ], dtype=np.float64)
    Sigmas = np.array([
        [
            [0.36, 0, 0],
            [0, 0.0225, 0],
            [0, 0, 0.36],
        ],
        [
            [0.36, 0, 0],
            [0, 0.0225, 0],
            [0, 0, 0.36],
        ],
    ], dtype=np.float64)
    a = np.array([-np.inf, 0, -np.inf], dtype=np.float64)
    b = np.array([np.inf, 1, np.inf], dtype=np.float64)
    Zs = np.array([0.977248, 0.977248], dtype=np.float64)
    return cmog.tmog_c(X, weights, mus, Sigmas, a, b, Zs)

And test it:

[13]:

X = np.array([[0.0, 0.3, 1.2]])
cmog(X)

[13]:

array([0.08374225])

For a single point let’s see the performance of both approaches:

[14]:

X = [0.2, 0.3, 0.5]
%timeit pymog(X)
%timeit cmog(X)

80.8 μs ± 6.14 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
17.4 μs ± 181 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

And for many points:

[15]:

X = np.random.normal(size=(10000,3))
%timeit pymog(X)
%timeit cmog(X)

948 μs ± 58.4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
224 μs ± 22.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

MultiMin – Multivariate Gaussian fitting with finite domains