
High-performance MoG calculations
The idea of fitting a MoG to a dataset is to have a semi-analytical description of the probability distribution of the dataset, even though the original distribution is unknown. Although Multimin allows generating nearly pure functions for the MoGs fitted through the get_function method, these routines are written in Python, which is a less efficient language than, for example, C. Multimin includes an optional tool that allows creating optimized routines with C code. To run these
routines, it is necessary to download and install the GNU Scientific Library.
Installation and importing
If you’re running this in Google Colab or need to install the package, uncomment and run the following cell:
[1]:
try:
from google.colab import drive
%pip install -Uq multimin
except ImportError:
print("Not running in Colab, skipping installation")
%load_ext autoreload
%autoreload 2
!mkdir -p gallery/
# Uncomment to install from GitHub (development version)
# !pip install git+https://github.com/seap-udea/MultiMin.git
Not running in Colab, skipping installation
[2]:
import multimin as mn
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore")
deg = np.pi / 180
figprefix = "cmog"
Welcome to MultiMin v0.11.2. ¡Al infinito y más allá!
Verifying that GSL is installed
To verify that the library is installed, you can use the imultimin command from the command line:
[3]:
!imultimin --verify
Verifying MultiMin installation...
Welcome to MultiMin v0.11.2. ¡Al infinito y más allá!
✅ MultiMin package found: /Users/jzuluaga/dev/multimin/src/multimin/__init__.py
✅ Shared library found: /Users/jzuluaga/dev/multimin/src/multimin/lib/multimin.dylib
If it is not installed you can try to install it:
!imultimin --install-gsl
And follow the instructions.
3D distribution with one variable on a finite domain
Let’s try it in an infinite domain:
[4]:
weights = [0.5,0.5]
mus = [[1.0, 0.5, -0.5], [1.0, -0.5, +0.5],]
sigmas = [[1, 1.2, 2.3], [0.8, 0.2, 3.3]]
angles = [
[10*deg, 30*deg, 20*deg],
[-20*deg, 0*deg, 30*deg],
]
Sigmas = mn.Stats.calc_covariance_from_rotation(sigmas, angles)
MoG = mn.MixtureOfGaussians(mus=mus, weights=weights, Sigmas=Sigmas)
sample = MoG.rvs(5000)
MixtureOfGaussians.rvs executed in 0.2905158996582031 seconds
Normally:
[5]:
function, pymog = MoG.get_function()
from multimin import Util
def mog(X):
mu1_1 = 1.0
mu1_2 = 0.5
mu1_3 = -0.5
mu1 = [mu1_1, mu1_2, mu1_3]
Sigma1 = [[1.132931, -0.33952, 0.223726], [-0.33952, 2.282436, -1.675999], [0.223726, -1.675999, 4.314632]]
n1 = Util.nmd(X, mu1, Sigma1)
mu2_1 = 1.0
mu2_2 = -0.5
mu2_3 = 0.5
mu2 = [mu2_1, mu2_2, mu2_3]
Sigma2 = [[0.621908, 0.102606, 0.0], [0.102606, 0.058092, 0.0], [0.0, 0.0, 10.89]]
n2 = Util.nmd(X, mu2, Sigma2)
w1 = 0.5
w2 = 0.5
return (
w1*n1
+ w2*n2
)
Now we can get the C-MoG version:
[6]:
function, cmog = MoG.get_function(cmog=True)
import numpy as np
from multimin import Util, cmog
def mog(X):
weights = np.array([0.5, 0.5], dtype=np.float64)
mus = np.array([
[1, 0.5, -0.5],
[1, -0.5, 0.5],
], dtype=np.float64)
Sigmas = np.array([
[
[1.13293, -0.33952, 0.223726],
[-0.33952, 2.28244, -1.676],
[0.223726, -1.676, 4.31463],
],
[
[0.621908, 0.102606, 0],
[0.102606, 0.0580922, 0],
[0, 0, 10.89],
],
], dtype=np.float64)
return cmog.mog_c(X, weights, mus, Sigmas)
And test it:
[7]:
X = np.array([[0.0, 0.3, 1.2]])
cmog(X), pymog(X)
[7]:
(array([0.00478285]), np.float64(0.004782856043975975))
For a single point let’s see the performance of both approaches:
[8]:
X = [0.2, 0.3, 0.5]
%timeit pymog(X)
%timeit cmog(X)
69.4 μs ± 6.67 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
13 μs ± 392 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
And for many points:
[9]:
X = np.random.normal(size=(10000,3))
%timeit pymog(X)
%timeit cmog(X)
454 μs ± 20.2 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
339 μs ± 47.5 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Fitting 3D data with a finite domain on one variable
It also works for truncated MoGs.
[10]:
weights = [0.5, 0.5]
mus = [[0.0, 0.3, 0.0], [0.0, 0.7, 0.0]] # two bumps along the bounded variable (y)
sigmas = [[0.6, 0.15, 0.6], [0.6, 0.15, 0.6]]
Sigmas = [np.diag(s)**2 for s in sigmas]
MoG_3d = mn.MixtureOfGaussians(
mus=mus,
weights=weights,
Sigmas=Sigmas,
domain=[None, [0, 1], None], # only variable 1 in [0, 1]
)
Normally:
[11]:
function, pymog = MoG_3d.get_function()
import numpy as np
from multimin import Util
def mog(X):
a = [-np.inf, 0.0, -np.inf]
b = [np.inf, 1.0, np.inf]
mu1_1 = 0.0
mu1_2 = 0.3
mu1_3 = 0.0
mu1 = [mu1_1, mu1_2, mu1_3]
Sigma1 = [[0.36, 0.0, 0.0], [0.0, 0.0225, 0.0], [0.0, 0.0, 0.36]]
Z1 = 0.977248
n1 = Util.tnmd(X, mu1, Sigma1, a, b, Z=Z1)
mu2_1 = 0.0
mu2_2 = 0.7
mu2_3 = 0.0
mu2 = [mu2_1, mu2_2, mu2_3]
Sigma2 = [[0.36, 0.0, 0.0], [0.0, 0.0225, 0.0], [0.0, 0.0, 0.36]]
Z2 = 0.977248
n2 = Util.tnmd(X, mu2, Sigma2, a, b, Z=Z2)
w1 = 0.5
w2 = 0.5
return (
w1*n1
+ w2*n2
)
Now we can get the C-MoG version:
[12]:
function, cmog = MoG_3d.get_function(cmog=True)
import numpy as np
from multimin import Util, cmog
def mog(X):
weights = np.array([0.5, 0.5], dtype=np.float64)
mus = np.array([
[0, 0.3, 0],
[0, 0.7, 0],
], dtype=np.float64)
Sigmas = np.array([
[
[0.36, 0, 0],
[0, 0.0225, 0],
[0, 0, 0.36],
],
[
[0.36, 0, 0],
[0, 0.0225, 0],
[0, 0, 0.36],
],
], dtype=np.float64)
a = np.array([-np.inf, 0, -np.inf], dtype=np.float64)
b = np.array([np.inf, 1, np.inf], dtype=np.float64)
Zs = np.array([0.977248, 0.977248], dtype=np.float64)
return cmog.tmog_c(X, weights, mus, Sigmas, a, b, Zs)
And test it:
[13]:
X = np.array([[0.0, 0.3, 1.2]])
cmog(X)
[13]:
array([0.08374225])
For a single point let’s see the performance of both approaches:
[14]:
X = [0.2, 0.3, 0.5]
%timeit pymog(X)
%timeit cmog(X)
80.8 μs ± 6.14 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
17.4 μs ± 181 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
And for many points:
[15]:
X = np.random.normal(size=(10000,3))
%timeit pymog(X)
%timeit cmog(X)
948 μs ± 58.4 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
224 μs ± 22.9 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)