API¶

API for the methods in the bicm module.

class bicm.BiCM(bin_mat)[source]¶

Bases: object

Bipartite Configuration Model for undirected binary bipartite networks.

This class implements the Bipartite Configuration Model (BiCM), which can be used as a null model for the analysis of undirected and binary bipartite networks. The class provides methods for calculating the biadjacency matrix of the null model and for quantifying node similarities in terms of p-values.

add2inqueue(nprocs, plam_mat, nlam_mat, k1, k2)[source]¶

Add elements to the in-queue to calculate the p-values.

Parameters:	nprocs (int) – number of processes running in parallel plam_mat (numpy.array (square matrix)) – array containing the list of probabilities for the single observations of \(\Lambda\)-motifs nlam_mat (numpy.array (square matrix)) – array containing the observations of \(\Lambda\)-motifs k1 (int) – lower interval limit k2 (int) – upper interval limit

check_input_matrix_is_binary()[source]¶

Check that the input matrix is binary, i.e. entries are 0 or 1.

Raises:	AssertionError – raise an error if the input matrix is not binary

equations(xx)[source]¶

Return the equations of the log-likelihood maximization problem.

Note that the equations for the row-nodes depend only on the column-nodes and vice versa, see [Saracco2015].

Parameters:	xx (numpy.array) – Lagrange multipliers which have to be solved
Returns:	equations to be solved (\(f(x) = 0\))
Return type:	numpy.array

static flat2triumat_dim(k)[source]¶

Return the dimension of the matrix hosting k triangular elements.

Parameters:	k (int) – the number of elements in the upper triangular part of the corresponding square matrix, excluding the diagonal
Returns:	dimension of the corresponding square matrix
Return type:	int

static flat2triumat_idx(k, n)[source]¶

Convert an array index into the index couple of a triangular matrix.

k is the index of an array of length \(\binom{n}{2}{2}\), which contains the elements of an upper triangular matrix of dimension n excluding the diagonal. The function returns the index couple \((i, j)\) that corresponds to the entry k of the flat array.

Note

\(k \in \left[0,\ldots, \binom{n}{2} - 1\right]\)
returned indices:
- \(i \in [0,\ldots, n - 1]\)
- \(j \in [i + 1,\ldots, n - 1]\)

Parameters:	k (int) – flattened array index n (int) – dimension of the square matrix
Returns:	matrix index tuple (row, column)
Return type:	tuple

get_biadjacency_matrix(xx)[source]¶

Calculate the biadjacency matrix of the null model.

The biadjacency matrix describes the BiCM null model, i.e. the optimal average graph \(<G>^*\) with the average link probabilities \(<G>^*_{rc} = p_{rc}\) , \(p_{rc} = \frac{x_r \cdot x_c}{1 + x_r\cdot x_c}.\) \(x\) are the solutions of the equation system which has to be solved for the null model. Note that \(r\) and \(c\) are taken from opposite bipartite node sets, thus \(r \neq c\).

Parameters:	xx (numpy.array) – solutions of the equation system (Lagrange multipliers)
Returns:	biadjacency matrix of the null model
Return type:	numpy.array
Raises:	ValueError – raise an error if \(p_{rc} < 0\) or \(p_{rc} > 1\) for any \(r, c\)

get_lambda_motif_block(mm, k1, k2)[source]¶

Return a subset of \(\Lambda\)-motifs as observed in mm.

Given the binary input matrix mm, count the number of \(\Lambda\)-motifs for all the node couples specified by the interval \(\left[k_1, k_2\right[\).

Note

The \(\Lambda\)-motifs are counted between the row-nodes of the input matrix mm.
If \(k_2 \equiv \binom{mm.shape[0]}{2}\), the interval becomes \(\left[k_1, k_2\right]\).

Parameters:	mm (numpy.array) – binary matrix k1 (int) – lower interval limit k2 (int) – upper interval limit
Returns:	array of observed \(\Lambda\)-motifs
Return type:	numpy.array

get_plambda_block(biad_mat, k1, k2)[source]¶

Return a subset of the \(\Lambda\) probability matrix.

Given the biadjacency matrix biad_mat with \(\mathbf{M}_{rc} = p_{rc}\), which describes the probabilities of row-node r and column-node c being linked, the method returns the matrix

\(P(\Lambda)_{ij} = \left(M_{i\alpha_1} \cdot M_{j\alpha_1}, M_{i\alpha_2} \cdot M_{j\alpha_2}, \ldots\right),\)

for all the node couples in the interval \(\left[k_1, k_2\right[\). \((i, j)\) are two row-nodes of biad_mat and \(\alpha_k\) runs over the nodes in the opposite layer.

Note

The probabilities are calculated between the row-nodes of the input matrix biad_mat.
If \(k_2 \equiv \binom{biad\_mat.shape[0]}{2}\), the interval becomes \(\left[k1, k2\right]\).

Parameters:	biad_mat (numpy.array) – biadjacency matrix k1 (int) – lower interval limit k2 (int) – upper interval limit
Returns:	\(\Lambda\)-motif probability matrix
Return type:	numpy.array

get_pvalues_q(plam_mat, nlam_mat, k1, k2, parallel=True)[source]¶

Calculate the p-values of the observed \(\Lambda\)-motifs.

For each number of \(\Lambda\)-motifs in nlam_mat for the node interval \(\left[k1, k2\right[\), construct the Poisson Binomial distribution using the corresponding probabilities in plam_mat and calculate the p-value.

Parameters:

plam_mat (numpy.array (square matrix)) – array containing the list of probabilities for the single observations of \(\Lambda\)-motifs
nlam_mat (numpy.array (square matrix)) – array containing the observations of \(\Lambda\)-motifs
k1 (int) – lower interval limit
k2 (int) – upper interval limit
parallel (bool) – if True, the calculation is executed in parallel; if False, only one process is started

get_triup_dim(bip_set)[source]¶

Return the number of possible node couples in bip_set.

Parameters:	bip_set (bool) – selects row-nodes (`True`) or column-nodes (`False`)
Returns:	return the number of node couple combinations corresponding to the layer `bip_set`
Return type:	int
Raises:	ValueError – raise an error if the parameter `bip_set` is neither `True` nor `False`

jacobian(xx)[source]¶

Return a NumPy array with the Jacobian of the equation system.

Parameters:	xx (numpy.array) – Lagrange multipliers which have to be solved
Returns:	Jacobian
Return type:	numpy.array

lambda_motifs(bip_set, parallel=True, filename=None, delim='\t', binary=True, num_chunks=4)[source]¶

Calculate and save the p-values of the \(\Lambda\)-motifs.

For each node couple in the bipartite layer specified by bip_set, calculate the p-values of the corresponding \(\Lambda\)-motifs according to the link probabilities in the biadjacency matrix of the BiCM null model.

The results can be saved either as a binary .npy or a human-readable .csv file, depending on binary.

Note

The total number of p-values that are calculated is split into num_chunks chunks, which are processed sequentially in order to avoid memory allocation errors. Note that a larger value of num_chunks will lead to less memory occupation, but comes at the cost of slower processing speed.
The output consists of a one-dimensional array of p-values. If the bipartite layer bip_set contains n nodes, this means that the array will contain \(\binom{n}{2}\) entries. The indices (i, j) of the nodes corresponding to entry k in the array can be reconstructed using the method BiCM.flat2_triumat_idx(). The number of nodes n can be recovered from the length of the array with BiCM.flat2_triumat_dim()
If binary == False, the filename should end with .csv. If binary == True, it will be saved in binary NumPy .npy format and the suffix .npy will be appended automatically. By default, the file is saved in binary format.

Parameters:

bip_set (bool) – select row-nodes (True) or column-nodes (False)
parallel (bool) – select whether the calculation of the p-values should be run in parallel (True) or not (False)
filename (str) – name of the output file
delim (str) – delimiter between entries in the .csv``file, default is ``\t
binary (bool) – if True, the file will be saved in the binary NumPy format .npy, otherwise as .csv
num_chunks (int) – number of chunks of p-value calculations that are performed sequentially

Raises:

ValueError – raise an error if the parameter bip_set is neither True nor False

make_bicm(x0=None, method='hybr', jac=None, tol=None, callback=None, options=None)[source]¶

Create the biadjacency matrix of the BiCM null model.

Solve the log-likelihood maximization problem to obtain the BiCM null model which respects constraints on the degree sequence of the input matrix.

The problem is solved using scipy‘s root function with the solver defined by method. The status of the solver after running ``scipy.root``and the difference between the network and BiCM degrees are printed in the console.

The default solver is the modified Powell method hybr. Least-squares can be chosen with method='lm' for the Levenberg-Marquardt approach.

Depending on the solver, keyword arguments kwargs can be passed to the solver. Please refer to the scipy.optimize.root documentation for detailed descriptions.

Note

It can happen that the solver method used by scipy.root does not converge to a solution. In this case, please try another method or different initial conditions and refer to the scipy.optimize.root documentation.

Parameters:

x0 (1d numpy.array, optional) – initial guesses for the solutions. The first entries are the initial guesses for the row-nodes, followed by the initial guesses for the column-nodes.
method (str, optional) –
type of solver, default is ‘hybr’. For other solvers, see the scipy.optimize.root documentation.
jac (bool or callable, optional) – Jacobian of the system
tol (float, optional) – tolerance for termination. For detailed control, use solver-specific options.
callback (function, optional) – optional callback function to be called at every iteration as callback(self.equations, x), see scipy.root documentation
options (dict, optional) – a dictionary of solver options, e.g. xtol or maxiter, see scipy.root documentation
kwargs – solver-specific options, please refer to the SciPy documentation

Raises:

ValueError – raise an error if not enough initial conditions are provided

outqueue2pval_mat(nprocs, pvalmat)[source]¶: Put the results from the out-queue into the p-value array.

print_max_degree_differences()[source]¶

Print the maximal differences between input network and BiCM degrees.

Check that the degree sequence of the solved BiCM null model graph corresponds to the degree sequence of the input graph.

pval_process_worker()[source]¶: Calculate p-values and add them to the out-queue.

static save_array(mat, filename, delim='\t', binary=False)[source]¶

Save the array mat in the file filename.

The array can either be saved as a binary NumPy .npy file or as a human-readable .npy file.

Note

The relative path has to be provided in the filename, e.g. ../data/pvalue_matrix.csv.
If binary==True, NumPy automatically appends the format ending .npy to the file.

Parameters:	mat (numpy.array) – array filename (str) – name of the output file delim (str) – delimiter between values in file binary (bool) – if `True`, save as binary `.npy`, otherwise as a `.csv` file

save_biadjacency(filename, delim='\t', binary=False)[source]¶

Save the biadjacendy matrix of the BiCM null model.

The matrix can either be saved as a binary NumPy .npy file or as a human-readable .csv file.

Note

The relative path has to be provided in the filename, e.g. ../data/pvalue_matrix.csv.
If binary==True, NumPy automatically appends the format ending .npy to the file.

Parameters:	filename (str) – name of the output file delim (str) – delimiter between values in file binary (bool) – if `True`, save as binary `.npy`, otherwise as a `.csv` file

set_degree_seq()[source]¶

Return the node degree sequence of the input matrix.

Returns:	node degree sequence [degrees row-nodes, degrees column-nodes]
Return type:	numpy.array
Raises:	AssertionError – raise an error if the length of the returned degree sequence does not correspond to the total number of nodes

solve_equations(x0=None, method='hybr', jac=None, tol=None, callback=None, options=None)[source]¶

Solve the system of equations of the maximum log-likelihood problem.

The system of equations is solved using scipy‘s root function with the solver defined by method. The solutions correspond to the Lagrange multipliers

\[x_i = \exp(-\theta_i).\]

Depending on the solver, keyword arguments kwargs can be passed to the solver. Please refer to the scipy.optimize.root documentation for detailed descriptions.

The default solver is the modified Powell method hybr. Least-squares can be chosen with method='lm' for the Levenberg-Marquardt approach.

Note

It can happen that the solver method used by scipy.root does not converge to a solution. In this case, please try another method or different initial conditions and refer to the scipy.optimize.root documentation.

Parameters:	x0 (1d numpy.array, optional) – initial guesses for the solutions. The first entries are the initial guesses for the row-nodes, followed by the initial guesses for the column-nodes. method (str, optional) – type of solver, default is ‘hybr’. For other solvers, see the scipy.optimize.root documentation. jac (bool or callable, optional) – Jacobian of the system tol (float, optional) – tolerance for termination. For detailed control, use solver-specific options. callback (function, optional) – optional callback function to be called at every iteration as `callback(self.equations, x)`, see `scipy.root` documentation options (dict, optional) – a dictionary of solver options, e.g. `xtol` or `maxiter`, see scipy.root documentation kwargs – solver-specific options, please refer to the SciPy documentation
Returns:	solution of the equation system
Return type:	scipy.optimize.OptimizeResult
Raises:	ValueError – raise an error if not enough initial conditions are provided

split_range(n, m=4)[source]¶

Split the interval \(\left[0,\ldots, n\right]\) in m parts.

Parameters:	n (int) – upper limit of the range m – number of part in which range should be split
Returns:	delimiter indices for the `m` parts
Return type:	list

test_average_degrees(eps=0.01)[source]¶

Test the constraints on the node degrees.

Check that the degree sequence of the solved BiCM null model graph corresponds to the degree sequence of the input graph.

Parameters:	eps (float) – maximum difference between degrees of the real network and the BiCM

static triumat2flat_dim(n)[source]¶

Return the size of the triangular part of a n x n matrix.

Parameters:	n (int) – the dimension of the square matrix
Returns:	number of elements in the upper triangular part of the matrix (excluding the diagonal)
Return type:	int

static triumat2flat_idx(i, j, n)[source]¶

Convert an matrix index couple to a flattened array index.

Given a square matrix of dimension n and the index couple (i, j) of the upper triangular part of the matrix, return the index which the matrix element would have in a flattened array.

Note

\(i \in [0, ..., n - 1]\)
\(j \in [i + 1, ..., n - 1]\)
returned index \(\in [0,\, n (n - 1) / 2 - 1]\)

Parameters:	i (int) – row index j (int) – column index n (int) – dimension of the square matrix
Returns:	flattened array index
Return type:	int