API¶
API for the methods in the bicm module.
-
class
bicm.BiCM(bin_mat)[source]¶ Bases:
objectBipartite Configuration Model for undirected binary bipartite networks.
This class implements the Bipartite Configuration Model (BiCM), which can be used as a null model for the analysis of undirected and binary bipartite networks. The class provides methods for calculating the biadjacency matrix of the null model and for quantifying node similarities in terms of p-values.
-
add2inqueue(nprocs, plam_mat, nlam_mat, k1, k2)[source]¶ Add elements to the in-queue to calculate the p-values.
Parameters: - nprocs (int) – number of processes running in parallel
- plam_mat (numpy.array (square matrix)) – array containing the list of probabilities for the single observations of \(\Lambda\)-motifs
- nlam_mat (numpy.array (square matrix)) – array containing the observations of \(\Lambda\)-motifs
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
-
check_input_matrix_is_binary()[source]¶ Check that the input matrix is binary, i.e. entries are 0 or 1.
Raises: AssertionError – raise an error if the input matrix is not binary
-
equations(xx)[source]¶ Return the equations of the log-likelihood maximization problem.
Note that the equations for the row-nodes depend only on the column-nodes and vice versa, see [Saracco2015].
Parameters: xx (numpy.array) – Lagrange multipliers which have to be solved Returns: equations to be solved (\(f(x) = 0\)) Return type: numpy.array
-
static
flat2triumat_dim(k)[source]¶ Return the dimension of the matrix hosting
ktriangular elements.Parameters: k (int) – the number of elements in the upper triangular part of the corresponding square matrix, excluding the diagonal Returns: dimension of the corresponding square matrix Return type: int
-
static
flat2triumat_idx(k, n)[source]¶ Convert an array index into the index couple of a triangular matrix.
kis the index of an array of length \(\binom{n}{2}{2}\), which contains the elements of an upper triangular matrix of dimensionnexcluding the diagonal. The function returns the index couple \((i, j)\) that corresponds to the entrykof the flat array.Note
- \(k \in \left[0,\ldots, \binom{n}{2} - 1\right]\)
- returned indices:
- \(i \in [0,\ldots, n - 1]\)
- \(j \in [i + 1,\ldots, n - 1]\)
Parameters: - k (int) – flattened array index
- n (int) – dimension of the square matrix
Returns: matrix index tuple (row, column)
Return type: tuple
-
get_biadjacency_matrix(xx)[source]¶ Calculate the biadjacency matrix of the null model.
The biadjacency matrix describes the BiCM null model, i.e. the optimal average graph \(<G>^*\) with the average link probabilities \(<G>^*_{rc} = p_{rc}\) , \(p_{rc} = \frac{x_r \cdot x_c}{1 + x_r\cdot x_c}.\) \(x\) are the solutions of the equation system which has to be solved for the null model. Note that \(r\) and \(c\) are taken from opposite bipartite node sets, thus \(r \neq c\).
Parameters: xx (numpy.array) – solutions of the equation system (Lagrange multipliers) Returns: biadjacency matrix of the null model Return type: numpy.array Raises: ValueError – raise an error if \(p_{rc} < 0\) or \(p_{rc} > 1\) for any \(r, c\)
-
get_lambda_motif_block(mm, k1, k2)[source]¶ Return a subset of \(\Lambda\)-motifs as observed in
mm.Given the binary input matrix
mm, count the number of \(\Lambda\)-motifs for all the node couples specified by the interval \(\left[k_1, k_2\right[\).Note
- The \(\Lambda\)-motifs are counted between the row-nodes
of the input matrix
mm. - If \(k_2 \equiv \binom{mm.shape[0]}{2}\), the interval becomes \(\left[k_1, k_2\right]\).
Parameters: - mm (numpy.array) – binary matrix
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
Returns: array of observed \(\Lambda\)-motifs
Return type: numpy.array
- The \(\Lambda\)-motifs are counted between the row-nodes
of the input matrix
-
get_plambda_block(biad_mat, k1, k2)[source]¶ Return a subset of the \(\Lambda\) probability matrix.
Given the biadjacency matrix
biad_matwith \(\mathbf{M}_{rc} = p_{rc}\), which describes the probabilities of row-noderand column-nodecbeing linked, the method returns the matrix\(P(\Lambda)_{ij} = \left(M_{i\alpha_1} \cdot M_{j\alpha_1}, M_{i\alpha_2} \cdot M_{j\alpha_2}, \ldots\right),\)
for all the node couples in the interval \(\left[k_1, k_2\right[\). \((i, j)\) are two row-nodes of
biad_matand \(\alpha_k\) runs over the nodes in the opposite layer.Note
- The probabilities are calculated between the row-nodes of the
input matrix
biad_mat. - If \(k_2 \equiv \binom{biad\_mat.shape[0]}{2}\), the interval becomes \(\left[k1, k2\right]\).
Parameters: - biad_mat (numpy.array) – biadjacency matrix
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
Returns: \(\Lambda\)-motif probability matrix
Return type: numpy.array
- The probabilities are calculated between the row-nodes of the
input matrix
-
get_pvalues_q(plam_mat, nlam_mat, k1, k2, parallel=True)[source]¶ Calculate the p-values of the observed \(\Lambda\)-motifs.
For each number of \(\Lambda\)-motifs in
nlam_matfor the node interval \(\left[k1, k2\right[\), construct the Poisson Binomial distribution using the corresponding probabilities inplam_matand calculate the p-value.Parameters: - plam_mat (numpy.array (square matrix)) – array containing the list of probabilities for the single observations of \(\Lambda\)-motifs
- nlam_mat (numpy.array (square matrix)) – array containing the observations of \(\Lambda\)-motifs
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
- parallel (bool) – if
True, the calculation is executed in parallel; ifFalse, only one process is started
-
get_triup_dim(bip_set)[source]¶ Return the number of possible node couples in
bip_set.Parameters: bip_set (bool) – selects row-nodes ( True) or column-nodes (False)Returns: return the number of node couple combinations corresponding to the layer bip_setReturn type: int Raises: ValueError – raise an error if the parameter bip_setis neitherTruenorFalse
-
jacobian(xx)[source]¶ Return a NumPy array with the Jacobian of the equation system.
Parameters: xx (numpy.array) – Lagrange multipliers which have to be solved Returns: Jacobian Return type: numpy.array
-
lambda_motifs(bip_set, parallel=True, filename=None, delim='\t', binary=True, num_chunks=4)[source]¶ Calculate and save the p-values of the \(\Lambda\)-motifs.
For each node couple in the bipartite layer specified by
bip_set, calculate the p-values of the corresponding \(\Lambda\)-motifs according to the link probabilities in the biadjacency matrix of the BiCM null model.The results can be saved either as a binary
.npyor a human-readable.csvfile, depending onbinary.Note
- The total number of p-values that are calculated is split into
num_chunkschunks, which are processed sequentially in order to avoid memory allocation errors. Note that a larger value ofnum_chunkswill lead to less memory occupation, but comes at the cost of slower processing speed. - The output consists of a one-dimensional array of p-values. If
the bipartite layer
bip_setcontainsnnodes, this means that the array will contain \(\binom{n}{2}\) entries. The indices(i, j)of the nodes corresponding to entrykin the array can be reconstructed using the methodBiCM.flat2_triumat_idx(). The number of nodesncan be recovered from the length of the array withBiCM.flat2_triumat_dim() - If
binary == False, thefilenameshould end with.csv. Ifbinary == True, it will be saved in binary NumPy.npyformat and the suffix.npywill be appended automatically. By default, the file is saved in binary format.
Parameters: - bip_set (bool) – select row-nodes (
True) or column-nodes (False) - parallel (bool) – select whether the calculation of the p-values should
be run in parallel (
True) or not (False) - filename (str) – name of the output file
- delim (str) – delimiter between entries in the
.csv``file, default is ``\t - binary (bool) – if
True, the file will be saved in the binary NumPy format.npy, otherwise as.csv - num_chunks (int) – number of chunks of p-value calculations that are performed sequentially
Raises: ValueError – raise an error if the parameter
bip_setis neitherTruenorFalse- The total number of p-values that are calculated is split into
-
make_bicm(x0=None, method='hybr', jac=None, tol=None, callback=None, options=None)[source]¶ Create the biadjacency matrix of the BiCM null model.
Solve the log-likelihood maximization problem to obtain the BiCM null model which respects constraints on the degree sequence of the input matrix.
The problem is solved using
scipy‘s root function with the solver defined bymethod. The status of the solver after running ``scipy.root``and the difference between the network and BiCM degrees are printed in the console.The default solver is the modified Powell method
hybr. Least-squares can be chosen withmethod='lm'for the Levenberg-Marquardt approach.Depending on the solver, keyword arguments
kwargscan be passed to the solver. Please refer to the scipy.optimize.root documentation for detailed descriptions.Note
It can happen that the solver
methodused byscipy.rootdoes not converge to a solution. In this case, please try anothermethodor different initial conditions and refer to the scipy.optimize.root documentation.Parameters: - x0 (1d numpy.array, optional) – initial guesses for the solutions. The first entries are the initial guesses for the row-nodes, followed by the initial guesses for the column-nodes.
- method (str, optional) –
type of solver, default is ‘hybr’. For other solvers, see the scipy.optimize.root documentation.
- jac (bool or callable, optional) – Jacobian of the system
- tol (float, optional) – tolerance for termination. For detailed control, use solver-specific options.
- callback (function, optional) – optional callback function to be called at
every iteration as
callback(self.equations, x), seescipy.rootdocumentation - options (dict, optional) – a dictionary of solver options, e.g.
xtolormaxiter, see scipy.root documentation - kwargs – solver-specific options, please refer to the SciPy documentation
Raises: ValueError – raise an error if not enough initial conditions are provided
-
outqueue2pval_mat(nprocs, pvalmat)[source]¶ Put the results from the out-queue into the p-value array.
-
print_max_degree_differences()[source]¶ Print the maximal differences between input network and BiCM degrees.
Check that the degree sequence of the solved BiCM null model graph corresponds to the degree sequence of the input graph.
-
static
save_array(mat, filename, delim='\t', binary=False)[source]¶ Save the array
matin the filefilename.The array can either be saved as a binary NumPy
.npyfile or as a human-readable.npyfile.Note
- The relative path has to be provided in the filename, e.g. ../data/pvalue_matrix.csv.
- If
binary==True, NumPy automatically appends the format ending.npyto the file.
Parameters: - mat (numpy.array) – array
- filename (str) – name of the output file
- delim (str) – delimiter between values in file
- binary (bool) – if
True, save as binary.npy, otherwise as a.csvfile
-
save_biadjacency(filename, delim='\t', binary=False)[source]¶ Save the biadjacendy matrix of the BiCM null model.
The matrix can either be saved as a binary NumPy
.npyfile or as a human-readable.csvfile.Note
- The relative path has to be provided in the filename, e.g. ../data/pvalue_matrix.csv.
- If
binary==True, NumPy automatically appends the format ending.npyto the file.
Parameters: - filename (str) – name of the output file
- delim (str) – delimiter between values in file
- binary (bool) – if
True, save as binary.npy, otherwise as a.csvfile
-
set_degree_seq()[source]¶ Return the node degree sequence of the input matrix.
Returns: node degree sequence [degrees row-nodes, degrees column-nodes] Return type: numpy.array Raises: AssertionError – raise an error if the length of the returned degree sequence does not correspond to the total number of nodes
-
solve_equations(x0=None, method='hybr', jac=None, tol=None, callback=None, options=None)[source]¶ Solve the system of equations of the maximum log-likelihood problem.
The system of equations is solved using
scipy‘s root function with the solver defined bymethod. The solutions correspond to the Lagrange multipliers\[x_i = \exp(-\theta_i).\]Depending on the solver, keyword arguments
kwargscan be passed to the solver. Please refer to the scipy.optimize.root documentation for detailed descriptions.The default solver is the modified Powell method
hybr. Least-squares can be chosen withmethod='lm'for the Levenberg-Marquardt approach.Note
It can happen that the solver
methodused byscipy.rootdoes not converge to a solution. In this case, please try anothermethodor different initial conditions and refer to the scipy.optimize.root documentation.Parameters: - x0 (1d numpy.array, optional) – initial guesses for the solutions. The first entries are the initial guesses for the row-nodes, followed by the initial guesses for the column-nodes.
- method (str, optional) –
type of solver, default is ‘hybr’. For other solvers, see the scipy.optimize.root documentation.
- jac (bool or callable, optional) – Jacobian of the system
- tol (float, optional) – tolerance for termination. For detailed control, use solver-specific options.
- callback (function, optional) – optional callback function to be called at
every iteration as
callback(self.equations, x), seescipy.rootdocumentation - options (dict, optional) – a dictionary of solver options, e.g.
xtolormaxiter, see scipy.root documentation - kwargs – solver-specific options, please refer to the SciPy documentation
Returns: solution of the equation system
Return type: scipy.optimize.OptimizeResult
Raises: ValueError – raise an error if not enough initial conditions are provided
-
split_range(n, m=4)[source]¶ Split the interval \(\left[0,\ldots, n\right]\) in
mparts.Parameters: - n (int) – upper limit of the range
- m – number of part in which range should be split
Returns: delimiter indices for the
mpartsReturn type: list
-
test_average_degrees(eps=0.01)[source]¶ Test the constraints on the node degrees.
Check that the degree sequence of the solved BiCM null model graph corresponds to the degree sequence of the input graph.
Parameters: eps (float) – maximum difference between degrees of the real network and the BiCM
-
static
triumat2flat_dim(n)[source]¶ Return the size of the triangular part of a
n x nmatrix.Parameters: n (int) – the dimension of the square matrix Returns: number of elements in the upper triangular part of the matrix (excluding the diagonal) Return type: int
-
static
triumat2flat_idx(i, j, n)[source]¶ Convert an matrix index couple to a flattened array index.
Given a square matrix of dimension
nand the index couple(i, j)of the upper triangular part of the matrix, return the index which the matrix element would have in a flattened array.Note
- \(i \in [0, ..., n - 1]\)
- \(j \in [i + 1, ..., n - 1]\)
- returned index \(\in [0,\, n (n - 1) / 2 - 1]\)
Parameters: - i (int) – row index
- j (int) – column index
- n (int) – dimension of the square matrix
Returns: flattened array index
Return type: int
-