API¶
API for the methods in the bicm
module.
-
class
bicm.
BiCM
(bin_mat)[source]¶ Bases:
object
Bipartite Configuration Model for undirected binary bipartite networks.
This class implements the Bipartite Configuration Model (BiCM), which can be used as a null model for the analysis of undirected and binary bipartite networks. The class provides methods for calculating the biadjacency matrix of the null model and for quantifying node similarities in terms of p-values.
-
add2inqueue
(nprocs, plam_mat, nlam_mat, k1, k2)[source]¶ Add elements to the in-queue to calculate the p-values.
Parameters: - nprocs (int) – number of processes running in parallel
- plam_mat (numpy.array (square matrix)) – array containing the list of probabilities for the single observations of \(\Lambda\)-motifs
- nlam_mat (numpy.array (square matrix)) – array containing the observations of \(\Lambda\)-motifs
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
-
check_input_matrix_is_binary
()[source]¶ Check that the input matrix is binary, i.e. entries are 0 or 1.
Raises: AssertionError – raise an error if the input matrix is not binary
-
equations
(xx)[source]¶ Return the equations of the log-likelihood maximization problem.
Note that the equations for the row-nodes depend only on the column-nodes and vice versa, see [Saracco2015].
Parameters: xx (numpy.array) – Lagrange multipliers which have to be solved Returns: equations to be solved (\(f(x) = 0\)) Return type: numpy.array
-
static
flat2triumat_dim
(k)[source]¶ Return the dimension of the matrix hosting
k
triangular elements.Parameters: k (int) – the number of elements in the upper triangular part of the corresponding square matrix, excluding the diagonal Returns: dimension of the corresponding square matrix Return type: int
-
static
flat2triumat_idx
(k, n)[source]¶ Convert an array index into the index couple of a triangular matrix.
k
is the index of an array of length \(\binom{n}{2}{2}\), which contains the elements of an upper triangular matrix of dimensionn
excluding the diagonal. The function returns the index couple \((i, j)\) that corresponds to the entryk
of the flat array.Note
- \(k \in \left[0,\ldots, \binom{n}{2} - 1\right]\)
- returned indices:
- \(i \in [0,\ldots, n - 1]\)
- \(j \in [i + 1,\ldots, n - 1]\)
Parameters: - k (int) – flattened array index
- n (int) – dimension of the square matrix
Returns: matrix index tuple (row, column)
Return type: tuple
-
get_biadjacency_matrix
(xx)[source]¶ Calculate the biadjacency matrix of the null model.
The biadjacency matrix describes the BiCM null model, i.e. the optimal average graph \(<G>^*\) with the average link probabilities \(<G>^*_{rc} = p_{rc}\) , \(p_{rc} = \frac{x_r \cdot x_c}{1 + x_r\cdot x_c}.\) \(x\) are the solutions of the equation system which has to be solved for the null model. Note that \(r\) and \(c\) are taken from opposite bipartite node sets, thus \(r \neq c\).
Parameters: xx (numpy.array) – solutions of the equation system (Lagrange multipliers) Returns: biadjacency matrix of the null model Return type: numpy.array Raises: ValueError – raise an error if \(p_{rc} < 0\) or \(p_{rc} > 1\) for any \(r, c\)
-
get_lambda_motif_block
(mm, k1, k2)[source]¶ Return a subset of \(\Lambda\)-motifs as observed in
mm
.Given the binary input matrix
mm
, count the number of \(\Lambda\)-motifs for all the node couples specified by the interval \(\left[k_1, k_2\right[\).Note
- The \(\Lambda\)-motifs are counted between the row-nodes
of the input matrix
mm
. - If \(k_2 \equiv \binom{mm.shape[0]}{2}\), the interval becomes \(\left[k_1, k_2\right]\).
Parameters: - mm (numpy.array) – binary matrix
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
Returns: array of observed \(\Lambda\)-motifs
Return type: numpy.array
- The \(\Lambda\)-motifs are counted between the row-nodes
of the input matrix
-
get_plambda_block
(biad_mat, k1, k2)[source]¶ Return a subset of the \(\Lambda\) probability matrix.
Given the biadjacency matrix
biad_mat
with \(\mathbf{M}_{rc} = p_{rc}\), which describes the probabilities of row-noder
and column-nodec
being linked, the method returns the matrix\(P(\Lambda)_{ij} = \left(M_{i\alpha_1} \cdot M_{j\alpha_1}, M_{i\alpha_2} \cdot M_{j\alpha_2}, \ldots\right),\)
for all the node couples in the interval \(\left[k_1, k_2\right[\). \((i, j)\) are two row-nodes of
biad_mat
and \(\alpha_k\) runs over the nodes in the opposite layer.Note
- The probabilities are calculated between the row-nodes of the
input matrix
biad_mat
. - If \(k_2 \equiv \binom{biad\_mat.shape[0]}{2}\), the interval becomes \(\left[k1, k2\right]\).
Parameters: - biad_mat (numpy.array) – biadjacency matrix
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
Returns: \(\Lambda\)-motif probability matrix
Return type: numpy.array
- The probabilities are calculated between the row-nodes of the
input matrix
-
get_pvalues_q
(plam_mat, nlam_mat, k1, k2, parallel=True)[source]¶ Calculate the p-values of the observed \(\Lambda\)-motifs.
For each number of \(\Lambda\)-motifs in
nlam_mat
for the node interval \(\left[k1, k2\right[\), construct the Poisson Binomial distribution using the corresponding probabilities inplam_mat
and calculate the p-value.Parameters: - plam_mat (numpy.array (square matrix)) – array containing the list of probabilities for the single observations of \(\Lambda\)-motifs
- nlam_mat (numpy.array (square matrix)) – array containing the observations of \(\Lambda\)-motifs
- k1 (int) – lower interval limit
- k2 (int) – upper interval limit
- parallel (bool) – if
True
, the calculation is executed in parallel; ifFalse
, only one process is started
-
get_triup_dim
(bip_set)[source]¶ Return the number of possible node couples in
bip_set
.Parameters: bip_set (bool) – selects row-nodes ( True
) or column-nodes (False
)Returns: return the number of node couple combinations corresponding to the layer bip_set
Return type: int Raises: ValueError – raise an error if the parameter bip_set
is neitherTrue
norFalse
-
jacobian
(xx)[source]¶ Return a NumPy array with the Jacobian of the equation system.
Parameters: xx (numpy.array) – Lagrange multipliers which have to be solved Returns: Jacobian Return type: numpy.array
-
lambda_motifs
(bip_set, parallel=True, filename=None, delim='\t', binary=True, num_chunks=4)[source]¶ Calculate and save the p-values of the \(\Lambda\)-motifs.
For each node couple in the bipartite layer specified by
bip_set
, calculate the p-values of the corresponding \(\Lambda\)-motifs according to the link probabilities in the biadjacency matrix of the BiCM null model.The results can be saved either as a binary
.npy
or a human-readable.csv
file, depending onbinary
.Note
- The total number of p-values that are calculated is split into
num_chunks
chunks, which are processed sequentially in order to avoid memory allocation errors. Note that a larger value ofnum_chunks
will lead to less memory occupation, but comes at the cost of slower processing speed. - The output consists of a one-dimensional array of p-values. If
the bipartite layer
bip_set
containsn
nodes, this means that the array will contain \(\binom{n}{2}\) entries. The indices(i, j)
of the nodes corresponding to entryk
in the array can be reconstructed using the methodBiCM.flat2_triumat_idx()
. The number of nodesn
can be recovered from the length of the array withBiCM.flat2_triumat_dim()
- If
binary == False
, thefilename
should end with.csv
. Ifbinary == True
, it will be saved in binary NumPy.npy
format and the suffix.npy
will be appended automatically. By default, the file is saved in binary format.
Parameters: - bip_set (bool) – select row-nodes (
True
) or column-nodes (False
) - parallel (bool) – select whether the calculation of the p-values should
be run in parallel (
True
) or not (False
) - filename (str) – name of the output file
- delim (str) – delimiter between entries in the
.csv``file, default is ``\t
- binary (bool) – if
True
, the file will be saved in the binary NumPy format.npy
, otherwise as.csv
- num_chunks (int) – number of chunks of p-value calculations that are performed sequentially
Raises: ValueError – raise an error if the parameter
bip_set
is neitherTrue
norFalse
- The total number of p-values that are calculated is split into
-
make_bicm
(x0=None, method='hybr', jac=None, tol=None, callback=None, options=None)[source]¶ Create the biadjacency matrix of the BiCM null model.
Solve the log-likelihood maximization problem to obtain the BiCM null model which respects constraints on the degree sequence of the input matrix.
The problem is solved using
scipy
‘s root function with the solver defined bymethod
. The status of the solver after running ``scipy.root``and the difference between the network and BiCM degrees are printed in the console.The default solver is the modified Powell method
hybr
. Least-squares can be chosen withmethod='lm'
for the Levenberg-Marquardt approach.Depending on the solver, keyword arguments
kwargs
can be passed to the solver. Please refer to the scipy.optimize.root documentation for detailed descriptions.Note
It can happen that the solver
method
used byscipy.root
does not converge to a solution. In this case, please try anothermethod
or different initial conditions and refer to the scipy.optimize.root documentation.Parameters: - x0 (1d numpy.array, optional) – initial guesses for the solutions. The first entries are the initial guesses for the row-nodes, followed by the initial guesses for the column-nodes.
- method (str, optional) –
type of solver, default is ‘hybr’. For other solvers, see the scipy.optimize.root documentation.
- jac (bool or callable, optional) – Jacobian of the system
- tol (float, optional) – tolerance for termination. For detailed control, use solver-specific options.
- callback (function, optional) – optional callback function to be called at
every iteration as
callback(self.equations, x)
, seescipy.root
documentation - options (dict, optional) – a dictionary of solver options, e.g.
xtol
ormaxiter
, see scipy.root documentation - kwargs – solver-specific options, please refer to the SciPy documentation
Raises: ValueError – raise an error if not enough initial conditions are provided
-
outqueue2pval_mat
(nprocs, pvalmat)[source]¶ Put the results from the out-queue into the p-value array.
-
print_max_degree_differences
()[source]¶ Print the maximal differences between input network and BiCM degrees.
Check that the degree sequence of the solved BiCM null model graph corresponds to the degree sequence of the input graph.
-
static
save_array
(mat, filename, delim='\t', binary=False)[source]¶ Save the array
mat
in the filefilename
.The array can either be saved as a binary NumPy
.npy
file or as a human-readable.npy
file.Note
- The relative path has to be provided in the filename, e.g. ../data/pvalue_matrix.csv.
- If
binary==True
, NumPy automatically appends the format ending.npy
to the file.
Parameters: - mat (numpy.array) – array
- filename (str) – name of the output file
- delim (str) – delimiter between values in file
- binary (bool) – if
True
, save as binary.npy
, otherwise as a.csv
file
-
save_biadjacency
(filename, delim='\t', binary=False)[source]¶ Save the biadjacendy matrix of the BiCM null model.
The matrix can either be saved as a binary NumPy
.npy
file or as a human-readable.csv
file.Note
- The relative path has to be provided in the filename, e.g. ../data/pvalue_matrix.csv.
- If
binary==True
, NumPy automatically appends the format ending.npy
to the file.
Parameters: - filename (str) – name of the output file
- delim (str) – delimiter between values in file
- binary (bool) – if
True
, save as binary.npy
, otherwise as a.csv
file
-
set_degree_seq
()[source]¶ Return the node degree sequence of the input matrix.
Returns: node degree sequence [degrees row-nodes, degrees column-nodes] Return type: numpy.array Raises: AssertionError – raise an error if the length of the returned degree sequence does not correspond to the total number of nodes
-
solve_equations
(x0=None, method='hybr', jac=None, tol=None, callback=None, options=None)[source]¶ Solve the system of equations of the maximum log-likelihood problem.
The system of equations is solved using
scipy
‘s root function with the solver defined bymethod
. The solutions correspond to the Lagrange multipliers\[x_i = \exp(-\theta_i).\]Depending on the solver, keyword arguments
kwargs
can be passed to the solver. Please refer to the scipy.optimize.root documentation for detailed descriptions.The default solver is the modified Powell method
hybr
. Least-squares can be chosen withmethod='lm'
for the Levenberg-Marquardt approach.Note
It can happen that the solver
method
used byscipy.root
does not converge to a solution. In this case, please try anothermethod
or different initial conditions and refer to the scipy.optimize.root documentation.Parameters: - x0 (1d numpy.array, optional) – initial guesses for the solutions. The first entries are the initial guesses for the row-nodes, followed by the initial guesses for the column-nodes.
- method (str, optional) –
type of solver, default is ‘hybr’. For other solvers, see the scipy.optimize.root documentation.
- jac (bool or callable, optional) – Jacobian of the system
- tol (float, optional) – tolerance for termination. For detailed control, use solver-specific options.
- callback (function, optional) – optional callback function to be called at
every iteration as
callback(self.equations, x)
, seescipy.root
documentation - options (dict, optional) – a dictionary of solver options, e.g.
xtol
ormaxiter
, see scipy.root documentation - kwargs – solver-specific options, please refer to the SciPy documentation
Returns: solution of the equation system
Return type: scipy.optimize.OptimizeResult
Raises: ValueError – raise an error if not enough initial conditions are provided
-
split_range
(n, m=4)[source]¶ Split the interval \(\left[0,\ldots, n\right]\) in
m
parts.Parameters: - n (int) – upper limit of the range
- m – number of part in which range should be split
Returns: delimiter indices for the
m
partsReturn type: list
-
test_average_degrees
(eps=0.01)[source]¶ Test the constraints on the node degrees.
Check that the degree sequence of the solved BiCM null model graph corresponds to the degree sequence of the input graph.
Parameters: eps (float) – maximum difference between degrees of the real network and the BiCM
-
static
triumat2flat_dim
(n)[source]¶ Return the size of the triangular part of a
n x n
matrix.Parameters: n (int) – the dimension of the square matrix Returns: number of elements in the upper triangular part of the matrix (excluding the diagonal) Return type: int
-
static
triumat2flat_idx
(i, j, n)[source]¶ Convert an matrix index couple to a flattened array index.
Given a square matrix of dimension
n
and the index couple(i, j)
of the upper triangular part of the matrix, return the index which the matrix element would have in a flattened array.Note
- \(i \in [0, ..., n - 1]\)
- \(j \in [i + 1, ..., n - 1]\)
- returned index \(\in [0,\, n (n - 1) / 2 - 1]\)
Parameters: - i (int) – row index
- j (int) – column index
- n (int) – dimension of the square matrix
Returns: flattened array index
Return type: int
-