Classes

class BiMMSBM.core.metadata_layer(lambda_val, meta_name)[source]

Bases: object

Principal class of nodes_layer metadata. It contains extra information about the nodes.

It has two subclasses:
  • exclusive_metadata

  • inclusive_metadata

Parameters:
  • lambda_val (float) – A parameter representing a lambda value.

  • meta_name (str) – Name of the metadata.

Variables:
  • lambda_val (float) – Metadata visibility parameter

  • meta_name (str) – Name of the metadata column in the node_layer class

  • N_att (int) – Number of different categorical attributes of the metadata.

  • dict_codes (dict) – A dictionary to store codes related to the metadata. Codes are integers ranged from 0 to N_att-1.

  • links (2D NumPy array) – Array representing links between nodes and metadata using its codes.

__len__()[source]

Returns the number of different categorical attributes.

__str__()[source]

Returns the name of the metadata.

Notes

This class provides a structure to manage metadata associated with nodes.

property N_att

Number of different categorical attributes of the metadata.

Returns:

Number of different categorical attributes

Return type:

int

__init__(lambda_val, meta_name)[source]

Initialize the MetadataLayer instance.

Parameters:
  • lambda_val (float) – Parameter that represent the importance of the metadata when the model is inferred.

  • meta_name (str) – Name of the metadata.

__len__()[source]

Returns the number of different categorical attributes.

Returns:

Number of different categorical attributes

Return type:

int

__str__()[source]

Returns the name of the metadata.

Returns:

Name of the metadata

Return type:

str

property dict_codes

A dictionary property to store codes related to the metadata.

Returns:

Dictionary containing codes related to the metadata.

Return type:

dict

Array representing links between nodes and metadata using its codes.

Returns:

2D array containing the links between nodes and metadata

Return type:

np.array

class BiMMSBM.core.exclusive_metadata(lambda_val, meta_name)[source]

Bases: metadata_layer

Class for handling exclusive metadata in a nodes layer.

This class inherits from metadata_layer and adds functionality for handling exclusive metadata, where each node can only have one attribute from a set of possible attributes.

Variables:
  • lambda_val (float) – Metadata visibility parameter

  • meta_name (str) – Name of the metadata column in the node_layer class

  • N_att (int) – Number of different categorical attributes of the metadata.

  • dict_codes (dict) – A dictionary to store codes related to the metadata. Codes are integers ranged from 0 to N_att-1.

  • links (2D NumPy array) – Array representing links between nodes and metadata using its codes.

  • qka (np.array) – Probability matrix between groups and attributes

  • masks_att_list (list) – List of arrays of ints where the array number att has all the index positions of links that connects the attribute att

__len__()

Returns the number of different categorical attributes.

__str__()

Returns the name of the metadata.

__init__(lambda_val, meta_name)[source]

Initialization of the exclusive_metadata class

Parameters:
  • lambda_val (float) – Metadata visibility

  • meta_name (str) – Name of the metadata column in the node_layer class

property qka

Probability matrix between groups and attributes.

Returns:

Matrix of probabilities between groups and attributes

Return type:

np.array

class BiMMSBM.core.inclusive_metadata(lambda_val, meta_name, Tau)[source]

Bases: metadata_layer

Class for handling inclusive metadata in a nodes layer.

This class inherits from metadata_layer and adds functionality for handling inclusive metadata, where each node can have multiple attributes from a set of possible attributes.

Variables:
  • lambda_val (float) – Metadata visibility parameter

  • meta_name (str) – Name of the metadata column in the node_layer class

  • N_att (int) – Number of different categorical attributes of the metadata.

  • dict_codes (dict) – A dictionary to store codes related to the metadata. Codes are integers ranged from 0 to N_att-1.

  • links (2D NumPy array) – Array representing links between nodes and metadata using its codes.

  • Tau (int) – Number of membership groups of this metadata

  • q_k_tau (np.array) – Probability matrix between groups, membership groups and attributes

  • neighbours_meta (list) – List where the index is the attribute and the element is an array of the nodes that are connected to the same attribute.

  • masks_att_list (list) – List of arrays of ints where the array number att has all the index positions of links that connects the attribute att

  • labels (np.array) – Array of labels of the links: 0 if not connected 1 if connected

  • masks_label_list (list) – masks list to know wich links have label r (that is the index of the list).

init_q_k_tau(K, Tau)[source]

Initialization of the q_k_tau matrix.

code_inclusive_metadata(meta_row)[source]

Code the inclusive metadata that is a string of metadata separated by inclusive_metadata._separator.

__len__()

Returns the number of different categorical attributes.

__str__()

Returns the name of the metadata.

__init__(lambda_val, meta_name, Tau)[source]

Initialization of the inclusive_metadata class

Parameters:
  • lambda_val (float) – Metadata visibility

  • meta_name (str) – Name of the metadata column in the node_layer class

  • Tau (int) – Number of membership groups of this metadata

code_inclusive_metadata(meta_row)[source]

Code the inclusive metadata that is a string of metadata separated by inclusive_metadata._separator.

Parameters:

meta_row (str) – String of metadata separated by inclusive_metadata._separator

Returns:

String of metadata ids separated by inclusive_metadata._separator

Return type:

str

init_q_k_tau(K, Tau)[source]

Initialization of the q_k_tau matrix.

Parameters:
  • K (int) – Number of groups

  • Tau (int) – Number of membership groups

Returns:

Initialized q_k_tau matrix of shape (K, Tau, N_att)

Return type:

np.array

Raises:

ValueError – If K or Tau are not positive

property q_k_tau

Probability matrix between groups, membership groups and attributes.

Returns:

Matrix of probabilities between groups, membership groups and attributes

Return type:

np.array

class BiMMSBM.core.nodes_layer(K, nodes_name, nodes_info, *, separator='\t', dict_codes=None, **kwargs)[source]

Bases: object

Base class of a layer that contains nodes

Is initialized using a dataframe and can be modify it using the df attribute

The rest of the columns of the dataframe can contain information (metadata) from the nodes. This metadata can be added as a metadata_layer object considering the network as multipartite network. This metadata can be classified it as exclusive_metadata (if a node only accepts one attribute) and inclusive_metadata (if the node accepts more than one attribute)

See for more information of metadata: metadata_layer, exclusive_metadata and inclusive_metadata.

These objects can be added into a BiNet (bipartite network) where connections between nodes_layer are considered to infer links and their labels (see BiNet)

Variables:
  • K (int) – Number of memberships groups for the layer.

  • node_type (str) – Name of the layer. It corresponds with the column where the nodes’ name are contained.

  • df (pandas DataFrame) – DataFrame that contains information of the nodes. It contains one column with the nodes’ name and the rest are its metadata.

  • dict_codes (dict) – Dictionary with the integer id of the nodes. The key is the nodes’ name and the value its id.

  • dict_decodes (dict) – Dictionary with the integer id of the nodes. The key is the nodes’ id and the value its name.

  • meta_exclusives (list of metadata_layer) – List with the metadata exclusives objects that contains the metadata that will be used in the inference.

  • meta_inclusives (list of metadata_layer) – List with the metadata inclusives object that contains the metadata that will be used in the inference.

  • meta_neighbours_exclusives – Dictionaries of lists that contains, for each node its exclusives metadata neighbours.

  • meta_neighbours_inclusives – Dictionaries of lists that contains, for each node its inclusives metadata neighbours.

  • nodes_observed_inclusive – List of arrays for each metadata with the nodes that has assigned an attribute of the metadata

__contains__(node_id)[source]

Check if a node exists in the layer.

Parameters:

node_id (int) – Node identifier

Returns:

True if the node exists, False otherwise

Return type:

bool

__delitem__(metadata_name)[source]

Deletes the metadata object with the name metadata_name

Parameters:

metadata_name (str) – Name of the metadata

__getitem__(metadata_name)[source]

Returns the metadata object with the name metadata_name

Parameters:

metadata_name (str) – Name of the metadata

Returns:

metadata_layer object with the name metadata_name

Return type:

metadata_layer

__init__(K, nodes_name, nodes_info, *, separator='\t', dict_codes=None, **kwargs)[source]

Initialization of the nodes_layer class

Parameters:
  • K (int) – Number of memberships groups for the layer.

  • nodes_name (str) – Name of the nodes column in the nodes_layer class

  • nodes_info (str or pandas DataFrame) – If it is a string, it is the directory of the file with the nodes information. If it is a DataFrame, it is the DataFrame with the nodes information.

  • separator (str, default) – Separator if the columns of the file that contains the nodes information

  • dict_codes (dict, default None) – Dictionary with the integer id of the nodes. The key is the nodes’ name and the value its id.

__iter__()[source]

Returns an iterator over the nodes.

Returns:

Iterator over node IDs

Return type:

iterator

__len__()[source]

Returns the number of nodes in the layer.

Returns:

Number of nodes

Return type:

int

__setitem__(metadata_name, metadata)[source]

Sets the metadata object with the name metadata_name

Parameters:
  • metadata_name (str) – Name of the metadata

  • metadata (metadata_layer) – metadata_layer object with the name metadata_name

add_exclusive_metadata(lambda_val, meta_name, *, dict_codes=None, **kwargs)[source]

Add exclusive_metadata object to node_layer object

Parameters:
  • meta_name (str) – Name of the metadata that should be in the node dataframe

  • lambda_val (Float) – Value of the metadata visibility

  • dict_codes (dict, None, default: None) – Dictionary where the keys are the names of metadata’s type, and the values are the ids. If None, the program will generate the ids.

add_inclusive_metadata(lambda_val, meta_name, Tau, *, dict_codes=None, separator='|', **kwargs)[source]

Add inclusive_metadata object to node_layer object

Parameters:
  • meta_name (str) – Name of the metadata that should be in the node dataframe

  • lambda_val (float) – Value of the metadata visibility

  • Tau (Int) – Number of membership groups of metadata

  • separator (str, default: "|") – Separator that is used to differentiate the different metadata assigned for each node

  • dict_codes (dict, None, default: None) – Dictionary where the keys are the names of metadata’s type, and the values are the ids. If None, the program will generate the ids.

classmethod create_simple_layer(K, nodes_list, nodes_name, dict_codes=None)[source]

Create a nodes_layer object from a list or DataSeries only with the known nodes and without metadata

Parameters:
  • K (Int) – Number of membership groups of nodes_layer

  • nodes_list (array-like, DataFrame or DataSeries) – array-like, DataFrame or DataSeries with all the nodes

  • nodes_name (str) – Name of the nodes type (users, movies, metabolites…) that are or will be in DataFrame

  • dict_codes (dict, None, default: None) – Dictionary where the keys are the names of nodes, and the values are their ids. If None, the program will generate the ids.

property dict_codes
property dict_decodes
classmethod load_nodes_layer_from_file(df, json_dir='.')[source]

It loads the nodes_layer object from a JSON file

Parameters:

dir (str) – Directory where the json with the nodes_layer information is saved

read_file(filename, separator='\t')[source]

Reads the nodes information from a file and returns it as a pandas DataFrame.

Parameters:
  • filename (str) – The filename or path to the file containing nodes information.

  • separator (str, default: " ") – Separator of the nodes DataFrame. Default is “ “.

Returns:

A pandas DataFrame containing nodes information.

Return type:

DataFrame

save_nodes_layer(dir='.')[source]

It saves the nodes_layer object

Parameters:

dir (str) – Directory where the json with the nodes_layer information will be saved

update_K(K)[source]

Update the number of membership groups of nodes_layer and reinitialize the membership matrix

Parameters:

K (Int) – Number of membership groups of nodes_layer

update_N(N_nodes)[source]

Update the number of nodes and reinitialize the membership matrix

Parameters:

N_nodes (Int) – Number of nodes

update_exclusives_id(em, dict_codes)[source]

Changes the ids (the integer assigned to each metadata attribute) given the dict_codes.

Parameters:

dict_codes (dict) – Dictionary where the keys are the names of metadata’s type, and the values are the ids.

update_inclusives_id(im, dict_codes)[source]

Changes the ids (the integer assigned to each metadata attribute) given the dict_codes.

Parameters:

dict_codes (dict Dictionary where the keys are the names of metadata's type, and the values are the ids.) – If None, ids will be generated automatically.

class BiMMSBM.core.BiNet(links, links_label, *, nodes_a=None, nodes_b=None, Ka=1, nodes_a_name='nodes_a', Kb=1, nodes_b_name='nodes_b', separator='\t', dict_codes=None, dict_codes_a=None, dict_codes_b=None)[source]

Bases: object

Class of a Bipartite Network, where two layers of different types of nodes are connected (users->items, politicians->bills, patient->microbiome…) and these links can be labeled with information of the interaction (ratings, votes…).

Variables:
  • labels_name (str) – Name of the labels column

  • N_labels (int) – Number of different types labels.

  • labels_array (ndarray) – Array with all the ids of the labels.

  • labels_name – List of the names of the diferents labels.

  • labels_training – Array with all the ids of the labels used to train the MMSBM

  • df – Dataframe with the links information, who connected to who and with which label are connected

  • dict_codes (dict) – Dictionary with the integer ids of the labels. Keys are label names, and values are corresponding ids.

  • nodes_a,nodes_b (nodes_layer) – nodes_layer objects of the nodes that are part from the bipartite network.

  • links (2D-array) – 2D-array with the links of the nodes that are connected

  • links_training (2D-array) – 2D-array with the links of the nodes that are connected used to train the MMSBM

__len__()

Returns the number of different labels.

__str__()

Returns the name of the labels column.

init_EM(tol=0.001, training=None, seed=None)[source]

Initializes the EM algorithm to find the most plausible membership parameters of the MMSBM.

init_EM_from_directory(training=None, dir='.')[source]

Initializes the EM algorithm using parameters saved in files in a specified directory.

EM_step(N_steps=1)[source]

Performs N_steps steps of the EM algorithm.

get_log_likelihoods()[source]

Returns the loglikelihood of the current state of the MMSBM.

Returns the probability of each link in links.

get_predicted_labels(links=None)[source]

Returns the predicted label of each link in links.

get_accuracy(predicted_labels=None, test_labels=None, Pij=None, links=None, estimator='max_probability')[source]

Returns the accuracy of the predicted labels.

deep_copying()[source]

Returns a deep copy of the BiNet instance.

converges()[source]

Returns True if the EM algorithm has converged, False otherwise.

Notes

This class provides a structure to manage bipartite networks.

EM_step(N_steps=1)[source]

Performs the N_steps number of steps to update the model parameters.

Parameters:

N_steps (int, default: 1) – Number of EM steps to be performed. Default is 1.

Notes

This method updates the model parameters using the Expectation Maximization (EM) estimation. The Maximum a Posteriori algorithm is employed for iterative updates.

During each step, the following updates are performed: - Update of nodes_a parameters (BiNet.nodes_a.theta). - Update of exclusive_meta and inclusive_meta for nodes_a (BiNet.nodes_a.meta.theta). - Update of nodes_b parameters ((BiNet.nodes_b.theta)). - Update of exclusive_meta and inclusive_meta for nodes_b (BiNet.nodes_b.meta.theta).. - Update of link probabilities (BiNet.pkl) and omega (BiNet.omega).

After each step, a deep copy of the current model parameters is stored for convergence tracking.

It is recommended to perform multiple EM steps to refine the model parameters.

__getitem__(nodes_type)[source]

Returns the nodes_layer object of the specified type.

Parameters:

nodes_type (str) – Name of the nodes_layer object to return.

Returns:

The nodes_layer object of the specified type.

Return type:

nodes_layer

__init__(links, links_label, *, nodes_a=None, nodes_b=None, Ka=1, nodes_a_name='nodes_a', Kb=1, nodes_b_name='nodes_b', separator='\t', dict_codes=None, dict_codes_a=None, dict_codes_b=None)[source]

Initialization of a BiNet class

Parameters:
  • links (str, pandas DataFrame) – DataFrame or directory containing the links between nodes_a and nodes_b and their labels.

  • links_label (str) – Name of the column where the labels are stored in the links DataFrame.

  • nodes_a (nodes_layer, str, DataFrame, None, default: None) – One of the nodes layer that forms the bipartite network - If nodes_layer: Existing instance of the nodes_layer class representing the first layer. - If str or pd.DataFrame: If str, a directory containing the file information about nodes_a. - If pd.DataFrame, DataFrame with nodes_a information. - If None: A simple nodes_layer will be created from the information in links.

  • nodes_b (nodes_layer, str, DataFrame, None, default: None) – One of the nodes layer that forms the bipartite network - If nodes_layer: Existing instance of the nodes_layer class representing the first layer. - If str or pd.DataFrame: If str, a directory containing the file information about nodes_b. - If pd.DataFrame, DataFrame with nodes_b information. - If None: A simple nodes_layer will be created from the information in links.

  • Ka (int, default: 1) – Number of membership groups for nodes_a layer

  • Kb (int, default: 1) – Number of membership groups for nodes_b layer

  • nodes_a_name (str, default: nodes_a) – Name of the column where the names of nodes_a are in the links DataFrame and nodes_a DataFrame

  • nodes_b_name (str, default: nodes_b) – Name of the column where the names of nodes_b are in the links DataFrame and nodes_b DataFrame

  • dict_codes (dict, None, default: None) – Dictionary where the keys are the names of the labels, and the values are the ids. If None, new ids will be provided.

  • dict_codes_a (dict, None, default: None) – Dictionary where the keys are the names of the nodes from nodes_a and the values are the ids. If None, new ids will be provided.

  • dict_codes_b (dict, None, default: None) – Dictionary where the keys are the names of the nodes from nodes_b and the values are the ids. If None, new ids will be provided.

  • separator (str, default:) – Separator used to read links DataFrame file. Default is

__setitem__(nodes_type, nodes_layer)[source]

Sets the nodes_layer object of the specified type.

Parameters:
  • nodes_type (str) – Name of the nodes_layer object to set.

  • nodes_layer (nodes_layer) – The nodes_layer object to set.

add_metadata_to_node(node_id, meta_name, meta_value)[source]

Add metadata to a specific node.

Parameters:
  • node_id (int) – Node identifier

  • meta_name (str) – Name of the metadata column

  • meta_value (any) – Value of the metadata

Return type:

None

Raises:

KeyError – If node_id or meta_name doesn’t exist

add_node(node_id, node_name=None)[source]

Add a node to the layer.

Parameters:
  • node_id (int) – Unique identifier for the node

  • node_name (str, optional) – Name of the node, defaults to None

Returns:

The created node dictionary

Return type:

dict

Raises:

ValueError – If node_id already exists

add_nodes_from_list(node_list)[source]

Add multiple nodes from a list.

Parameters:

node_list (list) – List of node identifiers

Return type:

None

clear()[source]

Remove all nodes and metadata from the layer.

converges()[source]

Checks if the parameters have converged during the EM procedure.

Returns:

True if the parameters have converged, False otherwise.

Return type:

bool

Notes

Convergence is determined based on the tolerance (self.tol) set for the model.

  • Meta Convergence:
    • Checks convergence for each layer’s theta and metadata parameters.

    • Metadata parameters include zeta, q_k_tau, and omega for both inclusive and exclusive metadata.

  • Links Convergence:
    • Checks convergence for pkl (link probabilities) and omega parameters.

deep_copying()[source]

Performs a deep copy of all parameters in the EM algorithm.

Notes

This method creates deep copies of various parameters to store their current states for future reference and convergence checking.

  • Link Parameters:
    • pkl_old: Deep copy of the link probabilities (self.pkl).

    • omega_old: Deep copy of omega (self.omega).

  • Metadata parameters (for each layer):
    • theta_old: Deep copy of the layer’s theta parameters (self.theta).

    • Inclusive metadata:
      • zeta_old: Deep copy of zeta (meta.zeta).

      • q_k_tau_old: Deep copy of q_k_tau (meta.q_k_tau).

      • omega_old: Deep copy of omega (meta.omega).

    • Exclusive metadata:
      • qka_old: Deep copy of qka (meta.qka).

      • omega_old: Deep copy of omega (meta.omega).

property dict_codes

Dictionary with the integer ids of the labels. Keys are label names, and values are corresponding ids. The ids go from 0 to N_labels-1.

get_accuracy(predicted_labels=None, test_labels=None, Pij=None, links=None, estimator='max_probability')[source]
Computes the ratio of correctly predicted labels of the model given the MMSBM parameters. They can be measured by different estimators:

-max_probability: The predicted label will be the most plausible label -mean: The predicted label will be the mean

Parameters:
  • predicted_labels (array-like, default:None.) – Array-like with the predicted labels ids given by the MMSBM. If None, predictions will be generated using

  • estimator. (the specified links and)

  • test_labels (array-like, default:None.) – List or array with the observed labels. If None, labels from self.labels_array are taken given pos_test_labels

  • links (ndarray of 1 or 2 dimensions, pandas DataFrame, default: None) –

    Array with links for which label probabilities are computed. -If a 2d-array, the first column must contain the ids from nodes_a layer and the second

    column must contain the ids from nodes_b layers.

    -If a 1d-array, it must contain the positions of the links list from self.df attribute -If a pandas DataFrame, it must contain at least two columns with the name of the nodes’ layers

    and a column with the same name as the labels column from BiNet.df.

    -If None, self.links_training will be used.

estimator: {“max_probability”,”mean”}, default: max_probability

Estimator used to get the predicted labels: -max_probability: Selects the most plausible label -mean: Selects the mean label (sum [Pij(l)*l])

Returns:

accuracy – Ratio of correctly predicted labels to the total number of predicted labels.

Return type:

float

get_all_nodes_metadata(meta_name)[source]

Get metadata values for all nodes.

Parameters:

meta_name (str) – Name of the metadata column

Returns:

Dictionary mapping node IDs to their metadata values

Return type:

dict

Raises:

KeyError – If meta_name doesn’t exist

get_links_probabilities(links=None)[source]

Computes the label probabilities for links in the trained BiNet.

Parameters:

links (ndarray or DataFrame, optional, default: None) – Array or DataFrame with links for which probabilities are computed. - If 2D array, the first column should contain node IDs from nodes_a layer, and the second column from nodes_b layer. - If 1D array, it should contain positions of links in self.df attribute. - If DataFrame, it should have at least two columns with names of the nodes layers. - If None, self.links_training will be used.

Returns:

Pij_r – Pij_r[l, r] is the probability that link l has label r.

Return type:

ndarray, shape (len(links), self.N_labels)

get_log_likelihoods()[source]

It computes the log_likelihoods from every bipartite network of the multipartite network, that means the log_likelihoods of the BiNet network and the log_likelihoods of the metadata networks.

get_node_metadata(node_id, meta_name)[source]

Get metadata value for a specific node.

Parameters:
  • node_id (int) – Node identifier

  • meta_name (str) – Name of the metadata column

Returns:

The metadata value

Return type:

any

Raises:

KeyError – If node_id or meta_name doesn’t exist

get_predicted_labels(links=None, Pij=None, estimator='max_probability', to_return='df')[source]
Computes the predicted labels of the model based on the MMSBM parameters, using different estimators. They can be measured by different estimators:
  • max_probability: The predicted label will be the most plausible label

  • mean: The predicted label will be the mean

Parameters:
  • links (ndarray of 1 or 2 dimensions, pandas DataFrame, default: None) –

    Array with links for which label probabilities are computed. -If a 2d-array, the first column must contain the ids from nodes_a layer and the second

    column must contain the ids from nodes_b layers.

    -If a 1d-array, it must contain the positions of the links list from self.df attribute -If a pandas DataFrame, it must contain at least two columns with the name of the nodes’ layers

    and a column with the same name as the labels column from BiNet.df.

    -If None, self.links_training will be used.

  • Pij (ndarray, default: None) – Array with the probabilities of the links to have each label. If None, it will compute the probabilities using self.get_links_probabilities(links).

  • estimator ({"max_probability","average"}, default: max_probability) – Estimator used to get predicted labels: - “max_probability”: Selects the most plausible label. - “average”: Selects the average label (sum [Pij(l) * l]).

  • to_return ({"df","ids", "both"}, default: df) –

    Option to choose how the predicted labels will be returned.

    -“df”: Returns a DataFrame with columns for nodes from both layers and an additional column called “Predicted + self.label_name”. -“ids”: Returns a ndarray of ints with the ids of the predicted labels. -“both”: Returns both the DataFrame and the ndarray with the ids in this order.

Returns:

  • labels_id (ndarray) – Predicted labels id.

  • labels_df (pandas DataFrame) – DataFrame whose columns are nodes_a, nodes_b and the predicted labels

Notes

If Pij is provided, it will use the given probabilities; otherwise, it will compute probabilities using self.get_links_probabilities(links).

init_EM(tol=0.001, training=None, seed=None)[source]

Initialize the EM algorithm to get the most plausible membership parameters of the MMSBM

Parameters:
  • tol (float, default: 0.001) – Tolerance of the algorithm when finding the parameters.

  • seed (int, None, default: None) – Seed to generate the matrices. Is initialized using the np.random.RandomState(seed) method.

  • training (DataFrame, list, default: None) –

    • If DataFrame: DataFrame with the links used to train the MMSBM.

    • If list or ndarray: List or array containing the indexes of the links list used for training.

    • If None: Uses self.links and self.labels_array.

Notes

This method initializes the EM algorithm by setting up probability matrices (BiNet.pkl), memberships (BiNet.nodes_a.theta and BiNet.nodes_b.theta), and managing links to train. The tolerance, seed, and training data can be specified to customize the initialization process.

init_EM_from_directory(training=None, dir='.')[source]

Initialize the Expectation Maximization (EM) algorithm to obtain the most plausible membership parameters of the Mixed-Membership Stochastic Block Model (MMSBM) using parameters saved in files located in a specified directory.

dir: str, default: “.”

Directory where the files with the MMSBM parameters will be loaded.

training: pd.DataFrame, list, ndarray, default: None
  • If pd.DataFrame: DataFrame containing the training links and labels.

  • If list or ndarray: List or array containing the positions of the links list from self.df attribute.

  • If None: Uses self.links_training and self.labels_training.

classmethod load_BiNet_from_file(df_links, json_dir, layers=True, *, nodes_a=None, nodes_b=None)[source]

It loads the BiNet data from a JSON file in dir

Parameters:
  • dir (str) – Directory where the JSON with the BiNet information is saved

  • layers (bool, default: True) – If True, it loads the nodes_layer objects from the JSON file in the same directory.

classmethod load_BiNet_from_json(json_file, links, links_label, *, nodes_a=None, nodes_b=None, nodes_a_dir=None, nodes_b_dir=None, separator='\t')[source]

Load a BiNet instance from a JSON file containing MMSBM parameters and link information.

Parameters:
  • json_file (str) – Path to the JSON files containing MMSBM parameters.

  • links (str, pandas DataFrame) – DataFrame or directory containing the links between nodes_a and nodes_b and their labels.

  • links_label (array-like) – Array-like object representing the labels corresponding to the links.

  • nodes_a (nodes_layer, str, pd.DataFrame, None, default: None) –

    • If nodes_layer: Existing instance of the nodes_layer class representing the first layer.

    • If str or pd.DataFrame: If str, a name for the first layer. If pd.DataFrame, DataFrame with nodes and attributes.

    • If None: The first layer will be created later.

  • nodes_b (nodes_layer, str, pd.DataFrame, None, default: None) –

    • If nodes_layer: Existing instance of the nodes_layer class representing the second layer.

    • If str or pd.DataFrame: If str, a name for the second layer. If pd.DataFrame, DataFrame with nodes and attributes.

    • If None: The second layer will be created later as a simple layer (no metadata)

  • separator (str, default: " ") – Separator used in the provided JSON file.

Returns:

BN – Instance of the BiNet class loaded from the JSON file.

Return type:

BiNet

Notes

This class method allows loading a BiNet instance from a JSON file, along with links and labels. It constructs both nodes layers’ objects with metadata initialized based on the provided information.

remove_metadata(meta_name)[source]

Remove a metadata column from all nodes.

Parameters:

meta_name (str) – Name of the metadata column

Return type:

None

Raises:

KeyError – If meta_name doesn’t exist

remove_node(node_id)[source]

Remove a node from the layer.

Parameters:

node_id (int) – Node identifier

Return type:

None

Raises:

KeyError – If node_id doesn’t exist

save_BiNet(dir='.', layers=True)[source]

It saves the BiNet data into a JSON file in dir. If layers==True, it saves the nodes_layer objects in JSONs files in the same directory.

Parameters:
  • dir (str) – Directory where the JSON with the BiNet information will be saved

  • layers (bool, default: True) – If True, it saves the nodes_layer objects in JSONs files in the same directory.