MITB Banner

Guide to pgmpy: Probabilistic Graphical Models with Python Code

Share

pgmpy

Probabilistic Graphical Models(PGM) are a very solid way of representing joint probability distributions on a set of random variables. It allows users to do inferences in a computationally efficient way. PGM makes use of independent conditions between the random variables to create a graph structure representing the relationships between different random variables. Further, we can calculate the joint probability distribution of these variables by combining various parameters taken from the graph.

What are the types of Graph Models?


Mainly, there are two types of Graph models:
Bayesian Graph Models:  These models consist of Directed-Cyclic Graph(DAG) and there is always a conditional probability associated with the random variables. These types of models represent causation between the random variables.
Markov Graph Models:  These models are undirected graphs and represent non-causal relationships between the random variables.

pgmpy is a python framework to work with these types of graph models. Several graph models and inference algorithms are implemented in pgmpy. Pgmpy also allows users to create their own inference algorithm without getting into the details of the source code of it. Let’s get started with the implementation part.

Requirements 

Installation

Install pgmpy via pyPI

!pip install pgmpy

pgmpy Demo – Create Bayesian Network

In this demo, we are going to create a Bayesian Network. Bayesian networks use conditional probability to represent  each node and are parameterized by it. For example : for each node is represented as P(node| Pa(node)) where Pa(node) is the parent node in the network.

An example of a student-model is shown below, we are going to implement it using pgmpy python library.

  1. Import the required methods from pgmpy.
 from pgmpy.models import BayesianModel
 from pgmpy.factors.discrete import TabularCPD 
  1. Initialize the model by passing the edge list as shown below.
 # Defining the model structure. We can define the network by just passing a list of edges.
 model = BayesianModel([('D', 'G'), ('I', 'G'), ('G', 'L'), ('I', 'S')]) 

Define all the conditional probabilities  tables as shown in the diagram above. These CPD’s are formed by a method in pgmpy called TabularCPD. 

 # Defining individual CPDs.
 cpd_d = TabularCPD(variable='D', variable_card=2, values=[[0.6], [0.4]])
 cpd_i = TabularCPD(variable='I', variable_card=2, values=[[0.7], [0.3]])
 # The representation of CPD in pgmpy is a bit different than the CPD shown in the above picture. In pgmpy the columns
 # are the evidence and rows are the states of the variable. ##represents P(grade|diff, intel)
 cpd_g = TabularCPD(variable='G', variable_card=3, 
                    values=[[0.3, 0.05, 0.9,  0.5],
                            [0.4, 0.25, 0.08, 0.3],
                            [0.3, 0.7,  0.02, 0.2]],
                   evidence=['I', 'D'],
                   evidence_card=[2, 2])
 cpd_l = TabularCPD(variable='L', variable_card=2, 
                    values=[[0.1, 0.4, 0.99],
                            [0.9, 0.6, 0.01]],
                    evidence=['G'],
                    evidence_card=[3])
 cpd_s = TabularCPD(variable='S', variable_card=2,
                    values=[[0.95, 0.2],
                            [0.05, 0.8]],
                    evidence=['I'],
                    evidence_card=[2])
 Add CPD’s(defined above) to the initialized model.
 # Associating the CPDs with the network
 model.add_cpds(cpd_d, cpd_i, cpd_g, cpd_l, cpd_s)
 Verify the above network by using a check_model() method. If it sum up to 1, means the CPD’s are defined correctly.
 # check_model checks for the network structure and CPDs and verifies that the CPDs are correctly 
 # defined and sum to 1.
 model.check_model() 
  1. In the above step, we haven’t provided the state name so pgmpy automatically initialized all the states as 0,1,2,…., so on but it also provides a method of exclusively setting the states. An example of this is shown below. And the whole code snippet is available here.
 cpd_g_sn = TabularCPD(variable='G', variable_card=3, 
                       values=[[0.3, 0.05, 0.9,  0.5],
                               [0.4, 0.25, 0.08, 0.3],
                               [0.3, 0.7,  0.02, 0.2]],
                       evidence=['I', 'D'],
                       evidence_card=[2, 2],
                       state_names={'G': ['A', 'B', 'C'],
                                    'I': ['Dumb', 'Intelligent'],
                                    'D': ['Easy', 'Hard']}) 
  1. Print the CPD’s for no-states defined by simply using the print command and exclusively defined states by using the get_cpds method. The code is available here. The output is shown below.
  1. Next is to find independencies in the given bayesian network. There are types of independencies defined by the Bayesian Network.

Local Independencies : A variable which is independent of its non-descendents given its parents. It can be defined as P( X ⊥ NonDesc(X) | Pa(X)), where NonDesc(X) is the set of variables which are not descendents of X and Pa(X) is the set of variables which are parents of X

 # Getting the local independencies of a variable.
 model.local_independencies('G') 

Or,

 # Getting all the local independencies in the network.
 model.local_independencies(['D', 'I', 'S', 'G', 'L']) 

Global Independencies : There are many  structures possible for global independencies. For two nodes, there are two ways it can be connected. 

In the above two cases it is obvious that change in any of the nodes will affect the other. Similar cases can be shown for three nodes.

  1. Inference from bayesian models. In this step, we will predict values from the Bayesian Model discussed above. We are going to use Variable Elimination, a very basic method for inference. For example, we will compute the probability of G by marginalizing over all the other variables. The python code for this is given below.
 from pgmpy.inference import VariableElimination
 infer = VariableElimination(model)
 g_dist = infer.query(['G'])
 print(g_dist) 

For computing the conditional distribution such as P(G | D=0, I=1), we need to pass an extra argument.

print(infer.query(['G'], evidence={'D': 'Easy', 'I': 'Intelligent'}))

  1. In this step, we will predict the values for new data points . The difference between step 6 and this step is, we are now interested in getting the most probable state of the variable instead of calculating probability distribution. In pgmpy this is known as MAP query. Here’s an example:

infer.map_query(['G'])

Or,

infer.map_query(['G'], evidence={'D': 'Easy', 'I': 'Intelligent'})

You can check the full demo here.

pgmpy Demo – Extensibility 

As discussed above, pgmpy provides a method to create your own inference algorithm. In this demo, we are going to discuss the same. pgmpy contains methods like :

  • BaseInference for inference
  •  BaseFactor for model parameters
  •  BaseEstimators for parameter and model learning 
  • For adding new features, create a new class which inherits a base class and the we can just simply use other functionality of pgmpy with this new class. 

Following are the steps:

  1. Import all the required methods and packages.
 # A simple Exact inference algorithm
 import itertools
 from pgmpy.inference.base import Inference
 from pgmpy.factors import factor_product 
  1. Define your own inference class, by passing the base class from pgmpy. For this particular algorithm, we will multiply all the factors/CPD of the network and marginalize over variables to get the desired query.
 class SimpleInference(Inference):
     # By inheriting Inference we can use self.model, self.factors and self.cardinality in our class
     def query(self, var, evidence):
         # self.factors is a dict of the form of {node: [factors_involving_node]}
         factors_list = set(itertools.chain(*self.factors.values()))
         product = factor_product(*factors_list)
         reduced_prod = product.reduce(evidence, inplace=False)
         reduced_prod.normalize()
         var_to_marg = set(self.model.nodes()) - set(var) - set([state[0] for state in evidence])
         marg_prod = reduced_prod.marginalize(var_to_marg, inplace=False)
         return marg_prod 
  1. Now, like discussed in the above model, we will initialize the bayesian model, prepare all the conditional probability (for all variables) and then add it to the initialized model.
 # Defining a model
 from pgmpy.models import BayesianModel
 from pgmpy.factors.discrete import TabularCPD
 model = BayesianModel([('A', 'J'), ('R', 'J'), ('J', 'Q'), ('J', 'L'), ('G', 'L')])
 cpd_a = TabularCPD('A', 2, values=[[0.2], [0.8]])
 cpd_r = TabularCPD('R', 2, values=[[0.4], [0.6]])
 cpd_j = TabularCPD('J', 2, values=[[0.9, 0.6, 0.7, 0.1],
                                    [0.1, 0.4, 0.3, 0.9]],
                   evidence=['A', 'R'], evidence_card=[2, 2])
 cpd_q = TabularCPD('Q', 2, values=[[0.9, 0.2], [0.1, 0.8]],
                   evidence=['J'], evidence_card=[2])
 cpd_l = TabularCPD('L', 2, values=[[0.9, 0.45, 0.8, 0.1],
                                    [0.1, 0.55, 0.2, 0.9]],
                   evidence=['J', 'G'], evidence_card=[2, 2])
 cpd_g = TabularCPD('G', 2, values=[[0.6], [0.4]])
 model.add_cpds(cpd_a, cpd_r, cpd_j, cpd_q, cpd_l, cpd_g) 
  1. Now, calculate the inference from your customized inference algorithm and compare it with the VariableElimination method.
 # Doing inference with our SimpleInference
 infer = SimpleInference(model)
 a = infer.query(var=['A'], evidence=[('J', 0), ('R', 1)]) 

You can check the full demo here.

Conclusion

In this article, we have discussed the pgmpy python library which provides a simple API for working with Graphical models(bayesian model, markov model,etc. It is highly modular and quite extensible.

Official codes, Docs & Tutorials are available at:

Share
Picture of Aishwarya Verma

Aishwarya Verma

A data science enthusiast and a post-graduate in Big Data Analytics. Creative and organized with an analytical bent of mind.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.