Last updated March 2, 2021
In AI Mysteries

Guide to Open Federated Learning (OpenFL) – An Intel’s Python Framework

Published on March 2, 2021

by Nikita Shiledarbaxi

Open Federated Learning (OpenFL) is a Python 3 library designed for implementing a federated learning approach in Machine Learning experiments. The framework was developed by Intel Labs and Intel Internet of Things Group.

If you are unfamiliar with the term ‘federated learning’, read ‘What is federated learning?’, ‘How does it work?’ and ‘Advantages of federated learning approach’ sections of this article before proceeding.

Two major components of OpenFL

Collaborator (can use any deep learning framework such as PyTorch or TensorFlow)
Aggregator (framework agnostic)

Before going into the open-source project’s details, let us briefly overview these and some other underlying terminologies.

Basic terminologies

Collaborators: The only client in the federated learning system who can access local training, validation and test datasets is a collaborator. The local dataset should never leave the collaborator.

Parameter Server: It sends a global model to the collaborators.

Aggregator: It receives locally-tuned models from the collaborators. It then combines those models (typically using federated averaging algorithms) into a new global model.

Federation round: It is the interval where an aggregation is performed. Within a single round, a collaborator may carry out local training on the model for multiple epochs.

Federated Learning (FL) plan: An FL plan defines the following aspects:

Address of the aggregator
Global model to be sent to the collaborators
Federation parameters like encryption to be used for network connections and the number of federation rounds

Image source: Official website

FL plan – the base of an OpenFL experiment

The overall design of the OpenFL library centres around the Federated Learning (FL) Plan. A YAML file defines the collaborators, aggregator, connections, models, data, and any other parameters that describe how the model training process will evolve.

Practical implementation

Requirement: Python 3.6 or higher version

Here’s a demonstration of OpenFL implemented in Google colab with Python 3.7.10, Torch 1.7.1, Torchvision 0.8.2 and OpenFL 1.0 versions. Step-wise explanation of the code is as follows:

Installation of OpenFL package

!pip install openfl

Install the dependencies (torch, torchvision) and MNIST dataset

!pip install torch torchvision mnist

Import required libraries and classes

 import numpy as np
 import torch
 import torch.nn as nn
 import torch.nn.functional as F
 import torch.optim as optim
 import torchvision
 import torchvision.transforms as transforms
 import openfl.native as fx 
 from openfl.federated import FederatedModel,FederatedDataSet

Setup default workspace

fx.init('torch_cnn_mnist')

The available workspace templates can be found here. We have used ‘Torch_cnn_mnist’ workspace, which comprises a PyTorch CNN model. It downloads the MNIST dataset and gets trained in a federation.

The structure of the workspace directory can be seen in the output as follows:

Define a function to form a one-hot representation of output labels

 def one_hot(labels, classes):
     return np.eye(classes)[labels]
 #numpy.eye() returns 1’s as the diagonal elements and 0’s elsewhere

Carry out image transformations

torchvision.transforms enables performing several common image transformations. Using torchvision.transforms.Compose(), many transforms (mentioned as parameter) can be chained together.

We are applying two transformations:

(i) transforms.ToTensor()converts a PIL image and numpy.ndarray to a

tensor.

(ii) transforms.Normalize() normalizes the image with mean and standard deviation provided as parameters.

Among the parameters provided to transforms.Normalize() in the above line of code. The two sequences represent means and standard deviations for each channel, respectively.

 trf = transforms.Compose(
     [transforms.ToTensor(),
      transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Prepare the training data

train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=trf)

The training images will be downloaded from the ‘data’ directory, and transformation defined by ‘trf’ in step (6) will be applied to them.

Separate out training images and their output labels

 t_img,t_label = train.train_data,np.array(train.train_labels)
 #numpy.expand_dims() expands the shape of t_img array
 #torch.from_numpy() forms a tensor from a numpy.ndarray
 t_img = torch.from_numpy(np.expand_dims(t_img,axis=1)).float()
 #Form one-hot encoding representation of labels by calling one_hot()  
 #function defined in step (5)
 t_label = one_hot(t_label,10) #10 is number of output classes

Prepare the data for validation.

 valid = torchvision.datasets.MNIST(root='./data',train=False,                    
                           download=True,transform=trf)

Separate validation images and their labels as done for training set in step (8)

 v_img,v_label = valid.test_data, np.array(valid.test_labels)
 v_img = torch.from_numpy(np.expand_dims(v_img, axis=1)).float()
 v_label = one_hot(v_label,10)

FederatedDataSet() method of openfl.federated module wraps in-memory NumPy datasets and includes a setup function that splits the data into N mutually-exclusive chunks for each collaborator participating in the experiment.

 feature_shape = t_img.shape[1]
 classes = 10
 #Prepare data for federation
 data = FederatedDataSet(t_img,t_label,v_img,v_label,batch_size=32,
 num_classes=classes)

Define the neural network layers

 class Net(nn.Module):
     def __init__(self):
         super(Net, self).__init__()
         self.conv1 = nn.Conv2d(1, 16, 3) #1st convolution
         self.pool = nn.MaxPool2d(2, 2) #pooling layer 
         self.conv2 = nn.Conv2d(16, 32, 3) #2nd convolution
 #1st fully connected layer
         self.fc1 = nn.Linear(32 * 5 * 5, 32) 
         self.fc2 = nn.Linear(32, 84) #2nd fully connected layer
         self.fc3 = nn.Linear(84, 10) #3rd fully connected layer
     def forward(self, x):  #forward propagation
         x = self.pool(F.relu(self.conv1(x)))#pooling layer
         x = self.pool(F.relu(self.conv2(x))) #pooling layer
         x = x.view(x.size(0),-1)
         x = F.relu(self.fc1(x)) #activation for 1st FC layer
         x = F.relu(self.fc2(x)) #activation for 2nd FC layer
         x = self.fc3(x)
         return F.log_softmax(x, dim=1)
 #torch.nn.functional.log_softmax() applies softmax followed by logarithm

Define optimizer

opt = lambda x: optim.Adam(x, lr=1e-4) #’lr’ is the learning rate

Define a function for binary cross entropy metric

 def cross_entropy(output, target):
     return F.binary_cross_entropy_with_logits(input=output,target=target)

Build the model using FederatedModel() method of openfl.federated module.It defines the network definition and associated forward function, lambda optimizer method that can be set to a new instantiated network and the loss function.

 fmodel = FederatedModel(build_model=Net,optimizer=opt,
 loss_fn=cross_entropy,data_loader=data)

Build collaborator models

 c_models = fmodel.setup(num_collaborators=2)
 #Define which collaborators will take part in the experiment
 collaborators = {'one':c_models[0],'Two':c_models[1]}

Check training and validation data sizes for original dataset and those of the two collaborators

 #Original MNIST dataset
 print(f'Original training data size: {len(t_img)}')
 print(f'Original validation data size: {len(v_img)}\n')
 #1st collaborator’s data
 print(f'Collaborator 1\'s training data size: {len(c_models[0]
 .data_loader.X_train)}')
 print(f'Collaborator 1\'s validation data size: {len(c_models[0]
 .data_loader.X_valid)}\n')
 #2nd collaborator's data
 print(f'Collaborator 2\'s training data size: {len(c_models[1]
 .data_loader.X_train)}')
 print(f'Collaborator 2\'s validation data size: {len(c_models[1]
 .data_loader.X_valid)}\n')

Sample output:

 Original training data size: 60000
 Original validation data size: 10000

 Collaborator 1's training data size: 30000
 Collaborator 1's validation data size: 5000

 Collaborator 2's training data size: 30000
 Collaborator 2's validation data size: 5000

Get the current values of the FL plan.

 import json
 #fx.get_plan() command returns all the plan values that can be set
 print(json.dumps(fx.get_plan(), indent=4, sort_keys=True))

Run the experiment, return trained FederatedModel

According to the output of step (18), the experiment will run 10 times. We can change it using ‘aggregator.settings.rounds_to_train’ parameter.

 final_model = fx.run_experiment(collaborators,{'aggregator.settings
 .rounds_to_train':5})

Sample condensed output:

For each round of the experiment, the output shows accuracy for locally tuned model and aggregated model as shown above.

Code source: GitHub
Google colab notebook of the above implementation can be found here

References

PS: The story was written using a keyboard.

Access all our open Survey & Awards Nomination forms in one place

Nikita Shiledarbaxi

A zealous learner aspiring to advance in the domain of AI/ML. Eager to grasp emerging techniques to get insights from data and hence explore realistic Data Science applications as well.

Google Takes Leap Forward in Robotics with RT-2

Google Returns to Federated Learning Over Privacy Concerns

Apple Gives More Power to Users With End-to-End Encryption

Transfer learning vs federated learning: A comparative analysis

Council Post: Overcoming the cyclical challenge of data utility and data privacy through Federated Learning

How can Federated learning be used for speech emotion recognition?

Download our Mobile App

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

3 Ways to Join our Community

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Recent Stories

India is Making its Own AI Servers

Pritam Bordoloi

PLI scheme marks the beginning of India ‘s manufacturing venture

GPT-5 Likely to be Released After the US Elections

Donna Eva

Generative AI Jobs in India can Fetch You up to Rs 1 Crore

Siddharth Jindal

Top Editorial Picks

Elon Musk Set to Meet Indian Spacetech Startups During Upcoming Visit

Shyam Nandan Upadhyay

Happiest Minds Technologies Acquires Macmillan Learning India, Expands Edutech Reach

Shritama Saha

Meta Releases Llama 3, Beats Claude 3 Sonnet and Gemini Pro 1.5

Mohit Pandey

Nothing Becomes the First Smartphone Company to Integrate OpenAI’s ChatGPT

Siddharth Jindal

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Featured

Enhancing AI Integration through Optimal Data Management in the Global Convenience Food and Beverage Sector

Through the implementation of advanced data management methodologies, resilient data observability solutions, and cutting-edge AI frameworks, Course5 is spearheading the