Open Federated Learning (OpenFL) is a Python 3 library designed for implementing a federated learning approach in Machine Learning experiments. The framework was developed by Intel Labs and Intel Internet of Things Group.
If you are unfamiliar with the term ‘federated learning’, read ‘What is federated learning?’, ‘How does it work?’ and ‘Advantages of federated learning approach’ sections of this article before proceeding.
Two major components of OpenFL
- Collaborator (can use any deep learning framework such as PyTorch or TensorFlow)
- Aggregator (framework agnostic)
Before going into the open-source project’s details, let us briefly overview these and some other underlying terminologies.
Basic terminologies
- Collaborators: The only client in the federated learning system who can access local training, validation and test datasets is a collaborator. The local dataset should never leave the collaborator.
- Parameter Server: It sends a global model to the collaborators.
- Aggregator: It receives locally-tuned models from the collaborators. It then combines those models (typically using federated averaging algorithms) into a new global model.
- Federation round: It is the interval where an aggregation is performed. Within a single round, a collaborator may carry out local training on the model for multiple epochs.
- Federated Learning (FL) plan: An FL plan defines the following aspects:
- Address of the aggregator
- Global model to be sent to the collaborators
- Federation parameters like encryption to be used for network connections and the number of federation rounds
Image source: Official website
FL plan – the base of an OpenFL experiment
The overall design of the OpenFL library centres around the Federated Learning (FL) Plan. A YAML file defines the collaborators, aggregator, connections, models, data, and any other parameters that describe how the model training process will evolve.
Practical implementation
Requirement: Python 3.6 or higher version
Here’s a demonstration of OpenFL implemented in Google colab with Python 3.7.10, Torch 1.7.1, Torchvision 0.8.2 and OpenFL 1.0 versions. Step-wise explanation of the code is as follows:
- Installation of OpenFL package
!pip install openfl
- Install the dependencies (torch, torchvision) and MNIST dataset
!pip install torch torchvision mnist
- Import required libraries and classes
import numpy as np import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import torchvision import torchvision.transforms as transforms import openfl.native as fx from openfl.federated import FederatedModel,FederatedDataSet
- Setup default workspace
fx.init('torch_cnn_mnist')
The available workspace templates can be found here. We have used ‘Torch_cnn_mnist’ workspace, which comprises a PyTorch CNN model. It downloads the MNIST dataset and gets trained in a federation.
The structure of the workspace directory can be seen in the output as follows:
- Define a function to form a one-hot representation of output labels
def one_hot(labels, classes): return np.eye(classes)[labels] #numpy.eye() returns 1’s as the diagonal elements and 0’s elsewhere
- Carry out image transformations
torchvision.transforms enables performing several common image transformations. Using torchvision.transforms.Compose(), many transforms (mentioned as parameter) can be chained together.
We are applying two transformations:
(i) transforms.ToTensor()converts a PIL image and numpy.ndarray to a
tensor.
(ii) transforms.Normalize() normalizes the image with mean and standard deviation provided as parameters.
Among the parameters provided to transforms.Normalize() in the above line of code. The two sequences represent means and standard deviations for each channel, respectively.
trf = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
- Prepare the training data
train = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=trf)
The training images will be downloaded from the ‘data’ directory, and transformation defined by ‘trf’ in step (6) will be applied to them.
- Separate out training images and their output labels
t_img,t_label = train.train_data,np.array(train.train_labels) #numpy.expand_dims() expands the shape of t_img array #torch.from_numpy() forms a tensor from a numpy.ndarray t_img = torch.from_numpy(np.expand_dims(t_img,axis=1)).float() #Form one-hot encoding representation of labels by calling one_hot() #function defined in step (5) t_label = one_hot(t_label,10) #10 is number of output classes
- Prepare the data for validation.
valid = torchvision.datasets.MNIST(root='./data',train=False, download=True,transform=trf)
- Separate validation images and their labels as done for training set in step (8)
v_img,v_label = valid.test_data, np.array(valid.test_labels) v_img = torch.from_numpy(np.expand_dims(v_img, axis=1)).float() v_label = one_hot(v_label,10)
- FederatedDataSet() method of openfl.federated module wraps in-memory NumPy datasets and includes a setup function that splits the data into N mutually-exclusive chunks for each collaborator participating in the experiment.
feature_shape = t_img.shape[1] classes = 10 #Prepare data for federation data = FederatedDataSet(t_img,t_label,v_img,v_label,batch_size=32, num_classes=classes)
- Define the neural network layers
class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 16, 3) #1st convolution self.pool = nn.MaxPool2d(2, 2) #pooling layer self.conv2 = nn.Conv2d(16, 32, 3) #2nd convolution #1st fully connected layer self.fc1 = nn.Linear(32 * 5 * 5, 32) self.fc2 = nn.Linear(32, 84) #2nd fully connected layer self.fc3 = nn.Linear(84, 10) #3rd fully connected layer def forward(self, x): #forward propagation x = self.pool(F.relu(self.conv1(x)))#pooling layer x = self.pool(F.relu(self.conv2(x))) #pooling layer x = x.view(x.size(0),-1) x = F.relu(self.fc1(x)) #activation for 1st FC layer x = F.relu(self.fc2(x)) #activation for 2nd FC layer x = self.fc3(x) return F.log_softmax(x, dim=1) #torch.nn.functional.log_softmax() applies softmax followed by logarithm
- Define optimizer
opt = lambda x: optim.Adam(x, lr=1e-4) #’lr’ is the learning rate
- Define a function for binary cross entropy metric
def cross_entropy(output, target): return F.binary_cross_entropy_with_logits(input=output,target=target)
- Build the model using FederatedModel() method of openfl.federated module.It defines the network definition and associated forward function, lambda optimizer method that can be set to a new instantiated network and the loss function.
fmodel = FederatedModel(build_model=Net,optimizer=opt, loss_fn=cross_entropy,data_loader=data)
- Build collaborator models
c_models = fmodel.setup(num_collaborators=2) #Define which collaborators will take part in the experiment collaborators = {'one':c_models[0],'Two':c_models[1]}
- Check training and validation data sizes for original dataset and those of the two collaborators
#Original MNIST dataset print(f'Original training data size: {len(t_img)}') print(f'Original validation data size: {len(v_img)}\n') #1st collaborator’s data print(f'Collaborator 1\'s training data size: {len(c_models[0] .data_loader.X_train)}') print(f'Collaborator 1\'s validation data size: {len(c_models[0] .data_loader.X_valid)}\n') #2nd collaborator's data print(f'Collaborator 2\'s training data size: {len(c_models[1] .data_loader.X_train)}') print(f'Collaborator 2\'s validation data size: {len(c_models[1] .data_loader.X_valid)}\n')
Sample output:
Original training data size: 60000 Original validation data size: 10000 Collaborator 1's training data size: 30000 Collaborator 1's validation data size: 5000 Collaborator 2's training data size: 30000 Collaborator 2's validation data size: 5000
- Get the current values of the FL plan.
import json #fx.get_plan() command returns all the plan values that can be set print(json.dumps(fx.get_plan(), indent=4, sort_keys=True))
- Run the experiment, return trained FederatedModel
According to the output of step (18), the experiment will run 10 times. We can change it using ‘aggregator.settings.rounds_to_train’ parameter.
final_model = fx.run_experiment(collaborators,{'aggregator.settings .rounds_to_train':5})
Sample condensed output:
For each round of the experiment, the output shows accuracy for locally tuned model and aggregated model as shown above.