Metrics¶
Continual Learning has many different metrics due to the nature of the task. Continuum proposes a logger module that accumulates the predictions of the model. Then the logger can compute various types of continual learning metrics based on the prediction saved.
Disclaimer: We aim to propose some tools for metric evaluation. Nevertheless, they are many different ways of evaluating CL and different metrics need different information: for example, the forward transfer might need the test accuracy of future tasks or accuracy of the same model trained from scratch. In our metric tools, we assume that we can measure test accuracy on future tasks and that task transitions are clear and available. We do not provide tools for measuring compute efficiency.
Pseudo-code
logger = Logger()
for task in scenario:
for (x,y,t) in tasks:
predictions = model(x,y,t)
logger.add([predictions, y, t])
logger.end_task()
print(f"Metric result: {logger.my_pretty_metric}")
Here is a list of all implemented metrics:
Name |
Code |
↑ / ↓ |
---|---|---|
Accuracy |
accuracy |
↑ |
Accuracy A |
accuracy_A |
↑ |
Backward Transfer |
backward_transfer |
↑ |
Positive Backward Transfer |
positive_backward_transfer |
↑ |
Remembering |
remembering |
↑ |
Forward Transfer |
forward_transfer |
↑ |
Forgetting |
forgetting |
↓ |
Model Size Growth |
model_size_growth |
↓ |
Accuracy:
Computes the accuracy of a given task.
Accuracy A:
Accuracy as defined in Diaz-Rodriguez and Lomonaco.
Note that it is slightly different from the normal accuracy as it considers
each task accuracy with equal weight, while the normal accuracy considers
the proportion of all targets.
Example:
- Given task 1 with 50,000 images and task 2 with 1,000 images.
- With normal accuracy, task 1 has more importance in the average accuracy.
- With this accuracy A, task 1 has as much importance as task 2.
Reference:
* Don’t forget, there is more than forgetting: newmetrics for Continual Learning
Diaz-Rodriguez and Lomonaco et al. NeurIPS Workshop 2018
Backward Transfer:
Measures the influence that learning a task has on the performance on previous tasks.
Reference:
* Gradient Episodic Memory for Continual Learning
Lopez-paz & ranzato, NeurIPS 2017
Note: To measure backward transfer, the logger has to contains accuracy to past tasks at task t.
Positive Backward Transfer:
Computes the the positive gain of Backward transfer.
Reference:
* Don’t forget, there is more than forgetting: newmetrics for Continual Learning
Diaz-Rodriguez and Lomonaco et al. NeurIPS Workshop 2018
Remembering:
Computes the forgetting part of Backward transfer.
Reference:
* Don’t forget, there is more than forgetting: newmetrics for Continual Learning
Diaz-Rodriguez and Lomonaco et al. NeurIPS Workshop 2018
Forward Transfer:
Measures the influence that learning a task has on the performance of future tasks.
Reference:
* Gradient Episodic Memory for Continual Learning
Lopez-paz & ranzato, NeurIPS 2017
Note: To measure Forward transfer, the logger has to contains accuracy to the future tasks at task t.
Forgetting:
Measures the average forgetting.
Reference:
* Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
Chaudhry et al. ECCV 2018
Model Size Growth:
Evaluate the evolution of the model size.
Detailed Example¶
from torch.utils.data import DataLoader
from continuum import ClassIncremental
from continuum.datasets import MNIST
from continuum.metrics import Logger
train_scenario = ClassIncremental(
MNIST(data_path=DATA_PATH, download=True, train=True),
increment=2
)
test_scenario = ClassIncremental(
MNIST(data_path=DATA_PATH, download=True, train=False),
increment=2
)
# model = ...
test_loader = DataLoader(test_scenario[:])
logger = Logger(list_subsets=['train', 'test'])
for task_id, train_taskset in enumerate(train_scenario):
train_loader = DataLoader(train_taskset)
for x, y, t in train_loader:
predictions = y # model(x)
logger.add([predictions, y, None], subset="train")
_ = (f"Online accuracy: {logger.online_accuracy}")
for x_test, y_test, t_test in test_loader:
preds_test = y_test
logger.add([preds_test, y_test, t_test], subset="test")
_ = (f"Task: {task_id}, acc: {logger.accuracy}, avg acc: {logger.average_incremental_accuracy}")
if task_id > 0:
_ = (f"BWT: {logger.backward_transfer}, FWT: {logger.forward_transfer}")
logger.end_task()
Advanced Use of logger¶
The logger is designed to save any type of tensor with a corresponding keyword. For example you may want to save a latent vector at each epoch.
from continuum.metrics import Logger
model = ... Initialize your model here ...
list_keywords=["latent_vector"]
logger = Logger(list_keywords=list_keywords, list_subsets=['train', 'test'])
for tasks in task_scenario):
for epoch in range(epochs)
for x, y, t in task_loader:
# Do here your model training with losses and optimizer...
latent_vector = model.get_latent_vector_fancy_method_you_designed()
logger.add(latent_vector, keyword='latent_vector', subset="train")
logger.end_epoch()
logger.end_task()
If you want to log result to compute metrics AND log you latent vector you can declare and use you logger as following:
# Logger declaration with several keyword
logger = Logger(list_keywords=["performance", "latent_vector"], list_subsets=['train', 'test'])
# [...]
# log test results for metrics
logger.add([x,y,t], keyword='performance', subset="test")
# [...]
# log latent vector while testing
logger.add(latent_vector, keyword='latent_vector', subset="test")
At the end of training or when you want, you can get all the data logged.
logger = Logger(list_keywords=["performance", "latent_vector"], list_subsets=['train', 'test'])
# [... a long training a logging adventure ... ]
logs_latent = logger.get_logs(keyword='latent_vector', subset='test')
# you can explore the logs as follow
for task_id in range(logs_latent):
for epoch_id in range(logs_latent[task_id]):
# the list of all latent vector you saved as task_id and epoch_id by chronological order.
list_of_latent_vector_logged = logs_latent[task_id][epoch_id]
We hope it might be useful for you :)