Metrics

Continual Learning has many different metrics due to the nature of the task. Continuum proposes a logger module that accumulates the predictions of the model. Then the logger can compute various types of continual learning metrics based on the prediction saved.

Disclaimer: We aim to propose some tools for metric evaluation. Nevertheless, they are many different ways of evaluating CL and different metrics need different information: for example, the forward transfer might need the test accuracy of future tasks or accuracy of the same model trained from scratch. In our metric tools, we assume that we can measure test accuracy on future tasks and that task transitions are clear and available. We do not provide tools for measuring compute efficiency.

Pseudo-code

logger = Logger()
for task in scenario:
    for (x,y,t) in tasks:
        predictions = model(x,y,t)

        logger.add([predictions, y, t])
    logger.end_task()
print(f"Metric result: {logger.my_pretty_metric}")

Here is a list of all implemented metrics:

Name

Code

↑ / ↓

Accuracy

accuracy

Accuracy A

accuracy_A

Backward Transfer

backward_transfer

Positive Backward Transfer

positive_backward_transfer

Remembering

remembering

Forward Transfer

forward_transfer

Forgetting

forgetting

Model Size Growth

model_size_growth

Accuracy:

Computes the accuracy of a given task.

Accuracy A:

Accuracy as defined in Diaz-Rodriguez and Lomonaco.

Note that it is slightly different from the normal accuracy as it considers
each task accuracy with equal weight, while the normal accuracy considers
the proportion of all targets.

Example:
- Given task 1 with 50,000 images and task 2 with 1,000 images.
- With normal accuracy, task 1 has more importance in the average accuracy.
- With this accuracy A, task 1 has as much importance as task 2.

Reference:
* Don’t forget, there is more than forgetting: newmetrics for Continual Learning
  Diaz-Rodriguez and Lomonaco et al. NeurIPS Workshop 2018

Backward Transfer:

Measures the influence that learning a task has on the performance on previous tasks.

Reference:
* Gradient Episodic Memory for Continual Learning
  Lopez-paz & ranzato, NeurIPS 2017

Note: To measure backward transfer, the logger has to contains accuracy to past tasks at task t.

Positive Backward Transfer:

Computes the the positive gain of Backward transfer.

Reference:
* Don’t forget, there is more than forgetting: newmetrics for Continual Learning
  Diaz-Rodriguez and Lomonaco et al. NeurIPS Workshop 2018

Remembering:

Computes the forgetting part of Backward transfer.

Reference:
* Don’t forget, there is more than forgetting: newmetrics for Continual Learning
  Diaz-Rodriguez and Lomonaco et al. NeurIPS Workshop 2018

Forward Transfer:

Measures the influence that learning a task has on the performance of future tasks.

Reference:
* Gradient Episodic Memory for Continual Learning
  Lopez-paz & ranzato, NeurIPS 2017

Note: To measure Forward transfer, the logger has to contains accuracy to the future tasks at task t.

Forgetting:

Measures the average forgetting.

Reference:
* Riemannian Walk for Incremental Learning: Understanding Forgetting and Intransigence
  Chaudhry et al. ECCV 2018

Model Size Growth:

Evaluate the evolution of the model size.

Detailed Example

from torch.utils.data import DataLoader

from continuum import ClassIncremental
from continuum.datasets import MNIST
from continuum.metrics import Logger

train_scenario = ClassIncremental(
    MNIST(data_path=DATA_PATH, download=True, train=True),
    increment=2
)
test_scenario = ClassIncremental(
    MNIST(data_path=DATA_PATH, download=True, train=False),
    increment=2
)

# model = ...

test_loader = DataLoader(test_scenario[:])
logger = Logger(list_subsets=['train', 'test'])

for task_id, train_taskset in enumerate(train_scenario):
    train_loader = DataLoader(train_taskset)

    for x, y, t in train_loader:
        predictions = y  # model(x)

        logger.add([predictions, y, None], subset="train")
        _ = (f"Online accuracy: {logger.online_accuracy}")

    for x_test, y_test, t_test in test_loader:
        preds_test = y_test

        logger.add([preds_test, y_test, t_test], subset="test")

    _ = (f"Task: {task_id}, acc: {logger.accuracy}, avg acc: {logger.average_incremental_accuracy}")
    if task_id > 0:
        _ = (f"BWT: {logger.backward_transfer}, FWT: {logger.forward_transfer}")

    logger.end_task()

Advanced Use of logger

The logger is designed to save any type of tensor with a corresponding keyword. For example you may want to save a latent vector at each epoch.

from continuum.metrics import Logger

model = ... Initialize your model here ...

list_keywords=["latent_vector"]

logger = Logger(list_keywords=list_keywords, list_subsets=['train', 'test'])

for tasks in task_scenario):
    for epoch in range(epochs)
        for x, y, t in task_loader:
            # Do here your model training with losses and optimizer...
        latent_vector = model.get_latent_vector_fancy_method_you_designed()
        logger.add(latent_vector, keyword='latent_vector', subset="train")
        logger.end_epoch()

    logger.end_task()

If you want to log result to compute metrics AND log you latent vector you can declare and use you logger as following:

# Logger declaration with several keyword
logger = Logger(list_keywords=["performance", "latent_vector"], list_subsets=['train', 'test'])

# [...]
# log test results for metrics
logger.add([x,y,t], keyword='performance', subset="test")

# [...]
# log latent vector while testing
logger.add(latent_vector, keyword='latent_vector', subset="test")

At the end of training or when you want, you can get all the data logged.

logger = Logger(list_keywords=["performance", "latent_vector"], list_subsets=['train', 'test'])

# [... a long training a logging adventure ... ]

logs_latent = logger.get_logs(keyword='latent_vector', subset='test')

# you can explore the logs as follow
for task_id in range(logs_latent):
    for epoch_id in range(logs_latent[task_id]):
        # the list of all latent vector you saved as task_id and epoch_id by chronological order.
        list_of_latent_vector_logged = logs_latent[task_id][epoch_id]

We hope it might be useful for you :)