Diabetic Retinopathy - Detecting Blindness¶

APTOS 2019 Blindness¶

Diabetic Retinopathy is a disease that affects the retina of the eye. Millions around the world suffer from this disease.

Currently, diagnosis happens through the use of a technique called fundus photography, which involves photographing the rear of the eye.

Medical screening for diabetic retinopathy occurs around the world, but is more difficult for people living in rural areas.

Using machine learning and computer vision, we attempt to automate the process of diagnosis, which currently is manually being performed doctors.

On Kaggle (https://www.kaggle.com/c/aptos2019-blindness-detection/data and https://www.kaggle.com/c/diabetic-retinopathy-detection) we will have access to a dataset of tens of thousands of real-world clinical images of both healthy patients and pateints with the disease, and labelled by trained clinicians.

Using this dataset, we'll be able to train a machine learning model to acheive a high level of accuracy when predicting occurrences of the disease in patients.

Results¶

We train our model on a combined dataset of approx 40,000 images. We then perform inference against the public and private leaderboard on Kaggle for the APTOS 2019 competition. The public leaderboard contains approx 30% of the total test dataset, and the private LB 70%. The test set contains in total approx 13,000 images at over 20gb in size.

The models are trained offline, and then uploaded to a Kaggle private data set linked to the kernel, which we use soley for inference.

We also make sure to pre-process the test images with the same image treatments that were performed on the training data. We use a custom ItemList to perform image manipulations on the test set before running our predictions.

Using an ensemble of B3 and B5 Efficientnets, we achieve a Quadratic Weighted Kappa score of 0.905775.

In comparison, the winning solution achieved 0.936129

Contents¶

The following notebook has been organised as follows:

Code has been listed initially first, and roughly sectioned off into the following key parts. It is worth going through this code, and as you read through the next major section of experiment discusssion, you can refer back to the code section as is relevant.
- Imports and Setup
- Image processing
- Metrics
- Learner and Databunch
- Predictions and Inference
- Pipeline experimental methods
Outline of experiment and results
- Data exploration
- Image processing baselines
- Model and architecture baselines. Decide if Regression or Classification is the best approach.
- Adding data and data augmentations
- Increasing image size
- Tuning other hyperparameters like dropout and weight decay
- Progressive resizing
- Increasing epochs and training times
- Ensembling
Appendix
- Image pre-processing methods
- Select experiments
- References

Imports and Setup¶

The following cell contains all of the setup code for each of intialisation whenever restarting the kernel

from fastai.callbacks import*
from fastai.vision import *
from fastai.metrics import error_rate

# Import Libraries here
import os
import json 
import shutil
import zipfile
import numpy as np
import pandas as pd
import PIL
import cv2

from PIL import ImageEnhance

import scipy as sp
from functools import partial
from sklearn import metrics
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import confusion_matrix

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchvision import transforms, utils
import torchvision.transforms.functional as TF

from torchvision.models import *

%reload_ext autoreload
%autoreload 2
%matplotlib inline
import pretrainedmodels
%load_ext jupyternotify
    

# set the random seed
np.random.seed(42)
    
    
import fastai; fastai.__version__

The jupyternotify extension is already loaded. To reload it, use:
  %reload_ext jupyternotify

'1.0.57'

Datasource Selectors¶

In my experiments I've setup a few data sources, both with the 2019 dataset only and also the combines 2019+2015 datasets. The code below helps me switch out between the two so I can benchmark with various options

# Downloaded from https://www.kaggle.com/benjaminwarner/resized-2015-2019-blindness-detection-images

def switch2019Only(sub_folder:str = ''):
    base_dir = '/hdd/data/blindness-detection/2015_and_2019/'

    !mkdir -p "{base_dir}"

    train_img_path = f'{base_dir}train/{sub_folder}'  # need to split this folder into train and val sets
    test_img_path = f'{base_dir}test/{sub_folder}' # images only, use to test

    df_train = pd.read_csv(base_dir + 'labels/trainLabels19.csv')
    df_train.head()
    
    return (train_img_path, base_dir, train_img_path, test_img_path, df_train)

# Downloaded from https://www.kaggle.com/benjaminwarner/resized-2015-2019-blindness-detection-images

def switch2015Only(sub_folder:str = ''):
    base_dir = '/hdd/data/blindness-detection/2015_and_2019/'

    !mkdir -p "{base_dir}"

    train_img_path = f'{base_dir}train/{sub_folder}'  # need to split this folder into train and val sets
    test_img_path = f'{base_dir}test/{sub_folder}' # images only, use to test

    df_train = pd.read_csv(base_dir + 'labels/trainLabels15.csv')
    df_train.columns = ['id_code', 'diagnosis']
    df_train.head()

    
    return (train_img_path, base_dir, train_img_path, test_img_path, df_train)

# Downloaded from https://www.kaggle.com/benjaminwarner/resized-2015-2019-blindness-detection-images

def switch2019And2015(sub_folder:str = ''):
    base_dir = '/hdd/data/blindness-detection/2015_and_2019/'

    !mkdir -p "{base_dir}"

    train_img_path = f'{base_dir}train/{sub_folder}'  # need to split this folder into train and val sets
    test_img_path = f'{base_dir}test/{sub_folder}' # images only, use to test

    df_train_15 = pd.read_csv(base_dir + 'labels/trainLabels15.csv')
    df_train_15.columns = ['id_code', 'diagnosis']
    df_train_15.head()

    df_train_19 = pd.read_csv(base_dir + 'labels/trainLabels19.csv')
    df_train_19.head()

    df_train = pd.concat([df_train_15, df_train_19])
    df_train=df_train.reset_index(drop=True)
    df_train.head()

    
    return (train_img_path, base_dir, train_img_path, test_img_path, df_train)

Metrics¶

# ---------- Metrics ----------

# Competition uses the quadric kappa metric, defined here
# Definition of Quadratic Kappa
from sklearn.metrics import cohen_kappa_score
def quadratic_kappa(y_hat, y):
    return torch.tensor(cohen_kappa_score(torch.round(y_hat), y, weights='quadratic'),device='cuda:0')

Learner and Data¶

from torch.utils.data.sampler import WeightedRandomSampler

class OverSamplingCallback(LearnerCallback):
    def __init__(self,learn:Learner,weights:torch.Tensor=None):
        super().__init__(learn)
        self.labels = self.learn.data.train_dl.dataset.y.items
        _, counts = np.unique(self.labels,return_counts=True)
        self.weights = (weights if weights is not None else
                        torch.DoubleTensor((1/counts)[self.labels.astype(int)]))
        self.label_counts = np.bincount([self.learn.data.train_dl.dataset.y[i].data for i in range(len(self.learn.data.train_dl.dataset))])
        self.total_len_oversample = int(self.learn.data.c*np.max(self.label_counts))
    def on_train_begin(self, **kwargs):
        self.learn.data.train_dl.dl.batch_sampler = BatchSampler(WeightedRandomSampler(self.weights,self.total_len_oversample), self.learn.data.train_dl.batch_size,False)
    

    
# ---------- Learner and Databunch ----------
def get_data_bunch_explore(data_source, image_size, bs=64, mode=0, use_xtra_tfms=False):
    # data source
    data_in_use, base_dir, train_img_path, test_img_path, df_train = data_source
    print(f'Using data in: {train_img_path}') # print out which dataset is in use
    
    # lets start off with a small image size first 
    # and use progressive resizing to see how our initial model is performing
    sz=image_size 
    
    # 1. Setup data bunch
    source = (CleanedImageList
                .from_df(df_train, train_img_path, suffix='.jpg', image_size=sz, mode=mode)
                .split_by_rand_pct(0.2, seed=42)
                .label_from_df(cols='diagnosis',label_cls=FloatList))
    
    data_bunch = (
        source
            .databunch(bs=bs)
            .normalize(imagenet_stats)
        );
    
    # if using data aug
    if use_xtra_tfms:
        source = (CleanedImageList
                .from_df(df_train, train_img_path, suffix='.jpg', image_size=sz, mode=mode)
                .split_by_rand_pct(0.2, seed=42)
                .label_from_df(cols='diagnosis',label_cls=FloatList))
        
        transforms = get_transforms(do_flip=True, 
                              flip_vert=True,
                              max_rotate=360,
                              max_zoom=False, 
                              max_lighting=0.1,
                              p_lighting=0.5,
                              xtra_tfms=zoom_crop(scale=(1.01, 1.45), do_rand=True))
    
        data_bunch = (
            source
                .transform(transforms,size=sz)
                .databunch(bs=bs)
                .normalize(imagenet_stats)
            );
        
    
    return data_bunch
    
def get_data_bunch(data_source, image_size, bs=64, use_xtra_tfms=False):
    
    # data source
    data_in_use, base_dir, train_img_path, test_img_path, df_train = data_source
    print(f'Using data in: {train_img_path}') # print out which dataset is in use
    
    # lets start off with a small image size first 
    # and use progressive resizing to see how our initial model is performing
    sz=image_size 
    
    # 1. Setup data bunch
    source = (ImageList
                .from_df(df_train, train_img_path, suffix='.jpg')
                .split_by_rand_pct(0.2, seed=42)
                .label_from_df(cols='diagnosis',label_cls=FloatList))
        
    data_bunch = (
        source
            .databunch(bs=bs)
            .normalize(imagenet_stats)
        );
    
    if use_xtra_tfms:
        source = (ImageList
                .from_df(df_train, train_img_path, suffix='.jpg')
                .split_by_rand_pct(0.2, seed=42)
                .label_from_df(cols='diagnosis',label_cls=FloatList))

        transforms = get_transforms(do_flip=True, 
                                  flip_vert=True,
                                  max_rotate=360,
                                  max_zoom=False, 
                                  max_lighting=0.1,
                                  p_lighting=0.5,
                                  xtra_tfms=zoom_crop(scale=(1.01, 1.45), do_rand=True))


        data_bunch = (
            source
                .transform(transforms,size=sz)
                .databunch(bs=bs)
                .normalize(imagenet_stats)
            );
        
    
    # add test set
    sample_df = pd.read_csv(base_dir + 'sample_submission.csv')
    sample_df.head()
    
    # Remember, for inference, we should apply the same image processing as what we trained on!
    data_bunch.add_test(ImageList.from_df(sample_df,base_dir,folder='test',suffix='.jpg'))

    return data_bunch

# "pretrained" is hardcoded to adapt to the PyTorch model function
from efficientnet_pytorch import EfficientNet
def efficient_net(b_class='b5'):
    return EfficientNet.from_pretrained(f'efficientnet-{b_class}', num_classes=1)


def get_cnn_learner(arch, data_bunch, tofp16=True, oversample=False):
    
    # 1. Get data bunch
    data_bunch_cleaned = data_bunch
    
    callback_fns = [ShowGraph]
    
    if oversample:
        print('is oversampling')
        callback_fns = [partial(OverSamplingCallback), ShowGraph]     

    # 2. Setup new learner. 
    learner = Learner(data_bunch_cleaned, arch, model_dir="models", metrics=quadratic_kappa, callback_fns=callback_fns)
    
    if tofp16:
        learner = Learner(data_bunch_cleaned, arch, model_dir="models", metrics=quadratic_kappa, callback_fns=callback_fns)
        learner.to_fp16()

    return learner

Pipelining Helper methods¶

I use these general helper methods to run training. These methods help encapsulate a lot of the benchmarking and training runs that I execute, and help to pass hyperparameters through easily whilst abstracting out some of the cnn setup code.

class Experiment():
    
    def __init__(self, name, data_source, arch, image_size, bs, wd, use_xtra_tfms, oversample, pretrained_model_name=None):
        
        super().__init__()
        
        self.name = name
        self.data_source = data_source
        self.arch = arch
        self.image_size = image_size
        self.bs = bs
        self.wd = wd
        self.use_xtra_tfms = use_xtra_tfms
        self.oversample = oversample
        self.pretrained_model_name = pretrained_model_name
        
        self.data_in_use, self.base_dir, self.train_img_path, self.test_img_path, self.df_train = self.data_source
        
        if self.pretrained_model_name:
            
            self.learner, self.data_bunch = get_learner_and_databunch(
                                                self.arch, 
                                                self.data_source,
                                                image_size=self.image_size,
                                                bs=self.bs,
                                                use_xtra_tfms=self.use_xtra_tfms,
                                                oversample=self.oversample)
            
            print(f'Loading pretrained model: {self.pretrained_model_name}')
            self.learner.load(self.base_dir + self.pretrained_model_name)
            self.learner.to_fp16()
            
        else:
            
            self.learner, self.data_bunch = get_learner_and_databunch(
                                                self.arch, 
                                                self.data_source,
                                                image_size=self.image_size,
                                                bs=self.bs,
                                                use_xtra_tfms=self.use_xtra_tfms,
                                                oversample=self.oversample)
            
    def find_lr(self):
        # find the inital lr for frozen training
        self.learner.lr_find(wd=self.wd)
        self.learner.recorder.plot(suggestion=True)
        
        
    def fit_frozen(self, epochs, lr):
        
        self.learner.fit_one_cycle(
            epochs, 
            lr, 
            wd=self.wd, 
            callbacks=[SaveModelCallback(self.learner, monitor='valid_loss', name=f'best_{self.name}')])
        
        %notify -m "fit_one_cycle finished"
        
        print(f'Saved model: {self.base_dir + self.name}')
        self.learner.save(self.base_dir + self.name)
        
    def unfreeze(self):
        self.learner.unfreeze()
        self.learner.lr_find()
        self.learner.recorder.plot()
        
    def fit_unfrozen(self, epochs, lr):
        
        self.learner.fit_one_cycle(
            epochs, 
            lr, 
            wd=self.wd, 
            callbacks=[SaveModelCallback(exp.learner, monitor='valid_loss', name=f'best_unf_{exp.name}')])
        
        %notify -m "unfrozen fit_one_cycle finished"


        self.learner.save(self.base_dir + 'unf_' + self.name)
        print(f'Saved model: {self.base_dir + "unf_" + self.name}')
        
    def load_frozen(self):
        self.learner.load(self.base_dir + self.name)
        self.learner.to_fp16()
              
    def load_best_frozen(self):
        self.learner.load(self.train_img_path + 'models/best_' + self.name)
        self.learner.to_fp16()
        print(f'Loaded best model {self.train_img_path + "models/best_" + self.name}')
        
    
        
    def get_kappa_score(self):
        get_kappa_score(self.learner)
        
    def show_batch(self):
        self.data_bunch.show_batch(4, figsize=(20,20))

# This method is a helper method that we use to help us setup a learner and a databunch with baselined defaults
# It returns the learner, the data bunch, and also runs an lr finder to use to find an appropriate learning rate to feed into fit one cycle
# The only required parameter is the architecture. For everything else 
# you can pass in overrides values to test different hyperparameters
def get_learner_and_databunch(
        arch, 
        data_source,
        image_size=128,
        bs=64,
        use_xtra_tfms=False,
        oversample=False):

    # data bunch
    data_bunch = get_data_bunch(
        data_source,
        image_size, 
        bs=bs, 
        use_xtra_tfms=use_xtra_tfms)

    # create a learner
    learner = get_cnn_learner(arch, data_bunch, oversample=oversample) 
    
        
    return (learner, data_bunch)


def get_kappa_score(learner):
    preds, y = learner.get_preds()
    score = quadratic_kappa(preds, y)
    print('Kappa score is {0}'.format(score))

    return score

Summary¶

The following is an outline of how I approached the problem and is roughly in the order of how I tackled the project. At each stage the aim was to find the best settings that would allow me to move forward on each experiments, and I spent a lot of time getting to know the data, baselining, and trying to uncover bugs during the training process.

Roughly, the order of operations for this project are outlined below:

Exploratory Data analysis
Image processing baselines
Model and architecture baselines. Regression or Classification is the best approach.
Adding data and data augmentations
Increasing image size
Tuning other hyperparameters like dropout and weight decay
Progressive resizing
Increasing epochs and training times
Ensembling

Common training settings¶

Using transfer learning with pre-trained weights
We use fit_one_cycle policy to help vary learning rates for best results.
We use Adam as our optimiser
Treated as a regression problem with MSELoss as our cost function.
Oversampled the dataset
Using expanded dataset from 2015 and 2019
Using heavy data augmentations: flipping, rotation, zoom, crops, and lighting.

Things we did not attempt¶

Stratified Kfolds
Test Time Augmentations
Psuedo labelling

Exploratory Data Analysis (EDA)¶

Before we start any training we try to get a good sense of the raw data, understand its distributions, and explore its features and idiosyncracies.

Getting to know our dataset is an important first step, and helps us tune our model towards more accurate predictions.

data_source = switch2019Only()

learner, data_bunch = get_learner_and_databunch(
        efficient_net('b2'), 
        data_source,
        image_size=224,
        bs=32,
        use_xtra_tfms=False,
        oversample=False)

data_bunch.show_batch(5, figsize=(20,20))

Loaded pretrained weights for efficientnet-b2
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/

Inconsistent cropping and lighting¶

As we can see from the above images, the input images are inconsistently cropped, and is the product of coming from a variety of different sources, equipment quality, and time periods.

Another inconsistency is the lighting used in theses images.

We need to help improve the consistency of the images so that it helps the model work more effectvely. A start would be to get the cropping and lighting consistent across all of the images.

Let's see what happens when we do some image processing:

Data imbalance¶

We can see here that there is a definite data imbalance with the supplied data. A large percentage of the images coming through are actually normal, rather than containing any diabetic retinopathy.

We deal with this imbalance by oversampling the less dominant classes.

# checking for data imbalances
data_in_use, base_dir, train_img_path, test_img_path, df_train = data_source
counts = df_train.diagnosis.value_counts() 
counts.plot(kind='bar')

<matplotlib.axes._subplots.AxesSubplot at 0x7fa640c1f588>

Data Processing¶

To deal with the above problems, we tried to implement the following solutions to improve the input into the network.

Pre-resize¶

We first pre-resize our images into the sizes we want to give to our model. This gives us a signficant boost in trainign speed over resizing the images on the fly (say through a custom ItemList)

We create subfolders with resized versions of the images so we can retain the original images. When we resize the images, we also apply the various colour, cropping and image processing treatments so that we also do not have to perform them on the fly.

Image Preprocessing¶

Below I've explored various image processing strategies. The goal here was to normalise exposure, contrast and cropping as much as possile across the dataset to make it easier for our neural net to work on.

Filtering out the green channel was explored, along with various Gaussian filters and also various cropping tasks.

In the end we found that a simple colour and circle crop provided the best results.

The different processing methods that were explored can be seen below.

Circle crop and resize¶

The following image batch displays the image pre-processing that was eventually applied in our final training runs. These were a simple circle crop and resize, and standardising the black corners.

data_source = switch2019And2015(sub_folder = '224/')
data_bunch = get_data_bunch_explore(data_source, image_size=128, bs=64, mode=3)
data_bunch.show_batch(5, figsize=(15,15))

Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/

Colour Raw images circle cropped with CLAHE¶

Original raw image with Contrast Limited Adaptive Histogram Equalisation (CLAHE) with circle cropping.

data_source = switch2019And2015(sub_folder = '224/')
data_bunch = get_data_bunch_explore(data_source, image_size=128, bs=64, mode=0)
data_bunch.show_batch(5, figsize=(15,15))

Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/

Green Channel Filter and Circle Crop¶

The paper here featured research around filtring out the green channel of the images to further highlight diabetic retinopathy indicators. http://biomedpharmajournal.org/vol10no2/diabetic-retinal-fundus-images-preprocessing-and-feature-extraction-for-early-detection-of-diabetic-retinopathy/

Although this provided good results, keeping the raw image colourisations seemed to work better in practice.

data_source = switch2019And2015(sub_folder = '224/')
data_bunch = get_data_bunch_explore(data_source, image_size=128, bs=64, mode=1)
data_bunch.show_batch(5, figsize=(15,15))

Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/

Gaussian Filters (Ben's processing methods)¶

I also experimented with some cropping and gaussian colorisation filters that were outlined in the following kernel: https://www.kaggle.com/ratthachat/aptos-updatedv14-preprocessing-ben-s-cropping

Compared to using the raw colour vs green channel filter, vs Gaussian filter methods, we observed during training that that Gaussian filters actually resulted is visibly poorer results in training.

data_source = switch2019And2015(sub_folder = '224/')
data_bunch = get_data_bunch_explore(data_source, image_size=128, bs=64, mode=2)
data_bunch.show_batch(5, figsize=(15,15))

Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/

Oversample¶

Our dataset displays a class imbalance, a situation where there is a disproportionate amount of images classified to one particular class. In this case, there are a lot of images of healthy eyes in the dataset.

In order to give our model enough examples to learn to detect instances of diabetic retinpathy, we can try a technique called oversampling.

By oversampling we are in essence creating duplicates of our data. This creates a scenario where our network could start to overfit. We need to observe this and balance this out with some regularisation.

Ref: https://forums.fast.ai/t/oversampling-callback/48997

Experimental results:¶

Even though the dataset was imbalanced, we found that oversampling introduced worse results.

TRAINING¶

Here we start looking at setting up our learners for training. As mentioned previously, training is performed offline, and the resulting models are then uploaded to Kaggle datasets and the kernel for inference on the public and private test set.

A summary of the key hyperparameters and setups that worked are listed below:

Fit one cycle policy for discriminitive learning rates.
Combine 2015 and 2019 datasets
Heavy data augmentations with cropping, lighting, and rotation.
Progressive resizing
Using EfficientNets: We explored a number of options for architectures. Including Resnets and Densenets. We found that EfficientNets provided the best results, so we use this with pretrained weights as our baseline architecture. https://arxiv.org/abs/1905.11946
Chose to approach this problem as a classifciation problem rather than a regression problem.
Adam optimiser.
Ensembling multiple models.

Classification or Regression¶

Try use CrossEntropyLoss and MSELoss (experiments not detailed below but I was able to determine previously that treating this as a regression problem gave best results).

More Data¶

Instead of training just on the 2019 dataset, we could use transfer learning and train on the 2015 dataset, and then use this as a pretrained network to train the 2019 dataset.

The other option was to combine both the 2015 and 2019 datasets into one large dataset.

This strategy provided the best results and generalised better on the private LB.

Data Augmentations¶

Data augmentations can be incredibly effective in improving machine learning models. Heavy augmentations seemed to help quite a lot in boosting accuracy in this competition.

In the original dataset we can see a lot of variability around lighting and cropping. We deal with this in one way through image pre-processing and standardising the original images. Another we deal with this is using data augmentation.

Along with serving as a means of regularisation and effectively increasing our dataset size, we also augment the data with a series of cropping, rotation, lighting, and zooming augmentations in an effort to smooth out the effect of the inconsistencies present in the photography of the original dataset.

The following displays the results of our augmentations.

data_source = switch2019And2015(sub_folder = '224/')
data_bunch = get_data_bunch_explore(data_source, image_size=128, bs=64, mode=3, use_xtra_tfms=True)
data_bunch.show_batch(5, figsize=(15,15))

Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/

Progressive Resizing¶

We get better results by starting with smaller images, and then gradually transfer learning to larger images. We start with a size of 224px for our training. Then feed the weights of these to the model trained against 352px images.

EfficientNet (B3)¶

# Our baseline hyperparameters
def get_baseline_train_b3(name, data_source, bs, sz, pretrained_model_name=None):
    
    return Experiment(
            name=name, 
            data_source=data_source, 
            arch=efficient_net('b3'), 
            image_size=sz, 
            bs=bs, 
            wd=1e-2,
            use_xtra_tfms=True, 
            oversample=False, 
            pretrained_model_name=pretrained_model_name)

(B3) 224px¶

Freeze¶

data_source = switch2019And2015(sub_folder = '224/')
exp = get_baseline_train_b3('exp_full_train_224_efficientnet-b3', data_source, bs=32, sz=224)
exp.find_lr()

Loaded pretrained weights for efficientnet-b3
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(50, 2e-3)

Better model found at epoch 35 with valid_loss value: 0.4752731919288635.
Better model found at epoch 38 with valid_loss value: 0.4523259401321411.
Better model found at epoch 40 with valid_loss value: 0.4514419138431549.
Better model found at epoch 41 with valid_loss value: 0.44921305775642395.
Better model found at epoch 43 with valid_loss value: 0.4359509348869324.

exp.unfreeze()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_unfrozen(20, 2e-3/3)

Better model found at epoch 0 with valid_loss value: 0.43965739011764526.
Better model found at epoch 15 with valid_loss value: 0.4392130374908447.
Better model found at epoch 17 with valid_loss value: 0.43600040674209595.

(B3) 352px.¶

Pretrained with 224px.¶

Freeze¶

data_source = switch2019And2015(sub_folder = '352/')

exp = get_baseline_train_b3(
        'exp_full_train_352_efficientnet-b3', 
        data_source, 
        bs=16, 
        sz=352, 
        pretrained_model_name='unf_exp_full_train_224_efficientnet-b3')

exp.find_lr()

Loaded pretrained weights for efficientnet-b3
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/352/
Using get_transforms
Loading pretrained model: unf_exp_full_train_224_efficientnet-b3

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(50, 1e-3)

/home/adeperio/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:373: RuntimeWarning: invalid value encountered in true_divide
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)

Better model found at epoch 0 with valid_loss value: 0.3769490420818329.
Better model found at epoch 1 with valid_loss value: 0.36530375480651855.

data_source = switch2019And2015(sub_folder = '352/')

exp = get_baseline_train_b3(
        'exp_full_train_352_efficientnet-b3', 
        data_source, 
        bs=16, 
        sz=352)

exp.load_frozen()

Loaded pretrained weights for efficientnet-b3
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/352/
Using get_transforms

exp.unfreeze()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_unfrozen(20, 5e-4)

Better model found at epoch 0 with valid_loss value: 0.367678701877594.
Better model found at epoch 3 with valid_loss value: 0.3643893301486969.

/home/adeperio/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:373: RuntimeWarning: invalid value encountered in true_divide
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)

Saved model: /hdd/data/blindness-detection/2015_and_2019/unf_exp_full_train_352_efficientnet-b3

Efficientnet (B5)¶

# Our baseline hyperparameters
def get_baseline_train_b5(name, data_source, bs, sz, pretrained_model_name=None):
    
    return Experiment(
            name=name, 
            data_source=data_source, 
            arch=efficient_net('b5'), 
            image_size=sz, 
            bs=bs, 
            wd=1e-2,
            use_xtra_tfms=True, 
            oversample=False, 
            pretrained_model_name=pretrained_model_name)

(B5) 224px¶

Freeze¶

data_source = switch2019And2015(sub_folder = '224/')
exp = get_baseline_train_b5('exp_full_train_224_efficientnet-b5', data_source, bs=16, sz=224)
exp.find_lr()

Loaded pretrained weights for efficientnet-b5
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(50, 1e-3)

Better model found at epoch 0 with valid_loss value: 0.6462504267692566.
Better model found at epoch 1 with valid_loss value: 0.560756266117096.
Better model found at epoch 3 with valid_loss value: 0.524396538734436.

/home/adeperio/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:373: RuntimeWarning: invalid value encountered in true_divide
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)

Better model found at epoch 33 with valid_loss value: 0.4585687518119812.
Better model found at epoch 37 with valid_loss value: 0.45334669947624207.
Better model found at epoch 38 with valid_loss value: 0.4347122013568878.
Better model found at epoch 42 with valid_loss value: 0.4318144917488098.
Better model found at epoch 44 with valid_loss value: 0.4268737733364105.
Better model found at epoch 45 with valid_loss value: 0.42627590894699097.
Better model found at epoch 47 with valid_loss value: 0.4262005686759949.

Saved model: /hdd/data/blindness-detection/2015_and_2019/exp_full_train_224_efficientnet-b5

data_source = switch2019And2015(sub_folder = '224/')

exp = get_baseline_train_b5(
        'exp_full_train_224_efficientnet-b5', 
        data_source, 
        bs=16, 
        sz=224)

exp.load_frozen()

Loaded pretrained weights for efficientnet-b5
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

Unfrozen¶

exp.unfreeze()

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_unfrozen(20, 1e-3/5)

Better model found at epoch 0 with valid_loss value: 0.42880508303642273.
Better model found at epoch 2 with valid_loss value: 0.42704999446868896.

/home/adeperio/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:373: RuntimeWarning: invalid value encountered in true_divide
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)

Better model found at epoch 13 with valid_loss value: 0.42697182297706604.

Saved model: /hdd/data/blindness-detection/2015_and_2019/unf_exp_full_train_224_efficientnet-b5

(B5) 352px. Pretrained witth 224px.¶

data_source = switch2019And2015(sub_folder = '352/')

exp = get_baseline_train_b5(
        'exp_full_train_352_efficientnet-b5', 
        data_source, 
        bs=8, 
        sz=352, 
        pretrained_model_name='unf_exp_full_train_224_efficientnet-b5')

exp.find_lr()

Loaded pretrained weights for efficientnet-b5
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/352/
Using get_transforms
Loading pretrained model: unf_exp_full_train_224_efficientnet-b5

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

(B5) Frozen 352 Training¶

exp.fit_frozen(50, 3e-3)

/home/adeperio/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:373: RuntimeWarning: invalid value encountered in true_divide
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)

Better model found at epoch 0 with valid_loss value: 0.37010443210601807.

Saved model: /hdd/data/blindness-detection/2015_and_2019/exp_full_train_352_efficientnet-b5

# loading the best frozen score and unfreezing
exp.load_best_frozen()
exp.unfreeze()

Loaded best model /hdd/data/blindness-detection/2015_and_2019/train/352/models/best_exp_full_train_352_efficientnet-b5

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

(B5) Unfrozen 352 Training¶

exp.fit_unfrozen(20, 3e-4)

/home/adeperio/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:373: RuntimeWarning: invalid value encountered in true_divide
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)

Better model found at epoch 0 with valid_loss value: 0.3629795014858246.

Saved model: /hdd/data/blindness-detection/2015_and_2019/unf_exp_full_train_352_efficientnet-b5

INFERENCE¶

Inference is performed directly in a Kaggle kernel after uploading the trained models to a Kaggle private dataset.

Before we perform inference, the test set has not been pre-processed like the images in our training set.

So here we use a custom Fastai ItemList (CleanedImageList) and apply the same image pre-processing the images in our test set (circle crop, resize). If we don't do this our predictions will be significantly off.

class CleanedImageList(ImageList):
    
    def __init__(self, *args, image_size=224, **kwargs):
        super().__init__(*args, **kwargs)
        self.image_size = image_size
        self.copy_new.append('image_size')
        
    def open(self, dirName):

        image = cv2.imread(dirName)
        image = circle_crop_v2(image)
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        image = cv2.resize(image, (int(self.image_size), int(self.image_size)))
        return Image(pil2tensor(image, np.float32).div_(255)) # fastai format

We then take the following models and ensemble them using simple averaging:

EfficientNet B3 on 224px images.
EfficientNet B3 progressive resized to 352px images.
EfficientNet B5 on 224px images.
EfficientNet B5 progressive resized to 352px images.

Result¶

After inference, we get a result of 0.905775 on the private dataset (70% of the data)

APPENDIX¶

Data Source¶

Our source of data is from the APTOS 2019 Blindness Detection competition on Kaggle.

We end up actually amalgamating two data sources: one from a previous 2015 competitiona and the 2019 dataset.

See the references section for the full suite of download routines used. It's been moved to the bottom of this notebook for document cleanliness.

# Download the data
!pip install kaggle
!kaggle competitions download -c aptos2019-blindness-detection -p "{base_dir}"

Image Processing¶

The original images of the data set were pre-process and resaved with cropping and resizing. The code to do that is included below.

# ---------- Cleaning Images ----------
def crop_image_from_gray(img,tol=7):
    if img.ndim ==2:
        mask = img>tol
        return img[np.ix_(mask.any(1),mask.any(0))]
    elif img.ndim==3:
        gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
        mask = gray_img>tol
        
        check_shape = img[:,:,0][np.ix_(mask.any(1),mask.any(0))].shape[0]
        if (check_shape == 0): # image is too dark so that we crop out everything,
            return img # return original image
        else:
            img1=img[:,:,0][np.ix_(mask.any(1),mask.any(0))]
            img2=img[:,:,1][np.ix_(mask.any(1),mask.any(0))]
            img3=img[:,:,2][np.ix_(mask.any(1),mask.any(0))]
            img = np.stack([img1,img2,img3],axis=-1)
        return img


def circle_crop_v2(img):
    """
    Create circular crop around image centre
    """
    img = crop_image_from_gray(img)

    height, width, depth = img.shape
    largest_side = np.max((height, width))
    img = cv2.resize(img, (largest_side, largest_side))

    height, width, depth = img.shape

    x = int(width / 2)
    y = int(height / 2)
    r = np.amin((x, y))

    circle_img = np.zeros((height, width), np.uint8)
    cv2.circle(circle_img, (x, y), int(r), 1, thickness=-1)
    img = cv2.bitwise_and(img, img, mask=circle_img)
    img = crop_image_from_gray(img)

    return img


def toCLAHEgreen(img):  
    clipLimit=2.0 
    tileGridSize=(8, 8)  
    img = np.array(img)     
    green_channel = img[:, :, 1]    
    clahe = cv2.createCLAHE(clipLimit=clipLimit, tileGridSize=tileGridSize)
    cla = clahe.apply(green_channel) 
    cla=clahe.apply(cla)
    return cla

def toCLAHE(img):      

    lab = cv2.cvtColor(img, cv2.COLOR_BGR2LAB)
    lab_planes = cv2.split(lab)
    clahe = cv2.createCLAHE(clipLimit=2.0,tileGridSize=(8,8))
    lab_planes[0] = clahe.apply(lab_planes[0])
    lab = cv2.merge(lab_planes)
    img = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
    return img

def resize_to(img, targ_sz:int, use_min:bool=False):
    h,w = img.shape[:2]
    min_sz = (min if use_min else max)(w,h)
    ratio = targ_sz/min_sz
    return int(w*ratio),int(h*ratio)

def crop_and_resize(path, size):
    image = cv2.imread(str(path))
    image = crop_image_from_gray(image)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    target_size = resize_to(image, int(size), use_min=True)
    image = cv2.resize(image, target_size)
    return Image(pil2tensor(image, np.float32).div_(255)) # fastai format

def crop_and_resize_circle(path, size):
    image = cv2.imread(str(path))
    image = circle_crop_v2(image)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    target_size = resize_to(image, int(size), use_min=True)
    image = cv2.resize(image, target_size)
    return Image(pil2tensor(image, np.float32).div_(255)) # fastai format

def resize_one(fn, i, train_img_path, path, size):
    path.mkdir(exist_ok=True)
    dest = path/fn.relative_to(train_img_path)
    img = crop_and_resize(fn, size)
#     print(dest)
    img.save(dest)
    

# now lets reload the data bunch with these image crops
class CleanedImageList(ImageList):
    
    def __init__(self, *args, image_size=128, mode=0, **kwargs):
        super().__init__(*args, **kwargs)
        self.image_size = image_size
        self.copy_new.append('image_size')
        self.mode = mode
        self.copy_new.append('mode')
        
    def open(self, dirName):

        image = cv2.imread(dirName)

        if self.mode == 0:
            image = circle_crop_v2(image)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image = toCLAHE(image)
            image = cv2.resize(image, (int(self.image_size), int(self.image_size)))
            
        elif self.mode == 1:
            image = circle_crop_v2(image)
            image = toCLAHEgreen(image)    
            image = cv2.resize(image, (int(self.image_size), int(self.image_size)))
        elif self.mode == 2:
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image = circle_crop_v2(image)
            image = cv2.resize(image, (int(self.image_size), int(self.image_size)))
            sigmaX = 10
            image=cv2.addWeighted (image,4, cv2.GaussianBlur( image , (0,0), sigmaX) ,-4 ,128)
        else:
            image = crop_image_from_gray(image)
            image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
            image = cv2.resize(image, (int(self.image_size), int(self.image_size)))
        
        return Image(pil2tensor(image, np.float32).div_(255)) # return fastai Image format

Preprocess Images: 224px¶

size = 224

data_in_use, base_dir, train_img_path, test_img_path, df_train = switch2019And2015()
image_list = ImageList.from_folder(train_img_path)
path = Path(train_img_path + '224/')

print(f"resizing to {size} into {path}")
parallel(partial(resize_one, train_img_path=train_img_path, path=path, size=size), image_list.items)

size = 224

data_in_use, base_dir, train_img_path, test_img_path, df_train = switch2019And2015()
image_list = ImageList.from_folder(train_img_path)
path = Path(train_img_path + '224_regular_crop/')

print(f"resizing to {size} into {path}")
parallel(partial(resize_one, train_img_path=train_img_path, path=path, size=size), image_list.items)

Preprocess Images: 352px¶

size = 352

data_in_use, base_dir, train_img_path, test_img_path, df_train = switch2019And2015()
image_list = ImageList.from_folder(train_img_path)
path = Path(train_img_path + '352/')

print(f"resizing to {size} into {path}")
parallel(partial(resize_one, train_img_path=train_img_path, path=path, size=size), image_list.items)

Experiments¶

A number of experiments were performed to determine the best hyper paramters for the network. Below are some of the key experiments that were performed, detailed to highlight experimental process and work flow.

First we encapsulated experiment hyper paramaters in an "Experiment" class.

Then for each experiment, we create a "baseline" function that intialises a set of hyper parameters that we want to remain constant, but takes as function paramaters then hyper parameters we are trying to test.

Then we run over a small number of epochs to get a rough indiciation of what best works.

(Exp-1) Batch Size vs Model Complexity¶

Summary¶

The following experiment tries to observe the performance of changing batch size vs model complexity.

Comparing for batch size

efficientnet-b2 and bs=64
efficientnet-b2 and bs=16

We can see that higher batch sizes have better accuracy

Comparing for model complexity

efficientnet-b2 and bs=16
efficientnet-b5 and bs=16

We can see that the more complex b5 also gets better accuracy, and even better than the b2 with bs=16.

So we use b5 as one of our ensemble models to train from.

But increasing model complexity also brings up the accuracy even with lower batch sizes.

# Our baseline hyperparameters
def get_baseline_exp1(name, arch, bs):
    return Experiment(
            name=name, 
            data_source=switch2019And2015(sub_folder = '224/'), 
            arch=arch, 
            image_size=224, 
            bs=bs, 
            wd=1e-2,
            use_xtra_tfms=True, 
            oversample=False)

Then vary the architecture and batch size

# iteration 1
exp = get_baseline_exp1('exp1_bs64_224_efficientnet-b2', efficient_net('b2'), bs=64)
exp.find_lr()

Loaded pretrained weights for efficientnet-b2
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(3, 1e-4)

Better model found at epoch 0 with valid_loss value: 0.7020165920257568.
Better model found at epoch 1 with valid_loss value: 0.5692717432975769.
Better model found at epoch 2 with valid_loss value: 0.500103235244751.

# iteration 2
exp = get_baseline_exp1('exp1_bs16_224_efficientnet-b2', efficient_net('b2'), bs=16)
exp.find_lr()

Loaded pretrained weights for efficientnet-b2
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(3, 1e-3)

/home/adeperio/anaconda3/lib/python3.7/site-packages/sklearn/metrics/classification.py:373: RuntimeWarning: invalid value encountered in true_divide
  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)

Better model found at epoch 0 with valid_loss value: 0.7379746437072754.
Better model found at epoch 1 with valid_loss value: 0.5954458117485046.
Better model found at epoch 2 with valid_loss value: 0.541480541229248.

exp = get_baseline_exp1('exp1_bs16_224_efficientnet-b5', efficient_net('b5'), bs=16)
exp.find_lr()

Loaded pretrained weights for efficientnet-b5
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(3, 5e-4)

Better model found at epoch 0 with valid_loss value: 0.69666588306427.
Better model found at epoch 1 with valid_loss value: 0.5378988981246948.
Better model found at epoch 2 with valid_loss value: 0.49134281277656555.

(Exp-2) Progressive Resize¶

Summary¶

We pretrained 224px on 2015 and 2019 data.

Then we used this to progressize resize to 352, and only on the 2019 data.

This resulted in a significant boost to our CV score.

# Our baseline hyperparameters
def get_baseline_exp2(name, arch, data_source, bs, sz, pretrained_model_name=None):
    
    return Experiment(
            name=name, 
            data_source=data_source, 
            arch=arch, 
            image_size=sz, 
            bs=bs, 
            wd=1e-2,
            use_xtra_tfms=True, 
            oversample=False, 
            pretrained_model_name=pretrained_model_name)

start with 224px¶

data_source = switch2019And2015(sub_folder = '224/')
exp = get_baseline_exp2('exp2_bs16_224_efficientnet-b5', data_source, efficient_net('b5'), bs=16, sz=224)
exp.find_lr()

Loaded pretrained weights for efficientnet-b5
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(3, 7e-4)

transfer learn up to 352, 2019 Only¶

exp = get_baseline_exp2('exp2_bs16_352_efficientnet-b5', 
                        efficient_net('b5'), 
                        switch2019Only(sub_folder = '352/'),
                        bs=8, 
                        sz=352,
                        pretrained_model_name='exp2_bs16_224_efficientnet-b5')
exp.find_lr()

Loaded pretrained weights for efficientnet-b5
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/352/
Using get_transforms
Loading pretrained model: exp2_bs16_224_efficientnet-b5

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.

exp.fit_frozen(3, 5e-4)

Better model found at epoch 0 with valid_loss value: 1.3506337404251099.
Better model found at epoch 1 with valid_loss value: 0.32884085178375244.
Better model found at epoch 2 with valid_loss value: 0.3046622574329376.

(Exp-3) Oversample vs Non-Oversample¶

Summary¶

As we can clearly see, although there was an imbalance with the dataset, oversampling gave significantly poorer results.

We turn Oversampling off on our main training run.

# Our baseline hyperparameters
def get_baseline_exp3(name, oversample):
    
    return Experiment(
            name=name, 
            data_source=switch2019And2015(sub_folder = '224/'), 
            arch=efficient_net('b2'), 
            image_size=224, 
            bs=64, 
            wd=1e-2,
            use_xtra_tfms=True, 
            oversample=oversample, 
            pretrained_model_name=None)

Oversample¶

exp = get_baseline_exp3('exp3_bs64_224_efficientnet-b2', True)
exp.find_lr()

Loaded pretrained weights for efficientnet-b2
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms
is oversampling

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Min numerical gradient: 1.51E-04
Min loss divided by 10: 2.00E-03

exp.fit_frozen(5, 1e-3)

Better model found at epoch 0 with valid_loss value: 0.9569430351257324.
Better model found at epoch 2 with valid_loss value: 0.8411678671836853.
Better model found at epoch 3 with valid_loss value: 0.6268053650856018.

Saved model: /hdd/data/blindness-detection/2015_and_2019/exp3_bs64_224_efficientnet-b2

Non-Oversample¶

exp = get_baseline_exp3('exp3_bs64_224_efficientnet-b2', False)
exp.find_lr()

Loaded pretrained weights for efficientnet-b2
Using data in: /hdd/data/blindness-detection/2015_and_2019/train/224/
Using get_transforms

LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
Min numerical gradient: 1.58E-07
Min loss divided by 10: 6.92E-04

exp.fit_frozen(5, 1e-3)

Better model found at epoch 0 with valid_loss value: 0.6258118152618408.
Better model found at epoch 2 with valid_loss value: 0.5545154213905334.
Better model found at epoch 4 with valid_loss value: 0.47669753432273865.

Saved model: /hdd/data/blindness-detection/2015_and_2019/exp3_bs64_224_efficientnet-b2

References¶

Fastai starter code

https://www.kaggle.com/demonplus/fast-ai-starter-with-resnet-50/notebook

Google work on diabetic retiopathy:

Image processing:

bens original - https://www.kaggle.com/ratthachat/aptos-updatedv14-preprocessing-ben-s-cropping
circle cropping https://www.kaggle.com/taindow/pre-processing-train-and-test-images
Paper on focusing on green channel for most information http://biomedpharmajournal.org/vol10no2/diabetic-retinal-fundus-images-preprocessing-and-feature-extraction-for-early-detection-of-diabetic-retinopathy/
Green channel post on discussion board - https://www.kaggle.com/c/aptos2019-blindness-detection/discussion/102613#latest-598093

Pretrained Models (Cadene):

Previous competition data:

https://www.kaggle.com/benjaminwarner/resized-2015-2019-blindness-detection-images

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.685342	0.619710	0.545845	05:31
1	0.596974	0.571068	0.578879	05:38
2	0.618178	0.567687	0.588475	05:40
3	0.555176	0.585413	0.620093	05:39
4	0.593917	0.634622	0.600980	05:40
5	0.632227	0.587759	0.587736	05:41
6	0.644521	0.787938	0.440152	05:40
7	0.641000	0.667928	0.522046	05:43
8	0.622614	0.683303	0.533380	05:41
9	0.640435	0.953740	0.373591	05:40
10	0.638760	0.645746	0.534350	05:40
11	0.618202	0.751819	0.437967	05:41
12	0.590469	0.678545	0.539696	05:39
13	0.624134	0.625909	0.521491	05:41
14	0.601607	0.672399	0.473202	05:38
15	0.604112	0.596951	0.540873	05:40
16	0.571431	0.581151	0.573475	05:39
17	0.604839	0.605205	0.549892	05:38
18	0.578118	0.603482	0.576252	05:40
19	0.530020	0.685973	0.494106	05:43
20	0.583008	0.534397	0.620179	05:39
21	0.573428	0.613555	0.559092	05:43
22	0.532787	0.591532	0.560300	05:42
23	0.536919	0.641766	0.538613	05:39
24	0.492625	0.613391	0.585605	05:42
25	0.564835	0.556439	0.581119	05:42
26	0.517337	0.556271	0.604257	05:42
27	0.552688	0.534814	0.611220	05:40
28	0.518556	0.594337	0.591200	05:40
29	0.478324	0.550315	0.623366	05:40
30	0.465005	0.516290	0.633222	05:41
31	0.465429	0.575935	0.563715	05:41
32	0.459050	0.476072	0.659520	05:41
33	0.470096	0.478756	0.651597	05:43
34	0.466783	0.495208	0.640741	05:45
35	0.451121	0.475273	0.671176	05:44
36	0.424223	0.476628	0.665514	05:41
37	0.441065	0.500524	0.632676	05:42
38	0.437456	0.452326	0.681158	05:45
39	0.403897	0.494364	0.643641	05:43
40	0.367161	0.451442	0.673862	05:42
41	0.387677	0.449213	0.686294	05:45
42	0.421061	0.458739	0.668585	05:43
43	0.380013	0.435951	0.693150	05:44
44	0.389248	0.442668	0.686642	05:43
45	0.363035	0.443507	0.691067	05:43
46	0.388261	0.444286	0.688327	05:42
47	0.387743	0.442615	0.689882	05:44
48	0.372562	0.441129	0.692700	05:42
49	0.360547	0.440386	0.691759	05:43

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.355483	0.439657	0.688297	05:42
1	0.366197	0.444503	0.688321	05:45
2	0.375689	0.446104	0.685427	05:47
3	0.403844	0.470239	0.664926	05:42
4	0.408403	0.461219	0.676576	05:47
5	0.398039	0.477462	0.650838	05:45
6	0.418846	0.485373	0.637332	05:45
7	0.397980	0.474653	0.661722	05:46
8	0.418474	0.464985	0.679641	05:47
9	0.389099	0.491108	0.633804	05:46
10	0.380193	0.454011	0.674628	05:49
11	0.376623	0.454947	0.674366	05:47
12	0.377168	0.448288	0.673841	05:47
13	0.398294	0.454249	0.693535	05:45
14	0.373084	0.443102	0.688910	05:50
15	0.344921	0.439213	0.697587	05:47
16	0.321437	0.441183	0.688341	05:42
17	0.339580	0.436000	0.697186	05:45
18	0.363504	0.437782	0.694226	05:46
19	0.336848	0.437483	0.696593	05:44

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.378758	0.376949	nan	12:25
1	0.364224	0.365304	nan	12:36
2	0.382510	0.370127	0.713891	12:34
3	0.376226	0.373534	0.722537	12:35
4	0.371377	0.376860	nan	12:36
5	0.358835	0.385350	0.701696	12:37
6	0.401600	0.434859	nan	12:36
7	0.412164	0.399532	0.698118	12:37
8	0.390873	0.422254	0.666851	12:38
9	0.388066	0.500082	0.673931	12:41
10	0.437900	0.505949	nan	12:42
11	0.390991	0.640664	nan	12:36
12	0.455951	0.501692	nan	12:39
13	0.461126	0.650071	nan	12:38
14	0.400187	0.497508	0.618025	12:38
15	0.443355	0.445772	nan	12:31
16	0.434101	0.623643	0.568809	12:37
17	0.399546	0.431995	0.669738	12:35
18	0.451687	0.522574	nan	12:38
19	0.421694	0.447511	nan	12:39
20	0.423959	0.561190	nan	12:38
21	0.471017	0.517371	0.648686	12:40
22	0.422174	0.456127	0.691005	12:39
23	0.415511	0.661708	0.624388	12:42
24	0.413321	0.542787	nan	12:38
25	0.404340	0.407079	0.694435	12:40
26	0.395765	0.481117	0.629505	12:38
27	0.326620	0.453025	nan	12:40
28	0.408697	0.409218	nan	12:40
29	0.368440	0.537910	0.646179	12:39
30	0.329484	0.432478	nan	12:38
31	0.358978	0.444529	0.697926	12:33
32	0.363501	0.643090	0.707714	12:36
33	0.364408	0.438274	0.689685	12:40
34	0.346218	0.395282	0.716795	12:42
35	0.333619	0.381019	0.705948	12:35
36	0.282752	0.391902	0.722615	12:38
37	0.284697	0.385550	0.722432	12:40
38	0.295718	0.399403	0.722060	12:44
39	0.301479	0.414364	0.717080	12:38
40	0.275269	0.385260	0.716763	12:42
41	0.303773	0.399387	0.723052	12:37
42	0.251041	0.378457	0.728782	12:40
43	0.267637	0.382559	0.729941	12:42
44	0.270321	0.384342	0.725919	12:37
45	0.245405	0.376402	0.729031	12:42
46	0.261571	0.376969	0.733021	12:38
47	0.276336	0.375040	0.734086	12:41
48	0.246320	0.377162	0.736666	12:39
49	0.299888	0.374446	0.733259	12:39

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.364069	0.367679	0.718695	12:27
1	0.338643	0.373225	0.719983	12:28
2	0.334729	0.382793	0.716175	12:26
3	0.389582	0.364389	0.719564	12:30
4	0.335883	0.416343	0.701526	12:30
5	0.420510	0.511973	nan	12:30
6	0.393051	0.405175	0.699467	12:28
7	0.407934	0.580876	0.666136	12:27
8	0.370138	0.421006	0.677059	12:31
9	0.419773	0.372806	0.723538	12:25
10	0.346884	0.386918	0.710356	12:29
11	0.323462	0.374531	0.709298	12:33
12	0.320611	0.401581	nan	12:27
13	0.310443	0.380425	0.712785	12:31
14	0.287194	0.371826	0.715642	12:25
15	0.293426	0.373948	0.720805	12:24
16	0.318623	0.367338	0.727762	12:26
17	0.252421	0.368773	0.725815	12:28
18	0.276527	0.367025	0.727898	12:32
19	0.280160	0.365576	0.727554	12:26

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.595345	0.646250	0.475005	13:00
1	0.592122	0.560756	0.556119	13:03
2	0.543716	0.610841	0.501747	13:00
3	0.605076	0.524397	0.585637	13:01
4	0.583582	0.547083	0.590754	13:00
5	0.609600	0.578056	0.533149	13:01
6	0.577295	0.559897	0.597622	13:06
7	0.597799	0.591496	0.509156	13:06
8	0.616400	0.743248	nan	13:02
9	0.626760	0.606392	0.547063	13:03
10	0.602613	0.668969	nan	13:02
11	0.645009	0.696823	0.424927	13:02
12	0.610261	0.720009	0.513902	13:07
13	0.579437	0.622490	nan	13:04
14	0.577785	0.604873	0.503102	13:03
15	0.538940	0.578461	0.539171	13:06
16	0.544767	0.546279	0.578014	13:08
17	0.537881	0.570902	0.597600	13:10
18	0.511949	0.670403	nan	13:07
19	0.522222	0.529480	0.597809	13:08
20	0.528282	0.657377	0.602783	13:10
21	0.508090	0.639336	nan	13:09
22	0.538038	0.504943	0.623391	13:13
23	0.587688	0.612449	0.541492	13:11
24	0.500340	0.493437	0.623449	13:14
25	0.515066	0.544601	0.618239	13:10
26	0.488631	0.543157	0.567873	13:08
27	0.523809	0.523644	nan	13:15
28	0.453457	0.548754	nan	13:13
29	0.456395	0.566812	0.642520	13:09
30	0.412951	0.528970	nan	13:08
31	0.537828	0.498892	nan	13:07
32	0.423157	0.536432	nan	13:07
33	0.386358	0.458569	0.629433	13:06
34	0.461502	0.469598	0.644802	13:06
35	0.388960	0.509905	0.612689	13:08
36	0.417862	0.464479	nan	13:08
37	0.418502	0.453347	0.647451	13:09
38	0.406408	0.434712	0.666715	13:07
39	0.381572	0.440594	0.644749	13:07
40	0.382604	0.452715	nan	13:08
41	0.370123	0.441038	0.665932	13:07
42	0.376259	0.431814	0.654034	13:10
43	0.329282	0.434962	0.658811	13:07
44	0.359504	0.426874	0.668100	13:08
45	0.357846	0.426276	0.668389	13:09
46	0.359646	0.426947	0.671260	13:09
47	0.359240	0.426201	0.665448	13:10
48	0.326063	0.426390	0.665145	13:09
49	0.371072	0.427722	0.663822	13:08

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.313450	0.428805	0.665016	12:41
1	0.344732	0.432450	0.667773	12:44
2	0.339145	0.427050	0.661750	12:47
3	0.308634	0.441885	0.656379	12:47
4	0.384664	0.429656	0.656214	12:52
5	0.397535	0.452591	0.669986	12:49
6	0.334551	0.435887	0.651961	12:49
7	0.370083	0.431728	0.667534	12:49
8	0.396674	0.428796	0.660064	12:53
9	0.333076	0.439394	0.641015	12:52
10	0.328148	0.448282	0.668639	12:52
11	0.325486	0.430073	0.658357	12:50
12	0.332647	0.442094	nan	12:50
13	0.305808	0.426972	0.673415	12:51
14	0.318302	0.434622	0.657283	12:48
15	0.322901	0.427852	0.668537	12:50
16	0.262209	0.432599	0.671035	12:51
17	0.312506	0.429035	0.671172	12:50
18	0.290366	0.429802	0.673221	12:49
19	0.328396	0.429072	0.671930	12:51

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.296111	0.370104	nan	27:12
1	0.313915	0.407937	nan	27:16
2	0.329492	0.395435	nan	27:21
3	0.515533	0.452583	nan	27:18
4	0.486743	0.491821	nan	27:21
5	0.448745	0.583675	nan	27:14
6	0.632567	0.580358	nan	27:17
7	0.568568	0.563487	nan	27:19
8	0.652277	0.760102	nan	27:12
9	0.640356	1.192581	0.238264	27:16
10	0.809779	0.710387	nan	27:15
11	0.669706	0.992560	nan	27:12
12	0.735946	0.691278	nan	27:12
13	0.740307	0.751535	nan	27:19
14	0.665691	0.700673	0.276416	27:22
15	0.706394	0.877093	0.368763	27:22
16	0.657666	0.876099	nan	27:29
17	0.684086	0.700442	0.263370	27:33
18	0.582564	0.644197	nan	27:26
19	0.583665	0.800478	nan	27:26
20	0.638894	0.755252	nan	27:30
21	0.578587	0.612095	nan	27:42
22	0.631876	0.778131	nan	27:45
23	0.539442	0.674228	nan	27:41
24	0.537807	0.762890	0.273097	27:40
25	0.574273	0.634675	nan	27:49
26	0.598411	0.679090	0.302701	28:01
27	0.657017	0.693446	nan	27:50
28	0.594459	0.713499	0.414432	27:46
29	0.459683	0.566890	nan	27:48
30	0.521934	0.528279	nan	27:55
31	0.560276	0.526069	nan	27:58
32	0.505221	0.537023	nan	27:49
33	0.524495	0.506361	nan	28:01
34	0.503538	0.550988	nan	28:00
35	0.522662	0.503027	nan	28:01
36	0.432941	0.497164	nan	28:02
37	0.454244	0.511560	nan	28:08
38	0.500142	0.498066	nan	28:08
39	0.450399	0.487739	nan	28:06
40	0.522259	0.723720	0.454567	28:06
41	0.392952	0.563218	nan	28:07
42	0.441279	0.477039	nan	28:05
43	0.405745	0.475012	nan	28:12
44	0.427733	0.455954	nan	28:15
45	0.384656	0.544580	nan	28:16
46	0.458088	0.514241	nan	28:16
47	0.415536	0.470183	nan	28:20
48	0.373178	0.461036	nan	28:21
49	0.391088	0.470336	nan	28:18

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.342695	0.362980	nan	28:01
1	0.331227	0.369542	nan	28:17
2	0.331817	0.373943	nan	28:19
3	0.397074	0.383603	nan	28:20
4	0.352606	0.385025	nan	28:20
5	0.374123	0.454090	nan	28:18
6	0.392314	0.389502	nan	28:19
7	0.464454	0.427245	nan	28:19
8	0.380621	0.445283	nan	28:19
9	0.361473	0.396074	nan	28:23
10	0.334938	0.403379	nan	28:22
11	0.343663	0.409198	nan	28:22
12	0.336499	0.409659	nan	28:26
13	0.327567	0.425763	nan	28:23
14	0.291179	0.410862	nan	28:23
15	0.311857	0.383033	nan	28:26
16	0.262758	0.388558	nan	28:27
17	0.265251	0.388951	nan	28:27
18	0.271012	0.386437	nan	28:30
19	0.304055	0.389387	nan	28:30

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.677559	0.702017	0.564117	03:24
1	0.587030	0.569272	0.654701	03:35
2	0.524190	0.500103	0.648396	03:36

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.752668	0.737975	nan	09:15
1	0.692548	0.595446	0.515833	09:23
2	0.514083	0.541481	0.563979	09:22

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.750439	0.696666	0.508375	12:51
1	0.582432	0.537899	0.561278	13:00
2	0.475572	0.491343	0.604045	13:01

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.717716	1.350634	0.578072	02:31
1	0.489375	0.328841	0.840936	02:32
2	0.369162	0.304662	0.857985	02:32

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.969762	0.956943	0.442457	02:33
1	0.919834	1.236394	0.525234	02:37
2	0.762089	0.841168	0.463579	02:39
3	0.643005	0.626805	0.585560	02:37
4	0.575614	0.672715	0.555053	02:38

epoch	train_loss	valid_loss	quadratic_kappa	time
0	0.672804	0.625812	0.549030	03:35
1	0.623003	0.776893	0.600862	03:35
2	0.596965	0.554515	0.580464	03:36
3	0.523275	0.556523	0.601181	03:35
4	0.477067	0.476698	0.667192	03:35