Probits near zero with Bernoulli likelihood (Deep Kernel Learning) #2284

anttiyla · 2023-02-27T10:42:07Z

anttiyla
Feb 27, 2023

I'm working on my Master's Thesis in which I'm trying evaluate the predictive uncertainties achieved with two different GP-approximations applied on top of a DNN (BERT in this case). One of the methods is DUE (van Amersfoot et al, 2022) applied to a binary classification task. In the original paper, based on the source code (https://github.com/y0ast/DUE), it appears a Softmax -likelihood was used. If I've understood correctly, GPytorch implementation of this uses the SVDKL approach described in Wilson (2016) with mixing weights etc. I would, however, want to use a Bernoulli likelihood as to maintain a better comparability between my other method.

I thought I already had everything running properly, getting accuracies over 0.9 and estimated calibration errors in the range of 0.5-0.7, but further review showed that all the probits were practically zero corresponding to estimated class probalities in the range of 0.499-0.501? I wonder if this is somehow expected behaviour or is there something which could explain this? My model is defined as in the DUE-source with BERT as a feature-extractor. Training loop for a single epoch is as follows:

     for j ,data in enumerate(training_loader, 0):
          
          # Gather inputs
          ids = data['ids'].to(device, dtype = torch.long)
          mask = data['mask'].to(device, dtype = torch.long)
          token_type_ids = data['token_type_ids'].to(device, dtype = torch.long)
          targets = data['targets'].to(device, dtype = torch.float)

          # Reshape
          ids = ids.squeeze()
          mask = mask.squeeze()
          token_type_ids = token_type_ids.squeeze()
          targets = targets.squeeze()

          optimizer.zero_grad()

          probits = model(ids, mask, token_type_ids)
          probs = likelihood(probits).probs
          acc_train = accuracy(probs, targets)

          loss = -elbo_fn(probits, targets)
          loss.backward()

          optimizer.step()
          scheduler.step()

        acc_train = accuracy.compute()
        accuracy.reset()

with

likelihood = gpytorch.likelihoods.BernoulliLikelihood()

and

elbo_fn = gpytorch.mlls.VariationalELBO(likelihood, model.gp, num_data=len(training_set))

Any help would be highly appreciated as my due date approaches at a frightening speed :)

Also, as I'm not quite familiar with the SVDKL -approach, would setting mixing_weights=False in my setting equal to logistic -regression with one-hot encoded labels? If so, this could serve as a backup plan for me...

-Antti

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Probits near zero with Bernoulli likelihood (Deep Kernel Learning) #2284

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Probits near zero with Bernoulli likelihood (Deep Kernel Learning) #2284

anttiyla Feb 27, 2023

Replies: 0 comments

anttiyla
Feb 27, 2023