super overfitting issue on tinyVGG model in section 6. Thank you friends #1180

ErsiZhao · 2025-02-20T00:19:37Z

ErsiZhao
Feb 20, 2025

here is full code, I tried
resize to 128X128
tried use different transform argmentation, see code below
tried adding more hidden layers to the model
tried more epochs.

it seems test_acc stucked at 0.1979-0.26.

###############code below##################

device = "cuda" if torch.cuda.is_available() else "cpu"

import torch
from torch import nn
from torchvision import transforms
from torch.utils.data import DataLoader
from torchvision import datasets
import os

class TinyVGG_change(nn.Module):
def init(self,
input_shape: int,
hidden_units: int,
output_shape: int) -> None:
super().init()
self.Conv_black_1=nn.Sequential(
nn.Conv2d(
in_channels=input_shape,
out_channels=hidden_units,
kernel_size=3, #3x3
stride=1,
padding=1
),
nn.ReLU(),
nn.Conv2d(
in_channels=hidden_units,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,
stride=2)
)
self.Conv_black_2=nn.Sequential(
nn.Conv2d(
in_channels=hidden_units,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1
),
nn.ReLU(),
nn.Conv2d(
in_channels=hidden_units,
out_channels=hidden_units,
kernel_size=3,
stride=1,
padding=1
),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2,
stride=2)
)

self.classifier = nn.Sequential(
    nn.Flatten(),
    nn.Linear(in_features=32*16*16, out_features=256),
    nn.ReLU(),
    nn.Dropout(0.5),  # 50% dropout
    nn.Linear(256, output_shape)

)

def forward(self, x):
x=self.Conv_black_1(x)
x=self.Conv_black_2(x)
x=self.classifier(x)

return x

train_transform_trivial = transforms.Compose([
transforms.Resize((64, 64)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(5), # Reduce rotation from 10 to 5 degrees
transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)), # Less translation
transforms.ColorJitter(brightness=0.05, contrast=0.05, saturation=0.05), # Less jitter
transforms.RandomResizedCrop(128, scale=(0.9, 1.0)), # Reduce cropping
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

test_transform_simple = transforms.Compose([
transforms.Resize((64, 64)),
transforms.ToTensor()]
)

train_data_augmented = datasets.ImageFolder(root=train_dir,
transform=train_transform_trivial,
target_transform=None)

test_data_simple = datasets.ImageFolder(root=test_dir,
transform=test_transform_simple,
target_transform=None)

BATCH_SIZE=32
NUM_WORKERS=os.cpu_count()

torch.manual_seed(42)
torch.cuda.manual_seed(42)
train_dataloader_augmented = DataLoader(dataset=train_data_augmented,
batch_size=BATCH_SIZE,
num_workers=NUM_WORKERS,
shuffle=True)
test_dataloader_simple = DataLoader(dataset=test_data_simple,
batch_size=BATCH_SIZE,
num_workers=NUM_WORKERS,
shuffle=False)
train_dataloader_augmented, test_dataloader_simple

NUM_EPOCHS =40
model_1=TinyVGG_change(
input_shape=3,
hidden_units=32,
output_shape=len(train_data_augmented.classes)
).to(device)
loss=nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model_1.parameters(), lr=0.001, momentum=0.9, weight_decay=0.0005)

from timeit import default_timer as timer
start_time=timer()

model_1_results=train(model=model_1,
train_dataloader=train_dataloader_augmented,
test_dataloader=test_dataloader_simple,
optimizer=optimizer,
loss_fn=loss,
epochs=NUM_EPOCHS,
device=device)
end_time=timer()
print(f"Total training time: {end_time-start_time:.3f} seconds")
`

Prezzo-K · 2025-02-20T02:38:25Z

Prezzo-K
Feb 20, 2025

Seems like your model is underfitting and kinda stuck in a local minimum and is bouncing back and forth. I will recommend try using Adam optimizer instead of SGD and leave the weight decay parameters default unless you wanna regularize. Also, try different learning rates and see what works. Also, I wouldn't recommend using transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) if your dataset is quite small .

2 replies

ErsiZhao Feb 20, 2025
Author

Copy that Prezzo, here is some new fixes and result. I tried change training rate to 0.00001 as well, no luck with reduce overfitting on test loss, but these changes make training set more stable and thank you for your help. Do you think dataset is too small,
1 Im not quite sure about what is the correct dataset size, or
2 whats the ideal model accuracy, 70% consider a good model?
3Also I did some googling about ViTs YOLO Detectron2 and CLIP models, do you try those models?
train_transform_trivial = transforms.Compose([
transforms.Resize((64, 64)),
transforms.RandomHorizontalFlip(p=0.5),
transforms.RandomRotation(5), # Reduce rotation from 10 to 5 degrees
transforms.RandomAffine(degrees=0, translate=(0.05, 0.05)), # Less translation
transforms.ColorJitter(brightness=0.05, contrast=0.05, saturation=0.05), # Less jitter
# transforms.RandomResizedCrop(128, scale=(0.9, 1.0)), # Reduce cropping
transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])<---------- new

NUM_EPOCHS =100 <---------- new

model_1=TinyVGG_change(
input_shape=3,
hidden_units= 128, <---------- new
output_shape=len(train_data_augmented.classes)
).to(device)

optimizer=torch.optim.Adam(params=model_1.parameters(), lr=0.001) <---------- new

Prezzo-K Feb 20, 2025

Yeah, I can see it is still overfitting. There are many things that could go wrong including the one that you hinted at, the dataset.

1.I'm not quite sure about what is the correct dataset size, or
It all depends on your use case. For learning purposes, toy datasets are excellent like the pizza_steak_sushi dataset in this repo or similar.

2 What is the ideal model accuracy, 70% consider a good model?
For industry use or similar, that is pretty low and you wouldn't use that model. For learning etc it is okay. If you can increase it, better I will say.

3.Also I did some googling about ViTs YOLO Detectron2 and CLIP models, do you try those models?
Yeah, try ViTs. They are the best in my experience in this situation if a model overfits. They are always hungry for data. But if your dataset is small it will only make things worse. You can use CNNs in that case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

super overfitting issue on tinyVGG model in section 6. Thank you friends #1180

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

super overfitting issue on tinyVGG model in section 6. Thank you friends #1180

ErsiZhao Feb 20, 2025

Replies: 1 comment · 2 replies

Prezzo-K Feb 20, 2025

ErsiZhao Feb 20, 2025 Author

Prezzo-K Feb 20, 2025

ErsiZhao
Feb 20, 2025

Replies: 1 comment 2 replies

Prezzo-K
Feb 20, 2025

ErsiZhao Feb 20, 2025
Author