This repository consists of:
vision.datasets
: Data loaders for popular vision datasetsvision.transforms
: Common image transformations such as random crop, rotations etc.[WIP] vision.models
: Model definitions and Pre-trained models for popular models such as AlexNet, VGG, ResNet etc.
Binaries:
conda install torchvision -c https://conda.anaconda.org/t/6N-MsQ4WZ7jo/soumith
From Source:
pip install -r requirements.txt
pip install .
The following dataset loaders are available:
Datasets have the API:
__getitem__
__len__
They all subclass fromtorch.utils.data.Dataset
Hence, they can all be multi-threaded (python multiprocessing) using standard torch.utils.data.DataLoader.
For example:
torch.utils.data.DataLoader(coco_cap, batch_size=args.batchSize, shuffle=True, num_workers=args.nThreads)
In the constructor, each dataset has a slightly different API as needed, but they all take the keyword args:
transform
- a function that takes in an image and returns a transformed version- common stuff like
ToTensor
,RandomCrop
, etc. These can be composed together withtransforms.Compose
(see transforms section below)
- common stuff like
target_transform
- a function that takes in the target and transforms it. For example, take in the caption string and return a tensor of word indices.
This requires the COCO API to be installed
dset.CocoCaptions(root="dir where images are", annFile="json annotation file", [transform, target_transform])
Example:
import torchvision.datasets as dset
import torchvision.transforms as transforms
cap = dset.CocoCaptions(root = 'dir where images are',
annFile = 'json annotation file',
transform=transforms.ToTensor())
print('Number of samples: ', len(cap))
img, target = cap[3] # load 4th sample
print("Image Size: ", img.size())
print(target)
Output:
Number of samples: 82783
Image Size: (3L, 427L, 640L)
[u'A plane emitting smoke stream flying over a mountain.',
u'A plane darts across a bright blue sky behind a mountain covered in snow',
u'A plane leaves a contrail above the snowy mountain top.',
u'A mountain that has a plane flying overheard in the distance.',
u'A mountain view with a plume of smoke in the background']
dset.CocoDetection(root="dir where images are", annFile="json annotation file", [transform, target_transform])
dset.LSUN(db_path, classes='train', [transform, target_transform])
- db_path = root directory for the database files
- classes =
- 'train' - all categories, training set
- 'val' - all categories, validation set
- 'test' - all categories, test set
- ['bedroom_train', 'church_train', ...] : a list of categories to load
A generic data loader where the images are arranged in this way:
root/dog/xxx.png
root/dog/xxy.png
root/dog/xxz.png
root/cat/123.png
root/cat/nsdf3.png
root/cat/asd932_.png
dset.ImageFolder(root="root folder path", [transform, target_transform])
It has the members:
self.classes
- The class names as a listself.class_to_idx
- Corresponding class indicesself.imgs
- The list of (image path, class-index) tuples
This is simply implemented with an ImageFolder dataset.
The data is preprocessed as described here
Transforms are common image transforms.
They can be chained together using transforms.Compose
ToTensor()
- converts PIL Image to TensorNormalize(mean, std)
- normalizes the image given mean, std (for example: mean = [0.3, 1.2, 2.1])Scale(size, interpolation=Image.BILINEAR)
- Scales the smaller image edge to the given size. Interpolation modes are options from PILCenterCrop(size)
- center-crops the image to the given sizeRandomCrop(size)
- Random crops the image to the given size.RandomHorizontalFlip()
- hflip the image with probability 0.5RandomSizedCrop(size, interpolation=Image.BILINEAR)
- Random crop with size 0.08-1 and aspect ratio 3/4 - 4/3 (Inception-style)
One can compose several transforms together. For example.
transform = transforms.Compose([
transforms.RandomSizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(mean = [ 0.485, 0.456, 0.406 ],
std = [ 0.229, 0.224, 0.225 ]),
])