Skip to content

Commit

Permalink
[FEATS] [FusedDropoutLayerNorm] [FusedDenseGELUDense]
Browse files Browse the repository at this point in the history
  • Loading branch information
Kye committed Dec 20, 2023
1 parent bddc2df commit 80e55d0
Show file tree
Hide file tree
Showing 6 changed files with 294 additions and 33 deletions.
137 changes: 137 additions & 0 deletions docs/zeta/nn/modules/fused_dropout_layernorm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# FusedDropoutLayerNorm Documentation

## Overview

The `FusedDropoutLayerNorm` module in PyTorch is designed to combine two commonly used operations in neural networks: dropout and layer normalization. This fusion aims to enhance the efficiency of the model by reducing the overhead associated with sequential operations. The module is particularly useful in scenarios where both dropout and layer normalization are critical for the model's performance.

## Class Definition

### `FusedDropoutLayerNorm`

```python
class FusedDropoutLayerNorm(nn.Module):
"""
This class fuses Dropout and LayerNorm into a single module for efficiency.
Args:
dim (int): Input dimension of the layer.
dropout (float, optional): Probability of an element to be zeroed. Defaults to 0.1.
eps (float, optional): A value added to the denominator for numerical stability. Defaults to 1e-5.
elementwise_affine (bool, optional): A flag to enable learning of affine parameters. Defaults to True.
"""
```

## Constructor Parameters

| Parameter | Type | Description | Default Value |
|---------------------|---------|----------------------------------------------------------|---------------|
| `dim` | int | The input dimension of the layer. | - |
| `dropout` | float | Dropout probability. | 0.1 |
| `eps` | float | Epsilon for numerical stability in LayerNorm. | 1e-5 |
| `elementwise_affine`| bool | Enables learning of affine parameters in LayerNorm. | True |

## Methods

### `forward`

```python
def forward(self, x: torch.Tensor) -> torch.Tensor:
"""
Forward pass of FusedDropoutLayerNorm.
Args:
x (torch.Tensor): The input tensor.
Returns:
torch.Tensor: The output tensor after applying dropout and layer normalization.
"""
```

## Examples

### Basic Usage

```python
import torch
from torch import nn
from zeta.nn import FusedDropoutLayerNorm

# Initialize the module
model = FusedDropoutLayerNorm(dim=512)

# Create a sample input tensor
x = torch.randn(1, 512)

# Forward pass
output = model(x)

# Check output shape
print(output.shape) # Expected: torch.Size([1, 512])
```

### Integration in a Neural Network

```python
import torch
import torch.nn as nn
from zeta.nn import FusedDropoutLayerNorm

class SampleModel(nn.Module):
def __init__(self):
super(SampleModel, self).__init__()
self.linear = nn.Linear(512, 512)
self.fused_dropout_layernorm = FusedDropoutLayerNorm(512)

def forward(self, x):
x = self.linear(x)
x = self.fused_dropout_layernorm(x)
return x

# Example
model = SampleModel()
input_tensor = torch.randn(10, 512)
output = model(input_tensor)
print(output.shape) # Expected: torch.Size([10, 512])
```

### Custom Configuration

```python
import torch
from zeta.nn import FusedDropoutLayerNorm

# Custom configuration
dropout_rate = 0.2
epsilon = 1e-6
elementwise_affine = False

# Initialize the module with custom configuration
model = FusedDropoutLayerNorm(512, dropout=dropout_rate, eps=epsilon, elementwise_affine=elementwise_affine)

# Sample input
x = torch.randn(1, 512)

# Forward pass
output = model(x)
print(output.shape) # Expected: torch.Size([1, 512])
```

## Architecture and Working

The `FusedDropoutLayerNorm` module is architecturally a combination of two PyTorch layers: `nn.Dropout` and `nn.LayerNorm`. The fusion of these layers into a single module ensures that the operations are performed sequentially and efficiently, thereby reducing the computational overhead.

- **Dropout**: This operation randomly zeroes some of the elements of the input tensor with probability `dropout` during training. It helps prevent overfitting.
- **Layer Normalization**: This operation normalizes the input across the features. It stabilizes the learning process and accelerates the training of deep neural networks.

By integrating these two operations, `FusedDropoutLayerNorm` ensures a streamlined process where the dropout is applied first, followed by layer normalization. This design choice is made for computational efficiency and is particularly beneficial in transformer models and other deep learning architectures where both operations are frequently used.

## Purpose and Importance

The primary purpose of `FusedDropoutLayerNorm` is to provide a more efficient way to apply both dropout and layer normalization in a model. This efficiency is particularly crucial in

large-scale models where computational resources and runtime are significant concerns. The module is designed to be versatile and can be easily integrated into various neural network architectures, especially those involving transformer models.

## Conclusion

The `FusedDropoutLayerNorm` module in PyTorch is a practical and efficient solution for models that require both dropout and layer normalization. Its fused architecture not only enhances computational efficiency but also simplifies the model design process. The module is flexible, allowing for easy customization and integration into diverse neural network architectures.

70 changes: 70 additions & 0 deletions tests/nn/modules/test_fused_dropout_layernom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import torch
from torch import nn
from zeta.nn.modules.fused_dropout_layernom import FusedDropoutLayerNorm


def test_class_init():
model = FusedDropoutLayerNorm(512)

assert isinstance(model.dropout, nn.Dropout)
assert isinstance(model.layer_norm, nn.LayerNorm)


def test_class_init_with_args():
model = FusedDropoutLayerNorm(
512, dropout=0.2, eps=1e-6, elementwise_affine=False
)

assert isinstance(model.dropout, nn.Dropout)
assert isinstance(model.layer_norm, nn.LayerNorm)
assert model.dropout.p == 0.2
assert model.layer_norm.eps == 1e-6
assert model.layer_norm.elementwise_affine is False


def test_forward():
model = FusedDropoutLayerNorm(512)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])


def test_forward_with_different_input():
model = FusedDropoutLayerNorm(512)
x = torch.randn(2, 512)
out = model(x)

assert out.shape == torch.Size([2, 512])


def test_forward_with_different_dim():
model = FusedDropoutLayerNorm(256)
x = torch.randn(1, 256)
out = model(x)

assert out.shape == torch.Size([1, 256])


def test_forward_with_different_dropout():
model = FusedDropoutLayerNorm(512, dropout=0.2)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])


def test_forward_with_different_eps():
model = FusedDropoutLayerNorm(512, eps=1e-6)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])


def test_forward_with_no_elementwise_affine():
model = FusedDropoutLayerNorm(512, elementwise_affine=False)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])
15 changes: 13 additions & 2 deletions tests/nn/modules/test_fused_gelu_dense.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import torch
from zeta.nn.modules.fused_gelu_dense import FusedDenseGELUDense


def test_class_init():
model = FusedDenseGELUDense(512, 1024)

Expand All @@ -11,60 +12,70 @@ def test_class_init():
assert model.has_fp16_weights == False
assert model.threshold == 6.0


def test_class_init_with_args():
model = FusedDenseGELUDense(512, 1024, bias=False, has_fp16_weights=True, threshold=5.0)
model = FusedDenseGELUDense(
512, 1024, bias=False, has_fp16_weights=True, threshold=5.0
)

assert model.dim == 512
assert model.dim_out == 1024
assert model.bias == False
assert model.has_fp16_weights == True
assert model.threshold == 5.0


def test_forward():
model = FusedDenseGELUDense(512, 1024)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])


def test_forward_with_different_input():
model = FusedDenseGELUDense(512, 1024)
x = torch.randn(2, 512)
out = model(x)

assert out.shape == torch.Size([2, 512])


def test_forward_with_different_dim():
model = FusedDenseGELUDense(256, 512)
x = torch.randn(1, 256)
out = model(x)

assert out.shape == torch.Size([1, 256])


def test_forward_with_different_dim_out():
model = FusedDenseGELUDense(512, 2048)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])


def test_forward_with_no_bias():
model = FusedDenseGELUDense(512, 1024, bias=False)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])


def test_forward_with_fp16_weights():
model = FusedDenseGELUDense(512, 1024, has_fp16_weights=True)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])


def test_forward_with_different_threshold():
model = FusedDenseGELUDense(512, 1024, threshold=5.0)
x = torch.randn(1, 512)
out = model(x)

assert out.shape == torch.Size([1, 512])
assert out.shape == torch.Size([1, 512])
9 changes: 6 additions & 3 deletions zeta/cloud/main.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import logging
from typing import Any
from sky import Resources, AWS

from sky import AWS, Resources

from zeta.cloud.sky_api import SkyInterface

skyapi = SkyInterface(stream_logs_enabled=True)
Expand All @@ -14,8 +16,9 @@
def zetacloud(
task_name: str = None,
cluster_name: str = "ZetaTrainingRun",
setup: str = "pip install -r requirements.txt",
cloud: Any = AWS(),
gpus: str = None,
gpus: str = "V100:4",
filename: str = "train.py",
stop: bool = False,
down: bool = False,
Expand All @@ -34,7 +37,7 @@ def zetacloud(
try:
task = skyapi.create_task(
name=task_name,
setup="pip install -r requirements.txt",
setup=setup,
run=f"python {filename}",
workdir=".",
)
Expand Down
51 changes: 51 additions & 0 deletions zeta/nn/modules/fused_dropout_layernom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import torch
from torch import nn


class FusedDropoutLayerNorm(nn.Module):
"""FusedDropoutLayerNorm
Args:
dim (int): Input dimension
dropout (float, optional): Dropout. Defaults to 0.1.
eps (float, optional): Epsilon. Defaults to 1e-5.
elementwise_affine (bool, optional): Elementwise affine. Defaults to True.
Examples:
>>> x = torch.randn(1, 512)
>>> model = FusedDropoutLayerNorm(512)
>>> out = model(x)
>>> out.shape
torch.Size([1, 512])
"""

def __init__(
self,
dim: int,
dropout: float = 0.1,
eps: float = 1e-5,
elementwise_affine: bool = True,
*args,
**kwargs,
):
super(FusedDropoutLayerNorm, self).__init__()

# Dropout initialization
self.dropout = nn.Dropout(dropout)

# LayerNorm initialization
self.layer_norm = nn.LayerNorm(
dim, eps=eps, elementwise_affine=elementwise_affine, *args, **kwargs
)

def forward(self, x: torch.Tensor) -> torch.Tensor:
"""Forward pass
Args:
x (torch.Tensor): tensor
Returns:
"""
x = self.dropout(x)
return self.layer_norm(x)
Loading

0 comments on commit 80e55d0

Please sign in to comment.