-
-
Notifications
You must be signed in to change notification settings - Fork 41
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEATS] [FusedDropoutLayerNorm] [FusedDenseGELUDense]
- Loading branch information
Kye
committed
Dec 20, 2023
1 parent
bddc2df
commit 80e55d0
Showing
6 changed files
with
294 additions
and
33 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
# FusedDropoutLayerNorm Documentation | ||
|
||
## Overview | ||
|
||
The `FusedDropoutLayerNorm` module in PyTorch is designed to combine two commonly used operations in neural networks: dropout and layer normalization. This fusion aims to enhance the efficiency of the model by reducing the overhead associated with sequential operations. The module is particularly useful in scenarios where both dropout and layer normalization are critical for the model's performance. | ||
|
||
## Class Definition | ||
|
||
### `FusedDropoutLayerNorm` | ||
|
||
```python | ||
class FusedDropoutLayerNorm(nn.Module): | ||
""" | ||
This class fuses Dropout and LayerNorm into a single module for efficiency. | ||
Args: | ||
dim (int): Input dimension of the layer. | ||
dropout (float, optional): Probability of an element to be zeroed. Defaults to 0.1. | ||
eps (float, optional): A value added to the denominator for numerical stability. Defaults to 1e-5. | ||
elementwise_affine (bool, optional): A flag to enable learning of affine parameters. Defaults to True. | ||
""" | ||
``` | ||
|
||
## Constructor Parameters | ||
|
||
| Parameter | Type | Description | Default Value | | ||
|---------------------|---------|----------------------------------------------------------|---------------| | ||
| `dim` | int | The input dimension of the layer. | - | | ||
| `dropout` | float | Dropout probability. | 0.1 | | ||
| `eps` | float | Epsilon for numerical stability in LayerNorm. | 1e-5 | | ||
| `elementwise_affine`| bool | Enables learning of affine parameters in LayerNorm. | True | | ||
|
||
## Methods | ||
|
||
### `forward` | ||
|
||
```python | ||
def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
""" | ||
Forward pass of FusedDropoutLayerNorm. | ||
Args: | ||
x (torch.Tensor): The input tensor. | ||
Returns: | ||
torch.Tensor: The output tensor after applying dropout and layer normalization. | ||
""" | ||
``` | ||
|
||
## Examples | ||
|
||
### Basic Usage | ||
|
||
```python | ||
import torch | ||
from torch import nn | ||
from zeta.nn import FusedDropoutLayerNorm | ||
|
||
# Initialize the module | ||
model = FusedDropoutLayerNorm(dim=512) | ||
|
||
# Create a sample input tensor | ||
x = torch.randn(1, 512) | ||
|
||
# Forward pass | ||
output = model(x) | ||
|
||
# Check output shape | ||
print(output.shape) # Expected: torch.Size([1, 512]) | ||
``` | ||
|
||
### Integration in a Neural Network | ||
|
||
```python | ||
import torch | ||
import torch.nn as nn | ||
from zeta.nn import FusedDropoutLayerNorm | ||
|
||
class SampleModel(nn.Module): | ||
def __init__(self): | ||
super(SampleModel, self).__init__() | ||
self.linear = nn.Linear(512, 512) | ||
self.fused_dropout_layernorm = FusedDropoutLayerNorm(512) | ||
|
||
def forward(self, x): | ||
x = self.linear(x) | ||
x = self.fused_dropout_layernorm(x) | ||
return x | ||
|
||
# Example | ||
model = SampleModel() | ||
input_tensor = torch.randn(10, 512) | ||
output = model(input_tensor) | ||
print(output.shape) # Expected: torch.Size([10, 512]) | ||
``` | ||
|
||
### Custom Configuration | ||
|
||
```python | ||
import torch | ||
from zeta.nn import FusedDropoutLayerNorm | ||
|
||
# Custom configuration | ||
dropout_rate = 0.2 | ||
epsilon = 1e-6 | ||
elementwise_affine = False | ||
|
||
# Initialize the module with custom configuration | ||
model = FusedDropoutLayerNorm(512, dropout=dropout_rate, eps=epsilon, elementwise_affine=elementwise_affine) | ||
|
||
# Sample input | ||
x = torch.randn(1, 512) | ||
|
||
# Forward pass | ||
output = model(x) | ||
print(output.shape) # Expected: torch.Size([1, 512]) | ||
``` | ||
|
||
## Architecture and Working | ||
|
||
The `FusedDropoutLayerNorm` module is architecturally a combination of two PyTorch layers: `nn.Dropout` and `nn.LayerNorm`. The fusion of these layers into a single module ensures that the operations are performed sequentially and efficiently, thereby reducing the computational overhead. | ||
|
||
- **Dropout**: This operation randomly zeroes some of the elements of the input tensor with probability `dropout` during training. It helps prevent overfitting. | ||
- **Layer Normalization**: This operation normalizes the input across the features. It stabilizes the learning process and accelerates the training of deep neural networks. | ||
|
||
By integrating these two operations, `FusedDropoutLayerNorm` ensures a streamlined process where the dropout is applied first, followed by layer normalization. This design choice is made for computational efficiency and is particularly beneficial in transformer models and other deep learning architectures where both operations are frequently used. | ||
|
||
## Purpose and Importance | ||
|
||
The primary purpose of `FusedDropoutLayerNorm` is to provide a more efficient way to apply both dropout and layer normalization in a model. This efficiency is particularly crucial in | ||
|
||
large-scale models where computational resources and runtime are significant concerns. The module is designed to be versatile and can be easily integrated into various neural network architectures, especially those involving transformer models. | ||
|
||
## Conclusion | ||
|
||
The `FusedDropoutLayerNorm` module in PyTorch is a practical and efficient solution for models that require both dropout and layer normalization. Its fused architecture not only enhances computational efficiency but also simplifies the model design process. The module is flexible, allowing for easy customization and integration into diverse neural network architectures. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
import torch | ||
from torch import nn | ||
from zeta.nn.modules.fused_dropout_layernom import FusedDropoutLayerNorm | ||
|
||
|
||
def test_class_init(): | ||
model = FusedDropoutLayerNorm(512) | ||
|
||
assert isinstance(model.dropout, nn.Dropout) | ||
assert isinstance(model.layer_norm, nn.LayerNorm) | ||
|
||
|
||
def test_class_init_with_args(): | ||
model = FusedDropoutLayerNorm( | ||
512, dropout=0.2, eps=1e-6, elementwise_affine=False | ||
) | ||
|
||
assert isinstance(model.dropout, nn.Dropout) | ||
assert isinstance(model.layer_norm, nn.LayerNorm) | ||
assert model.dropout.p == 0.2 | ||
assert model.layer_norm.eps == 1e-6 | ||
assert model.layer_norm.elementwise_affine is False | ||
|
||
|
||
def test_forward(): | ||
model = FusedDropoutLayerNorm(512) | ||
x = torch.randn(1, 512) | ||
out = model(x) | ||
|
||
assert out.shape == torch.Size([1, 512]) | ||
|
||
|
||
def test_forward_with_different_input(): | ||
model = FusedDropoutLayerNorm(512) | ||
x = torch.randn(2, 512) | ||
out = model(x) | ||
|
||
assert out.shape == torch.Size([2, 512]) | ||
|
||
|
||
def test_forward_with_different_dim(): | ||
model = FusedDropoutLayerNorm(256) | ||
x = torch.randn(1, 256) | ||
out = model(x) | ||
|
||
assert out.shape == torch.Size([1, 256]) | ||
|
||
|
||
def test_forward_with_different_dropout(): | ||
model = FusedDropoutLayerNorm(512, dropout=0.2) | ||
x = torch.randn(1, 512) | ||
out = model(x) | ||
|
||
assert out.shape == torch.Size([1, 512]) | ||
|
||
|
||
def test_forward_with_different_eps(): | ||
model = FusedDropoutLayerNorm(512, eps=1e-6) | ||
x = torch.randn(1, 512) | ||
out = model(x) | ||
|
||
assert out.shape == torch.Size([1, 512]) | ||
|
||
|
||
def test_forward_with_no_elementwise_affine(): | ||
model = FusedDropoutLayerNorm(512, elementwise_affine=False) | ||
x = torch.randn(1, 512) | ||
out = model(x) | ||
|
||
assert out.shape == torch.Size([1, 512]) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
import torch | ||
from torch import nn | ||
|
||
|
||
class FusedDropoutLayerNorm(nn.Module): | ||
"""FusedDropoutLayerNorm | ||
Args: | ||
dim (int): Input dimension | ||
dropout (float, optional): Dropout. Defaults to 0.1. | ||
eps (float, optional): Epsilon. Defaults to 1e-5. | ||
elementwise_affine (bool, optional): Elementwise affine. Defaults to True. | ||
Examples: | ||
>>> x = torch.randn(1, 512) | ||
>>> model = FusedDropoutLayerNorm(512) | ||
>>> out = model(x) | ||
>>> out.shape | ||
torch.Size([1, 512]) | ||
""" | ||
|
||
def __init__( | ||
self, | ||
dim: int, | ||
dropout: float = 0.1, | ||
eps: float = 1e-5, | ||
elementwise_affine: bool = True, | ||
*args, | ||
**kwargs, | ||
): | ||
super(FusedDropoutLayerNorm, self).__init__() | ||
|
||
# Dropout initialization | ||
self.dropout = nn.Dropout(dropout) | ||
|
||
# LayerNorm initialization | ||
self.layer_norm = nn.LayerNorm( | ||
dim, eps=eps, elementwise_affine=elementwise_affine, *args, **kwargs | ||
) | ||
|
||
def forward(self, x: torch.Tensor) -> torch.Tensor: | ||
"""Forward pass | ||
Args: | ||
x (torch.Tensor): tensor | ||
Returns: | ||
""" | ||
x = self.dropout(x) | ||
return self.layer_norm(x) |
Oops, something went wrong.