[FEATS] [FusedDropoutLayerNorm] [FusedDenseGELUDense]

kyegomez · Dec 20, 2023 · 80e55d0 · 80e55d0
1 parent bddc2df
commit 80e55d0
Show file tree

Hide file tree

Showing 6 changed files with 294 additions and 33 deletions.
diff --git a/docs/zeta/nn/modules/fused_dropout_layernorm.md b/docs/zeta/nn/modules/fused_dropout_layernorm.md
@@ -0,0 +1,137 @@
+# FusedDropoutLayerNorm Documentation
+
+## Overview
+
+The `FusedDropoutLayerNorm` module in PyTorch is designed to combine two commonly used operations in neural networks: dropout and layer normalization. This fusion aims to enhance the efficiency of the model by reducing the overhead associated with sequential operations. The module is particularly useful in scenarios where both dropout and layer normalization are critical for the model's performance.
+
+## Class Definition
+
+### `FusedDropoutLayerNorm`
+
+```python
+class FusedDropoutLayerNorm(nn.Module):
+    """
+    This class fuses Dropout and LayerNorm into a single module for efficiency.
+
+    Args:
+        dim (int): Input dimension of the layer.
+        dropout (float, optional): Probability of an element to be zeroed. Defaults to 0.1.
+        eps (float, optional): A value added to the denominator for numerical stability. Defaults to 1e-5.
+        elementwise_affine (bool, optional): A flag to enable learning of affine parameters. Defaults to True.
+    """
+```
+
+## Constructor Parameters
+
+| Parameter           | Type    | Description                                              | Default Value |
+|---------------------|---------|----------------------------------------------------------|---------------|
+| `dim`               | int     | The input dimension of the layer.                        | -             |
+| `dropout`           | float   | Dropout probability.                                     | 0.1           |
+| `eps`               | float   | Epsilon for numerical stability in LayerNorm.            | 1e-5          |
+| `elementwise_affine`| bool    | Enables learning of affine parameters in LayerNorm.      | True          |
+
+## Methods
+
+### `forward`
+
+```python
+def forward(self, x: torch.Tensor) -> torch.Tensor:
+    """
+    Forward pass of FusedDropoutLayerNorm.
+
+    Args:
+        x (torch.Tensor): The input tensor.
+
+    Returns:
+        torch.Tensor: The output tensor after applying dropout and layer normalization.
+    """
+```
+
+## Examples
+
+### Basic Usage
+
+```python
+import torch
+from torch import nn
+from zeta.nn import FusedDropoutLayerNorm
+
+# Initialize the module
+model = FusedDropoutLayerNorm(dim=512)
+
+# Create a sample input tensor
+x = torch.randn(1, 512)
+
+# Forward pass
+output = model(x)
+
+# Check output shape
+print(output.shape)  # Expected: torch.Size([1, 512])
+```
+
+### Integration in a Neural Network
+
+```python
+import torch
+import torch.nn as nn
+from zeta.nn import FusedDropoutLayerNorm
+
+class SampleModel(nn.Module):
+    def __init__(self):
+        super(SampleModel, self).__init__()
+        self.linear = nn.Linear(512, 512)
+        self.fused_dropout_layernorm = FusedDropoutLayerNorm(512)
+
+    def forward(self, x):
+        x = self.linear(x)
+        x = self.fused_dropout_layernorm(x)
+        return x
+
+# Example
+model = SampleModel()
+input_tensor = torch.randn(10, 512)
+output = model(input_tensor)
+print(output.shape)  # Expected: torch.Size([10, 512])
+```
+
+### Custom Configuration
+
+```python
+import torch
+from zeta.nn import FusedDropoutLayerNorm
+
+# Custom configuration
+dropout_rate = 0.2
+epsilon = 1e-6
+elementwise_affine = False
+
+# Initialize the module with custom configuration
+model = FusedDropoutLayerNorm(512, dropout=dropout_rate, eps=epsilon, elementwise_affine=elementwise_affine)
+
+# Sample input
+x = torch.randn(1, 512)
+
+# Forward pass
+output = model(x)
+print(output.shape)  # Expected: torch.Size([1, 512])
+```
+
+## Architecture and Working
+
+The `FusedDropoutLayerNorm` module is architecturally a combination of two PyTorch layers: `nn.Dropout` and `nn.LayerNorm`. The fusion of these layers into a single module ensures that the operations are performed sequentially and efficiently, thereby reducing the computational overhead.
+
+- **Dropout**: This operation randomly zeroes some of the elements of the input tensor with probability `dropout` during training. It helps prevent overfitting.
+- **Layer Normalization**: This operation normalizes the input across the features. It stabilizes the learning process and accelerates the training of deep neural networks.
+
+By integrating these two operations, `FusedDropoutLayerNorm` ensures a streamlined process where the dropout is applied first, followed by layer normalization. This design choice is made for computational efficiency and is particularly beneficial in transformer models and other deep learning architectures where both operations are frequently used.
+
+## Purpose and Importance
+
+The primary purpose of `FusedDropoutLayerNorm` is to provide a more efficient way to apply both dropout and layer normalization in a model. This efficiency is particularly crucial in
+
+ large-scale models where computational resources and runtime are significant concerns. The module is designed to be versatile and can be easily integrated into various neural network architectures, especially those involving transformer models.
+
+## Conclusion
+
+The `FusedDropoutLayerNorm` module in PyTorch is a practical and efficient solution for models that require both dropout and layer normalization. Its fused architecture not only enhances computational efficiency but also simplifies the model design process. The module is flexible, allowing for easy customization and integration into diverse neural network architectures.
+
diff --git a/tests/nn/modules/test_fused_dropout_layernom.py b/tests/nn/modules/test_fused_dropout_layernom.py
@@ -0,0 +1,70 @@
+import torch
+from torch import nn
+from zeta.nn.modules.fused_dropout_layernom import FusedDropoutLayerNorm
+
+
+def test_class_init():
+    model = FusedDropoutLayerNorm(512)
+
+    assert isinstance(model.dropout, nn.Dropout)
+    assert isinstance(model.layer_norm, nn.LayerNorm)
+
+
+def test_class_init_with_args():
+    model = FusedDropoutLayerNorm(
+        512, dropout=0.2, eps=1e-6, elementwise_affine=False
+    )
+
+    assert isinstance(model.dropout, nn.Dropout)
+    assert isinstance(model.layer_norm, nn.LayerNorm)
+    assert model.dropout.p == 0.2
+    assert model.layer_norm.eps == 1e-6
+    assert model.layer_norm.elementwise_affine is False
+
+
+def test_forward():
+    model = FusedDropoutLayerNorm(512)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
+
+
+def test_forward_with_different_input():
+    model = FusedDropoutLayerNorm(512)
+    x = torch.randn(2, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([2, 512])
+
+
+def test_forward_with_different_dim():
+    model = FusedDropoutLayerNorm(256)
+    x = torch.randn(1, 256)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 256])
+
+
+def test_forward_with_different_dropout():
+    model = FusedDropoutLayerNorm(512, dropout=0.2)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
+
+
+def test_forward_with_different_eps():
+    model = FusedDropoutLayerNorm(512, eps=1e-6)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
+
+
+def test_forward_with_no_elementwise_affine():
+    model = FusedDropoutLayerNorm(512, elementwise_affine=False)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
diff --git a/tests/nn/modules/test_fused_gelu_dense.py b/tests/nn/modules/test_fused_gelu_dense.py
@@ -2,6 +2,7 @@
 import torch
 from zeta.nn.modules.fused_gelu_dense import FusedDenseGELUDense
 
+
 def test_class_init():
     model = FusedDenseGELUDense(512, 1024)
 
@@ -11,60 +12,70 @@ def test_class_init():
     assert model.has_fp16_weights == False
     assert model.threshold == 6.0
 
+
 def test_class_init_with_args():
-    model = FusedDenseGELUDense(512, 1024, bias=False, has_fp16_weights=True, threshold=5.0)
+    model = FusedDenseGELUDense(
+        512, 1024, bias=False, has_fp16_weights=True, threshold=5.0
+    )
 
     assert model.dim == 512
     assert model.dim_out == 1024
     assert model.bias == False
     assert model.has_fp16_weights == True
     assert model.threshold == 5.0
 
+
 def test_forward():
     model = FusedDenseGELUDense(512, 1024)
     x = torch.randn(1, 512)
     out = model(x)
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_different_input():
     model = FusedDenseGELUDense(512, 1024)
     x = torch.randn(2, 512)
     out = model(x)
 
     assert out.shape == torch.Size([2, 512])
 
+
 def test_forward_with_different_dim():
     model = FusedDenseGELUDense(256, 512)
     x = torch.randn(1, 256)
     out = model(x)
 
     assert out.shape == torch.Size([1, 256])
 
+
 def test_forward_with_different_dim_out():
     model = FusedDenseGELUDense(512, 2048)
     x = torch.randn(1, 512)
     out = model(x)
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_no_bias():
     model = FusedDenseGELUDense(512, 1024, bias=False)
     x = torch.randn(1, 512)
     out = model(x)
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_fp16_weights():
     model = FusedDenseGELUDense(512, 1024, has_fp16_weights=True)
     x = torch.randn(1, 512)
     out = model(x)
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_different_threshold():
     model = FusedDenseGELUDense(512, 1024, threshold=5.0)
     x = torch.randn(1, 512)
     out = model(x)
 
-    assert out.shape == torch.Size([1, 512])
+    assert out.shape == torch.Size([1, 512])
diff --git a/zeta/cloud/main.py b/zeta/cloud/main.py
@@ -1,6 +1,8 @@
 import logging
 from typing import Any
-from sky import Resources, AWS
+
+from sky import AWS, Resources
+
 from zeta.cloud.sky_api import SkyInterface
 
 skyapi = SkyInterface(stream_logs_enabled=True)
@@ -14,8 +16,9 @@
 def zetacloud(
     task_name: str = None,
     cluster_name: str = "ZetaTrainingRun",
+    setup: str = "pip install -r requirements.txt",
     cloud: Any = AWS(),
-    gpus: str = None,
+    gpus: str = "V100:4",
     filename: str = "train.py",
     stop: bool = False,
     down: bool = False,
@@ -34,7 +37,7 @@ def zetacloud(
     try:
         task = skyapi.create_task(
             name=task_name,
-            setup="pip install -r requirements.txt",
+            setup=setup,
             run=f"python {filename}",
             workdir=".",
         )

diff --git a/zeta/nn/modules/fused_dropout_layernom.py b/zeta/nn/modules/fused_dropout_layernom.py
@@ -0,0 +1,51 @@
+import torch
+from torch import nn
+
+
+class FusedDropoutLayerNorm(nn.Module):
+    """FusedDropoutLayerNorm
+
+    Args:
+        dim (int): Input dimension
+        dropout (float, optional): Dropout. Defaults to 0.1.
+        eps (float, optional): Epsilon. Defaults to 1e-5.
+        elementwise_affine (bool, optional): Elementwise affine. Defaults to True.
+
+    Examples:
+        >>> x = torch.randn(1, 512)
+        >>> model = FusedDropoutLayerNorm(512)
+        >>> out = model(x)
+        >>> out.shape
+        torch.Size([1, 512])
+    """
+
+    def __init__(
+        self,
+        dim: int,
+        dropout: float = 0.1,
+        eps: float = 1e-5,
+        elementwise_affine: bool = True,
+        *args,
+        **kwargs,
+    ):
+        super(FusedDropoutLayerNorm, self).__init__()
+
+        # Dropout initialization
+        self.dropout = nn.Dropout(dropout)
+
+        # LayerNorm initialization
+        self.layer_norm = nn.LayerNorm(
+            dim, eps=eps, elementwise_affine=elementwise_affine, *args, **kwargs
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward pass
+
+        Args:
+            x (torch.Tensor): tensor
+
+        Returns:
+
+        """
+        x = self.dropout(x)
+        return self.layer_norm(x)