diff --git a/docs/zeta/nn/modules/fused_dropout_layernorm.md b/docs/zeta/nn/modules/fused_dropout_layernorm.md
new file mode 100644
index 00000000..eab36b9c
--- /dev/null
+++ b/docs/zeta/nn/modules/fused_dropout_layernorm.md
@@ -0,0 +1,137 @@
+# FusedDropoutLayerNorm Documentation
+
+## Overview
+
+The `FusedDropoutLayerNorm` module in PyTorch is designed to combine two commonly used operations in neural networks: dropout and layer normalization. This fusion aims to enhance the efficiency of the model by reducing the overhead associated with sequential operations. The module is particularly useful in scenarios where both dropout and layer normalization are critical for the model's performance.
+
+## Class Definition
+
+### `FusedDropoutLayerNorm`
+
+```python
+class FusedDropoutLayerNorm(nn.Module):
+    """
+    This class fuses Dropout and LayerNorm into a single module for efficiency.
+
+    Args:
+        dim (int): Input dimension of the layer.
+        dropout (float, optional): Probability of an element to be zeroed. Defaults to 0.1.
+        eps (float, optional): A value added to the denominator for numerical stability. Defaults to 1e-5.
+        elementwise_affine (bool, optional): A flag to enable learning of affine parameters. Defaults to True.
+    """
+```
+
+## Constructor Parameters
+
+| Parameter           | Type    | Description                                              | Default Value |
+|---------------------|---------|----------------------------------------------------------|---------------|
+| `dim`               | int     | The input dimension of the layer.                        | -             |
+| `dropout`           | float   | Dropout probability.                                     | 0.1           |
+| `eps`               | float   | Epsilon for numerical stability in LayerNorm.            | 1e-5          |
+| `elementwise_affine`| bool    | Enables learning of affine parameters in LayerNorm.      | True          |
+
+## Methods
+
+### `forward`
+
+```python
+def forward(self, x: torch.Tensor) -> torch.Tensor:
+    """
+    Forward pass of FusedDropoutLayerNorm.
+
+    Args:
+        x (torch.Tensor): The input tensor.
+
+    Returns:
+        torch.Tensor: The output tensor after applying dropout and layer normalization.
+    """
+```
+
+## Examples
+
+### Basic Usage
+
+```python
+import torch
+from torch import nn
+from zeta.nn import FusedDropoutLayerNorm
+
+# Initialize the module
+model = FusedDropoutLayerNorm(dim=512)
+
+# Create a sample input tensor
+x = torch.randn(1, 512)
+
+# Forward pass
+output = model(x)
+
+# Check output shape
+print(output.shape)  # Expected: torch.Size([1, 512])
+```
+
+### Integration in a Neural Network
+
+```python
+import torch
+import torch.nn as nn
+from zeta.nn import FusedDropoutLayerNorm
+
+class SampleModel(nn.Module):
+    def __init__(self):
+        super(SampleModel, self).__init__()
+        self.linear = nn.Linear(512, 512)
+        self.fused_dropout_layernorm = FusedDropoutLayerNorm(512)
+
+    def forward(self, x):
+        x = self.linear(x)
+        x = self.fused_dropout_layernorm(x)
+        return x
+
+# Example
+model = SampleModel()
+input_tensor = torch.randn(10, 512)
+output = model(input_tensor)
+print(output.shape)  # Expected: torch.Size([10, 512])
+```
+
+### Custom Configuration
+
+```python
+import torch
+from zeta.nn import FusedDropoutLayerNorm
+
+# Custom configuration
+dropout_rate = 0.2
+epsilon = 1e-6
+elementwise_affine = False
+
+# Initialize the module with custom configuration
+model = FusedDropoutLayerNorm(512, dropout=dropout_rate, eps=epsilon, elementwise_affine=elementwise_affine)
+
+# Sample input
+x = torch.randn(1, 512)
+
+# Forward pass
+output = model(x)
+print(output.shape)  # Expected: torch.Size([1, 512])
+```
+
+## Architecture and Working
+
+The `FusedDropoutLayerNorm` module is architecturally a combination of two PyTorch layers: `nn.Dropout` and `nn.LayerNorm`. The fusion of these layers into a single module ensures that the operations are performed sequentially and efficiently, thereby reducing the computational overhead.
+
+- **Dropout**: This operation randomly zeroes some of the elements of the input tensor with probability `dropout` during training. It helps prevent overfitting.
+- **Layer Normalization**: This operation normalizes the input across the features. It stabilizes the learning process and accelerates the training of deep neural networks.
+
+By integrating these two operations, `FusedDropoutLayerNorm` ensures a streamlined process where the dropout is applied first, followed by layer normalization. This design choice is made for computational efficiency and is particularly beneficial in transformer models and other deep learning architectures where both operations are frequently used.
+
+## Purpose and Importance
+
+The primary purpose of `FusedDropoutLayerNorm` is to provide a more efficient way to apply both dropout and layer normalization in a model. This efficiency is particularly crucial in
+
+ large-scale models where computational resources and runtime are significant concerns. The module is designed to be versatile and can be easily integrated into various neural network architectures, especially those involving transformer models.
+
+## Conclusion
+
+The `FusedDropoutLayerNorm` module in PyTorch is a practical and efficient solution for models that require both dropout and layer normalization. Its fused architecture not only enhances computational efficiency but also simplifies the model design process. The module is flexible, allowing for easy customization and integration into diverse neural network architectures.
+
diff --git a/tests/nn/modules/test_fused_dropout_layernom.py b/tests/nn/modules/test_fused_dropout_layernom.py
new file mode 100644
index 00000000..e38567d8
--- /dev/null
+++ b/tests/nn/modules/test_fused_dropout_layernom.py
@@ -0,0 +1,70 @@
+import torch
+from torch import nn
+from zeta.nn.modules.fused_dropout_layernom import FusedDropoutLayerNorm
+
+
+def test_class_init():
+    model = FusedDropoutLayerNorm(512)
+
+    assert isinstance(model.dropout, nn.Dropout)
+    assert isinstance(model.layer_norm, nn.LayerNorm)
+
+
+def test_class_init_with_args():
+    model = FusedDropoutLayerNorm(
+        512, dropout=0.2, eps=1e-6, elementwise_affine=False
+    )
+
+    assert isinstance(model.dropout, nn.Dropout)
+    assert isinstance(model.layer_norm, nn.LayerNorm)
+    assert model.dropout.p == 0.2
+    assert model.layer_norm.eps == 1e-6
+    assert model.layer_norm.elementwise_affine is False
+
+
+def test_forward():
+    model = FusedDropoutLayerNorm(512)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
+
+
+def test_forward_with_different_input():
+    model = FusedDropoutLayerNorm(512)
+    x = torch.randn(2, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([2, 512])
+
+
+def test_forward_with_different_dim():
+    model = FusedDropoutLayerNorm(256)
+    x = torch.randn(1, 256)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 256])
+
+
+def test_forward_with_different_dropout():
+    model = FusedDropoutLayerNorm(512, dropout=0.2)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
+
+
+def test_forward_with_different_eps():
+    model = FusedDropoutLayerNorm(512, eps=1e-6)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
+
+
+def test_forward_with_no_elementwise_affine():
+    model = FusedDropoutLayerNorm(512, elementwise_affine=False)
+    x = torch.randn(1, 512)
+    out = model(x)
+
+    assert out.shape == torch.Size([1, 512])
diff --git a/tests/nn/modules/test_fused_gelu_dense.py b/tests/nn/modules/test_fused_gelu_dense.py
index 5ea5ce5a..f0390bf7 100644
--- a/tests/nn/modules/test_fused_gelu_dense.py
+++ b/tests/nn/modules/test_fused_gelu_dense.py
@@ -2,6 +2,7 @@
 import torch
 from zeta.nn.modules.fused_gelu_dense import FusedDenseGELUDense
 
+
 def test_class_init():
     model = FusedDenseGELUDense(512, 1024)
 
@@ -11,8 +12,11 @@ def test_class_init():
     assert model.has_fp16_weights == False
     assert model.threshold == 6.0
 
+
 def test_class_init_with_args():
-    model = FusedDenseGELUDense(512, 1024, bias=False, has_fp16_weights=True, threshold=5.0)
+    model = FusedDenseGELUDense(
+        512, 1024, bias=False, has_fp16_weights=True, threshold=5.0
+    )
 
     assert model.dim == 512
     assert model.dim_out == 1024
@@ -20,6 +24,7 @@ def test_class_init_with_args():
     assert model.has_fp16_weights == True
     assert model.threshold == 5.0
 
+
 def test_forward():
     model = FusedDenseGELUDense(512, 1024)
     x = torch.randn(1, 512)
@@ -27,6 +32,7 @@ def test_forward():
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_different_input():
     model = FusedDenseGELUDense(512, 1024)
     x = torch.randn(2, 512)
@@ -34,6 +40,7 @@ def test_forward_with_different_input():
 
     assert out.shape == torch.Size([2, 512])
 
+
 def test_forward_with_different_dim():
     model = FusedDenseGELUDense(256, 512)
     x = torch.randn(1, 256)
@@ -41,6 +48,7 @@ def test_forward_with_different_dim():
 
     assert out.shape == torch.Size([1, 256])
 
+
 def test_forward_with_different_dim_out():
     model = FusedDenseGELUDense(512, 2048)
     x = torch.randn(1, 512)
@@ -48,6 +56,7 @@ def test_forward_with_different_dim_out():
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_no_bias():
     model = FusedDenseGELUDense(512, 1024, bias=False)
     x = torch.randn(1, 512)
@@ -55,6 +64,7 @@ def test_forward_with_no_bias():
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_fp16_weights():
     model = FusedDenseGELUDense(512, 1024, has_fp16_weights=True)
     x = torch.randn(1, 512)
@@ -62,9 +72,10 @@ def test_forward_with_fp16_weights():
 
     assert out.shape == torch.Size([1, 512])
 
+
 def test_forward_with_different_threshold():
     model = FusedDenseGELUDense(512, 1024, threshold=5.0)
     x = torch.randn(1, 512)
     out = model(x)
 
-    assert out.shape == torch.Size([1, 512])
\ No newline at end of file
+    assert out.shape == torch.Size([1, 512])
diff --git a/zeta/cloud/main.py b/zeta/cloud/main.py
index 7b3e1e4e..3d46183d 100644
--- a/zeta/cloud/main.py
+++ b/zeta/cloud/main.py
@@ -1,6 +1,8 @@
 import logging
 from typing import Any
-from sky import Resources, AWS
+
+from sky import AWS, Resources
+
 from zeta.cloud.sky_api import SkyInterface
 
 skyapi = SkyInterface(stream_logs_enabled=True)
@@ -14,8 +16,9 @@
 def zetacloud(
     task_name: str = None,
     cluster_name: str = "ZetaTrainingRun",
+    setup: str = "pip install -r requirements.txt",
     cloud: Any = AWS(),
-    gpus: str = None,
+    gpus: str = "V100:4",
     filename: str = "train.py",
     stop: bool = False,
     down: bool = False,
@@ -34,7 +37,7 @@ def zetacloud(
     try:
         task = skyapi.create_task(
             name=task_name,
-            setup="pip install -r requirements.txt",
+            setup=setup,
             run=f"python {filename}",
             workdir=".",
         )
diff --git a/zeta/nn/modules/fused_dropout_layernom.py b/zeta/nn/modules/fused_dropout_layernom.py
new file mode 100644
index 00000000..8850d47b
--- /dev/null
+++ b/zeta/nn/modules/fused_dropout_layernom.py
@@ -0,0 +1,51 @@
+import torch
+from torch import nn
+
+
+class FusedDropoutLayerNorm(nn.Module):
+    """FusedDropoutLayerNorm
+
+    Args:
+        dim (int): Input dimension
+        dropout (float, optional): Dropout. Defaults to 0.1.
+        eps (float, optional): Epsilon. Defaults to 1e-5.
+        elementwise_affine (bool, optional): Elementwise affine. Defaults to True.
+
+    Examples:
+        >>> x = torch.randn(1, 512)
+        >>> model = FusedDropoutLayerNorm(512)
+        >>> out = model(x)
+        >>> out.shape
+        torch.Size([1, 512])
+    """
+
+    def __init__(
+        self,
+        dim: int,
+        dropout: float = 0.1,
+        eps: float = 1e-5,
+        elementwise_affine: bool = True,
+        *args,
+        **kwargs,
+    ):
+        super(FusedDropoutLayerNorm, self).__init__()
+
+        # Dropout initialization
+        self.dropout = nn.Dropout(dropout)
+
+        # LayerNorm initialization
+        self.layer_norm = nn.LayerNorm(
+            dim, eps=eps, elementwise_affine=elementwise_affine, *args, **kwargs
+        )
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        """Forward pass
+
+        Args:
+            x (torch.Tensor): tensor
+
+        Returns:
+
+        """
+        x = self.dropout(x)
+        return self.layer_norm(x)
diff --git a/zeta/nn/modules/fused_gelu_dense.py b/zeta/nn/modules/fused_gelu_dense.py
index d47d934e..885ac458 100644
--- a/zeta/nn/modules/fused_gelu_dense.py
+++ b/zeta/nn/modules/fused_gelu_dense.py
@@ -1,6 +1,7 @@
-import torch 
+import torch
 from torch import nn
 
+
 class FusedDenseGELUDense(nn.Module):
     """FuseFusedDenseGELUDense
 
@@ -10,7 +11,7 @@ class FusedDenseGELUDense(nn.Module):
         bias (bool, optional): Bias. Defaults to True.
         has_fp16_weights (bool, optional): Use fp16 weights. Defaults to False.
         threshold (float, optional): Threshold for quantization. Defaults to 6.0.
-        
+
     Examples:
         >>> x = torch.randn(1, 512)
         >>> model = FusedDenseGELUDense(512, 1024)
@@ -18,6 +19,7 @@ class FusedDenseGELUDense(nn.Module):
         >>> out.shape
         torch.Size([1, 512])
     """
+
     def __init__(
         self,
         dim: int,
@@ -26,18 +28,18 @@ def __init__(
         has_fp16_weights: bool = False,
         threshold: float = 6.0,
         *args,
-        **kwargs
+        **kwargs,
     ):
         super(FusedDenseGELUDense, self).__init__()
-        self.dim = dim 
+        self.dim = dim
         self.dim_out = dim_out
         self.bias = bias
         self.has_fp16_weights = has_fp16_weights
         self.threshold = threshold
-        
-        
+
         try:
             import bitsandbytes as bnb
+
             # Using bitsandbytes for quantization
             self.dense1 = bnb.nn.Linear8bitLt(
                 dim,
@@ -46,9 +48,9 @@ def __init__(
                 has_fp16_weights=has_fp16_weights,
                 threshold=threshold,
                 *args,
-                **kwargs
+                **kwargs,
             )
-            
+
             # Reverse
             self.dense2 = bnb.nn.Linear8bitLt(
                 dim_out,
@@ -57,31 +59,19 @@ def __init__(
                 has_fp16_weights=has_fp16_weights,
                 threshold=threshold,
                 *args,
-                **kwargs
+                **kwargs,
             )
-        
+
         except ModuleNotFoundError:
             # Using torch.nn.Linear
-            self.dense1 = nn.Linear(
-                dim,
-                dim_out,
-                bias=bias
-                *args,
-                **kwargs
-            )
-            
+            self.dense1 = nn.Linear(dim, dim_out, bias=bias * args, **kwargs)
+
             # Dense 2
-            self.dense2 = nn.Linear(
-                dim_out,
-                dim,
-                bias=bias
-                *args,
-                **kwargs
-            )
-            
+            self.dense2 = nn.Linear(dim_out, dim, bias=bias * args, **kwargs)
+
         # Activation
         self.act = nn.GELU()
-        
+
     def forward(self, x: torch.Tensor) -> torch.Tensor:
         """Forward pass
 
@@ -95,4 +85,3 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
         x = self.act(x)
         x = self.dense2(x)
         return x
-            
\ No newline at end of file