Add Dynamic Model Import and ModelSpec Definition #814

fegin · 2025-01-31T18:40:37Z

Stack from ghstack (oldest at bottom):

-> Add Dynamic Model Import and ModelSpec Definition #814

What does this PR do?

This PR introduces ModelSpec to describe a model and how to parallelize a model.
- All the models should call register_model_spec().
- Users can also use --experimental.custom_model_path to dynamically import a model that is not implemented by TorchTitan. The module should also call register_model_spec().
This PR also refactors OptimizersContainer and LRSchedulersContainers
- Fixes an issue that optimizers will accept parameters that requires_grad is False.
- Improve typing and docstring.
- Improve the function and class reusability.
- OptimizersContainer now inherits from torch.optim.Optimizer .
This PR also moves parallelize_llama and pipelining_llama to the llama folder.

Why do we need this PR?
This allows users to use TorchTitan with a new model without intrusively change TorchTitan code.

Next steps

Dataloader is not included
Checkpoint customization is not included yet.

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: f0847f5efebfdf8c6619f58c1b0131a233502eaf Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: 28259eb74975eeb7ad790a774b6e719f3aa19a31 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: ba1389f57808b1c6b309f554a675523d09395b42 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: a88ff3ebe5c869055dd3314fb1b791855fd0e0b2 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: 362df77a3f6a2b9f3cff514938a415bfe25e2100 Pull Request resolved: #814

tianyu-l

Initial pass looks great. Had some suggestions on restructuring.

torchtitan/models/llama/model.py

torchtitan/models/__init__.py

torchtitan/config_manager.py

torchtitan/models/llama/__init__.py

torchtitan/config_manager.py

fduwjj · 2025-01-31T22:37:15Z

torchtitan/models/llama/model.py

 from torchtitan.models.norms import build_norm


 @dataclass
-class ModelArgs:
+class ModelArgs(BaseModelArgs):


Down the road we will have many models, like MM model. Do we want all model args to inherit this? Currently we use different model args for different model arch.

This mainly for typing for now but also preserve the ability to introduce common model args.

torchtitan/models/__init__.py

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: 9ed1b54aa945af27ce0881ea02150c9e2f0022e8 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: 01c89646326d2c356b6f82e0fa714a347da7b869 Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: bee7a1df8af55ea6a7ad7451a6bc9a3158922d4f Pull Request resolved: #814

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. **What does this PR do?** 1. Introduces `ModelSpec` to describe a model and how to parallelize it. 2. Requires all models to define `build_model_spec()` or `model_spec`, which will be imported by the model module. 3. Calls `build_model_specs()` in the trainer to obtain `model_specs`, which are then used to retrieve the corresponding model spec. 4. Allows users to dynamically import a model not implemented by TorchTitan using --experimental.model_module_path. **Why do we need this PR?** This PR enables users to integrate new models with TorchTitan without making intrusive changes to the TorchTitan codebase. **Next steps** 1. This PR includes only the model definitions, configurations, tokenizer, parallelize_fn, and pipelining_fn. We may want to extend ModelSpec to include the optimizer and learning rate scheduler. 2. The current TorchTitan parallelize and pipelining_fn import ModelArgs, which can lead to circular imports. This issue needs to be addressed. ghstack-source-id: 91e1dca3dc8d3268c2d636e335029cb3e18318d6 Pull Request resolved: #814

fegin · 2025-02-06T23:59:59Z

Tests and document will come in the next few updates.

[ghstack-poisoned]

**What does this PR do?** 1. This PR introduce ModelSpec to decribe a model and how to parallelize a model. 2. All the models should define `build_model_spec()` or `model_spec` to be imported by the `model` module. 3. `build_model_specs()` is called in the trainer to get the `model_specs` and the result is used to get the corresponding model spec. 4. Users can also use `--experimental.model_module_path` to dynamically import a model that is not implemented by TorchTitan. **Why do we need this PR?** This allows users to use TorchTitan with a new model without intrusively change TorchTitan code. **Next steps** 1. This PR only include the mode definitions, configurations, totkenizer, parallize_fn, and pipelining_fn. We may also want to extend ModelSpec to include optimizer and lr_scheduler 2. Current TorchTitan parallelize and pipelining_fn import ModelArgs which can cause circular imports. We should fix this issue. ghstack-source-id: 671424d38a040c8594f8b3d692cd8e141ce5c656 Pull Request resolved: #814

tianyu-l

This is super cool! Thank you for unlocking torchtitan to reach next level.

tianyu-l · 2025-02-07T01:01:15Z

torchtitan/model_spec.py

+    # TorchTitan library. A better way would be to have a dataloader class
+    # and a ``build_dataloader`` function that take job_config to consume
+    # the different dataloader and tokenizer configs.
+    tokenizer: str


currently tokenizer is part of data loader
https://github.com/pytorch/torchtitan/blob/main/torchtitan/datasets/hf_datasets.py#L186

maybe let's remove it for now

tianyu-l · 2025-02-07T01:06:57Z

torchtitan/models/llama/__init__.py

+from .parallelize_llama import parallelize_llama
+from .pipeline_llama import pipeline_llama
+
+__all__ = ["parallelize_llama", "pipeline_llama", "ModelArgs", "Transformer"]


do we need to expose these fields in llama/__init__.py?

Yes, so that users can reuse the parallelism APIs from llama.

That makes sense. But maybe also llama3_configs?
I imagine some one wants to use implement new parallelisms, but relying on existing definitions of Llama 3 8B/70B/405B. In that case they don't need ModelArgs but only the preset configs.

tianyu-l · 2025-02-07T01:10:46Z

torchtitan/models/llama/model.py

 from torchtitan.models.norms import build_norm


 @dataclass
-class ModelArgs:
+class ModelArgs(BaseModelArgs):


can we rename it to "TransformerModelArgs"?

tianyu-l · 2025-02-07T01:12:18Z

torchtitan/models/llama/pipeline_llama.py

@@ -67,10 +70,16 @@ def pipeline_llama_manual_split(

    splits = (
        job_config.experimental.pipeline_parallel_split_points
-        or generate_split_points(job_config, parallel_dims.pp, model_config)
+        or generate_split_points(job_config, parallel_dims.pp, model_config.n_layers)


tianyu-l · 2025-02-07T01:23:01Z

torchtitan/optimizer.py

+        # that is immutable. As long as ``training.steps`` and ``training.warmup_steps``
+        # in ``job_config`` remain unchanged when resuming from a checkpoint, this
+        # approach is safe. We call ``copy()`` here to ensure extra safety.
+        # TODO: Should we deepcopy the state_dict?


I think we should -- that was the intention.

tianyu-l · 2025-02-07T01:28:26Z

torchtitan/parallelisms/pipelining_utils.py

+# TODO: It's unclear if this API is general enough to be used by other models.
+# If not, we should move it to a Transformer-specific directory.


tianyu-l · 2025-02-07T01:34:13Z

torchtitan/optimizer.py

I have to say the docs added in this file look fabulous ✨

tianyu-l · 2025-02-07T01:37:20Z

torchtitan/optimizer.py

+        # We need to call super().__init__() to initialize some necessary optimizer
+        # functionality such as hooks.


can we put this to where _post_init is called for better readability?

tianyu-l · 2025-02-07T01:45:26Z

torchtitan/models/llama/pipeline_llama.py

@@ -36,7 +39,7 @@ def pipeline_llama(
    device: DeviceType,
    model_config: ModelArgs,
    loss_fn: Callable[..., torch.Tensor],
-):
+) -> tuple[_PipelineSchedule, list[nn.Module]]:


Typing in uppercase vs. lowercase seems inconsistent throughout the PR. Is this intentionally? and what's the recommended way?

hmm it seems only for state_dict you used uppercase, maybe because compatibility.

Uppercase is the recommended way if we don't support <= Python 3.8. After Pytorch 2.6, that's the case. So we should just change to the lower case one. I may revisit the code and try to change all to lowercases.

tianyu-l · 2025-02-07T01:49:04Z

torchtitan/utils.py

I feel this file is growing to be too big -- we basically throw things here when we don't know where to put them.
Maybe let's revisit later as a BE thing.

Lol, ye, that is the legacy putting everything into utils file issue. I agree we should split it. But we can do it in another BE PR.

tianyu-l · 2025-02-07T04:25:16Z

torchtitan/model_spec.py

+
+
+@dataclass
+class ModelSpec:


Since the Spec is not only about model, e.g. conceptually there can be multiple ways to do training for the same model (gpu/tpu, customized parallelize/pipeline), shall we consider renaming it to TrainSpec?

That's actually a good question and suggestion. I am open to this option.

tianyu-l · 2025-02-07T04:30:45Z

torchtitan/model_spec.py

+    build_optimizers_fn: Callable[[List[nn.Module], JobConfig], OptimizersContainer]
+    build_lr_schedulers_fn: Callable[
+        [List[nn.Module], JobConfig], LRSchedulersContainer
+    ]


For some models we may need to alter loss_fn as well, e.g. in diffusion models. We may add that later when necessary.

Update

df1bc6a

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 31, 2025

fegin requested review from tianyu-l, wconstab and fduwjj January 31, 2025 18:40

fegin changed the title ~~Allow users to use the customized model~~ Add Dynamic Model Import and ModelSpec Definition Jan 31, 2025

Update

dfc1649

[ghstack-poisoned]

Update

720f12a

[ghstack-poisoned]

Update

225bfcc

[ghstack-poisoned]

Update

650152e

[ghstack-poisoned]

tianyu-l reviewed Jan 31, 2025

View reviewed changes

fduwjj reviewed Jan 31, 2025

View reviewed changes

torchtitan/config_manager.py Outdated Show resolved Hide resolved

fduwjj reviewed Jan 31, 2025

View reviewed changes

torchtitan/models/__init__.py Outdated Show resolved Hide resolved

fduwjj reviewed Jan 31, 2025

View reviewed changes

torchtitan/models/__init__.py Outdated Show resolved Hide resolved

fduwjj reviewed Jan 31, 2025

View reviewed changes

torchtitan/models/__init__.py Outdated Show resolved Hide resolved

xffxff mentioned this pull request Feb 3, 2025

[RFC] Implement model-specific 4d parallelism fla-org/flash-linear-attention#148

Open

fegin mentioned this pull request Feb 5, 2025

should we have an extension point for model transforms out of tree? #790

Open

Update

687fda9

[ghstack-poisoned]

Update

6a51325

[ghstack-poisoned]

Update

5b33b65

[ghstack-poisoned]

Update

2e569d7

[ghstack-poisoned]

Update

bab9bf5

[ghstack-poisoned]

tianyu-l reviewed Feb 7, 2025

View reviewed changes

fegin mentioned this pull request Feb 7, 2025

Introducing a generic ModelHandler interface. #823

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Dynamic Model Import and ModelSpec Definition #814

Add Dynamic Model Import and ModelSpec Definition #814

fegin commented Jan 31, 2025 •

edited

Loading

tianyu-l left a comment

fduwjj Jan 31, 2025

fegin Feb 6, 2025

fegin commented Feb 6, 2025

tianyu-l left a comment

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

fegin Feb 7, 2025

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

tianyu-l Feb 7, 2025

fegin Feb 7, 2025

tianyu-l Feb 7, 2025

fegin Feb 7, 2025

tianyu-l Feb 7, 2025

fegin Feb 7, 2025

tianyu-l Feb 7, 2025

		# TODO: It's unclear if this API is general enough to be used by other models.
		# If not, we should move it to a Transformer-specific directory.

		# We need to call super().__init__() to initialize some necessary optimizer
		# functionality such as hooks.

Add Dynamic Model Import and ModelSpec Definition #814

Are you sure you want to change the base?

Add Dynamic Model Import and ModelSpec Definition #814

Conversation

fegin commented Jan 31, 2025 • edited Loading

tianyu-l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fegin commented Feb 6, 2025

tianyu-l left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fegin commented Jan 31, 2025 •

edited

Loading