Skip to content

Commit c42f15e

Browse files
authored
update doc
1 parent bfe15b0 commit c42f15e

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

readme.md

+11-11
Original file line numberDiff line numberDiff line change
@@ -50,28 +50,21 @@ And you're good to go.
5050
**Note: for the ultimate 5-10x performance increase, you'll need to let `fast_gauss`'s shader directly write to your desired framebuffer.**
5151

5252
Currently, we are trying to automatically detect whether you're managing your own OpenGL context (i.e. opening up a GUI) by checking for the module `OpenGL` during the import of `fast_gauss`.
53-
54-
If detected, all rendering command will return `None`s and we will directly write to the bound framebuffer at the time of the draw call.
55-
53+
If detected, all rendering commands will return `None`s and we will directly write to the bound framebuffer at the time of the draw call.
5654
Thus if you're running in a GUI (OpenGL-based) environment, the output of our rasterizer will be `None`s and does not require further processing.
5755

5856
- [ ] TODO: Improve offline rendering performance.
5957
- [ ] TODO: Add a warning to the user if they're performing further processing on the returned values.
6058

61-
6259
**Note: the speedup is the most visible when the pixel-to-point ratio is high.**
6360

6461
That is, when there are large Gaussians and very high-resolution rendering, the speedup is more visible.
65-
6662
The CUDA-based software implementation is more resolution sensitive and for some extremely dense point clouds (> 1 million points), the CUDA implementation might be faster.
67-
6863
This is because the typical rasterization-based pipeline on modern graphics hardware is [not well-optimized for small triangles](https://www.youtube.com/watch?v=hf27qsQPRLQ&list=WL).
6964

70-
7165
**Note: for best performance, cache the persistent results (for example, the 6 elements of the covariance matrix).**
7266

7367
This is more of a general tip and not directly related to `fast_gauss`.
74-
7568
However, the impact is more observable here since we haven't implemented a fast 3D covariance computation (from scales and rotations) in the shader yet.
7669
Only PyTorch implementation is available for now.
7770

@@ -83,12 +76,10 @@ Thus, store the concatenated tensors instead and avoid concatenating them in eve
8376
- [ ] TODO: Warn users if they're not properly precomputing the covariance matrix.
8477
- [ ] TODO: Implement a more optimized `OptimizedGaussians` for precomputing things and apply a cache. Similar to that of the vertex shader (see [Invokation frequency](https://www.khronos.org/opengl/wiki/Vertex_Shader)).
8578

86-
8779
**Note: it's recommended to pass in a CPU tensor in the `GaussianRasterizationSettings` to avoid explicit synchronizations for even better performance.**
8880

8981
- [ ] TODO: Add a warning to the user if GPU tensors are detected.
9082

91-
9283
**Note: the second output of the `GaussianRasterizer` is not radii anymore (since we're not gonna use it for the backward pass), but the alpha values of the rendered image instead.**
9384

9485
And the alpha channel content seems to be bugged currently, will debug.
@@ -107,7 +98,7 @@ And the alpha channel content seems to be bugged currently, will debug.
10798

10899
## Implementation
109100

110-
**Goal:**
101+
**Guidelines**
111102

112103
- Let the professionals do the work.
113104
- Let GPU do the large-scale sorting.
@@ -119,6 +110,15 @@ And the alpha channel content seems to be bugged currently, will debug.
119110
- Enabled by using `non_blocking=True` data passing and moving sync points to as early as possible.
120111
- Boosted by the fact that we're sorting on the GPU, thus no need to perform synchronized host-to-device copies.
121112

113+
**Why does a global sort work?**
114+
115+
The OpenGL specification is somewhat vague but there's this reference:
116+
(in the 4th paragraph of section 2.1 of chapter 2 of this specification: https://registry.khronos.org/OpenGL/specs/gl/glspec44.core.pdf)
117+
118+
> Commands are always processed in the order in which they are received, although there may be an indeterminate delay before the effects of a command are realized. This means, for example, that one primitive must be drawn completely before any subsequent one can affect the framebuffer.
119+
120+
Thus if the order of the data in the vertex buffer (or as specified by an index buffer) is back-to-front, and alpha blending is enabled, you can count on OpenGL to correctly update the framebuffer in the correct back to front order.
121+
122122
- [ ] TODO: Expand implementation details.
123123

124124
## Environment

0 commit comments

Comments
 (0)