Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@mtlprintf #418

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

@mtlprintf #418

wants to merge 3 commits into from

Conversation

tgymnich
Copy link
Member

@tgymnich tgymnich commented Sep 16, 2024

Implement @mtlprintf and friends using os_log

TODO:

  • Printing of floats does not work since they will be converted to doubles due to the vararg calling convention which will be caught by our IR checker
  • Version check the @mtlprintf macro
  • Add tests
  • Capturing and logging are mutually exclusive

depends on: JuliaGPU/GPUCompiler.jl#630
notify: #226

@tgymnich tgymnich self-assigned this Sep 16, 2024
@tgymnich tgymnich marked this pull request as ready for review September 16, 2024 12:12
@tgymnich
Copy link
Member Author

@maleadt Any idea how we can implement the version check for the @mtlprintf macro? I know we could check the air version inside the kernel but I'd like to avoid that.

Also can we get rid of the double check in check_ir?

@christiangnrd
Copy link
Contributor

Would it be worth benchmarking the performance difference between having logging active vs not?

@tgymnich
Copy link
Member Author

tgymnich commented Sep 17, 2024

Would it be worth benchmarking the performance difference between having logging active vs not?

@christiangnrd Sure. I don't expect there to be much overhead besides allocation of the log buffer and checking it for logs after running a kernel. But we might want to look into only conditionally adding MTLLogState since logging also prevents GPU frame capture.

@christiangnrd christiangnrd mentioned this pull request Sep 17, 2024
2 tasks
@maleadt
Copy link
Member

maleadt commented Sep 17, 2024

@maleadt Any idea how we can implement the version check for the @mtlprintf macro? I know we could check the air version inside the kernel but I'd like to avoid that.

Given that the macro expands way to early, I don't think there's anything we can do but checking in the kernel. Why are you opposed to that? GPUCompiler.jl has infrastructure to optimize those checks away, see e.g. how CUDA.jl exposes the device capability and PTX ISA version to the kernel.

@tgymnich
Copy link
Member Author

We could also wrap the macro and accompanying functions in if Metal.macos_version() >= v"15".

@christiangnrd
Copy link
Contributor

I we do that we should have definitions in both cases and give an informative error if Metal.macos_version() < v"15".

@maleadt
Copy link
Member

maleadt commented Sep 18, 2024

Actually, looks like I provided the run-time queries already:

@device_function @inline metal_version() = SimpleVersion(metal_major(), metal_minor())
@device_function @inline air_version() = SimpleVersion(air_major(), air_minor())

So we can just use that in the generated code, generating an error when emitting code for an older platform. That of course depends on #416 for meaningful reporting, but we'll get there.

I'd rather not simply check based on the macOS version during macro expansion, since we might want to target older Metal versions than the system supports.

Copy link
Contributor

@christiangnrd christiangnrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! However do you know what's causing the tests to hang?

lib/mtl/command_queue.jl Show resolved Hide resolved
docs/src/usage/kernel.md Show resolved Hide resolved
@tgymnich
Copy link
Member Author

tgymnich commented Sep 19, 2024

Looks good! However do you know what's causing the tests to hang?

@christiangnrd The hangs are caused by this one line:

@print_and_throw "@mtlprintf requires Metal 3.2 (macOS 15) or higher"

test/output.jl Outdated Show resolved Hide resolved
@christiangnrd
Copy link
Contributor

christiangnrd commented Sep 21, 2024

@maleadt Could we have one of the Apple Silicon runners upgraded to Sequoia so the output tests don't get ignored? Edit: All the runners are running 13.3.1. Should we also have one on macOS 14?

I would also like to see #420 merged first (with benchmarks run on macOS 15) to see how big the impact of enabling logging is.

@tgymnich
Copy link
Member Author

@christiangnrd I recently made changes so that logging (e.g. MTLLogState and friends) is only enabled whenever we actually use the feature.

@christiangnrd
Copy link
Contributor

Just pushed a whitespace-only formatting commit

@christiangnrd
Copy link
Contributor

@christiangnrd I recently made changes so that logging (e.g. MTLLogState and friends) is only enabled whenever we actually use the feature.

In that case I still think we should be able to test on macOS 15, but I think we should merge this as soon as it's ready.

@maleadt
Copy link
Member

maleadt commented Sep 23, 2024

I've upgraded one of the workers to macOS 15:
image

See the macos_version tag which can be used to select on this.

@maleadt
Copy link
Member

maleadt commented Sep 24, 2024

@christiangnrd The hangs are caused by this one line:

@print_and_throw "@mtlprintf requires Metal 3.2 (macOS 15) or higher"

How did this get fixed?

@maleadt
Copy link
Member

maleadt commented Sep 24, 2024

I've also updated one of the juliaecosystem workers to 15.0, so we can revert to that queue.

image

@christiangnrd
Copy link
Contributor

How did this get fixed?

I assume by no longer running when macos_version() < 15. I think I got ahead of myself with the review.

@maleadt
Copy link
Member

maleadt commented Sep 24, 2024

I assume by no longer running when macos_version() < 15.

Right; but that's not great. It means that any kernel using logging output will first generate a non-fatal error message on the host, and then hang in the kernel? Or, when we on macOS 15 use (the hypothetical, but useful) @metal metal=v"3.1" it would hang too?

EDIT: suggested capability implemented here: #430

@christiangnrd
Copy link
Contributor

The following code hangs in the REPL, but not when run using include or when called from the terminal:

using Metal
function f()
    @mtlprintln("Testing...")
    return
end
@metal f()

@maleadt
Copy link
Member

maleadt commented Sep 24, 2024

The following code hangs in the REPL, but not when run using include or when called from the terminal

Isn't that because in the REPL we force synchronization via an AST transform hook? What happens if you synchronize manually?

@christiangnrd
Copy link
Contributor

@maleadt When I add Metal.synchronize() to the end of the file it hangs in all situations.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metal Benchmarks

Benchmark suite Current: 0840aa4 Previous: 8652754 Ratio
latency/precompile 4599693584 ns 4401680834 ns 1.04
latency/ttfp 6702643541.5 ns 6678542687 ns 1.00
latency/import 722647167 ns 721498042 ns 1.00
integration/metaldevrt 715958 ns 708167 ns 1.01
integration/byval/slices=1 1498958.5 ns 1530625 ns 0.98
integration/byval/slices=3 11746791 ns 11010542 ns 1.07
integration/byval/reference 1489417 ns 1585084 ns 0.94
integration/byval/slices=2 2602291.5 ns 2472708 ns 1.05
kernel/indexing 464895.5 ns 454333 ns 1.02
kernel/indexing_checked 466812.5 ns 455667 ns 1.02
kernel/launch 8417 ns 8459 ns 1.00
array/construct 27659.666666666668 ns 27638.916666666664 ns 1.00
array/broadcast 460729.5 ns 464625 ns 0.99
array/random/randn/Float32 804708 ns 813083 ns 0.99
array/random/randn!/Float32 610958 ns 634041 ns 0.96
array/random/rand!/Int64 552250 ns 552750 ns 1.00
array/random/rand!/Float32 581958.5 ns 577083 ns 1.01
array/random/rand/Int64 795125 ns 800833.5 ns 0.99
array/random/rand/Float32 599209 ns 583709 ns 1.03
array/copyto!/gpu_to_gpu 639042 ns 643166.5 ns 0.99
array/copyto!/cpu_to_gpu 585875.5 ns 600020.5 ns 0.98
array/copyto!/gpu_to_cpu 736041.5 ns 777166.5 ns 0.95
array/accumulate/1d 1332458 ns 1334916 ns 1.00
array/accumulate/2d 1420438 ns 1419167 ns 1.00
array/iteration/findall/int 2084291.5 ns 2072542 ns 1.01
array/iteration/findall/bool 1812750 ns 1854833 ns 0.98
array/iteration/findfirst/int 1687750 ns 1674333 ns 1.01
array/iteration/findfirst/bool 1644416.5 ns 1643833 ns 1.00
array/iteration/scalar 3675458.5 ns 3625334 ns 1.01
array/iteration/logical 3255666 ns 3281021 ns 0.99
array/iteration/findmin/1d 1615416 ns 1572104 ns 1.03
array/iteration/findmin/2d 1319125 ns 1325292 ns 1.00
array/reductions/reduce/1d 1048770.5 ns 1055583 ns 0.99
array/reductions/reduce/2d 691041.5 ns 690959 ns 1.00
array/reductions/mapreduce/1d 1052625 ns 1057604.5 ns 1.00
array/reductions/mapreduce/2d 694708 ns 700416.5 ns 0.99
array/permutedims/4d 836583 ns 846917 ns 0.99
array/permutedims/2d 846937.5 ns 856979.5 ns 0.99
array/permutedims/3d 922750 ns 916917 ns 1.01
array/copy 610166 ns 610041 ns 1.00
metal/synchronization/stream 14208 ns 14667 ns 0.97
metal/synchronization/context 14500 ns 14916 ns 0.97

This comment was automatically generated by workflow using github-action-benchmark.

@tgymnich
Copy link
Member Author

I opened an issue for the hang: #433

@maleadt maleadt force-pushed the os-log branch 3 times, most recently from f70ccac to 95e47f1 Compare October 1, 2024 20:15
@maleadt
Copy link
Member

maleadt commented Oct 1, 2024

In the assumption that the conditional @print_and_throw generating just a trap on macOS 14 is what caused the hangs here, I simplified the logic to make the kernel launch code simply error when using unsupported logging. However, that does not fix the issue. In fact, even on my now upgraded macOS 15 installation a simple kernel doing I/O hangs...

julia> using Metal

julia> function kernel()
           @mtlprint("Hello, World\n")
           return
       end
kernel (generic function with 1 method)

julia> Metal.@sync @metal kernel()
Hello, World

# hang

The Metal.@sync is there just to illustrate what the AST transform hook is doing behind the scenes. So I guess we'll have to fix #433 first, with the above being another datapoint that the generated IR is not necessarily what's the issue (which #433 (comment) already hinted towards).

@maleadt maleadt added enhancement kernels Things about kernels and how they are compiled. labels Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernels Things about kernels and how they are compiled.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants