Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StackOverflow when used with Flux #1823

Open
BioTurboNick opened this issue Sep 14, 2024 · 2 comments
Open

StackOverflow when used with Flux #1823

BioTurboNick opened this issue Sep 14, 2024 · 2 comments

Comments

@BioTurboNick
Copy link

I don't know where to begin for troubleshooting or making a minimal example. Or for a more specific title. First time trying Enzyme.

I changed the Flux train! function from:

Flux.train!(network, (training_data,), opt_state)

to:

Flux.train!(Duplicated(network, make_zero(network)), (training_data,), opt_state)

But I'm not sure if that's correct, documentation on this usage is a bit sparse.

Also not sure why I only have 22 frames shown.

ERROR: StackOverflowError:
Stacktrace:
  [1] LLVMRunPassManager
    @ C:\Users\nicho\.julia\packages\LLVM\UqMfW\lib\15\libLLVM.jl:3385 [inlined]
  [2] run!
    @ C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:39 [inlined]
  [3] #18868
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2010 [inlined]        
  [4] LLVM.ModulePassManager(::Enzyme.Compiler.var"#18868#18875"{LLVM.Module}; kwargs::@Kwargs{})
    @ LLVM C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:33
  [5] ModulePassManager
    @ C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:30 [inlined]
  [6] removeDeadArgs!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2008  
  [7] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine, machine::Bool)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2283  
  [8] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2282  
  [9] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7260
 [10] _thunk
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7241 [inlined]
 [11] cached_compilation
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7282 [inlined]
 [12] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{…}, ::Type{…}, ::Type{…}, tt::Type{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Type{…}, ::Val{…})
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7355
 [13] #s2055#19000
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7407 [inlined]
 [14]
    @ Enzyme.Compiler .\none:0
 [15] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core .\boot.jl:602
 [16] autodiff(::ReverseMode{…}, ::Const{…}, ::Type{…}, ::Const{…}, ::Duplicated{…}, ::Const{…}, ::Const{…})
    @ Enzyme C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\Enzyme.jl:263
 [17] autodiff
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\Enzyme.jl:332 [inlined]
 [18] macro expansion
    @ C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:34 [inlined]  
 [19] macro expansion
    @ C:\Users\nicho\.julia\packages\ProgressLogging\6KXlp\src\ProgressLogging.jl:328 [inlined]  
 [20] train!(loss::Function, model::Duplicated{…}, data::Tuple{…}, opt::@NamedTuple{…}; cb::Nothing)
    @ FluxEnzymeExt C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:30
 [21] train!(loss::Function, model::Duplicated{DecodeNet{…}}, data::Tuple{Tuple{…}}, opt::@NamedTuple{arch::@NamedTuple{…}})
    @ FluxEnzymeExt C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:27
 [22] train_network(name::String; learning_rate_schedule::Vector{…}, training_batch_size::Int64, evaluation_batch_size::Int64, iters_per_eval::Int64, seed::Int64, decode::Bool, wandb::Bool)     
    @ Main c:\Users\nicho\Repos\DeepLoco.jl\src\train.jl:213
@wsmoses
Copy link
Member

wsmoses commented Sep 15, 2024

Can you include a complete runnable code to try to reproduce? As well as show your OS/various package versions?

@BioTurboNick
Copy link
Author

I'll see if I can boil down a MWE. In the meantime:

Julia Version 1.10.5
Commit 6f3fdf7b36 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, bdver1)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

Status `~/Repos/DeepLoco.jl/Project.toml`
⌅ [052768ef] CUDA v5.4.3
  [082447d4] ChainRules v1.71.0
  [992eb4ea] CondaPkg v0.2.23
  [b4f34e82] Distances v0.10.11
  [31c24e10] Distributions v0.25.111
⌃ [7da242da] Enzyme v0.12.36
  [587475ba] Flux v0.14.19
  [033835bb] JLD2 v0.5.2
  [f1d291b0] MLUtils v0.4.4
  [91a5bcdd] Plots v1.40.8
  [6099a3de] PythonCall v0.9.23
  [e88e6eb3] Zygote v0.6.70
  [02a925ec] cuDNN v1.3.2
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random
  [10745b16] Statistics v1.10.0

I see Enzyme just recently bumped to 0.13, but seems Flux doesn't support it yet. Also, I realize this is Julia x86_64 running in emulation on Windows on ARM. I'll try on Julia AArch64 via WSL later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants