StackOverflow when used with Flux #1823

BioTurboNick · 2024-09-14T12:32:45Z

I don't know where to begin for troubleshooting or making a minimal example. Or for a more specific title. First time trying Enzyme.

I changed the Flux train! function from:

Flux.train!(network, (training_data,), opt_state)

to:

Flux.train!(Duplicated(network, make_zero(network)), (training_data,), opt_state)

But I'm not sure if that's correct, documentation on this usage is a bit sparse.

Also not sure why I only have 22 frames shown.

ERROR: StackOverflowError:
Stacktrace:
  [1] LLVMRunPassManager
    @ C:\Users\nicho\.julia\packages\LLVM\UqMfW\lib\15\libLLVM.jl:3385 [inlined]
  [2] run!
    @ C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:39 [inlined]
  [3] #18868
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2010 [inlined]        
  [4] LLVM.ModulePassManager(::Enzyme.Compiler.var"#18868#18875"{LLVM.Module}; kwargs::@Kwargs{})
    @ LLVM C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:33
  [5] ModulePassManager
    @ C:\Users\nicho\.julia\packages\LLVM\UqMfW\src\passmanager.jl:30 [inlined]
  [6] removeDeadArgs!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2008  
  [7] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine, machine::Bool)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2283  
  [8] post_optimze!(mod::LLVM.Module, tm::LLVM.TargetMachine)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler\optimize.jl:2282  
  [9] _thunk(job::GPUCompiler.CompilerJob{Enzyme.Compiler.EnzymeTarget, Enzyme.Compiler.EnzymeCompilerParams}, postopt::Bool)
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7260
 [10] _thunk
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7241 [inlined]
 [11] cached_compilation
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7282 [inlined]
 [12] thunkbase(ctx::LLVM.Context, mi::Core.MethodInstance, ::Val{…}, ::Type{…}, ::Type{…}, tt::Type{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Val{…}, ::Type{…}, ::Val{…})
    @ Enzyme.Compiler C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7355
 [13] #s2055#19000
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\compiler.jl:7407 [inlined]
 [14]
    @ Enzyme.Compiler .\none:0
 [15] (::Core.GeneratedFunctionStub)(::UInt64, ::LineNumberNode, ::Any, ::Vararg{Any})
    @ Core .\boot.jl:602
 [16] autodiff(::ReverseMode{…}, ::Const{…}, ::Type{…}, ::Const{…}, ::Duplicated{…}, ::Const{…}, ::Const{…})
    @ Enzyme C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\Enzyme.jl:263
 [17] autodiff
    @ C:\Users\nicho\.julia\packages\Enzyme\TiboG\src\Enzyme.jl:332 [inlined]
 [18] macro expansion
    @ C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:34 [inlined]  
 [19] macro expansion
    @ C:\Users\nicho\.julia\packages\ProgressLogging\6KXlp\src\ProgressLogging.jl:328 [inlined]  
 [20] train!(loss::Function, model::Duplicated{…}, data::Tuple{…}, opt::@NamedTuple{…}; cb::Nothing)
    @ FluxEnzymeExt C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:30
 [21] train!(loss::Function, model::Duplicated{DecodeNet{…}}, data::Tuple{Tuple{…}}, opt::@NamedTuple{arch::@NamedTuple{…}})
    @ FluxEnzymeExt C:\Users\nicho\.julia\packages\Flux\HBF2N\ext\FluxEnzymeExt\FluxEnzymeExt.jl:27
 [22] train_network(name::String; learning_rate_schedule::Vector{…}, training_batch_size::Int64, evaluation_batch_size::Int64, iters_per_eval::Int64, seed::Int64, decode::Bool, wandb::Bool)     
    @ Main c:\Users\nicho\Repos\DeepLoco.jl\src\train.jl:213

The text was updated successfully, but these errors were encountered:

wsmoses · 2024-09-15T04:25:51Z

Can you include a complete runnable code to try to reproduce? As well as show your OS/various package versions?

BioTurboNick · 2024-09-20T01:26:10Z

I'll see if I can boil down a MWE. In the meantime:

Julia Version 1.10.5
Commit 6f3fdf7b36 (2024-08-27 14:19 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × Snapdragon(R) X 12-core X1E80100 @ 3.40 GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, bdver1)
Threads: 1 default, 0 interactive, 1 GC (on 12 virtual cores)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

Status `~/Repos/DeepLoco.jl/Project.toml`
⌅ [052768ef] CUDA v5.4.3
  [082447d4] ChainRules v1.71.0
  [992eb4ea] CondaPkg v0.2.23
  [b4f34e82] Distances v0.10.11
  [31c24e10] Distributions v0.25.111
⌃ [7da242da] Enzyme v0.12.36
  [587475ba] Flux v0.14.19
  [033835bb] JLD2 v0.5.2
  [f1d291b0] MLUtils v0.4.4
  [91a5bcdd] Plots v1.40.8
  [6099a3de] PythonCall v0.9.23
  [e88e6eb3] Zygote v0.6.70
  [02a925ec] cuDNN v1.3.2
  [37e2e46d] LinearAlgebra
  [9a3f8284] Random
  [10745b16] Statistics v1.10.0

I see Enzyme just recently bumped to 0.13, but seems Flux doesn't support it yet. Also, I realize this is Julia x86_64 running in emulation on Windows on ARM. I'll try on Julia AArch64 via WSL later.

wsmoses added the more information needed label Sep 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StackOverflow when used with Flux #1823

StackOverflow when used with Flux #1823

BioTurboNick commented Sep 14, 2024

wsmoses commented Sep 15, 2024

BioTurboNick commented Sep 20, 2024

StackOverflow when used with Flux #1823

StackOverflow when used with Flux #1823

Comments

BioTurboNick commented Sep 14, 2024

wsmoses commented Sep 15, 2024

BioTurboNick commented Sep 20, 2024