Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file-embed: Additions to extra-source-files don't trigger rebuild of module #6689

Open
andreasabel opened this issue Feb 21, 2025 · 11 comments

Comments

@andreasabel
Copy link

andreasabel commented Feb 21, 2025

Report against Stack-3.3.1 (latest).
I noticed that Stack does not seem to honor the extra-source-files when deciding whether to rebuild a project, in contrast to Cabal which does so.

This is relevant when using file-embed to embed files into the executable. Changes to the embedded files should trigger a recompilation to keep the executable up-to-date.

E.g. I want that these files are always embedded in their latest version:

extra-source-files:
    data/README.md
    data/sub/Foo.md

I added the second one in a second step. I see the following behavioral difference between Stack and Cabal:

$ stack run                                                                              ✔ │ ghc-9.4.8 hs │ 10:06:41 
File: README.md
Content: "This is a test of the `file-embed` package,\napplied to the `data-dir` of a Haskell project.\n"

$ cabal run                                                                              ✔ │ ghc-9.4.8 hs │ 10:14:10 
File: README.md
Content: "This is a test of the `file-embed` package,\napplied to the `data-dir` of a Haskell project.\n"
File: sub/Foo.md
Content: "This is a file in a subdirectory.\nIt contains 2 lines.\n"

WANT: Changes to the amount and content of the extra-source-files should trigger a rebuild.

I attach a reproducer to play with.

embed-datadir-0.1.tar.gz

@andreasabel
Copy link
Author

andreasabel added a commit to agda/agda that referenced this issue Feb 21, 2025
Via Template Haskell we embed the files in `src/data` as `ByteString`s
into the Agda binary.

They will be dumped into `$AGDA_DIR/share/$VERSION` when Agda starts
and this directory does not exist, or via `agda --setup`.

The $Agda_datadir becomes obsolete.

Agda needs to be recompiled now when a datafile is added or changed.
Under Cabal, this is taken care of by their listing in
`extra-source-files`, Stack however needs some explicit nudge to
recompile (see commercialhaskell/stack#6689).
andreasabel added a commit to agda/agda that referenced this issue Feb 21, 2025
Via Template Haskell we embed the files in `src/data` as `ByteString`s
into the Agda binary.

They will be dumped into `$AGDA_DIR/share/$VERSION` when Agda starts
and this directory does not exist, or via `agda --setup`.

The $Agda_datadir becomes obsolete.

Agda needs to be recompiled now when a datafile is added or changed.
Under Cabal, this is taken care of by their listing in
`extra-source-files`, Stack however needs some explicit nudge to
recompile (see commercialhaskell/stack#6689).
andreasabel added a commit to agda/agda that referenced this issue Feb 22, 2025
Via Template Haskell we embed the files in `src/data` as `ByteString`s
into the Agda binary.

They will be dumped into `$AGDA_DIR/share/$VERSION` when Agda starts
and this directory does not exist, or via `agda --setup`.

The $Agda_datadir becomes obsolete.

Agda needs to be recompiled now when a datafile is added or changed.
Under Cabal, this is taken care of by their listing in
`extra-source-files`, Stack however needs some explicit nudge to
recompile (see commercialhaskell/stack#6689).
@mpilgrem
Copy link
Member

mpilgrem commented Feb 22, 2025

Possibly relevant history:

The Cabal user guide currently documents that both extra-files (added cabal-verison: 3.14) and extra-source-files are to add files to an sdist archive, but the latter are also tracked by cabal build (first documented with Cabal 3.10, introduced in Cabal 3.4:

).

@mpilgrem
Copy link
Member

mpilgrem commented Feb 22, 2025

On Windows 11, I can't reproduce stack build (or stack run) not respecting changes to extra-source-files, per se:

❯ stack run
testESF-0.1.0.0: unregistering (local file changes: README.md)
testESF> build (lib + exe) with ghc-9.8.4

Also, I can't reproduce§ with package file-embed and $(embedDir "data") with lts-23.9 (GHC 9.8.4, template-haskell-2.21.0.0, file-embed-0.0.16.0) or lts-21.25 (GHC 9.4.8, template-haskell-2.19.0.0, file-embed-0.0.15.0). In each case, stack build recognises a change to README.md and embeds the changed file into the built executable.

§ EDIT: OK, I now see there is a problem when a new file is added to extra-source-files and Main.hs has already been compiled. I see that the Haddock documentation for embedFileIfExists has:

Warning: When a build is compiled with the file missing, a recompile when the file exists might not trigger an embed of the file. You might try to fix this by doing a clean build.

I am wondering if that is a more general problem with file-embed.

EDIT2: If I understand correctly, file-embed-0.0.16.0 has:

pairToExp :: FilePath -> (FilePath, B.ByteString) -> Q Exp
pairToExp _root (path, bs) = do
#if MIN_VERSION_template_haskell(2,7,0)
    qAddDependentFile $ _root ++ '/' : path -- <<< This tells GHC to track changes to file contents
#endif
    exp' <- bsToExp bs
    return $! TupE
#if MIN_VERSION_template_haskell(2,16,0)
      $ map Just
#endif
      [LitE $ StringL path, exp']

So, if a file is added subsequently: [1] GHC does not know to recompile Main.hs; and [2] Stack has no reason to recompile Main.hs either (as it has not changed).

@mpilgrem mpilgrem changed the title Changes to the extra-source-files do not trigger a rebuild file-embed: Additions to extra-source-files don't trigger rebuild of module Feb 22, 2025
@andreasabel
Copy link
Author

@mpilgrem Thanks for the analysis!

I see that you flagged this as resolution: upstream issue, does this mean that you think that the problem is solely in file-embed?
Then one would wonder why Cabal can do this correctly, but Stack not.

It seems that changes to extra-source-files are more aggressively triggering rebuilds in Cabal compared to Stack.
This was maybe annoying in the time when extra-doc-files (and now extra-files) did not exist so folks needed to abuse extra-source-files for the purpose of just adding files to the sdist tarball.

Just from a theoretical perspective, file-embed by itself has no means to get the case right when files are added to a directory embedded by embedDir. It can, when executed, mark all the files in the directory as involved in the build (qAddDependentFile), so that changes to these files will trigger a rebuild subsequently. But as it cannot know the future, it cannot prepare for files that are added to the directory. So in this case, the trigger for a rebuild has to come externally, namely from the build orchestrator (Stack or Cabal) that sees a new entry in the file list denoted by extra-source-files (so, after expansion of wildcards) and will force a total rebuild.

I see that you found the respective PR for cabal:

So you are likely aware of all of this already; I just wanted to write it down for the sake of clarity.

@mpilgrem
Copy link
Member

@andreasabel, I think Cabal (the tool) (version 3.14.1.0) has identical behaviour to Stack's, given the following experiment (no package.yaml file):

❯ stack exec --no-ghc-package-path -- cabal --version # Record Cabal version ...
cabal-install version 3.14.1.0
compiled using version 3.14.1.0 of the Cabal library

❯ stack exec --no-ghc-package-path -- ghc --version # Record GHC version in environment ...
The Glorious Glasgow Haskell Compilation System, version 9.8.4

❯ stack exec --no-ghc-package-path -- cabal run # First build of embed-datadir-0.1
Resolving dependencies...
Build profile: -w ghc-9.8.4 -O1
In order, the following will be built (use -v for more details):
 - embed-datadir-0.1 (exe:embed-datadir) (first run)
Configuring executable 'embed-datadir' for embed-datadir-0.1...
Preprocessing executable 'embed-datadir' for embed-datadir-0.1...
Building executable 'embed-datadir' for embed-datadir-0.1...
[1 of 2] Compiling Main             ( Main.hs, dist-newstyle\build\x86_64-windows\ghc-9.8.4\embed-datadir-0.1\x\embed-datadir\build\embed-datadir\embed-datadir-tmp\Main.o )
[2 of 2] Compiling Paths_embed_datadir ( dist-newstyle\build\x86_64-windows\ghc-9.8.4\embed-datadir-0.1\x\embed-datadir\build\embed-datadir\autogen\Paths_embed_datadir.hs, dist-newstyle\build\x86_64-windows\ghc-9.8.4\embed-datadir-0.1\x\embed-datadir\build\embed-datadir\embed-datadir-tmp\Paths_embed_datadir.o )
[3 of 3] Linking dist-newstyle\\build\\x86_64-windows\\ghc-9.8.4\\embed-datadir-0.1\\x\\embed-datadir\\build\\embed-datadir\\embed-datadir.exe
File: README.md
Content: "This is a test of the `file-embed` package,\napplied to the `data-dir` of a Haskell project.\n"
File: sub\Foo.md
Content: "This is a file in a subdirectory.\nIt contains 2 lines.\n"

❯ # Add new Foo2.md to data and *.cabal file

❯ stack exec --no-ghc-package-path -- cabal run # Rebuild, but no recompilation of Main ...
Resolving dependencies...
Build profile: -w ghc-9.8.4 -O1
In order, the following will be built (use -v for more details):
 - embed-datadir-0.1 (exe:embed-datadir) (configuration changed)
Configuring executable 'embed-datadir' for embed-datadir-0.1...
Preprocessing executable 'embed-datadir' for embed-datadir-0.1...
Building executable 'embed-datadir' for embed-datadir-0.1...
File: README.md
Content: "This is a test of the `file-embed` package,\napplied to the `data-dir` of a Haskell project.\n"
File: sub\Foo.md
Content: "This is a file in a subdirectory.\nIt contains 2 lines.\n"

❯ stack exec --no-ghc-package-path -- cabal clean # Start afresh ...

❯ stack exec --no-ghc-package-path -- cabal run # Now new Foo2.md is recognised ...
Resolving dependencies...
Build profile: -w ghc-9.8.4 -O1
In order, the following will be built (use -v for more details):
 - embed-datadir-0.1 (exe:embed-datadir) (first run)
Configuring executable 'embed-datadir' for embed-datadir-0.1...
Preprocessing executable 'embed-datadir' for embed-datadir-0.1...
Building executable 'embed-datadir' for embed-datadir-0.1...
[1 of 2] Compiling Main             ( Main.hs, dist-newstyle\build\x86_64-windows\ghc-9.8.4\embed-datadir-0.1\x\embed-datadir\build\embed-datadir\embed-datadir-tmp\Main.o )
[2 of 2] Compiling Paths_embed_datadir ( dist-newstyle\build\x86_64-windows\ghc-9.8.4\embed-datadir-0.1\x\embed-datadir\build\embed-datadir\autogen\Paths_embed_datadir.hs, dist-newstyle\build\x86_64-windows\ghc-9.8.4\embed-datadir-0.1\x\embed-datadir\build\embed-datadir\embed-datadir-tmp\Paths_embed_datadir.o )
[3 of 3] Linking dist-newstyle\\build\\x86_64-windows\\ghc-9.8.4\\embed-datadir-0.1\\x\\embed-datadir\\build\\embed-datadir\\embed-datadir.exe
File: README.md
Content: "This is a test of the `file-embed` package,\napplied to the `data-dir` of a Haskell project.\n"
File: sub\Foo.md
Content: "This is a file in a subdirectory.\nIt contains 2 lines.\n"
File: sub\Foo2.md
Content: "Added\r\n"

That would be consistent with the documentation in the Cabal User Guide, which says only 'partial' rebuilds:

... Files listed here are tracked by cabal build; changes in these files cause (partial) rebuilds.

@andreasabel
Copy link
Author

Thanks for clarifying this!

I think this is a bug in Cabal too, so I reported the issue:

@andreasabel
Copy link
Author

That would be consistent with the documentation in the Cabal User Guide, which says only 'partial' rebuilds:

... Files listed here are tracked by cabal build; changes in these files cause (partial) rebuilds.

Ha, I added this piece of documentation myself. But I guess I did not mean anything specific by "partial". I think I wanted to relativize "rebuilds" a bit since I had (and have) no clear model of Cabal's rebuild logic.

@mpilgrem
Copy link
Member

@andreasabel, you asked about my labelling this as an 'upstream issue'. It seems to me that both Stack and Cabal (the tool) are behaving as expected:

  • they are rebuilding when a new file is added to extra-source-files;
  • they are recompiling a built module when the past use of qAddDependentFile has told GHC that the module depends on a file and that file's contents have changed subsequently; and
  • they are not recompiling a built module when there is nothing to trigger a recompilation.

It also seems to me that what you are experiencing is a known limitation of file-embed, albeit one that is not documented as well in that package's Haddock documentation as it might be.

My immediate solution would be:

  • to improve Stack's online documentation to cover this use case (that is in hand); and
  • to raise a pull request to improve the Haddock documentation of the file-embed package.

@andreasabel
Copy link
Author

Ah thanks. I think have misunderstood what triggering a rebuild can do.

So then using embedDir might not be a good idea. In my actual use case, I already departed from this, but I thought I am dealing with a limitation of Stack/Cabal.

@andreasabel
Copy link
Author

Oops I have closed this by accident and reopening of an issue is not permitted here.
But I am ok either way.

@mpilgrem
Copy link
Member

@andreasabel, the solution (of sorts) is to add ghc-options: -fforce-recomp to the package description of the package making use of file-embed's embedDir function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants