Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in-image book reader #222

Open
wants to merge 32 commits into
base: master
Choose a base branch
from
Open

Add in-image book reader #222

wants to merge 32 commits into from

Conversation

LinqLover
Copy link
Collaborator

image

About this version of the book

Squeak by Example is a classical book intended for PDF/print format in the first place. Nevertheless, we made some efforts to make it available right in Squeak. For this, we wrote our own LaTeX parser that uses Sandblocks' Tree-sitter integration and a LaTeX grammar for tree-sitter to convert over 10,000 SLOC of text into Squeak objects (don't try this at home!). You can read the result in this help structure or search it using the search bar at the top. Note that this parser is meant to be an approximation only and we didn't get every space right to save our own sanity. Nevertheless, if you've got any questions or feedback, we're there at our GitHub repo: https://github.com/hpi-swa-lab/SqueakByExample-english

Quickstarter to try out the result (works on both Squeak 6.0 and Squeak 6.1Alpha):

Metacello new
	baseline: 'SBE';
	repository: 'github://hpi-swa-lab/SqueakByExample-english:extract-book/SmalltalkSources';
	get;
	load: #'SBE-Book'.

book := (Smalltalk classNamed: #SBEBook) readFrom: (WebClient httpGet: 'https://github.com/LinqLover/SqueakByExample-english/raw/extract-book-builds/sbe60.sbebook') content.
(Smalltalk classNamed: #SBEHelp) book: book.
HelpBrowser open model showTopicNamed: #SBEHelp.

Implementation

This PR adds two (and a half) packages:

  • SBE-Book contains the DOM for the parsed book: a node hierarchy of parts, chapters, etc.; a HelpTopic adapter; and a couple of custom TextAttributes for UI-theme-dependent styling and serialization.
    • SBE-BookCompatibility-Squeak60 provides extension methods for compatibility.
  • SBE-ExtractBook uses Sandblock's DomainCode interface for parsing the SBE LaTeX sources using a tree-sitter grammar and compiling them into the DOM structure.

A note on cost-benefit:

  • LaTeX parsing (to some extent) works. As described in SBELatexBookExtractor class>>#todo, all of this is very challenging because the used grammar is imperfect and LaTeX is non-context-free by nature. E.g., a single \ct{$} anywhere in a file will result in an unsound (yet accessible thanks to sb-tree-sitter) AST for the entire chapter.

  • I personally do not set any expectations wrt internal quality for this component. Seems not worth the effort for me, for now I was mainly interested in the output.

  • For running the extractor yourself, some dependencies are required that have not yet been merged upstream. The extractor is not 6.0 compatible.

  • I did a lot of fine-tuning to get the extractor to parse this particular book into a somewhat acceptable form. Still, this is very time-consuming, and there are several things that I left out for now. This includes:

    • Occasional unparsed LaTeX fragments in some sections
    • A few incomplete/missing code snippets due to escaping issues
    • Many redundant/missing spaces and some surprising linebreaks
    • Currently unsupported concepts, e.g., tables, index, bibliography, real footnotes

    Nevertheless, the large majority of pages render correctly and look somewhat nice, and it means we can finally read/search/analyze Squeak by Example in the image!

I will maybe do some very minor tweaking of the UI but 80% of the project is done and I am currently not planning to address the last 19.5%. However, if you find any urgent issues, I will be happy to take a look at them. Otherwise, I'd like to release this soon. Wdyt, can we merge this and ship it to our students? :-)

@LinqLover LinqLover self-assigned this Nov 21, 2023
@LinqLover
Copy link
Collaborator Author

LinqLover commented Nov 25, 2023

Open todo:

  • Run extractor on CI
  • Document installation of reader in readme and/or in book. Make clear extractor is optional and should not hinder overall development/authoring process.

Copy link
Member

@codeZeilen codeZeilen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work! :) I can barely comment on the details, but what I reviewed looked good to me.

@LinqLover
Copy link
Collaborator Author

@codeZeilen Thank you! :-) I guess I'm not in a hurry to merge this but will point students to this branch, but would you agree with including the extractor into the CI and mentioning the in-image-reader in the readme and in the preface of the book in the long term? If yes, would we want to do this before or after archiving the 6.0 version on a separate branch (similar to #130)? :)

@LinqLover
Copy link
Collaborator Author

Same question for #225. My $0.02: Given that we still ship the 6.0 version to students, it makes sense to me to keep it maintained in the default branch for now until we release the next edition. Maybe we would even want to release a 6.0.1 edition at some time, but not necessarily. Wdyt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants