Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Pandoc (commonmark-hs) #137

Closed
srid opened this issue Apr 25, 2020 · 17 comments · Fixed by #166
Closed

Switch to Pandoc (commonmark-hs) #137

srid opened this issue Apr 25, 2020 · 17 comments · Fixed by #166

Comments

@srid
Copy link
Owner

srid commented Apr 25, 2020

Requests for Pandoc arose a few times.

Let's switch to Pandoc commonmark-hs (which Pandoc will eventually be using as its markdown parser).

In order to migrate away from MMark to Pandoc, we will have to rewrite the replaceLink extension, which I've refactored to be general and small enough:

-- | MMark extension to replace links with some HTML.
replaceLink :: Map MarkdownLink (Html ()) -> Extension
replaceLink linkMap =
Ext.inlineRender $ \f -> \case
inline@(Ext.Link inner uri _title) ->
MarkdownLink (Ext.asPlainText inner) uri
& flip lookup linkMap
& fromMaybe (f inline)
inline ->
f inline

Essentially the extension takes a Map of links, and for each link it renders the given HTML in the final output (the Map is computed ahead by neuron).

The following should continue to work:

  • z:, zquery:, <ID>, etc links
  • Markdown YAML metadata (for title, date and tags)
@srid srid added the help wanted Extra attention is needed label Apr 25, 2020
@srid srid pinned this issue Apr 25, 2020
@srid srid removed the help wanted Extra attention is needed label Apr 27, 2020
@srid srid self-assigned this Apr 27, 2020
@srid
Copy link
Owner Author

srid commented Apr 27, 2020

According to jgm/commonmark-hs#1 Pandoc will eventually use commonmark-hs for parsing Markdown, and will thus be less buggy and more performant. It being pure Haskell parser is another advantage that I find to be relevant (for ghcjs support).

I'm inclined towards going with commonmark-hs at this point.

@srid srid changed the title Switch to Pandoc Switch to Pandoc (commonmark-hs) Apr 27, 2020
@Nadrieril
Copy link
Contributor

Nadrieril commented Apr 28, 2020

What's the goal in making the switch? I understood we wanted to open the possibility of using Pandoc features like citations or image properties, in which case we'd want to use Pandoc itself rather than commonmark-hs, right ?

@srid
Copy link
Owner Author

srid commented Apr 28, 2020

What's the goal in making the switch? I understood we wanted to open the possibility of using Pandoc features like citations or image properties,

That's correct; however that cannot be at the expense of dropping GHCJS support (I have another project in mind that will need this) or adding a buggy and less performant parser (see the link above). Fortunately, commonmark-hs will eventually be used as the parser in Pandoc; so by porting to commonmark-hs we will eventually be supporting Pandoc.

As for things like citations or image properties, I imagine they will eventually get the corresponding commonmark extension ported (there are already some extensions here of which fenced_divs today seems relevant to image properties); or we can write one ourselves, as commonmark's extension mechanism is more powerful than mmark's (the later can only customize HTML output; whereas with former you can write custom inline/block parsers).

@Nadrieril
Copy link
Contributor

Ok, if I understand correctly then, the move to commonmark-hs is mainly to have a more accepting parser and powerful extension mechanism. On top of that, we would be able to build some new features. Using the actual Pandoc AST and rendering code isn't planned for now.

In fact I don't think we care about Pandoc at all: as you say, commonmark-hs has a powerful extension mechanism and has already quite a few supported. If we ever want some Pandoc feature, commonmark-hs knows how to produce a Pandoc AST so we don't even care whether Pandoc ever uses commonmark-hs as its main markdown parser.

Potential issue: it looks like commonmark-hs does not support yaml headers yet jgm/commonmark-hs#17 .

Note that fenced_divs is not what we want for images; we want https://github.com/jgm/commonmark-hs/blob/master/commonmark-extensions/test/attributes.md .

@srid srid removed their assignment May 1, 2020
@michelk
Copy link
Contributor

michelk commented May 5, 2020

But on the other hand, if we would directly use pandoc (+ lua-filters), we could also support some other popular file-formats like org-mode or rst, which would come in handy in some situations.

I like for example the possibility to evaluate code blocks in emacs-org-mode: e.g.

#+TITLE: bla

#+NAME: someTable
#+BEGIN_SRC R :exports results :colnames yes
  data.frame(uno = c(1,2,3,4), dos = c('a', 'b', 'c', 'd'))
#+END_SRC

#+RESULTS: someTable
| uno | dos |
|-----+-----|
|   1 | a   |
|   2 | b   |
|   3 | c   |
|   4 | d   |

@maralorn
Copy link
Contributor

maralorn commented May 7, 2020

I am also strongly in favor of using pandoc for the markdown.

pandoc is a very viable option for writing complete papers, thesises or even books. Not being able reuse my multi-line formulas because they are wrapped as mathjax code blocks is kind of a killer for using neuron.

@srid
Copy link
Owner Author

srid commented May 7, 2020

@maralorn @michelk See here; the current Pandoc parser is infeasible because I need the neuron Markdown parser to work in GHCJS for other projects of mine.

Fortunately though, as @Nadrieril expressed here, commonmark-hs does provide a way to parse directly into the Pandoc AST. So any code that operates on the AST could immediately be supported, with extensions written for code operating on pre-parsed text. Pandoc's author explains the overall migration plan here:

The first step would be to use this library, instead of cmark-gfm, for
pandoc's 'commonmark' input, and gradually add extensions until
most of pandoc's major markdown extensions are supported. At that point
we might consider making 'commonmark' the default input format for
pandoc, instead of 'markdown'.

@Nadrieril
Copy link
Contributor

A possible solution for people who want to write their zettels in orgmode or use citations or whatnot, would be to make neuron more library-like, so that people can reuse the cool bits but e.g. provide their own code for getting input data. If neuron uses the Pandoc AST internally, it would be quite flexible. Somewhat like a more opiniated rib, maybe.

@michelk
Copy link
Contributor

michelk commented May 8, 2020

@srid you're mentioning GHCJS. I have also an related project in mind:

What about a flashcard system, similar to anki, where the title gets to the front and the body to back.

Just an idea...

@michelk
Copy link
Contributor

michelk commented May 8, 2020

I read something similar here.

@srid
Copy link
Owner Author

srid commented May 8, 2020

@Nadrieril That's an interesting idea, one that is worth exploring I think; but we can do this in the neuron executable (without opening up the library-based hydra). I'm currently playing with commonmark-hs and am keeping this in mind in the background.

Discussing with @felko over at zulip chat ... my thoughts are: If we support multiple markup, then commonmark would be one of them, to be used by *.md files. Zettels written in *.org would use Pandoc's org-mode parser, and *.pmd can use Pandoc's native markdown parser, and so forth. What's common between them is the Pandoc AST, which means all of neuron's link/query processing would operate directly on the Pandoc AST, without requiring the user to use a particular source markup (they could use whatever format, as long as it can be converted to Pandoc's AST). There may be unforeseen issues or what not; for example not all Pandoc source readers support the YAML block, which we use for title, date and tags.

@michelk Yup, a flashcard app is something I wanted to write myself, using reflex.

@srid srid mentioned this issue May 9, 2020
11 tasks
@srid
Copy link
Owner Author

srid commented May 10, 2020

The Pandoc AST feature branch (#166) is ready to use if anybody is interested in testing. I'll merge it to master soonish.

It parses text using commonmark, but uses the pandoc AST (thus neuron can potentially support other formats).

@michelk
Copy link
Contributor

michelk commented May 12, 2020

Thank you. I only had to change math-blocks from

    ```mathjax
    tt
    ```

to

    $$
    tt
    $$

and from

`$tt$` to $tt$

and we don't need to escape [ and ] anymore.

Thanks a lot.

@maralorn
Copy link
Contributor

That is exactly how it should be! So great!

@srid
Copy link
Owner Author

srid commented May 12, 2020

Cool!

I'm gonna finish #172 (which switches to reflex-dom; but I'll make sure to test that mathjax/etc. continue to render as before) before merging all of this to master.

@srid srid closed this as completed in #166 May 13, 2020
@srid
Copy link
Owner Author

srid commented May 13, 2020

This is now merged to master.

Note that as of #172 neuron uses reflex-dom (not pandoc's native HTML writer) to render the HTML from the AST. This README section contains instructions on how to hack on the renderer, for those interested in improving it.

@srid
Copy link
Owner Author

srid commented May 13, 2020

Oh, also note that installation instructions have changed. In particular, you would need to run this command:

nix-env -if https://github.com/srid/neuron/archive/master.tar.gz

You can simply run this again, in order to upgrade your current install. Just make sure that you are still using the cache.

@srid srid unpinned this issue May 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants