-
HTML has elements like this:
used this command:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
First, note that For parsing HTML pages that include math in For parsing HTML pages including raw mathml, you don't need to do anything; pandoc will recognize this as math. Your input is quite strange. I assume this is not the raw page source but the JavaScript-processed output? You will have better luck with the raw page source. In any case, this has the following structure:
So, to process this cleanly with pandoc we'll want to get rid of the cruft. We can use a Lua filter to remove the spans we don't want: function Span(el)
if el.classes:includes("MathJax_Preview") then
return {}
elseif el.classes:includes("mjx-chtml") then
return {}
elseif el.classes:includes("MJX_Assistive_MathML") then
return el.content
else
return el
end
end Now we can process your input with our filter (call it
or if you prefer mathml:
|
Beta Was this translation helpful? Give feedback.
First, note that
--mathjax
is an option for HTML output; it won't affect parsing of HTML at all.For parsing HTML pages that include math in
$..$
or\(..\)
and are meant to be processed with MathJax, you can use-f html+tex_math_dollars
or-f html+tex_math_single_backslash
.For parsing HTML pages including raw mathml, you don't need to do anything; pandoc will recognize this as math.
Your input is quite strange. I assume this is not the raw page source but the JavaScript-processed output? You will have better luck with the raw page source.
In any case, this has the following structure: