grab_pxml/2

Module: pxml

grab_pxml/2 — reads the pxml term found in file Path
grab_pxml_with_tagged/3 — read PXML term in FilePath, including tagged components
grab_pxml_with_paths/5 — read PXML term in FilePath, tagged component tags and paths
parse_html_toks_to_pxml_vals/3 — parse a list of HTML-tokens
parse_html_toks_to_pxml/5 — parse a list of HTML-tokens
read_pxml_term/7 — - read a PXML term out of Tokens
read_pxml_comment/3 — read an HTML comment into PXML
unary_tag/1 — specifies syntactic roles tags

FORMS

grab_pxml(Path, PXML)

grab_pxml_with_tagged(FilePath, PXML, TagVals)

grab_pxml_with_paths(FilePath, PXML, TagVals, TgtTags, Paths)

parse_html_toks_to_pxml_vals(Tokens, PXML, TagVals)

parse_html_toks_to_pxml(Tokens, Terms, StackIn, StackOut, TagsValsDList)

read_pxml_term(Tokens, Term, RestTokens, StackIn, StackOut,

read_pxml_comment(Tokens, Features, RestTokens)

unary_tag(T)

DESCRIPTION

grab_pxml/2 Calls grab_html_tokens/2 to read the list L of HTML tokens out of Path, and then parses L into a (single !doctype) PXML term.

grab_pxml_with_tagged/3 Calls grab_html_tokens/2 to read the list L of HTML tokens out of FilePath, and then parses L into a (single !doctype) PXML term, where it accumulates tags of component terms in TagVals, with the tagged terms accumulated in (lists) on TagVals.

grab_pxml_with_paths/5 Calls grab_html_tokens/2 to read the list L of HTML tokens out of FilePath, and then parses L into a (single !doctype) PXML term, the the tagged terms (eqns) accumulated in (lists) on TagVals, and also accumulates pairs (Stack, Term) on Paths, where:
1) TgtTags is a list of HTML tags,
2) Stack is list [Tg1, Tg2 | …] of HTML terms representing the reversed parser stack with Tg1 belonging to TgtTags, and
3) Term was parsed out as Tg1 was popped from Stack.

parse_html_toks_to_pxml_vals/3 Calls parse_html_toks_to_pxml/5, ignoring the Stack arguments.

parse_html_toks_to_pxml/5 The workhorse. Parses a list of HTML-tokens, as produced by read_tokens/5 in html_tokens.pro, into a list of Prolog Terms consituting a PXML representation of the source. The pair (StackIn, StackOut) implements the parser stack.
The difference list

  TagsValsDList

provides a means of capturing components of the PXML output. ( See the comment for handle_tag/6 for a description of TagsValsDList. )

read_pxml_term/7 Reads the (largest) PXML term possible starting at the beginning of Tokens.

read_pxml_comment/3 Read from Tokens an HTML comment into Features, leaving RestTokens.

unary_tag/1 Syntactic roles of tags:
Spec rules about optional tags:
  https://html.spec.whatwg.org/multipage/syntax.html#optional-tags
See also:
  https://html.spec.whatwg.org/multipage/syntax.html
  https://html.spec.whatwg.org/multipage/parsing.html
unary_tag/1 is exported for use by pxml_utils.pro.

EXAMPLES