Markup - why and how

Heinz Wittenbrink

2017-10-24

Goals of the session

Understanding how digital (text) content is stored and processed

Knowledge of basic characteristics of markup languages

  • Descriptive Markup
  • Difference of text and markup

Basic knowledge about text encoding

  • How is text translated into bytes?
  • ASCII and Unicode

Knowledge of basics of markup processing

  • What is parsing?
  • Document Object Model

Descriptive Markup

Essentials

  • Digital content is mostly stored as marked up text.
  • The grammar of the markup is standardized.
  • The standardized markup is the base of the processing of the text by software.
  • How software processes the text depends largely on the quality of the markup.

Markup before the digital age

The idea and terminology evolved from the “marking up” of paper manuscripts, i.e., the revision instructions by editors, traditionally written with a blue pencil on authors’ manuscripts.

Markup language - Wikipedia

Example for Markup

Source: Markup is


<!DOCTYPE html>

<html>

    <head>

        <title>Sample Manuscripts</title>

        <style type="text/css">

        h1 {font-family: Helvetica, sans-serif; font-size: 16}

        p {font-family: Times-Roman, serif; font-size: 16}

        </style>

    </head>

    <body>

        <h1>Sample Manuscripts throughout the Ages</h1>

        <p>The funny thing about sample manuscripts is that they never say anything really interesting.</p>

    </body>

</html>

Procedural vs. descriptive markup

  • Control of defined applications by procedural markup: Defines the state of processing applications
  • Descriptive markup: Describes text and leaves the processing to the application

Example PCL


Ec(s0Saufrechter Text Ec(s1Skursiv gesetzter Text

How does a computer distinguish text and markup?

  • Markup delimiters in HTML and XML: <, >,
  • Markup delimiters for Entities: &, ;(referencing special characters, escaping the literal values of < and >
  • Possible markup delimiters in SGML: [, ], {, }

Encoding: Transforming text into bytes

ASCII Table and Description

Source: Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion

Extended ASCII Codes

Source: Ascii Table - ASCII character codes and html, octal, hex and decimal chart conversion

Unicode

Source: How does unicode SMS Works? - Quora

UTF8-Encoding

Parsing: How a computer reads text

Parsing (US: /ˈpɑːrsɪŋ/; UK: /ˈpɑːzɪŋ/), syntax analysis or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).

Parsing - Wikipedia

SAX Parser
DOM Parser

Quelle: The Developer’s Digest - Spend your day here.: Parse XML with Java - SAX and DOM parser

The DOM: A technical model for digital infomation

DOM Tree

Rendering of an HTML page

An introduction to browser rendering

// reveal.js plugins