Encoding Documentation

1. Encoding of the printed text

  • 1.1 transcription conventions
    • 1.1.1 Spelling, special characters, punctuation, and spacing

      Each word has been captured as it appears on the page - spelling has not been regularized or corrected. Selected special characters have been captured using standard Unicode entity references. Long s has been captured simply as a keyboard s. Ligatures have been captured as separate letters. Printer errors have been captured as they appear and are not corrected. Punctuation has been captured as it appears in the original. If there is ambiguity as to which punctuation mark is being used, this has been considered as an illegibility (see section 1.1.9).

    • 1.1.2 Abbreviation markers

      Abbreviation markers in the text have been captured using the appropriate Unicode entity references. Wherever an abbreviation marker appears, the word has been placed in <abbr> tags, and the full word which has been abbreviated has been included in <expan>. <abbr> and <expan> are placed together inside <choice> tags. The information included in <expan> is there to aid searching; it will not be displayed in the running text.

    • 1.1.3 Hyphenation

      A distinction has been made between two different types of hyphenation. Soft hyphens (i.e. those that simply indicate that a word has been split across a line or page) have been captured using the Unicode reference &#x00AD;. Hard or genuine hyphens that indicate an actual hyphenated word, whether occurring inline or at the end of a line, have been recorded as Unicode &#x2010. The same distinction between soft and genuine hyphens has been made for instances found in catchwords.

      Examples:
      Genuine hyphen:
      <l>Though all the earth ore&#x2010;whelme them to mens eyes</l></sp>

      Soft hyphen:
      <p>I heard thee speake me a speech once, but it was neuer ac&#x00AD;
      <lb/>ted, or if it was, not aboue once, for the play I remember pleasd […]</p>

    • 1.1.4 Inverted type

      Where a character appears upside-down, it has been captured in <c> tags with a @rend attribute and the value “inverted.”

      Example:
      <l>Shoul<c rend="inverted">d</c> patch a wall t&#x0027;expell the waters flaw.</l>

    • 1.1.5 Superscript text

      Where a character appears superscripted, it has been captured in <c> tags with a @rend attribute and the value “superscript”. This chiefly occurs in handwritten annotation (see section 2.1). In the printed text, superscripted e in “ye” has been captured as the Unicode reference &#x0364; (that is, combining Latin small letter e).

    • 1.1.6 Decorative text

      Where a capital is dropped or decorated, it has been captured in <c> tags with a @rend attribute and the value “droppedCapital” or “decoratedCapital.” Decoration takes precedence over being dropped, i.e. if a character is both dropped and decorated, it has been captured as <c rend=“decoratedCapital”>W</c>.

    • 1.1.7 Brackets introducing textual features

      Opening brackets are sometimes used to indicate a turnover or turnunder line, or to introduce a stage direction. These have been captured in <c> tags with a @rend attribute and the value “turnover”, “turnunder” or “stageDirection”, as appropriate.

      Example:
      <l>Foregod my Lord well spoken, with good accent and
      <lb rend="turnunder"/><c rend="turnunder">(</c>good discretion.</l>

      (For more on the <lb/> @rend attribute, see section 1.2.2.)

    • 1.1.8 Illegible or unidentifiable material

      Any material which cannot be transcribed has been recorded as using a <gap/> tag. For a discussion of <gap/>s, see section 2.2 below.

  • 1.2. page layout
    • 1.2.1. Page and line breaks

      The pages of each quarto have been captured in the order in which they appear, with a <pb/> tag at the beginning of each page. The <pb/> tags contain an @xml:id attribute which contains a unique identifying string for the individual page, <pb facs="#ham-1605-22276a-bli-c01-image001" xml:id="ham-1605-22276a-bli-c01-001"/>. For details of the @facs attribute, see section 3 below. Each page of a double page spread has its own <pb/>, including blank pages. If there is only a single page on an image or the image is of binding, a single <pb/> is used.

      </lb> (line break) is used where there is a line break which is not indicated by a closing structural tag (e.g. </l>, </p>, </head>, </stage>).

      Example:
      <head>The Tragedie of
      <lb/><name type="character" key="ham">HAMLET</name>
      <lb/>Prince of Denmarke.</head

      Turnover or turnunder lines (where the end of a line has been printed just above or below the line it belongs to due to lack of space) have been captured in the @rend attribute of the <lb/> tag, e.g. <lb rend="turnover"/>.

    • 1.2.2 Highlighting

      <hi> tags have been used to indicate any typographic change, such as italicisation, where such information has not been recorded using the @rend attribute of another tag, as is the case with stage directions or speakers. The @rend attribute has been included whenever <hi> is used.

      Examples:
      <l>And prologue to the <hi rend="italic">Omen</hi> comming on</l>

      <speaker rend="italic">Ham.</speaker> <stage rend="italic, inline" type="exit">Exit</stage>

      Punctuation following or preceding a highlighted word has been captured outside the <hi> tags, unless the punctuation mark is itself highlighted.

    • 1.2.3 Decorative features

      Where a printed feature is purely decorative it has been captured as a <milestone> element with the attributes @unit and @rend. @unit always has the value “unspecified”; @rend can have the values “printersLine” and “decorativeBorder”.

    • 1.2.4 Forme works

      Catchwords, quire signatures, and running headers have been captured in <fw> forme work tags, with the appropriate @type attribute included – “signature”, “catchword” or “runningHeader”. Each <fw> also contains a @place attribute indicating its location on the page. The values for the @place attribute are:

      • • top-centre
      • • foot-centre
      • • foot-right

      Example:
      <fw type="signature" place="foot-centre">B</fw><fw type="catchword" rend="italic" place="foot-right">Mar.</fw>

  • 1.3 structural encoding
    • 1.3.1 Text: front, body, back

      Each copy of the play has been encoded as a single <text> containing <front> for front matter, <body> for the main body of the text, and <back> for back matter as appropriate.

      <front> contains the title page and any other material preceding the action of the play.

      <body> contains the play itself.

      <back> contains any material that appears after the close of the action of the play.

    • 1.3.2 Textual divisions

      Each of these larger divisions must contain within them <div>s, which will contain the individual structural divisions of the play. <div1>s are used with the @type attribute “act” and the @n attribute with appropriate numerical value for each of the play’s five acts. <div2>s are used with the @type attribute “scene” and the @n attribute with appropriate numerical value for each scene within an act.

      Example:
      <div1 type="act" n="1">
      <div2 type="scene " n="2">

      The original quartos contain no act and scene division – we have included the act and scene divisions conventionally adopted by editors, for the purpose of aiding navigation and searching rather than for display.

    • 1.3.3 Headings

      Headings which appear in the text have been recorded in <head> tags, at the beginning of the relevant <div>.

    • 1.3.4 Lines and paragraphs

      Verse lines have been encoded in <l> (line) tags.

      Prose passages have been encoded in <p> (paragraph) tags.

      Unless it is obviously prose, <l> tags have been used as the default.

      See also the use of <lb/> (line break) in section 1.2.1 above.

    • 1.3.5 Stage directions

      Stage directions have been recorded in <stage> tags, and a basic @type of direction included. The position of the stage direction and any font change have also been included using the @rend attribute. Positions include centred (i.e. in the middle of the text block), inline (printed within a line of the play text), and right-justified.

      Example:
      <stage rend="italic, centred" type="entrance">Enter <name type="character" key="bar">Bernardo</name>.</stage>

      Values for the type of stage direction are as follows:
      entrance—describes an entrance.
      exit —describes an exit.
      business—describes stage business.

    • 1.3.6 Speeches and speakers

      Each speech has been recorded in <sp> tags, and a @who attribute is included within each <sp> tag, containing a 3-letter key for each character.

      Within <sp>, the speaker has been recorded in <speaker> tags and the @rend attribute included.

      Example:
      <sp who="bar"><speaker rend="italic">Bar.</speaker>
      <l>Long liue the King.</l></sp>

      If there are multiple speakers, the 3-letter code for each character is included. Where the text indicates that “All” are speaking, but it is impossible to determine which characters are meant or where there are many unnamed characters involved, the code “all” is used. Where the text says “all” or “both” are speaking but it is clear which characters are speaking, the relevant character keys are used.

      The 3-letter codes used are:

      Hamlet ham
      Claudius cla
      Gertrude ger
      Polonius pol
      Corambis (Q1) crb
      Laertes lae
      Ophelia oph
      Reynaldo rey
      Montano (Q1) mon
      Horatio hor
      Ghost of Hamlet’s father gho
      Voltemand vol
      Cornelius cor
      Rosencrantz ros
      Guildenstern gui
      Francisco fra
      Barnardo bar
      Marcellus mar
      Gravedigger Clown 1 gr1
      Gravedigger Clown 2 gr2
      Fortinbras for
      Player who plays the King plk
      Player who plays the Queen plq
      Player who plays Lucianus luc
      Player who plays Prologue pro
      Courtier/Osrick osr
      Captain cap
      Messenger 1 me1
      Messenger 2 me2
      Gentleman 1 ge1
      Gentleman 2 ge2
      Sailor sai
      Lord lor
      Player pla (a link hasn’t inferred with plk)
      Doctor/Priest doc
      Ambassador amb
      All all

      Details of character keys are provided in the character list contained in the TEI header. See section 3 below.

    • 1.3.7 Trailers

      Where “Finis” appears at the end of each play, it has been recorded as <trailer>.

  • 1.4. title pages

    For all of the tags specific to the title page, the attribute @rend can be used where desired to represent typeface, italicisation, etc.

    <titlePage> should contain all of the title page material. Within this, <docTitle> will contain the name of the play (with <titlePart type="main”> within it), and <byline> the statement of authorial responsibility. Within <byline>, <docAuthor> contains the name of the author. <docEdition> contains the edition statement, where it appears, and <docImprint> the publication details, which appear at the foot of the title page. Within <docImprint>, <docDate> have been used to encode the date of publication.

    The printer’s mark has been tagged as <figure> at the point of the text where it appears, and includes a <figDesc> containing a description.

    Example, with handwritten annotation removed:

    <titlePage>
    <pb facs="#ham-1611-22277x-bod-c01-image007" xml:id="ham-1611-22277x-bod-c01-007b"/>
    <docTitle><titlePart type="main">THE
    <lb/>TRAGEDY
    <lb/>OF
    <lb/><name type="character" ref="#ham">HAMLET</name>
    <lb/>Prince of Denmarke.</titlePart></docTitle>
    <byline>BY
    <lb/><docAuthor>WILLIAM SHAKESPEARE</docAuthor>.</byline>
    <docEdition>Newly imprinted and enlarged to almost as much
    <lb/>againe as it was, according to the true
    <lb/>and perfect Coppy.</docEdition>
    <figure><figDesc>Printer&#x0027;s mark depicting a bird and the motto
    &#x201C;NON ALTVM PETO IS&#x201D;.</figDesc></figure>
    <docImprint>AT LONDON,
    <lb/>Printed for <hi rend="italic">Iohn Smethwicke</hi>, and are to be sold at his shoppe
    <lb/>in Saint <hi rend="italic">Dunstons</hi> Church yeard in Fleetstreet.
    <lb/>Vnder the Diall. <docDate>1611</docDate>.</docImprint>
    </titlePage>

  • 1.5. character name tagging

    Every instance of a character’s name (such as Hamlet, Ophelia, Horatio) in the running text of the play has been enclosed in <name> tags. This includes instances found in stage directions but excludes material captured in <speaker> or <fw> tags. @type and @ref attributes have been included. The @type attribute is always “character”. The @ref will be the 3 letter code supplied for each character – the same code, where relevant, that has been used for the @who attribute in <sp> tags. There will be significant variation in spelling between the different copies – the same character codes have been used throughout regardless of spelling. Only proper names are included in <name> tags, so character references such as “Ghost”, “King” and “Queen” are not included.

    <name> tags have been given to all characters that occur within the fictional universe of the play Hamlet. For this reason non-speaking, and non-appearing, characters, such as Lamord and Claudio, have name tags. Characters from fictional works within Hamlet’s fictional universe have been considered distinct, and so are not included in name tags. These characters include Gonzago and Baptista, from the play within the play, and Dido and Aeneas. Characters from history have similarly not been included in name tags, such as Julius Caesar.

    The 3-letter codes used are:

    Hamlet ham
    Claudius cla
    Gertrude ger
    Polonius pol
    Corambis (Q1) crb
    Laertes lae
    Ophelia oph
    Reynaldo rey
    Montano (Q1) mon
    Horatio hor
    Voltemand vol
    Cornelius cor
    Rosencrantz ros
    Guildenstern gui
    Francisco fra
    Barnardo bar
    Marcellus mar
    Fortinbras for
    Osrick osr
    Old Hamlet oha
    Old Fortinbrasse ofo
    Norway nor
    Claudio cld
    Lamord lam
    Yorick yor

    Details of character codes are provided in the character list contained in the TEI header. See section 3 below.

2. Manuscript Annotation and Damage

Any handwritten annotation present in any of the quartos has been captured, including additions, deletions, underlining and other marking. Where damage has made any part of the text, printed or handwritten, impossible to capture with certainty, this illegibility has been recorded together with an indication of the type of damage involved.

Project partner specialists at the Folger Shakespeare Library and the British Library have ensured that all manuscript additions, deletions, underlining and marking have been recorded for each file. Each tag checked includes a @resp attribute, indicating which institution has been responsible for the information contained in that tag. Possible values for the @resp attribute are bli for the British Library, fol for the Folger Shakespeare Library, and odl for the Oxford Digital Library.

  • 2.1. manuscript Annotation
    • 2.1.1 Additions

      <add> and <addSpan> indicate manuscript material added to a text, including material that has later been cancelled.

      <add> has been used to describe minor additions, such as characters or words. <add>s cannot float freely in the text; they always appear in a textual unit, such as <l> [line], <p> [paragraph] or <speaker>.

      <addSpan> has been used for larger textual interventions: for longer additions, and for passages which contain more complex features, such as (though not restricted to) a passage of several lines, entire speeches (of any length), or a sequence of manuscript interventions.

      <add> and <addSpan> include the attributes @hand, @place, @type and @resp. Both elements include the @n [number] attribute where necessary; see discussion below.

      @hand contains a code given to the hand responsible for the addition, e.g. <add hand="#aa" place="supralinear">thy</add>. <handNote> in the TEI header supplies the information about each hand, and the value of @hand points to that anchor, for example "#aa". For more on <handNote> see section 3 below.

      The values of @place can be:

      • • for the margins around the text block: margin-top, margin-bot, margin-left, margin-right
      • • for the margins of a mount in which a quarto’s page has been pasted: mount-top, mount-bot, mount-left, mount-right
      • • for additions on unprinted pages which do not fall obviously into either of the above values: textBlock
      • • for additions in the text block of a printed page: inline, where an addition appears within the line of printed text; supralinear and infralinear, where an addition to one line appears in the space above or below that line, respectively; interlinear, where an addition appears between lines and does not clearly belong to either of them

      The placement of <add> tags is largely governed by the location of the annotation on the page: <add>s are recorded in the closest valid tag. These include <fw>, <l>, <p> and <speaker>. Where an annotator has clearly indicated the addition belongs elsewhere, for example by using a line number, additions are transcribed in the indicated position.

      This is also the case where an annotation provides an alternative reading to the printed text. Here, the two interventions (<add> and <del>) are nested within one <subst> [substitution] tag in the running text. The use of this construction is not intended to imply a judgment about our editorial intention, but a description of the apparent intention of the annotator.

      For annotations, however brief, containing textual apparatus (e.g. <speaker>, <sp>, <l>), <addSpan> is used instead of <add>.

      The values of @type can be:

      • • for bibliographic and codicological details (including provenance, and other similar additions that are not textual interventions or textual notes): bibliographic
      • • for additions which complete partially cropped text: completedCropped (see discussion below, in section 2.1.4.)
      • • for graphic manuscript representations: figure
      • • for textual interventions: intervention
      • • for textual notes: note
      • • for additions which supply cropped printed text: suppliedCropped (see discussion below, in section 2.1.4.)
      • • for additions which supply uninked type: suppliedUninked (see discussion below, in section 2.1.4.)
      • • for additions with no evident textual purpose: unclear

      There is a distinction between @type="note" and <note>: the former is where a textual note has been added in manuscript; the latter is a note provided by the editors of the electronic text. Where editorial <note>s are added, a @resp has been included in the tag to indicate who is responsible for the information contained in the <note>.

      The @n attribute has been used to transcribe note markers where manuscript additions include them. Note markers can include, but are not restricted to, numbers, figures and words (sometimes entire lines). If a figure is recorded in this way, a <figure> tag has not been included as well. Numbers, words and any character for which we are using the Unicode entity have been transcribed as they appear, e.g.:
      <add n="19">hartely farwell.</add>
      <add n="Hora. Most like &#x0026;c">

      Where the note marker is a figure, the figure's name has been supplied in square brackets as the @n's value, e.g.:
      <addSpan n="[double triangle]"/>

      Where a note marker appears in the text and at the addition, it has been captured twice, once in the running text of the addition, and a second time in the @n attribute.

      Where a figurative MS marking is not functioning as a note marker it has been captured using a <figure> tag, with a <figDesc> within it, e.g.:
      <add place="margin-right" hand="#aa" type="figure" resp="#fol"><figure><figDesc>Triangle.</figDesc></figure></add>

      Possible descriptions of figures used in <figDesc> include:

      • arrow: a mark like an arrow, or arrow-head, used as a pointer
      • asterisk: frequently drawn as a small x-cross with a dot in each angle (Beal)
      • asterism: a group of three asterisks placed thus ( ) to direct attention to a particular passage (OED)
      • brace: a sign (} or ] or > or ), but may take more improvisational shapes) used in writing or printing, chiefly for the purpose of uniting together two or more lines, words, staves of music, etc. (adapted from OED)
      • caret: an inverted-v shaped mark placed in writing below the line, to indicate that something (written above or in the margin) has been omitted in that place (OED)
      • cross: two bars or lines [horizontal and vertical] crossing each other, used as a sign, ornament, etc.; mark or sign of small size used to mark a passage in a book, etc. (adapted from OED)
      • dash: a horizontal stroke (usually short and straight) (adapted from OED)
      • dot: a minute roundish mark (OED)
      • double oblique: two parallel slashes or diagonal strokes (adapted from OED)
      • double triangle: two adjoining triangles sharing a horizontal base line
      • flower: the representation of a flower of more than 3 or 4 petals (which would be trefoils and quatrefoils)
      • label: a slip of paper, cardboard, metal, etc. attached or intended to be attached (from OED)
      • line: a horizontal line, longer than a dash (and generally serving a different purpose) (adapted from OED)
      • manicule: hand or fist with pointing finger
      • marginal commas: single or double commas, sometimes inverted, used to mark a line or lines of text
      • mathematical formulas: use only for complex numeric equations or arithmetical problems; transcribe simple numeric or mathematical annotations in full
      • n.b.: abbreviation for nota bene, or “note well”
      • O: the letter considered with regard to its shape (OED)
      • oblique: a slash or diagonal stroke (adapted from OED)
      • quatrefoil: compound leaf or flower containing four, usually rounded, leaflets or petals radiating from a common centre (Beal)
      • scribble: a piece of random or casual doodling or drawing of unclear textual purpose, including pen trials made by writers to test a freshly-trimmed pen or a writing style (adapted from Beal)
      • stroke: a vertical stroke (usually short and straight) (adapted from OED)
      • trefoil: a leaf, such as a clover, comprising three rounded sections (Beal)
      • triangle: a rectilineal figure having three angles and three sides (OED)
      • X: the letter considered with regard to its shape (OED)

      Manuscript annotations have been transcribed on the same principles as printed text. Unicode character reference entities are used where necessary. Where abbreviations appear, they are treated in the same way. There is more variation in the various manuscript abbreviations: they can be more idiosyncratic, and they have a wide date range. Where a Unicode entity is not available, the transcription follows the text as closely as possible. For example, where a capital ‘N’ is followed by a superscript ‘o’ over two points, it has been transcribed:
      <abbr>N<c rend=“superscript”>o</c>..<expan>Number</expan></abbr>

    • 2.1.2 Deletions

      <del> describes any textual cancellation, including printed and manuscript text.

      It contains the attributes @hand, @type and@resp. It may use @n, as with <add> above.

      The values of @type can be:

      • braced, where cancelled text is indicated with a brace
      • crossed-braced, where cancelled text is indicated with a cross and a brace
      • crossed, where cancelled text is indicated with a cross
      • erased, where text has been erased (usually by abrasion)
      • gap, for details of which, see discussion below in section
      • obscured, where text has been partially or completely concealed, e.g. by scribbling
      • overwritten, where text is marked for deletion by a hand which supplies the alternative reading by writing over the original
      • struckThrough, where one or more distinct strokes have been made through text
      • substituted, where another reading is supplied with no other signal to delete the original text
      • underdotted, where text is marked for deletion by a series of dots under the line
      • underlined, where text is marked for deletion by underlining (often where an alternative reading is supplied)
    • 2.1.3 Underlining

      All underlined text has been transcribed as such, regardless of the apparent function of the underlining. Where necessary, encoding showing underlining has been nested within another tag (e.g. a <del>).

      Printed text underlined in manuscript has been transcribed using <addSpan> and <anchor> tags, except when the underlining functions as a mark of deletion (see section 2.1.2 above). Because underlining may continue across different structural units (e.g. both the speaker and the speech), two empty elements are used, the first to mark the point where the underlining begins and the second to mark the point where it ends, using a unique identifier to associate them with each other.

      For example,
      <l>Non-underlined text <addSpan hand="#aa" type="underlining" spanTo="#ham-1625-22278x-bod-c01-addSpan001"/>underlined text</l> <l>More underlined text<anchor xml:id="ham-1625-22278x-bod-c01-addSpan001"/>non-underlined text continues</l>

      The scheme for assigning unique identifiers follows the above example, to ensure that they are unique across all the texts, and not just within any one text.

      Where manuscript text is underlined by the same hand as the writing, it is captured more simply in <hi rend="underlined"> tags. Where it is underlined in a different hand, the <addSpan> method has been employed to make the distinction between the hands.

      Where an annotation functioning as a note marker is underlined, the fact has been recorded using the @rend attribute and the value “underlinedNoteMarker”.

    • 2.1.4 Complications

      If an annotation has been made, and then subsequently cancelled, the <del> tag has been nested within the <add> tag.

      For example:
      <add hand="#aa" place="textblock" type="bibliographic" resp="#odl"><del type="struckThrough" hand="#ab" resp="#odl">C.34. e.5</del></add>

      If an annotation cancels a printed reading, it is captured in a <subst>-<add>-<del> formation; where the annotation itself is subsequently cancelled, it has been transcribed in an <add> tag alone. This ensures one reading is left, rather than both being cancelled.

      Where one annotation is marked for two textual substitutions, it has been transcribed twice, once in each place.

      If an annotation has overwritten the printed word, transforming it into something else, the <subst> tag has been used. In this example, 'my' has been changed to 'thy':

      <subst><del type="overwritten" hand="#aa" resp="#odl">m</del><add hand="#aa" place="supralinear" type="intervention" resp="#odl">th</add></subst>y

      Further complications arise where the printed text is inadvertently illegible (as opposed to deleted).

      Where the printed text has been partially cropped, or is otherwise inadvertently illegible, and completed in manuscript, the affected characters have been transcribed once, in <add> tags, with the @type "completedCropped", e.g.:

      Ha<add type="completedCropped" place="inline">m</add>

      Completed cropped <add>s are not paired with <gap/>s: by definition, the text is partially there. They are distinct from completely cropped and supplied text.

      Where the printed text has been entirely cropped, or is otherwise inadvertently illegible, (to the extent that it could not be confidently transcribed) and supplied in manuscript, the <add> is paired with a <gap/>, and follows this pattern:

      <l><gap reason="absent" agent="torn" extent="7" unit="chars"/><add type="suppliedCropped" place="inline">And wag</add>er o<gap reason="absent" agent="torn" extent="2" unit="chars"/><add type="suppliedCropped" place="inline">re</add> your heads; he being remisse,</l>

      The use of the @type value "suppliedCropped" succinctly conveys the same meaning as the <subst> tag might: it pairs the <add> with the <gap/> to overcome the implication that we are recording more text than there is.

      Where an addition supplies missing character(s) (rather than replacing printed text), a <subst> tag has been used for reasons of searchability. To make it valid, a <del> as well as an <add> tag is needed. The @type of this <del> will be "gap" (indicating that it is not a true deletion), e.g.:

      <l>More rela<subst><del rend="gap"><gap reason="illegible" agent="uninkedType" unit="chars" extent="1" resp="#odl"/></del><add type="suppliedUninked" place="inline">t</add></subst>iue then this, the play's the thing</l>

  • 2.2. damage

    Where text is unreadable, the fact is recorded using the empty <gap/> element, e.g. <gap reason=“absent” agent=“cropped” extent=“1” unit=“chars” resp=“odl”/>.

    Each <gap/> has the attributes @reason, @agent, @extent, @unit and @resp.

    @reason can have the following values:

    • • illegible
    • • absent, used where something is missing – perhaps cropped - and we do not wish to imply its presence by using “illegible”
    • • foreign, used for alphabets other than Roman

    @agent can have the following values:

    • • abrasion
    • • bleedThrough
    • • cropped
    • • damagedType
    • • damp
    • • deletion
    • • excised
    • • faded
    • • foxed
    • • hole
    • • inkBlot
    • • partiallyInkedType, where text is visible but illegible
    • • repair
    • • stain
    • • torn
    • • unclear, where the agent cannot be determined
    • • uninkedType, where no text can be seen, but its presence can be inferred from a gap and reference to other copies of the edition
    • [alphabet name]

    @extent can have any numeric value. Where the extent is not known, the value will be given as "0".

    @unit can have the following values:

    • • chars
    • • words
    • • lines
    • • pages
    @resp can have the following values:
    • • bli British Library
    • • fol Folger Shakespeare Library
    • • odl Oxford Digital Library

    Where a <gap/> spans more than one word, the word unit takes precedence over characters. For example, a partially illegible 'Nay answer me' will be transcribed as 'Nay ans<gap unit="chars" extent="3"/> <gap unit="words" extent="1"/>. (Other attributes will also be included as normal.)

3. The TEI header and Copy-specific Information

The TEI Header was created according to the TEI P5 guidelines available at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html

Particular attention was given to the following consideration:

The bibliographic description of a machine-readable or digital text resembles in structure that of a book, an article, or any other kind of textual object. The file description element of the TEI header has therefore been closely modelled on existing standards in library cataloguing; it should thus provide enough information to allow users to give standard bibliographic references to the electronic text, and to allow cataloguers to catalogue it.
(P5: Guidelines for Electronic Text Encoding and Interchange, 2.2 The File Description)

The TEI Header in the SQA project was created with reference to the bibliographic records of the original works contained in the English Short Title Catalogue (ESTC). Anglo-American Cataloguing Rules (AACR2) informed the format of the title and author fields, and references were provided to Library of Congress Name Authority files wherever possible.

Full addresses were provided for the organisations responsible for creating the texts, to maximize accessibility.

Full copy-specific information was not provided. Instead, a link to the relevant ESTC record and the shelf mark of the original were given in the source description. Brief details of the holding library were encoded in a note. Details of surrogates available in the EEBO database were also provided when available. The manuscript description <msDesc> element contains details of the hands responsible for all handwritten annotations in the text. Information provided on these hands conforms to the following pattern:

  • Hand identifier: [#aa]
  • Scribe: [Name of annotator, if known]
  • Script: [secretary, copperplate, Italian etc.]
  • Medium: [brown [ink], pencil etc.]
  • Scope: [major or minor] (describes how widely the hand is used in this text)
  • Description: [further descriptive detail]

The script descriptions have been used consistently in all copies of Hamlet, based on the following list:

  • Italian
  • Mixed
  • Cursive
  • Roman
  • Round (Copperplate)
  • Secretary
  • Unidentifiable

"Unidentifiable" has been used where there is not enough information to judge the script, for example, in describing a hand which only provides dashes.

Standardized terms have also been used when describing locations that fall outside the text block:

  • Front and back pastedown
  • Flyleaf recto, flyleaf verso [to refer to any preliminary non-integral blank]
  • Endleaf recto, endleaf verso [to refer to any non-integral blank bound at the end]
  • Front board, Back board, Spine

Example of a <msDesc> element:

<msDesc><msIdentifier><repository>British Library</repository></msIdentifier>

<physDesc>

<handDesc>

<handNote xml:id="aa" scribe="hand-anon1" medium="pencil" scope="minor">This nineteenth-century hand appears only in the library housekeeping annotation ‘K. Shakespeare (W.)’ on the blank page facing the titlepage.</handNote>

<handNote xml:id="ab" scribe="hand-anon2" medium="pencil" scope="minor">This nineteenth-century hand appears only in the library housekeeping annotation ‘C.34.k.4’ (the shelfmark) and the crossing through of the previous shelfmark on the blank page facing the titlepage.</handNote>

<handNote xml:id="ac" scribe="hand-anon3" medium="pencil" scope="minor">This nineteenth-century hand appears only in the library housekeeping annotation ‘C.34.e.5’ (shelfmark, crossed through by hand <ref target="#ab">ab</ref>) on the blank page facing the titlepage.</handNote>

<handNote xml:id="ad" scribe="hand-cap" medium="pencil" scope="minor">This eighteenth-century hand appears only in the bibliographic annotations on the titlepage. ‘H.XXXII’ refers to the play's place in the original bound volumes, which were arranged by Edward Capell as he catalogued Garrick's collection.</handNote>

<handNote xml:id="ae" scribe="hand-anon4" medium="pencil" scope="minor">This nineteenth-century hand appears only in numeral on sig. B2r .</handNote>

<handNote xml:id="af" scribe="hand-anon5" medium="brown" scope="minor">This seventeenth-century hand appears only in two minor annotations on sig. D3v .</handNote>

<handNote xml:id="ag" scribe="hand-anon6" medium="brown" scope="minor">This seventeenth-century hand appears only in one annotation on page F3v .</handNote>

<handNote xml:id="ah" scribe="hand-anon7" medium="brown" scope="minor">This seventeenth-century hand appears only in one annotation on page N4v .</handNote>

<handNote xml:id="ai" scribe="hand-anon8" medium="pencil" scope="minor">This eighteenth-century hand appears only in one annotation on page N4v .</handNote>

</handDesc>

<bindingDesc><p>In a 19th-century English gold tooled red grained sheep binding with the coat of arms of David Garrick tooled in gold in the centre of both covers. Author, title, place and date of publication are lettered in gold up the flat <locus facs="#ham-1611-22277x-bli-c01-image057">spine</locus> within a gold cartouche “SHAKESPEARE. HAMLET. LONDON. 1611”. The edges of the boards and the turn-ins are gold tooled. The edges of the leaves are gilt. With comb marbled paper endleaves. Signed at the top on the turn-in of the upper cover “TUCKETT. BINDER. BRITISH MUSEUM”.</p></bindingDesc>

<accMat><p>Yellow ink stamp <stamp>BRITISH MUSEUM</stamp> on <locus facs="#ham-1611-22277x-bli-c01-image007">facsimile image 007a</locus>.</p>

<p>Yellow ink stamp <stamp>BRITISH MUSEUM</stamp> on <locus facs="#ham-1611-22277x-bli-c01-image057">facsimile image 057a</locus>.</p></accMat>

</physDesc>

</msDesc>

<msDesc>, as seen in the example above, also contains a description of the physical book that was the carrier for the text of Hamlet This includes a description of the binding, and any accompanying material, such as pasted-in catalogue descriptions, library stamps, and other material more closely associated with the physical book than the text of Hamlet. Links were provided to the appropriate facsimile image wherever possible, using the <locus> tag.

The association between encoded text and digital surrogate was created by means of the <facsimile> element. A <surface> was defined for each digital image, with the possibility of providing a link to each version of that image (e.g. jpg, tiff) and a brief description of the image contents (e.g. fol. B2v-B3r). Time constraints meant that the <facsimile> element was not exploited as fully as it might have been, and these links and descriptions are not provided in full.

Example:

<facsimile>
<surface xml:id="ham-1611-22277x-bli-c01-image001"><desc>Image 001</desc><graphic url="ham-1611-22277x-bli-c01-001.jpg"/></surface> <surface xml:id="ham-1611-22277x-bli-c01-image002"><desc>Image 002</desc><graphic url="ham-1611-22277x-bli-c01-002.jpg"/></surface> <surface xml:id="ham-1611-22277x-bli-c01-image003"><desc>Image 003</desc><graphic url="ham-1611-22277x-bli-c01-003.jpg"/></surface>
[…]
</facsimile>

Both the <locus> tags in the TEI Header Manuscript Description, and the <pb/> elements in the main text are linked to a <surface> element by means of the @facs attribute.

Examples:

<pb facs="#ham-1611-22277x-bli-c01-image004" xml:id="ham-1611-22277x-bli-c01-004a"/>

<accMat><p>Yellow ink stamp <stamp>BRITISH MUSEUM</stamp> on <locus facs="#ham-1611-22277x-bli-c01-image007">facsimile image 007a</locus>.</p>

The header also makes use of <xi:include> to associate each file with further information relating to the named characters in the text, and the individuals whose handwriting is present in the annotations.

The list of characters links all occurrences of a particular named character across all of the copies and editions, regardless of spelling, by means of a three-letter code assigned to each <sp> and <name> element. This makes it possible to associate, for example, Guildensterne, Guyl, Guil, Guyldensterne, Gilderstone and Gyldensterne (among many other spellings) without compromising the capture of the original spelling. In the list contained in the TEI header, the most familiar modern spelling of the name, taken from the Arden Shakespeare edition, is given to identify the three-letter code.

Example:

<person xml:id="gui"><p><persName>Guildenstern</persName> Hamlet's fellow student</p></person>

In the list of hands, brief biographical information is provided about each scribe. This includes their full name, any identifying characteristic (such as left-sloping handwriting) and dates of birth and death where known.

Example:

<person xml:id="hand-cob"> <persName> <forename type="first">Thomas</forename> <forename type="middle">James</forename> <surname>Cobden-Sanderson</surname> </persName> <birth>1840</birth> <death>1922</death> </person>

4. The Advisory Forum of June 2008

To help define the potential uses of the resource and to inform early editorial decisions in the Hamlet pilot project, the Bodleian Library's Oxford Digital Library (ODL) hosted an Advisory Forum on June 6, 2008. The invited Advisory Group was drawn from experts in the field: Early Modern scholars and editors, creative practitioners and educational specialists, librarians and digital humanists. The Advisory Forum included round table and small group discussions, addressing direct questions as well as encouraging open conversation. Helpful information came come from informal discussion as well as from structured sessions.

For an overview of participants, programme agenda, and a brief report on the day's outcomes, see the final report (SQA Advisory Forum 6 June 2008 final report).