Distributed Proofreaders 29 titles preserved for the world!
  DP
ID: Password:  ·  Register  ·  Help  
 

Formatting Guidelines

Version 1.9.c, generated January 1, 2006      

Formatting Guidelines in French / Directives de Formatage en français
Formatting Guidelines in Portuguese / Regras de Formatação em Português

Check out the Formatting Quiz!

  Table of Contents
 
 
  • Formatting of the...
 
 
  • Specific Guidelines for Special Books
 
 
  • Common Problems
 
   

The Primary Rule

"Don't change what the author wrote!"

The final electronic book seen by a reader, possibly many years in the future, should accurately convey the intent of the author. If the author spelled words oddly, leave them spelled that way. If the author wrote outrageous racist or biased statements, leave them that way. If the author puts italics, bold text or a footnote every third word, mark them italicized, bolded or footnoted.(See Printer's Errors for proper handling of obvious misprints.)

We do change minor typographical conventions that don't affect the sense of what the author wrote. For example, we rejoin words that were broken at the end of a line (End-of-line Hyphenation). Changes such as these help us produce a consistently formatted version of the book. The rules we follow are designed to achieve this result. Please carefully read the rest of these Guidelines with this concept in mind.

To assist the next formatter and the post-processor, we also preserve line breaks. This allows them to easily compare the lines in the text to the lines in the image.

 

Summary Guidelines

The Formatting Summary is a short, 2-page printer-friendly (.pdf) document that summarizes the main points of these Guidelines, and gives examples of how to format. Beginning formatters are encouraged to print out this document and keep it handy while formatting.

You may need to download and install a .pdf reader. You can get one free from Adobe® here.

About This Document

This document is written in order to reduce formatting differences when formatting of one book is distributed among many volunteers, each working on different pages of the book. This helps us all do formatting the same way. That makes it easier for the post-processor to eventually combine all these pages into one e-book.

It is not intended as any kind of a general editorial or typesetting rulebook.

We've tried to include in this document all the items that new users have asked about formatting and proofreading. If there are any items missing, or items that you consider should be done differently, or if something is vague, please let us know.

This document is a work in progress. Help us to progress by posting your suggested changes in the Documentation Forum in this thread.

Project Comments

On the proofreading interface page (Project Page) where you start formatting pages, there is a section called "Project Comments" containing information specific to that project (book). Read these before you start formatting pages! If the Project Manager wants you to format something in this book differently from the way specified in these Guidelines, that will be noted here. Instructions in the Project Comments override the rules in these Guidelines, so follow them. (This is also where the Project Manager may give you interesting tidbits of information about the author or the project.)

Please also read the Project Thread: The Project Manager may clarify project-specific guidelines here, and it is often used by volunteers to alert other volunteers to recurring issues within the project and how they can best be addressed.

On the Project Page, the link 'Images, Pages Proofread, & Differences' allows you to see how other volunteers have made changes. This Forum thread discusses different ways to use this information.

Forum/Discuss this Project

On the proofreading interface page (Project Page) where you start formatting pages, on the line "Forum", there is a link titled "Discuss this Project" (if the discussion has already started), or "Start a discussion on this Project" (if it hasn't). Clicking on that link will take you to a thread in the projects forum dedicated to this specific project. That is the place to ask questions about this book, inform the Project Manager about problems, etc. Using this project forum thread is the recommended way to communicate with the Project Manager and other volunteers who are working on this book.

Fixing errors on Previous Pages

When you select a project for formatting, the Project Comments page is loaded. This page contains links to pages from this project that you have recently worked on. (If you haven't proofread or formatted any pages yet, there will be no links shown.)

Pages listed under either "DONE" or "IN PROGRESS" are available to make corrections or to finish formatting. Just click on the link to the page. So if you discover that you made a mistake on a page, or marked something incorrectly, you can click on that page here and re-open it to fix the error.

You may also use the "Images, Pages Proofread, & Differences" or "Just My Pages" links on the Project Comments page. These pages will display an "Edit" link next to the pages you have worked on in the current round that can still be corrected.

For more detailed information, refer to either the Standard Proofreading Interface Help or the Enhanced Proofreading Interface Help, depending on which interface you are using.

Formatting of the...

Front/Back Title Page

Format all the text, just as it was printed on the page, whether all capitals, upper and lower case, etc., including the years of publication or copyright.

Older books often show the first letter as a large ornate graphic—format this as just the letter.

Sample Image:
title page image
Correctly Formatted Text:

GREEN FANCY

BY

GEORGE BARR McCUTCHEON

AUTHOR OF "GRAUSTARK," "THE HOLLOW OF HER HAND,"
"THE PRINCE OF GRAUSTARK," ETC.

<i>WITH FRONTISPIECE BY</i>
<i>C. ALLAN GILBERT</i>

NEW YORK
DODD, MEAD AND COMPANY

1917

Table of Contents

Format the Table of Contents just as it is printed in the book, whether all capitals, upper and lower case, etc. and surround it with /* and */. Leave a blank line between these markers and the rest of the text. Page number references should be retained and be placed at least six spaces past the end of the line.

Remove any periods or asterisks (leaders) used to align the page numbers.

Sample Image:

Correctly Formatted Text:

CONTENTS

/*
CHAPTER                                         PAGE

I. <sc>The First Wayfarer and the Second Wayfarer
Meet and Part on the Highway</sc>      1

II. <sc>The First Wayfarer Lays His Pack Aside and
Falls in with Friends</sc>      15

III. <sc>Mr. Rushcroft Dissolves, Mr. Jones Intervenes,
and Two Men Ride Away</sc>      35

IV. <sc>An Extraordinary Chambermaid, a Midnight
Tragedy, and a Man Who Said "Thank You"</sc>      50

V. <sc>The Farm-boy Tells a Ghastly Story, and an
Irishman Enters</sc>      67

VI. <sc>Charity Begins Far from Home, and a Stroll in
the Wildwood Follows</sc>      85

VII. <sc>Spun-gold Hair, Blue Eyes, and Various Encounters</sc>      103

VIII. <sc>A Note, Some Fancies, and an Expedition in
Quest of Facts</sc>      120

IX. <sc>The First Wayfarer, the Second Wayfarer, and
the Spirit of Chivalry Ascendant</sc>      134

X. <sc>The Prisoner of Green Fancy, and the Lament of
Peter the Chauffeur</sc>      148

XI. <sc>Mr. Sprouse Abandons Literature at an Early
Hour in the Morning</sc>      167

XII. <sc>The First Wayfarer Accepts an Invitation, and
Mr. Dillingford Belabors a Proxy</sc>      183

XIII. <sc>The Second Wayfarer Receives Two Visitors at
Midnight</sc>      199

XIV. <sc>A Flight, a Stone-cutter's Shed, and a Voice
Outside</sc>      221
*/

Blank Page

Format as [Blank Page] if both the text and the image are blank.

If there is text in the formatting text area and a blank image, or if there is an image but no text, follow the directions for a Bad Image or Bad Text.

Page Headers/Page Footers

Remove page headers and page footers, but not footnotes, from the text.

The page headers are normally at the top of the image and have a page number opposite them. Page headers may be the same all through the book (often the title of the book and the author's name), they may be the same for each chapter (often the chapter number), or they may be different on each page (describing the action on that page). Remove them all, regardless, including the page number.

A chapter header will start further down the page and won't have a page number on the same line. See the next section for a specific example.


Sample Image:

Correctly Formatted Text:
/#
In the United States?[A] In a railroad? In a mining company?
In a bank? In a church? In a college?

Write a list of all the corporations that you know or have
ever heard of, grouping them under the heads <i>public</i> and <i>private</i>.

How could a pastor collect his salary if the church should
refuse to pay it?

Could a bank buy a piece of ground "on speculation?" To
build its banking-house on? Could a county lend money if it
had a surplus? State the general powers of a corporation.
Some of the special powers of a bank. Of a city.

A portion of a man's farm is taken for a highway, and he is
paid damages; to whom does said land belong? The road intersects
the farm, and crossing the road is a brook containing
trout, which have been put there and cared for by the farmer;
may a boy sit on the public bridge and catch trout from that
brook? If the road should be abandoned or lifted, to whom
would the use of the land go?
#/




CHAPTER XXXV.

<sc>Commercial Paper.</sc>


<b>Kinds and Uses.</b>--If a man wishes to buy some commodity
from another but has not the money to pay for
it, he may secure what he wants by giving his written
promise to pay at some future time. This written
promise, or <i>note</i>, the seller prefers to an oral promise
for several reasons, only two of which need be mentioned
here: first, because it is <i>prima facie</i> evidence of
the debt; and, second, because it may be more easily
transferred or handed over to some one else.

If J. M. Johnson, of Saint Paul, owes C. M. Jones,
of Chicago, a hundred dollars, and Nelson Blake, of
Chicago, owes J. M. Johnson a hundred dollars, it is
plain that the risk, expense, time and trouble of sending
the money to and from Chicago may be avoided,

[Footnote A: The United States: "Its charter, the constitution. * * * Its flag the
symbol of its power; its seal, of its authority."--Dole.]

Chapter Headers

Format chapter headers as they appear in the text.

A chapter header may start a bit farther down the page than the page header and won't have a page number on the same line. Chapter Headers are often printed all caps; if so, keep them as all caps. Chapter Headers are usually printed in a larger font which may appear to be bold, but we do not mark them as bold text; however you should include italics or small-caps markup if it appears in the header.

Put 4 blank lines before the "CHAPTER XXX". Include these blank lines even if the chapter starts on a new page; there are no 'pages' in an e-book, so the blank lines are needed. Then leave 1 (one) blank line between each additional part of the chapter header, such as a chapter description, opening quote, etc., and finally leave 2 (two) blank lines before the start of the text of the chapter.

Old books often printed the first word or two of every chapter in all caps or small caps; change these to upper and lower case (first letter only capitalized).

Watch out for a missing double quote at the start of the first paragraph, which some publishers did not include or which the OCR missed due to a large capital in the original. If the author started the paragraph with dialog, insert the double quote.

Sample Image:

Correctly Formatted Text:
GREEN FANCY




CHAPTER I

THE FIRST WAYFARER AND THE SECOND WAYFARER
MEET AND PART ON THE HIGHWAY


A solitary figure trudged along the narrow
road that wound its serpentinous way
through the dismal, forbidding depths of
the forest: a man who, though weary and footsore,
lagged not in his swift, resolute advance. Night
was coming on, and with it the no uncertain prospects
of storm. Through the foliage that overhung
the wretched road, his ever-lifting and apprehensive
eye caught sight of the thunder-black, low-lying
clouds that swept over the mountain and bore
down upon the green, whistling tops of the trees.

At a cross-road below he had encountered a small
girl driving homeward the cows. She was afraid
of the big, strange man with the bundle on his back
and the stout walking stick in his hand: to her a
remarkable creature who wore "knee pants" and
stockings like a boy on Sunday, and hob-nail shoes,
and a funny coat with "pleats" and a belt, and a
green hat with a feather sticking up from the band.

Section Headers

Some texts have sections within chapters. Format these headers as they appear in the text. Leave 2 blanks lines before the header and one after, unless the Project Manager has requested otherwise. If you are not sure if a header indicates a chapter or a section, post a question in the Project Thread, noting the page number. Section Headers are often printed in a larger font which may appear to be bold, but we do not mark them as bold text; however you should include italics or small-caps markup if it appears in the header.

Other Major Divisions in Texts

Major Divisions in the text such as Preface, Foreword, Introduction, Prologue, Epilogue, Appendix, References, Conclusion, Glossary, Summary, Acknowledgements, Bibliography, etc., should be formatted in the same way as Chapter Headers, i.e. 4 blank lines before the heading and 2 blank lines before the start of the text.

Paragraph Side-Descriptions (Sidenotes)

Some books will have short descriptions of the paragraph along the side of the text. These are called sidenotes. Move sidenotes to just above the paragraph that they belong to. A sidenote should be surrounded by a sidenote tag [Sidenote:  and ], with the text of the sidenote placed in between. Format the sidenote text as it is printed, preserving the line breaks, italics, etc. Leave a blank line after the sidenote, so that it does not get merged into the paragraph when the text is rewrapped during post-processing.

If there are multiple sidenotes for a single paragraph, put them one after another at the start of the paragraph. Leave a blank line separating each of them.

If the paragraph began on a previous page, put the Sidenote at the top of the page and mark it with * so that the post-processor can see that it belongs on the previous page. Like this: *[Sidenote: (text of sidenote)]. The post-processor will move them to the appropriate place.

Sometimes a Project Manager will request that you put Sidenotes next to the sentence they apply to, rather than at the top or bottom of the paragraph. In this case, don't separate them out with blank lines.

Sample Image:

Correctly Formatted Text:

*[Sidenote: Burning
discs
thrown into
the air.]

that such as looked at the fire holding a bit of larkspur
before their face would be troubled by no malady of the
eyes throughout the year.[1] Further, it was customary at
Würzburg, in the sixteenth century, for the bishop's followers
to throw burning discs of wood into the air from a mountain
which overhangs the town. The discs were discharged by
means of flexible rods, and in their flight through the darkness
presented the appearance of fiery dragons.[2]

[Sidenote: The Midsummer
fires in
Swabia.]

[Sidenote: Omens
drawn from
the leaps
over the
fires.]

[Sidenote: Burning
wheels
rolled
down hill.]

In the valley of the Lech, which divides Upper Bavaria
from Swabia, the midsummer customs and beliefs are, or
used to be, very similar. Bonfires are kindled on the
mountains on Midsummer Day; and besides the bonfire
a tall beam, thickly wrapt in straw and surmounted by a
cross-piece, is burned in many places. Round this cross as
it burns the lads dance with loud shouts; and when the
flames have subsided, the young people leap over the fire in
pairs, a young man and a young woman together. If they
escape unsmirched, the man will not suffer from fever, and
the girl will not become a mother within the year. Further,
it is believed that the flax will grow that year as high as
they leap over the fire; and that if a charred billet be taken
from the fire and stuck in a flax-field it will promote the
growth of the flax.[3] Similarly in Swabia, lads and lasses,
hand in hand, leap over the midsummer bonfire, praying
that the hemp may grow three ells high, and they set fire
to wheels of straw and send them rolling down the hill.
Among the places where burning wheels were thus bowled
down hill at Midsummer were the Hohenstaufen mountains
in Wurtemberg and the Frauenberg near Gerhausen.[4]
At Deffingen, in Swabia, as the people sprang over the mid-*

[Footnote 1: <i>Op. cit.</i> iv. i. p. 242. We have
seen (p. 163) that in the sixteenth
century these customs and beliefs were
common in Germany. It is also a
German superstition that a house which
contains a brand from the midsummer
bonfire will not be struck by lightning
(J. W. Wolf, <i>Beiträge zur deutschen
Mythologie</i>, i. p. 217, § 185).]

[Footnote 2: J. Boemus, <i>Mores, leges et ritus
omnium gentium</i> (Lyons, 1541), p.
226.]

[Footnote 3: Karl Freiherr von Leoprechting,
<i>Aus dem Lechrain</i> (Munich, 1855),
pp. 181 <i>sqq.</i>; W. Mannhardt, <i>Der
Baumkultus<i>, p. 510.]

[Footnote 4: A. Birlinger, <i>Volksthümliches aus
Schwaben</i> (Freiburg im Breisgau, 1861-1862),
ii. pp. 96 <i>sqq.</i>, § 128, pp. 103
<i>sq.</i>, § 129; <i>id., Aus Schwaben</i> (Wiesbaden,
1874), ii. 116-120; E. Meier,
<i>Deutsche Sagen, Sitten und Gebräuche
aus Schwaben</i> (Stuttgart, 1852), pp.
423 <i>sqq.</i>; W. Mannhardt, <i>Der Baumkultus</i>,
p. 510.]

Paragraph Spacing/Indenting

Put a blank line before the start of paragraphs, even if a paragraph starts at the top of a page. You should not indent the start of paragraphs, but if all paragraphs are already indented, don't bother removing those spaces—that can be done automatically during post-processing.

See the Chapter Headers image/text for an example.

Multiple Columns

Format ordinary text which has been printed in two columns as a single column.

Spans of multiple-column text within single column sections should be formatted as a single column by placing the text from the left-most column first, the text from the next one after it, and so on. You do not need to mark where the columns were split, just join them together.

If the columns are lists of items, mark the start of the list with /* and the end with */ so that the lines do not get re-wrapped during post-processing. Leave a blank line between these markers and the rest of the text.

See also the Indexes, Lists of Items and Tables sections of these Guidelines.

Illustrations

Text for an illustration should be surrounded by an illustration tag [Illustration:  and ], with the caption text placed in between. Format the caption text as it is printed, preserving the line breaks, italics, etc.

If an illustration has no caption, add a tag [Illustration].

If the illustration is in the middle of or at the side of a paragraph, move the illustration tag to before or after the paragraph and leave a blank line to separate them. Rejoin the paragraph by removing any blank lines left by doing so.

If there is no paragraph break on the page, mark the illustration tag with an * like so *[Illustration: (text of caption)], move it to the top of the page, and leave 1 (one) blank line after it.

Sample Image:

Correctly Formatted Text:

[Illustration: Martha told him that he had always been her ideal and
that she worshipped him.

<i>Frontispiece</i>

<i>Her Weight in Gold</i>]


Sample Image: (Illustration in middle of paragraph)

Correctly Formatted Text:

such study are due to Italians. Several of these instruments
have already been described in this journal, and on the present
occasion we shall make known a few others that will
serve to give an idea of the methods employed.

[Illustration: <sc>Fig.</sc> 1.--APPARATUS FOR THE STUDY OF HORIZONTAL
SEISMIC MOVEMENTS.]

For the observation of the vertical and horizontal motions
of the ground, different apparatus are required. The

Footnotes/Endnotes

Footnotes are placed out-of-line; that is, the text of the footnote is left at the bottom of the page and a tag placed where it is referenced in the text.

During formatting, this means:

1. The number, letter, or other character that marks a footnote location should be surrounded with brackets ([ and ]). Remove any spaces before the [—keep it right next to the word being footnoted[1] or its punctuation mark,[2] as shown in the text, and the two examples in this sentence.

When footnotes are marked with a series of special characters (*, †, ‡, §, etc.) we replace these with Capital letters in order (A, B, C, etc.).

2. A footnote should be surrounded by a footnote tag [Footnote #:  and ], with the footnote text placed in between, and the footnote number or letter placed where the # is shown in the tag. Format the footnote text as it is printed, preserving the line breaks, italics, etc. Leave the footnote text at the bottom of the page. Be sure to use the same tag in the footnote as you used in the text where the footnote was referenced. Place each footnote on a separate line in order of appearance. Place a blank line between each footnote if there is more than one.

In some books, the Project Manager may ask that you move the footnotes in-line; read the Project Comments for instructions in this case.

See the Page Headers/Page Footers image/text for an example footnote.

If there's a footnote at the bottom of the page with no footnote marker in the text, especially if it starts mid-sentence or mid-word, it's probably a continuation of a footnote from a previous page. Leave it at the bottom of the page near the other footnotes, and surround it with *[Footnote: (text of footnote)] (without any footnote number or marker). The * indicates that the footnote was continued, and brings it to the attention of the post-processor.

If a footnote continues on the next page (the page ends before the footnote does), leave the footnote at the bottom of the page, and just put an asterisk * where the footnote ends, like this: [Footnote 1: (text of footnote)]*. (The * indicates that the footnote ended prematurely, and brings it to the attention of the post-processor, who will eventually join it up with the rest of the footnote text.

If a continued footnote ends or starts on a hyphenated word, mark both the footnote and the word with *, thus:
[Footnote 1: This footnote is continued and the last word in it is also con-*]*
for the leading fragment, and
*[Footnote: *tinued onto the next page.].

If a footnote or endnote is referenced in the text but does not appear on that page, keep the footnote/endnote number or marker and surround it with square brackets [ and ]. This is common in scientific and technical books, where footnotes are often grouped at the end of chapters. See "Endnotes" below.

Original Text:
The principal persons involved in this argument were Caesar1, former military
leader and Imperator, and the orator Cicero2. Both were of the aristocratic
(Patrician) class, and were quite wealthy.

1 Gaius Julius Caesar.
2 Marcus Tullius Cicero.
Format with Out-of-Line Footnotes:
The principal persons involved in this argument were Caesar[1], former military
leader and Imperator, and the orator Cicero[2]. Both were of the aristocratic
(Patrician) class, and were quite wealthy.

[Footnote 1: Gaius Julius Caesar.]

[Footnote 2: Marcus Tullius Cicero.]

In some books, footnotes are separated from the main text by a horizontal line. We don't keep this so please just leave a blank line between the main text and the footnotes. (See example above.)

Endnotes are just footnotes that have been located together at the end of a chapter or at the end of the book, instead of on the bottom of each page. These are formatted in the same manner as out-of-line footnotes. Where you find an endnote reference in the text, just surround it with [ and ]. If you are formatting one of the ending pages with the endnotes text on it, surround the text of each note with [Footnote #: (text of endnote)], with the endnote text placed in between, and the endnote number or letter placed where the # is. Put a blank line after each endnote so that they remain separate paragraphs when the text is rewrapped during post-processing.

Footnotes in Poetry or Tables should be treated the same as other footnotes. Volunteers should tag them and leave them at the bottom of the page; the post-processor will decide on the final placement.

Original Footnoted Poetry:
Mary had a little lamb1
   Whose fleece was white as snow
And everywhere that Mary went
   The lamb was sure to go!

1 This lamb was obviously of the Hampshire breed,
well known for the pure whiteness of their wool.
Correctly Formatted Text:
/*
Mary had a little lamb[1]
  Whose fleece was white as snow
And everywhere that Mary went
  The lamb was sure to go!
*/

[Footnote 1: This lamb was obviously of the Hampshire breed,
well known for the pure whiteness of their wool.]

Italics

Format italicized text with <i> inserted at the start and </i> inserted at the end of the italics. (Note the "/" in the closing tag.)

Punctuation goes outside the italics, unless it is an entire sentence or section that is italicized, or the punctuation is itself part of a phrase, title or abbreviation that is italicized.

The periods that mark an abbreviated word in the title of a journal such as Phil. Trans. are part of the title for italicization purposes, and are included within the italic tags, thus: <i>Phil. Trans.</i>.

For dates and similar phrases, format the entire phrase as italics, rather than marking the words as italics and the numbers as non-italics. The reason is that many typefaces found in older texts used the same design for numbers in both regular and italics.

If the italicized text consists of a series/list of words or names, mark these up with italics tags individually.

Examples—Italics:

Original Text: Correctly Formatted Text:
Enacted 4 July, 1776 <i>Enacted 4 July, 1776</i>
God knows what she saw in me! I spoke
in such an affected manner.
<i>God knows what she saw in me!</i> I spoke
in such an affected manner.
As in many other of these Studies, and As in many other of these <i>Studies</i>, and
(Psychological Review, 1898, p. 160) (<i>Psychological Review</i>, 1898, p. 160)
L. Robinson, art. "Ticklishness," L. Robinson, art. "<i>Ticklishness</i>,"
December 3, morning.
1323 Picadilly Circus
/*
<i>December 3, morning.</i>
1323 Picadilly Circus
*/
Volunteers may be tickled pink to read
Ticklishness, Tickling and Laughter,
Remarks on Tickling and Laughter
and Ticklishness, Laughter and Humour.
Volunteers may be tickled pink to read
<i>Ticklishness</i>, <i>Tickling and Laughter</i>,
<i>Remarks on Tickling and Laughter</i>
and <i>Ticklishness, Laughter and Humour</i>.

Bold Text

Format bold text (text printed in a heavier typeface) with <b> inserted before the bold text and </b> after it. (Note the "/" in the closing tag.)

Punctuation goes outside the bold tags, unless it is an entire sentence or section that is in bold, or the punctuation is itself part of a phrase, title or abbreviation that is in bold type.

See the Page Headers/Page Footers image/text for an example.

Some Project Managers may specify in the Project Comments that bold text be rendered as all caps.

Superscripts

Older books often abbreviated words as contractions, and printed them as superscripts, for example:
    Genrl Washington defeated Ld Cornwall's army.
Format these by inserting a single caret to identify this as a superscripted abbreviation/contraction, like this:
    Gen^rl Washington defeated L^d Cornwall's army.

In scientific & technical works, format superscripted characters with curly braces { and }, surrounding them, even if there is only one character superscripted.
For example:
    ... up to xn-1 elements in the array.
would be formatted as
    ... up to x^{n-1} elements in the array.

The Project Manager may specify in the Project Comments that superscripted text be marked up differently.

Subscripts

Subscripted text is often found in scientific works, but is not common in other material. Format subscripted text by inserting an underline character _ and surrounding the text with curly braces { and }.
For example:
    H2O.
would be formatted as
    H_{2}O.

Underlined Text

Format underlined text as Italics, with <i> and </i>. (Note the "/" in the closing tag.)

Underlining was often used to indicate emphasis when the typesetter was unable to actually italicize the text, for example in a typewritten document.

Some Project Managers may specify in the Project Comments that underlined text be marked up with the <u> and </u> tags.

S p a c e d   O u t   Text (gesperrt)

Format   s p a c e d   o u t   text as Italics, with <i> and </i>, and remove the extra spaces between letters in each word. (Note the "/" in the closing tag.)

This was a typesetting technique used to emphasize a piece of text in older German (and some Italian) books. Italics serve that purpose for modern readers, and extra spacing may not be clear on all the different screen sizes & fonts where people may read the final e-book.

Font size changes

Normally we do not do anything to mark changes in font size.

The exception to this is when the font size changes to indicate a block quotation.

Words in all Capitals

Format words that are printed in all capital letters as all capital letters.

The exception to this is the first word of a chapter: many old books typeset the first word of these in all caps; this should be changed to upper and lower case, so "ONCE upon a time," becomes "Once upon a time,"

Words in Small Capitals

Format words that are printed in Mixed Small Caps as mixed upper and lowercase, and surround the text with <sc> and </sc> markup.
    Example: This is Small Caps
    would correctly be: <sc>This is Small Caps</sc>.

Format words that are printed in all small caps as ALL-CAPS, and surround the text with <sc> and </sc> markup.
    Example: You cannot be serious about aardvarks!
    would correctly be: You cannot be serious about <sc>AARDVARKS</sc>!

Words in headings (Chapter Headings, Section Headings, Captions, etc.) that are entirely all-capped should be formatted as all-caps without any <sc> </sc>. The first word of a chapter that is in Small Caps should be changed to mixed case without the tags.

Large, Ornate opening Capital letter (Drop Cap)

Format large and ornate graphic first letters of a chapter, section, or paragraph as just the letter.

Dashes, Hyphens, and Minus Signs

There are generally four such marks you will see in books:

  1. Hyphens. These are used to join words together, or sometimes to join prefixes or suffixes to a word.
    Leave these as a single hyphen, with no spaces on either side.
    Note that there is a common exception to this shown in the second example below.
  2. En-dashes. These are just a little longer, and are used for a range of numbers, or for a mathematical minus sign.
    Format these as a single hyphen, too. Spaces before or after are determined by the way it was done in the book; usually no spaces in number ranges, usually spaces around mathematical minus signs, sometimes both sides, sometimes just before.
  3. Em-dashes & long dashes. These serve as separators between words—sometimes for emphasis like this—or when a speaker gets a word caught in his throat——!
    Format these as two hyphens if the em-dash is short and four hyphens if the em-dash is long. Don't leave a space before or after, even if it looks like there was a space in the original book image.
  4. Deliberately Omitted or Censored Words or Names.
    Format these as 4 hyphens. When it represents a word, we leave appropriate space around it like it's really a word. If it's only part of a word, then no spaces—join it with the rest of the word.

Note: If an em-dash appears at the start or end of a line of your OCR'd text, join it with the other line so that there are no spaces or line breaks around it. Only if the author used an em-dash to start or end the paragraph or line of poetry or dialog should you leave it at the start or end of a line. See the examples below.

Examples—Dashes, Hyphens, and Minus Signs:

Original Image: Correctly Formatted Text: Type
semi-detached semi-detached Hyphen
three- and four-part harmony three- and four-part harmony Hyphen
discoveries which the Crus-
aders made and brought home with
discoveries which the Crusaders
made and brought home with
Hyphen
factors which mold char-
acter—environment, training and heritage,
factors which mold character--environment,
training and heritage,
Hyphen
See pages 21–25 See pages 21-25 En-dash
–14° below zero -14° below zero En-dash
X – Y = Z X - Y = Z En-dash
2–1/2 2-1/2 En-dash
I am hurt;—A plague
on both your houses!—I am dead.
I am hurt;--A plague
on both your houses!--I am dead.
Em-dash
sensations—sweet, bitter, salt, and sour
—if even all of these are simple tastes. What
sensations--sweet, bitter, salt, and sour--if
even all of these are simple tastes. What
Em-dash
senses—touch, smell, hearing, and sight—
with which we are here concerned,
senses--touch, smell, hearing, and sight--with
which we are here concerned,
Em-dash
It is the east, and Juliet is the sun!— It is the east, and Juliet is the sun!-- Em-dash
"Three hundred——" "years," she was going to say, but the left-hand cat interrupted her. "Three hundred----" "years," she was going to say, but the left-hand cat interrupted her. Longer Em-dash
As the witness Mr. —— testified, As the witness Mr. ---- testified, long dash
As the witness Mr. S—— testified, As the witness Mr. S---- testified, long dash
the famous detective of ——B Baker St. the famous detective of ----B Baker St. long dash
“You —— Yankee”, she yelled. "You ---- Yankee", she yelled. long dash

End-of-line Hyphenation

Where a hyphen appears at the end of a line, join the two halves of the hyphenated word back together. If it is really a hyphenated word like well-meaning, join the two halves leaving the hyphen in-between. But if it was just hyphenated because it wouldn't fit on the line, and is not a word that is usually hyphenated, then join the two halves and remove the hyphen. Keep the joined word on the top line, and put a line break after it to preserve the line formatting—this makes it easier for the volunteers who come after you. See the Dashes, Hyphens, and Minus Signs section of these Guidelines for examples of each kind (nar-row turns into narrow, but low-lying keeps the hyphen). If the word is followed by punctuation, then carry that punctuation onto the top line, too.

Words like to-day and to-morrow that we don't commonly hyphenate now were often hyphenated in the old books we are working on. Leave them hyphenated the way the author did. If you're not sure if the author hyphenated it or not, leave the hyphen, put an * after it, and join the word together. Like this: to-*day. The asterisk will bring it to the attention of the post processor, who has access to all the pages, and can determine how the author typically wrote this word.

End-of-page Hyphenation

Format end-of-page hyphens by leaving the hyphen at the end of the last line, and mark it with a * after the hyphen.
For example, format:
 
    something Pat had already become accus-
as:
    something Pat had already become accus-*

On pages that start with part of a word from the previous page or an em-dash, place a * before the partial word or em-dash.
To continue the above example, format:
 
    tomed to from having to do his own family
as:
    *tomed to from having to do his own family

These markings indicate to the post-processor that the word must be rejoined when the pages are combined to produce the final e-book.

Single word at bottom of page

Format these by deleting the word, even if it's the second half of a hyphenated word.

In some older books, the single word at the bottom of the page (called a "catchword", usually printed near the right margin) indicates the first word on the next page of the book (called an "incipit"). It was used to alert the printer to print the correct reverse (called "verso"); to make it easier for printers' helpers to make up the pages prior to binding; also to help the reader avoid turning over more than one page.

Contractions

Remove any extra space in contractions, for example: would n't should be formatted as wouldn't.

This was often an early printers convention, where the space was retained to indicate that 'would' and 'not' were originally separate words. It is also sometimes an artifact of the OCR. Remove the extra space in either case.

Some Project Managers may specify in the Project Comments not to remove extra spaces in contractions, particularly in the case of texts which contain slang, dialect, or are written in languages other than English.

Poetry/Epigrams

This section applies to an occasional Poem or Epigram in a mainly non-poetry book. For an entire book of poetry, see the special guidelines for Poetry Books.

Mark poetry or epigrams so the post-processor can find it more quickly. Insert a separate line with /* at the start of the poetry or epigram and a separate line with */ at the end. Leave a blank line between these markers and the rest of the text.

Preserve the relative indentation of the individual lines of the poem or epigram by adding 2, 4, 6 (or more) spaces in front of the indented lines to make them resemble the original.

When a line of verse is too long for the printed page, many texts wrap the continuation onto the next printed line and place a wide indentation in front of it. These continuation lines should be rejoined with the line above. Continuation lines usually start with a lower case letter. They will appear randomly unlike normal indentation, which occurs at regular intervals in the metre of the poem.

If the poetry is centered on the printed page, don't try to center the lines of poetry during formatting. Move the lines to the left margin, and preserve the relative indentation of the lines.

Footnotes in poetry should be treated the same as usual footnotes during formatting. See footnotes for details.

Line Numbers in poetry should be kept. Put them at the end of the line, leaving at least 6 spaces between them and the end of the text. See Line Numbers for details.

Check the Project Comments for the specific text you are formatting. Books of poetry often have special instructions from the Project Manager. Many times, you won't have to follow all these formatting guidelines for a book that is mostly or entirely poetry.


Sample Image:

Correctly Formatted Text:
to the scenery of his own country:

/*
          Oh, to be in England
          Now that April's there,
      And whoever wakes in England
      Sees, some morning, unaware,
That the lowest boughs and the brushwood sheaf
Round the elm-tree hole are in tiny leaf,
While the chaffinch sings on the orchard bough
              In England--now!

And after April, when May follows,
And the whitethroat builds, and all the swallows!
Hark! where my blossomed pear-tree in the hedge
Leans to the field and scatters on the clover
Blossoms and dewdrops--at the bent spray's edge--
That's the wise thrush; he sings each song twice over,
Lest you should think he never could recapture
The first fine careless rapture!
And though the fields look rough with hoary dew,
All will be gay, when noontide wakes anew
The buttercups, the little children's dower;
--Far brighter than this gaudy melon-flower!
*/

So it runs; but it is only a momentary memory;
and he knew, when he had done it, and to his

Letters/Correspondence

Format letters and correspondence as you would paragraphs. Put a blank line before the start of the letter, you do not need to duplicate any indenting.

Surround consecutive heading or footer lines (such as addresses, date blocks, salutations or signatures) with /* and */ markers. Leave a blank line between the markers and the rest of the text. The markers will ensure the individual lines are kept in post-processing and not rewrapped.

Don't indent the heading or footer lines, even if they are indented or right justified in the original—just put them at the left margin. The post-processor will format them as needed.

Sample Image:

Correctly Formatted Text:

<i>John James Audubon to Claude François Rozier</i>

[Letter No. 1, addressed]

/*
<sc>M. Fr. Rozier</sc>,
Merchant-Nantes.
<sc>New York</sc>, <i>10 January, 1807.</i>

<sc>Dear Sir:</sc>
*/

We have had the pleasure of receiving by the <i>Penelope</i> your
consignment of 20 pieces of linen cloth, for which we send our
thanks. As soon as we have sold them, we shall take great
pleasure in making our return.

Lists of Items

Surround lists with /* and */ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the individual lines are not rewrapped during post-processing. Use this markup for any such list that should not be reformatted, including lists of questions & answers, items in a recipe, etc.

Original Text:
Andersen, Hans Christian   Daguerre, Louis J. M.    Melville, Herman
Bach, Johann Sebastian     Darwin, Charles          Newton, Isaac
Balboa, Vasco Nunez de     Descartes, René          Pasteur, Louis
Bierce, Ambrose            Earhart, Amelia          Poe, Edgar Allan
Carroll, Lewis             Einstein, Albert         Ponce de Leon, Juan
Churchill, Winston         Freud, Sigmund           Pulitzer, Joseph
Columbus, Christopher      Lewis, Sinclair          Shakespeare, William
Curie, Marie               Magellan, Ferdinand      Tesla, Nikola
Correctly Formatted Text:
/*
Andersen, Hans Christian
Bach, Johann Sebastian
Balboa, Vasco Nunez de
Bierce, Ambrose
Carroll, Lewis
Churchill, Winston
Columbus, Christopher
Curie, Marie
Daguerre, Louis J. M.
Darwin, Charles
Descartes, René
Earhart, Amelia
Einstein, Albert
Freud, Sigmund
Lewis, Sinclair
Magellan, Ferdinand
Melville, Herman
Newton, Isaac
Pasteur, Louis
Poe, Edgar Allan
Ponce de Leon, Juan
Pulitzer, Joseph
Shakespeare, William
Tesla, Nikola
*/

Tables

Surround tables with /* and */ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the individual lines are not rewrapped during post-processing. Format the table with spaces to look approximately like the original table. Don't make the table wider than 75 characters. Project Gutenberg's guidelines go on to say "...except where it can't be helped. Never, ever longer than 80...".

Do not use tabs for formatting—use space characters only. Tab characters will line up differently between computers, and your careful formatting will not always display the same way.

It's often hard to format tables in plain ASCII text; just do your best. This is much easier if you use a mono-spaced font such as DPCustomMono or Courier. Remember that the goal is to preserve the Author's meaning, while producing a readable table in an e-book. Sometimes this requires sacrificing the original format of the table on the printed page. Check the Project Comments and discussion thread because other volunteers may have settled on a specific format. If there is nothing there, you might find something useful in the Gallery of Table Layouts forum thread.

Footnotes in tables should go at the end of the table. See footnotes for details.

Sample Image:

Correctly Formatted Text:
/*
Deg. C.   Millimeters of Mercury.    Gasolene.
               Pure Benzene.

 -10°               13.4                 43.5
   0°               26.6                 81.0
 +10°               46.6                132.0
  20°               76.3                203.0
  40°              182.0                301.8
*/

Sample Image:

Correctly Formatted Text:
/*
TABLE II.

-----------------------+----+-----++-------------------------+----+------
                       | C  |     ||                         |  C |
Flat strips compared   | o  |     ||                         |  o |
with round wire 30 cm. | p  |Iron.|| Parallel wires 30 cm.   |  p | Iron.
in length.             | p  |     || in length.              |  p |
                       | e  |     ||                         |  e |
                       | r  |     ||                         |  r |
                       | .  |     ||                         |  . |
-----------------------+----+-----++-------------------------+----+------
Wire 1 mm. diameter    | 20 | 100 || Wire 1 mm. diameter     | 20 |  100
-----------------------+----+-----++-------------------------+----+------
        STRIPS.        |    |     ||       SINGLE WIRE.      |    |
0.25 mm. thick, 2 mm.  |    |     ||                         |    |
  wide                 | 15 |  35 || 0.25 mm. diameter       | 16 |   48
Same, 5 mm. wide       | 13 |  20 || Two  similar wires      | 12 |   30
 "   10  "    "        | 11 |  15 || Four    "      "        |  9 |   18
 "   20  "    "        | 10 |  14 || Eight   "      "        |  8 |   10
 "   40  "    "        |  9 |  13 || Sixteen "      "        |  7 |    6
Same strip rolled up in|    |     || Same 16 wires bound     |    |
  the form of a wire   | 17 |  15 ||   close together        | 18 |   12
-----------------------+----+-----++-------------------------+----+------
*/

Block Quotations

Surround block quotations with /# and #/ markers. Leave a blank line between these markers and the rest of the text. The markers will ensure the block quotation is formatted properly during post-processing.

Apart from adding the markers, block quotations should be formatted as any other text.

Block quotations are long quotations (typically several lines and sometimes several pages) and are often (but not always) printed with wider margins or in a smaller font size—sometimes both.

Sample Image:

Correctly Formatted Text:

later day was welcomed in their home on the Hudson.
Dr. Bakewell's contribution was as follows:[24]

/#
The uncertainty as to the place of Audubon's birth has been
put to rest by the testimony of an eye witness in the person
of old Mandeville Marigny now dead some years. His repeated
statement to me was, that on his plantation at Mandeville,
Louisiana, on Lake Ponchartrain, Audubon's mother was
his guest; and while there gave birth to John James Audubon.
Marigny was present at the time, and from his own lips, I have,
as already said, repeatedly heard him assert the above fact.
He was ever proud to bear this testimony of his protection
given to Audubon's mother, and his ability to bear witness as
to the place of Audubon's birth, thus establishing the fact that
he was a Louisianian by birth.
#/

We do not doubt the candor and sincerity of the
excellent Dr. Bakewell, but are bound to say that the
incidents as related above betray a striking lapse of

Double Quotes

For quotes in English, format these as plain ASCII " double quotes.

Do not change double quotes to single quotes. Leave them as the Author wrote them.

For quotes from other languages, use the quotation marks appropriate to that language if they are available.

The French equivalent, guillemets, «like this», are available from the pulldown menus in the proofreading interface. Remember to remove space between the guillemets and the quoted text; if needed, it will be added in post-processing. The same applies to languages which use reversed guillemets, »like this«.

The quotation marks used in some texts (in German or other languages), „like this” are also available in the pulldown menus; for the sake of simplicity, you should always use and regardless of the actual quotes used in the original text, as long as the quotes used in the original text are clearly lower and upper. If needed, the quotes will be changed to ones used in the text in post-processing.

The Project Manager may instruct you in the Project Comments to format non-English language quotation marks differently for a particular book.

Single Quotes

Format these as the plain ASCII ' single quote (apostrophe).

Do not change single quotes to double quotes. Leave them as the Author wrote them.

Quote Marks on each line

In general, format quotation marks at the beginning of each line of a quotation by removing all of them except for the one at the start of the first line of the quotation.

If the quotation goes on for multiple paragraphs, each paragraph should have an opening quote mark on the first line of the paragraph.

Often there is no closing quotation mark until the very end of the quoted section of text, which may not be on the same page you are formatting. Leave it that way—do not add closing quotation marks that are not in the page image.

There are some language specific exceptions. In French, for example, dialog within quotations uses a combination of different punctuation to indicate various speakers. If you are not familiar with a particular language, check the Project Comments or leave a message for the Project Manager in the Forum Discussion for clarification.

Periods Between Sentences

Format periods between sentences with a single space after them.

You do not need to remove extra spaces after periods if they're already in the scanned text—we can do that automatically during post-processing. See the Chapter Headers image and text for an example.

Punctuation

In general, there should be no space before punctuation characters except opening quotation marks. If scanned text has a space before punctuation, remove it. This applies even to languages, such as French, which normally use spaces before punctuation characters.

Spaces before punctuation sometimes appear because books typeset in the 1700's & 1800's often used partial spaces before punctuation such as a semicolon or comma.

Scanned Text:
and so it goes ; ever and ever.
Correctly Formatted Text:
and so it goes; ever and ever.

Line Breaks

Leave all line breaks in so that the next formatter and the post-processor can compare the lines in the text to the lines in the image easily. Be especially careful about this when rejoining hyphenated words or moving words around em-dashes. If the previous volunteer removed the line breaks, please replace them so that they once again match the image.

Extra blank lines that are not in the image should be removed except where we intentionally add them for formatting. But blank lines at the bottom of the page are fine—these are removed during post-processing.

Extra Spaces or Tabs Between Words

Extra spaces and tab characters between words are common in OCR output. You don't need to bother removing these—that can be done automatically during post-processing.

However, extra spaces around punctuation, em-dashes, quote marks, etc. do need to be removed when they separate the symbol from the word.

For example, in A horse ;   my kingdom for a horse. the space between the word "horse" and the semicolon should be removed. But the 2 spaces after the semicolon are fine—you don't have to delete one of them.

Trailing Space at End-of-line

Do not bother inserting spaces at the ends of lines of text. It is a waste of your time for something that we can take care of automatically later. Similarly do not waste your time removing extra spaces at the ends of lines.

Line Numbers

Keep line numbers. Place them at least six spaces past the right hand end of the line, even if they are on the left side of the poetry/text in the original image.

Line numbers are numbers in the margin for each line, or sometimes every fifth or tenth line, and are common in books of poetry. Since poetry will not be reformatted in the e-book version, the line numbers will be useful to readers.

Extra Spacing/Stars/Line Between Paragraphs

Most paragraphs start on the line immediately after the end of the previous one. Sometimes two paragraphs are separated to indicate a "thought break." A "thought break" may take the form of a line of stars, hyphens or some other character, a plain or floridly decorated horizontal line, a simple decoration, or even just an extra blank line or two.

A "thought break" may represent a change of scene or subject, a lapse in time or a bit of suspense. This is intended by the author, so we preserve them by putting a blank line, <tb>, and then another blank line.

Project Managers and/or Post-Processors may make the request for additional information to be retained in the thought break mark-up. For example, some projects delineate different types of breaks by the use of different styles of break such as a line of stars in one place and a blank line in another. In these cases, the Project Comments may request that these be marked up: <tb stars> and <tb>. Please, as always, read the project comments carefully so that you will know what is required for each project. Also be careful not to carry these special requests into other projects with different requirements.

Sometimes printers used decorative lines to mark the ends of chapters. As we already mark Chapter Headers, there is no need to add a "thought break" marker.

The proofreading interface has the "thought break" marker available to cut and paste.


Sample Image:
thought break
Correctly Formatted Text:

like the gentleman with the spiritual hydrophobia
in the latter end of Uncle Tom's Cabin.
Unconsciously Mr. Dixon has done his best to
prove that Legree was not a fictitious character.

<tb>

Joel Chandler Harris, Harry Stillwell Edwards,
George W. Cable, Thomas Nelson Page,
James Lane Allen, and Mark Twain are Southern
men in Mr. Griffith's class. I recommend

Period Pause "..." (Ellipsis)

The guidelines are different for English and Languages Other Than English (LOTE).

ENGLISH: Leave a space before the three dots, and a space after. The exception is at the end of a sentence, when there would be no space, four dots, and a space after. This is also the case for any other ending punctuation mark: the 3 dots follow immediately, without any space.

For example:

     That I know ... is true.
     This is the end....
     Wherefore art thou Romeo?...

Sometimes you will see it with the punctuation at the end; so format it that way:

     Wherefore art thou Romeo...?

Remove extra dots, if any, or add new ones, if necessary, to bring the number to three (or four) as appropriate.

LOTE: (Languages Other Than English)

LOTE: (Languages Other Than English) Use the general rule "Follow closely the style used in the printed page." In particular, insert spaces, if there are spaces before or between the periods, and use the same number of periods as appear in the image. Sometimes the printed page is unclear: in that case, insert a [**unclear] to draw the attention of the post-processor. (Note: Post Processors should replace those regular spaces with non-breaking spaces.)

Accented/Non-ASCII Characters

Please proofread these using the proper UTF-8 characters. For characters which are not in Unicode, see the Project Manager instructions in the Project Comments.

If they are not on your keyboard, there are several ways of inputting these characters:

  • The pull-down menus in the proofreading interface.
  • Applets included with your operating system.
    • Windows: "Character Map"
      Access it through:
      Start: Run: charmap, or
      Start: Accessories: System Tools: Character Map.
    • Macintosh: Key Caps or "Keyboard Viewer"
      For OS 9 and lower this is on the Apple Menu,
      For OS X through 10.2, this is located the in Applications, Utilities folder
      For OS X 10.3 and higher, this is in the Input Menu as "Keyboard Viewer."
    • Linux: Various, depending on your desktop environment.
      For KDE, try KCharSelect (in the Utilities submenu of the start menu).
  • An on-line program, such as Edicode.
  • Keyboard shortcuts.
    (See tables for Windows and Macintosh below.)
  • Switching to a keyboard layout or locale which supports "deadkey" accents.
    • Windows: Control Panel (Keyboard, Input Locales)
    • Macintosh: Input Menu (on Menu Bar)
    • Linux: Change the keyboard in your X configuration.

The original Project Gutenberg will post as a minimum, 7-bit ASCII versions of texts, but versions using other character encodings which can preserve more of the information from the original text are accepted. Project Gutenberg Europe publishes UTF-8 as its default encoding, but other appropriate encodings are also welcomed.

Currently for Distributed Proofreaders this means using Latin-1 or ISO 8859-1 and -15, and in the future will include Unicode.

Distributed Proofreaders Europe already uses Unicode.

For Windows:

  • You can use the Character Map program (Start: Run: charmap) to select an individual letter, and then cut & paste.
  • The dropdown menus in the proofreading interface.
  • Or you can type the Alt+NumberPad shortcut codes for these characters.
    This is faster than using cut & paste, once you get used to the codes.
    Press and hold the Alt key, type the four digits on the Number Pad and release the Alt key; note: the number row over the letters won't work.
    You must type all 4 digits, including the leading 0 (zero). Note that the capital version of a letter is 32 less than the lower case.
    Also note that with some system settings these codes may not be used.
    The table below shows the codes we use. (Print-friendly version of this table)
    Do not use other special characters unless the Project Manager tells you to in the Project Comments.

Windows Shortcuts for Latin-1 symbols
` grave ´ acute (aigu) ^ circumflex ~ tilde ¨ umlaut ° ring Æ ligature
à Alt-0224 á Alt-0225 â Alt-0226 ã Alt-0227 ä Alt-0228 å Alt-0229 æ Alt-0230
À Alt-0192 Á Alt-0193 Â Alt-0194 Ã Alt-0195 Ä Alt-0196 Å Alt-0197 Æ Alt-0198
è Alt-0232 é Alt-0233 ê Alt-0234 ë Alt-0235
È Alt-0200 É Alt-0201 Ê Alt-0202 Ë Alt-0203
ì Alt-0236 í Alt-0237 î Alt-0238 ï Alt-0239
Ì Alt-0204 Í Alt-0205 Î Alt-0206 Ï Alt-0207 / slash Œ ligature
ò Alt-0242 ó Alt-0243 ô Alt-0244 õ Alt-0245 ö Alt-0246 ø Alt-0248 œ Use [oe]
Ò Alt-0210 Ó Alt-0211 Ô Alt-0212 Õ Alt-0213 Ö Alt-0214 Ø Alt-0216 Œ Use [OE]
ù Alt-0249 ú Alt-0250 û Alt-0251 ü Alt-0252
Ù Alt-0217 Ú Alt-0218 Û Alt-0219 Ü Alt-0220 currency mathematics
ñ Alt-0241 ÿ Alt-0255 ¢ Alt-0162 ± Alt-0177
Ñ Alt-0209 Ÿ Alt-0159 £ Alt-0163 × Alt-0215
çedilla Icelandic marks accents punctuation ¥ Alt-0165 ÷ Alt-0247
ç Alt-0231 Þ Alt-0222 © Alt-0169 ´ Alt-0180 ¿ Alt-0191 $ Alt-0036 ¬ Alt-0172
Ç Alt-0199 þ Alt-0254 ® Alt-0174 ¨ Alt-0168 ¡ Alt-0161 ¤ Alt-0164 ° Alt-0176
superscripts Ð Alt-0208 Alt-0153 ¯ Alt-0175 « Alt-0171 µ Alt-0181
¹ Alt-0185 ð Alt-0240 Alt-0182 ¸ Alt-0184 » Alt-0187 ordinals ¼ 1Alt-0188
² Alt-0178 sz ligature § Alt-0167 · Alt-0183 º Alt-0186 ½ 1Alt-0189
³ Alt-0179 ß Alt-0223 ¦ Alt-0166 * Alt-0042 ª Alt-0170 ¾ 1Alt-0190

1Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)

For Apple Macintosh:

  • You can use the "Key Caps" program as a reference.
    In OS 9 & earlier, this is located in the Apple Menu; in OS X through 10.2, it is located in Applications, Utilities folder.
    This brings up a picture of the keyboard, and pressing shift, opt, command, or combinations of those keys shows how to produce each character. Use this reference to see how to type that character, or you can cut & paste it from here into the text in the proofreading interface.
  • In OS X 10.3 and higher, the same function is now a palette available from the Input menu (the drop-down menu attached to your locale's flag icon in the menu bar). It's labeled "Show Keyboard Viewer." If this isn't in your Input menu, or if you don't have that menu, you can activate it by opening System Preferences, the "International" panel, and selecting the "Input Menu" pane. Ensure that "Show input menu in menu bar" is checked. In the spreadsheet view, check the box for "Keyboard Viewer" in addition to any input locales you use.
  • If you are using the enhanced proofreading interface, the more tag creates a pop-up window containing these characters, which you can then cut & paste.
  • Or you can type the Apple Opt- shortcut codes for these characters.
    This is a lot faster than using cut & paste, once you get used to the codes.
    Hold the Opt key and type the accent symbol, then type the letter to be accented (or, for some codes, only hold the Opt key and type the symbol).
    These instructions are for the US-English keyboard layout. It may not work for other keyboard layouts.
    The table below shows the codes we use. (Print-friendly version of this table)
    Do not use other special characters unless the Project Manager tells you to in the Project Comments.

Apple Mac Shortcuts for Latin-1 symbols
` grave ´ acute (aigu) ^ circumflex ~ tilde ¨ umlaut ° ring Æ ligature
à Opt-`, a á Opt-e, a â Opt-i, a ã Opt-n, a ä Opt-u, a å Opt-a æ Opt-'
À Opt-~, A Á Opt-e, A Â Opt-i, A Ã Opt-n, A Ä Opt-u, A Å Opt-A Æ Opt-"
è Opt-~, e é Opt-e, e ê Opt-i, e ë Opt-u, e
È Opt-~, E É Opt-e, E Ê Opt-i, E Ë Opt-u, E
ì Opt-~, i í Opt-e, i î Opt-i, i ï Opt-u, i
Ì Opt-~, I Í Opt-e, I Î Opt-i, I Ï Opt-u, I / slash Œ ligature
ò Opt-~, o ó Opt-e, o ô Opt-i, o õ Opt-n, o ö Opt-u, o ø Opt-o œ Use [oe]
Ò Opt-~, O Ó Opt-e, O Ô Opt-i, O Õ Opt-n, O Ö Opt-u, O Ø Opt-O Œ Use [OE]
ù Opt-~, u ú Opt-e, u û Opt-i, u ü Opt-u, u
Ù Opt-~, U Ú Opt-e, U Û Opt-i, U Ü Opt-u, U currency mathematics
ñ Opt-n, n ÿ Opt-u, y ¢ Opt-4 ± Opt-+
Ñ Opt-n, N Ÿ Opt-u, Y £ Opt-3 × (none) †
çedilla Icelandic marks accents punctuation ¥ Opt-y ÷ Opt-/
ç Opt-c Þ (none) ‡ © Opt-g ´ Opt-E ¿ Opt-? $ Shift-4 ¬ Opt-l
Ç Opt-C þ Shift-Opt-6 ® Opt-r ¨ Opt-U ¡ Opt-1 ¤ Shift-Opt-2 ° Opt-*
superscripts Ð (none) ‡ Opt-2 ¯ Shift-Opt-, « Opt-\ µ Opt-m
¹ (none) ‡ ð (none) ‡ Opt-7 ¸ Opt-Z » Shift-Opt-\ ordinals ¼ (none) ‡1
² (none) ‡ sz ligature § Opt-6 · Opt-8 º Opt-0 ½ (none) ‡1
³ (none) ‡ ß Opt-s ¦ (none) ‡ * (none) ‡ ª Opt-9 ¾ (none) ‡1

‡ Note: No equivalent shortcut, use drop-down menus.

1Unless specifically requested by the Project Comments, please do not use the fraction symbols, but instead use the guidelines for Fractions. (1/2, 1/4, 3/4, etc.)

Characters with Diacritical marks

In some projects, you will find characters with special marks either above or below the normal latin A..Z character. These are called diacritical marks, and indicate a special pronunciation for this character.

If such a character does not exist in Unicode, it should be entered by using combining diacritical marks: these are Unicode symbols which can't appear alone, but appear above (or below) the letter after which they are placed. They could be entered by first entering the base letter, and then the combining mark, using applets and programs mentioned above.

On some systems, diacritical marks may not appear exactly where they should, but, for example, moved to the right. They should still be used, as people with other systems will see them correctly. However, if, for any reason, you can't see or enter combining marks properly, mark such letter with an *. Note that Modifier diacritical marks also exist; these should not be used.

Non-Latin Characters

There are projects which contain text printed in non-Latin characters; that is, characters other than the Latin A...Z characters, for example Greek, Cyrillic (used in Russian, Slavic and other languages), Hebrew, or Arabic characters.

These characters should be entered in the text just as Latin characters are. (WITHOUT transliteration!)

If a document is written entirely in a non-Latin script, it is the best to install a keyboard driver which supports the language. Consult your operating system manual for instructions on how to do that.

If the script appears only occasionaly, you may use a separate program to enter it. See above for some of the programs.

If you are uncertain about a character or an accent, mark it with an * to bring it to the attention of the next formatter or the post-processor.

For scripts which cannot be so easily entered, such as Arabic, surround the text with appropriate markers: [Arabic: **] and leave it as scanned. Include the ** so the post-processor can address it later.

Fractions

Format fractions as follows: becomes 2-1/2. The hyphen prevents the whole and fractional part from becoming separated when the lines are rewrapped during post-processing.

Page References "See Pg. 123"

Format page number references within the text such as (see p. 123) as they appear in the image.

Check the Project Comments to see if the Project Manager has special requirements for page references.

Indexes

Please retain page numbers in index pages. Surround the index with /* and */ tags, leaving a blank line before /* and after */. You don't need to align the numbers as they appear in the scan; just put a comma or semicolon, followed by the page numbers.

Indexes are often printed in 2 columns; this narrower space can cause entries to split onto the next line. Rejoin these back onto a single line.

Indexes are a case where long lines created by following this rule are acceptable, since the lines will be re-wrapped to the proper width and indentation during post-processing.

Place one blank line between each entry in the index.

For sub-topic listings in an index, start each one on a new line, indented 2 spaces.

Treat each new section in an index (A, B, C...) the same as a section header by placing 2 blank lines before it.

Old books sometimes printed the first word of each letter in the index in all caps or small caps; change this to match the style used for the rest of the index entries.

Scanned Text:
Elizabeth I, her royal Majesty the
     Queen, 123, 144-155.
  birth of, 145.
  christening, 146-147.
  death and burial, 152.

Ethelred II, the Unready, 33.
Correctly Formatted Text: (with rejoined lines)
/*
Elizabeth I, her royal Majesty the Queen, 123, 144-155.
  birth of, 145.
  christening, 146-147.
  death and burial, 152.

Ethelred II, the Unready, 33.
*/

Scanned Text:
Hooker, Jos., maj. gen. U. S. V., 345; assigned
  to command Porter's corps, 350; afterwards,
  McDowell's, 367; in pursuit of Lee, 380;
  at South Mt., 382; unacceptable to Halleck,
  retires from active service, 390.

Hopkins, Henry H., 209; notorious secessionist in
  Kanawha valley, 217; controversy with Gen.
  Cox over escaped slave, 233.

Hosea, Lewis M., 187; capt. on Gen. Wilson's staff, 194.
Correctly Formatted Text: (with index subtopics aligned)
/*
Hooker, Jos., maj. gen. U.S.V., 345;
  assigned to command Porter's corps, 350;
  afterwards, McDowell's, 367;
  in pursuit of Lee, 380;
  at South Mt., 382;
  unacceptable to Halleck, retires from active service, 390.

Hopkins, Henry H., 209;
  notorious secessionist in Kanawha valley, 217;
  controversy with Gen. Cox over escaped slave, 233;

Hosea, Lewis M., 187;
  capt. on Gen. Wilson's staff, 194.
*/

Plays: Actor Names/Stage Directions

For all plays:

  • Format cast listings (Dramatis Personæ) as lists.
  • Put four blank lines before the beginning of an Act.
  • Put two blank lines before the beginning of each Scene.
  • In dialogue, treat a change in speaker as a new paragraph, with one blank line between.
  • Format actor names as they are in the original text, whether they are italics, bold or all capital letters.
  • Stage directions are formatted as they are in the original text.
    If the stage direction is on a line by itself, format it that way; if it is at the end of a line of dialogue, leave it there; if it is right-justified at the end of a line of dialogue, leave six spaces between the dialogue and the stage directions.
    Stage directions often begin with an opening bracket and omit the closing bracket.
    This convention is retained; do not close the brackets. Italics are generally placed inside the brackets.

For metrical plays: (Plays written as rhymed poetry)

  • Many plays are metrical, and like poetry should not be rewrapped. Surround metred text with /* and */ as for poetry. If stage directions are on their own line, do not surround these with /* and */. (Stage directions are not metrical, and so can be safely rewrapped in the PP stage, so should not be contained within the /* */ tags that protect the metrical dialogue from being rewrapped.)
  • Preserve relative indenting of dialog when a single metrical line is shared by more than one speaker.
  • Rejoin metrical lines that were split due to width restrictions of the paper, just as in poetry.
    If the continuation is only a word or so, it is often shown on the line above or below following a (, rather than having a line of its own.
    See the example.

Please check the Project Comments, as the Project Manager may specify different formatting.

Sample Image:
title page image
Correctly Formatted Text:

/*
Has not his name for nought, he will be trode upon:
What says my Printer now?

<i>Clow.</i> Here's your last Proof, Sir.
You shall have perfect Books now in a twinkling.

<i>Lap.</i> These marks are ugly.

<i>Clow.</i> He says, Sir, they're proper:
Blows should have marks, or else they are nothing worth.

<i>Lap.</i> But why a Peel-crow here?

<i>Clow.</i> I told 'em so Sir:
A scare-crow had been better.

<i>Lap.</i> How slave? look you, Sir,
Did not I say, this <i>Whirrit</i>, and this <i>Bob</i>,
Should be both <i>Pica Roman</i>.

<i>Clow.</i> So said I, Sir, both <i>Picked Romans</i>,
And he has made 'em <i>Welch</i> Bills,
Indeed I know not what to make on 'em.

<i>Lap.</i> Hay-day; a <i>Souse</i>, <i>Italica</i>?

<i>Clow.</i> Yes, that may hold, Sir,
<i>Souse</i> is a <i>bona roba</i>, so is <i>Flops</i> too.
*/


Sample Image:
title page image
Correctly Formatted Text:

/*
<sc>Clin.</sc> And do I hold thee, my Antiphila,
Thou only wish and comfort of my soul!

<sc>Syrus.</sc> In, in, for you have made our good man wait. (<i>Exeunt.</i>
*/




ACT THE THIRD.


<sc>Scene I.</sc>


/*
<sc>Chrem.</sc> 'Tis now just daybreak.--Why delay I then
To call my neighbor forth, and be the first
To tell him of his son's return?--The youth,
I understand, would fain not have it so.
But shall I, when I see this poor old man
Afflict himself so grievously, by silence
Rob him of such an unexpected joy,
When the discov'ry can not hurt the son?
No, I'll not do't; but far as in my pow'r
Assist the father. As my son, I see,
Ministers to th' occasions of his friend,
Associated in counsels, rank, and age,
So we old men should serve each other too.
*/


<sc>SCENE II.</sc>

<i>Enter</i> <sc>Menedemus.</sc>


/*
<sc>Mene.</sc> (<i>to himself</i>). Sure I'm by nature form'd for misery
Beyond the rest of humankind, or else
'Tis a false saying, though a common one,
"That time assuages grief." For ev'ry day
My sorrow for the absence of my son
Grows on my mind: the longer he's away,
The more impatiently I wish to see him,
The more pine after him.

<sc>Chrem.</sc> But he's come forth. (<i>Seeing</i> <sc>Menedemus.</sc>)
Yonder he stands. I'll go and speak with him.
Good-morrow, neighbor! I have news for you;
Such news as you'll be overjoy'd to hear.
*/


Sample Image:
Plays image
Correctly Formatted Text:

[<i>Hernda has come from the grove and moves up to his side</i>]

/*
<i>Her.</i> [<i>Adoringly</i>] And you the master!

<i>Hud.</i> Daughter, you owe my lord Megario
Some pretty thanks.                  [<i>Kisses her cheek</i>]

<i>Her.</i>        I give them, sir.
*/


Sample Image:
Plays image
Correctly Formatted Text:

/*
Am. Sure you are fasting;
Or not slept well to night; some dream (Ismena?)

Ism. My dreams are like my thoughts, honest and innocent,
Yours are unhappy; who are these that coast us?
You told me the walk was private.
*/

Anything else that needs special handling or that you're unsure of

While formatting, if you encounter something that isn't covered in these guidelines that you think needs special handling or that you are not sure how to handle, post your question, noting the png (page) number, in the Project Discussion thread (a link to the project-specific forum is in the Project Comments), and put a note in the formatted text explaining the problem. Your note will explain to the next volunteer or post-processor what the problem or question is.

Start your note with a square bracket and two asterisks [** and end it with another square bracket ]. This clearly separates it from from the Author's text and signals the Post-Processor to stop and carefully examine this part of the text & the matching image to address any issues. Any comments put in by a previous volunteer must be left in place. Agreement or disagreement can be added, but even if you know the answer, you absolutely must not remove the comment. If you have found a source which clarifies the problem, please cite it so the post-processor can also refer to it.

If you are formatting in a later round and come across a note from a volunteer in a previous round that you know the answer to, please take a moment and provide Feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation in the future. Please, as already stated, do not remove the note.

Previous Proofreaders' Notes/Comments

Any notes or comments put in by a previous volunteer must be left in place. You may add agreement or disagreement to the existing note but even if you know the answer, you absolutely must not remove the comment. If you have found a source which clarifies the problem, please cite it so the post-processor can also refer to it.

If you are formatting in a later round and come across a note from a volunteer in a previous round that you know the answer to, please take a moment and provide Feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation in the future. Please, as already stated, do not remove the note.


 

Specific Guidelines for Special Books

These particular types of books have specific guidelines that add to or modify the normal guidelines given in this document. Projects for these books are often difficult, and are not recommended for beginning volunteers. They are more appropriate to experienced volunteers or people who have expertise in the particular field.

Click on the link below when you need to see the guidelines for one of these types of books.

 

Common Problems

OCR Problems: 1-l-I

OCR commonly has trouble distinguishing between the digit '1' (one), the lowercase letter 'l' (ell), and the uppercase letter 'I'. This is especially true for books where the pages may be in poor condition.

Watch out for these. Read the context of the sentence to determine which is the correct character, but be careful—often your mind will automatically 'correct' these as you are reading.

Noticing these is much easier if you use a mono-spaced font such as DPCustomMono or Courier.

OCR Problems: 0-O

OCR commonly has trouble distinguishing between the digit '0' (zero), and the uppercase letter 'O'. This is especially true for books where the pages may be in poor condition.

Watch out for these. Normally the context of the sentence is sufficient to determine which is the correct character, but be careful—often your mind will automatically 'correct' these as you are reading.

Noticing these is much easier if you use a mono-spaced font such as DPCustomMono or Courier.

OCR Problems: Hyphens and Dashes

OCR commonly has trouble distinguishing between dashes & hyphens. Format these carefully—OCR'd text often has only one hyphen for an em-dash that should have two. See the rules for a hyphenated words and em-dashes for more detailed information.

Noticing these is much easier if you use a mono-spaced font such as DPCustomMono or Courier.

OCR Problems: Scannos

Another common OCR issue is misrecognition of characters. We call these errors "scannos" (like "typos"). This misrecognition can result in a word which:

  • appears to be correct at first glance, but is actually misspelled.
    These can usually be caught by running the spellcheck from the proofreading interface.
  • is changed to a different but otherwise valid word that does not match what is in the page image.
    These are subtle because they can only be caught by someone actually reading the text.

Possibly the most common example of the second type is "and" being OCR'ed as "arid." Other examples: "eve" for "eye", "Torn" for "Tom", "train" for "tram". This type is harder to spot and we have a special term for them: "Stealth Scannos." We collect examples of Stealth Scannos in this thread.

Spotting scannos is much easier if you use a mono-spaced font such as DPCustomMono or Courier.

Handwritten Notes in Book

Do not include handwritten notes in a book (unless it is overwriting faded, printed text to make it more visible). Do not include handwritten marginal notes made by readers, etc.

Some Project Managers may ask that handwritten notes be marked with [HW: (text of the note)].

Bad Images

If an image is bad (not loading, chopped off, unable to be read), please put a post about this bad image in the Project Comments forum. Do not click on "Return Page to Round"; if you do, the page will be reissued to the next formatter. Instead, click on the "Report Bad Page" button so this page is 'quarantined'.

Note that some page images are quite large, and it is common for your browser to have difficulty displaying them, especially if you have several windows open or are using an older computer. Before reporting this as a bad page, try clicking on the "Image" line on the bottom of the page to bring up just the image in a new window. If that brings up a good image, then the problem is probably in your browser or system.

It's fairly common for the image to be good, but the OCR scan is missing the first line or two of the text. Please just type in the missing line(s). If nearly all of the lines are missing in the scan, then either type in the whole page (if you are willing to do that), or just click on the "Return Page to Round" button and the page will be reissued to someone else. If there are several pages like this, you might post a note in the Project Comments forum to notify the Project Manager.

Wrong Image for Text

If there is a wrong image for the text given, please put a post about this bad image in the Project Comments forum. Do not click on "Return Page to Round"; if you do, the page will be reissued to the next formatter. Instead, click on the "Report Bad Page" button so this page is 'quarantined'.

Previous Proofreading and Formatting Mistakes

If a previous volunteer made a lot of mistakes or missed a lot of things, please take a moment and provide Feedback to them by clicking on their name in the proofreading interface and posting a private message to them explaining how to handle the situation so that they will know how in the future.

Please be nice! Everyone here is a volunteer and presumably trying their best. The point of your feedback message should be to inform them of the correct way to format, rather than to criticize them. Give a specific example from their work showing what they did, and what they should have done.

If the previous volunteer did an outstanding job, you can also send them a message about that—especially if they were working on a particularly difficult page.

Printer Errors/Misspellings

Correct all of the words which the OCR has misread (scannos), but do not correct what may appear to you to be misspellings or printer errors that occur on the scanned image. Many of the older texts have words spelled differently from modern usage and we retain these older spellings, including any accented characters.

If you are unsure, place a note in the txet [**typo for text?] and ask in the Project Discussion thread. If you do make a change, include a note describing what you changed: [**Transcriber's Note: typo fixed, changed from "txet" to "text"]. Include the two asterisks ** so the post-processor will notice it.

Factual Errors in Texts

In general, don't correct factual errors in the author's book. Many of the books we are preparing have statements of fact in them that we no longer accept as accurate. Leave them as the author wrote them.

A possible exception is in technical or scientific books, where a known formula or equation may be given incorrectly, especially if it is shown correctly on other pages of the book. Notify the Project Manager about these, either by sending them a message via the Forum, or by inserting [**note sic explain-your-concern] at that point in the text.

Uncertain Items

     [...to be completed...]

  Return to: Distributed Proofreaders home page,     DP FAQ Central page,     Project Gutenberg home page.
 
Copyright Distributed Proofreaders (Page Build Time: 0.013) Report a Bug