Translate
IBU
Consol
|
|
OCR (Optical Character Recognition) recovery.
Copyright Julian H.
Stacey July 2017
Recovery Notes:
- Scanning: The document below was scanned to .tiff
by berklix . com /
scanjet / 4 times, in monochrome, with density 0, +30,
+60, & +90 darker.
Actual scans not on line to save space.
- OCR Optimisation: +90 darker was generally but not
everywhere better than +60%, as some page darker print at the
top of the paper pages etc (it's not an artifact of the
scanner) (more noticeable on Page A=2 & less noticeable
on P23. Maybe typist tired by bottom of some pages ?)
-
OCR (Optical Character Recognition) by
Tesseract:,
- Comparison of texts by eye with mgdiff:, a
colourising differential comparator.
BSDforge .
com / projects / textproc / mgdiff /
svnweb . freebsd . org / ports / head / textproc / mgdiff /
pkg - descr ? revision = HEAD
- Then I manually corrected OCR errors back to
original plain text.
- To keep in plain Ascii text, which has no Pound Sterling,
symbols were replaced by international currency trading
abbreviation GBP
(if document is later converted to html, I will use html
symbol £
(£).
- Underlined Words manualy converted to _Underlined_
_Words_
(if document is later converted to html, if underline is not
supported, I will use Bold
- Page numbers: manually restored inside grey editors brackets [ ].
- Paragraph indents: Original used 10 space indents,
but no blank line. Either one of both {short preceeding
lines} and {10 space indents} was sufficient for the OCR to
detect paragraphs & insert a blank line. The OCR also
stripped the 10 blank spaces. I restored the 10 spaces for
authenticity, but left the blank lines in place as it makes
it easier to read. (Original presumably avoided blank lines
to avoid paper documents with more pages, but no issue
here.)
- OCR inserted some spurious blank lines elsewhere, these
manually removed.
- Spelling: I did not notice spelling errors, beyond
what few OCR created, but if I had I would have left in place
as original.
-
HTML Format Later maybe. to allow browsers on small
windows with narrow lines eg mobile & PDA screens.
Before that I will add switches to
berklix2.mk macros of form:
.if !exist .no_ispell to inhibit spell checker
fixing
&
.if !exist .no_tidy_m_reformat to inhibit
reformatting, until I have indent sections & tables
marked up in HTML.
- I worked from the author (my father) R. A. Stacey's paper copy.
After recovery was complete, I discovered www . average - adjusters
. com
but
www . average - adjusters . com / documents / category / wise
- words - from - the - chair/ is blocked with "You need
to login first to view Documents").
- Copyright Julian H.
Stacey, Munich, 2017, copyright asserted to avoid it
being abused eg by subsuming into private domain. I do not
contribute to closed source, only to Open Source
Even the UK government publish after 30 years. Ron Stacey's 1984 speach, by publication in
2017 was over 30 years old & is public.
|
|