HEATH Digest - 12 Jan 2001 to 13 Jan 2001 (#2001-14)

Gordon Brandly gbrandly at HOME.COM
Mon Jan 15 14:50:18 EST 2001


If you're using either Textbridge or OmniPage, it's not that simple,
unfortunately. Both of these packages will indeed insert a "~" character
wherever they *know* that a character or word is suspicious. Unfortunately,
they're never suspicious enough -- even on their highest 'suspicion'
settings, every scan I've done has produced mistakes that are obvious to a
human, but that the OCR packages didn't notice at all.

So, a human is still needed to read through the whole final text. Some time
ago I OCR'ed the Heathkit tube tester data lists with the idea of turning
these into a searchable database. The proof-reading has turned out to a
major amount of work, unfortunately, and I'm still far from finished.

The raw output of the OCR process is usually fine for, say, magazine
articles. But when it comes to lists of numbers and letters that must be
*completely* accurate, a team of at least one computer and one human
proof-reader seems to be the best way to go.


-----Original Message-----
From: Heathkit Owners and Collectors List
[mailto:HEATH at LISTSERV.TEMPE.GOV]On Behalf Of Mike Morris
Sent: January 14, 2001 9:28 PM
To: HEATH at LISTSERV.TEMPE.GOV
Subject: Re: HEATH Digest - 12 Jan 2001 to 13 Jan 2001 (#2001-14)


<snip>
The OCR package I use inserts a ~ character in place of anything it has a
problem
with, and all I have to do to see if there were problems is open the output
file
with Notepad and scan for the ~ character.

<snip>

Listserver Subscription:listserv at listserv.tempe.gov - "subscribe heath
'name' 'call'"
Listserver Submissions: heath at listserv.tempe.gov
Listserver Unsubscribe: listserv at listserv.tempe.gov - -"signoff heath"

Listserver Subscription:listserv at listserv.tempe.gov - "subscribe heath 'name' 'call'"
Listserver Submissions: heath at listserv.tempe.gov
Listserver Unsubscribe: listserv at listserv.tempe.gov - -"signoff heath"




More information about the Heath mailing list