![]() It’s not clear from this article whether this has ever caused anybody serious problems yet, or just been noticed in the lab, but you can imagine the potential lawsuits… ![]() This can be something of a problem if you are, say, an accountant, or an architect. All very cunning.īut this story on the BBC describes how some Xerox photocopiers may not have been getting it right, occasionally substituting incorrect digits in their copies. So you don’t have to store the image of every ‘e’ in the document – you can store a representative sample of each size, font etc and simply insert an appropriate one wherever it is used in the original. The same concepts are now in the JBIG2 standard, which is included in PDF and embedded in many devices, including Xerox copiers.Īnother way to save space and time is that, once you’ve separated the text and other symbols from the background, it’s fairly easy to see if any symbols are re-used. It was particularly good for things like digitising historical manuscripts – it would separate the script from the parchment, deal with them separately and still produce a realistic-looking copy afterwards, but take a fraction of the amount of data that most other schemes would have used especially important in those pre-broadband days. In the late 90s, my friend Yann LeCun and others created the DjVu format, which cunningly works out how to split a document up and compress each bit using the most appropriate system, then reassemble them for viewing later. So how do you handle, say, a typical magazine page, with crisp text, embedded photos, graduated background colours? And the hard-edged, often lossless, compression used by things like PNG and GIF is great for text but will do nasty things to any embedded photos or background textures. The kind of lossy compression used by the JPEG and MPEG standards is great for photos and movie frames, but not much good for text – it makes the edges blurry. Unless, of course, you compress the data. One of the challenges, when storing or transmitting the image of a scanned multi-page document, is that it takes an awful lot of space.
0 Comments
Leave a Reply. |