Barry SCHWARTZ (Barijo ŜVARC) (chemoelectric) wrote,

DRM in Adobe Reader

The digital rights management system in Adobe Reader makes little sense to me. It is mostly a ‘voluntary’ system: flags in the PDF file tell the reader program what to allow. Here is what makes no sense to me: how come the setting for not allowing re-use of font data also disallows the copying of the text as Unicode? Shouldn’t I be able to set the PDF to allow extraction of the text as Unicode, while disallowing re-use of the font data? The result is that if I use certain fonts in an e-text then, by the license terms, I have to set the document so you can’t cut and paste the Unicode in Adobe Reader. You can still have Reader look up a dictionary definition or do contextual searches, but it won’t let you copy the data to the clipboard.

There is, in fact, a setting for letting ‘accessibility’ programs extract the Unicode data, but this doesn’t help with ordinary Adobe Reader.

On Unix-like systems the ‘libre’ readers generally let you turn off the prohibition on extracting Unicode. Thus even if you like to read the text in Adobe Reader, when you want to copy some text you can go to the same page in a different reader and copy it from there. Of the non-Adobe readers I have tried, I think Evince probably was the best (despite being a featureless-by-design GNOME application); it was fast and rendered reasonably well, though still not as well as Adobe Reader with a well-hinted font.

Another alternative, especially doable for something like Paradise Lost where the line breaks are independent of the font, is to have multiple versions in different fonts, for different uses. This has the advantage also of being fun for me; I spend more time looking at my Paradise Lost in different fonts than actually working towards its completion! :)

Tangentially related is that I just figured out in the last several days how to get reader programs to extract the correct Unicode from a PDF, treating a smallcaps A the same as lowercase a, a ct ligature the same as c followed by t, etc. What happened is that while I was giving my software the ability to use auxiliary smallcaps and ligature fonts, which is the normal arrangement for traditional ‘Type 1’ fonts (*.pfb, *.afm, *.pfm), I noticed that reader was doing the right thing. But it still did the wrong thing with OpenType fonts (*.otf, some *.ttf). It turned out that the program I am using (at least for now) as a back-end, which I am borrowing from the XeTeX project, which borrowed it from users of enormous East Asian fonts, does the right thing with Type 1 fonts but for other fonts I have no idea what the programmers had in mind. Maybe it does the right thing with Korean or Japanese, but it doesn’t work too well with the Latin alphabet. So I wrote a small program to write a to-Unicode map for OpenType fonts, with minimum user-intervention, and made a small patch to the back-end to have it read that map from a mapping file placed alongside the font file.

(This back-end program is rather spaghetti-codish by accretion, and probably ought to be re-written, though I always will point out in cases like these that I have worked with code that was much, much, much worse. I have difficulty sometimes convincing people that the code they think is horrible is much better than what I used to work with on the job in the early ’90s.)

Another fact is that making repairs to broken fonts makes my thumb very achy. :(

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded