Hi there!
I just stumbled over a problem I'm not sure how to handle.
It's about handling Name objects with hexadecimal notation.
[...]Regular characters that are outside the range EXCLAMATION MARK(21h) (!) to TILDE (7Eh) (~) should be written using the hexadecimal notation.[...]
No problem with the escape or unescape function but I read further in the PDF specification and found this:
[...]Ordinarily, the bytes making up the name are never treated as text to be presented to a human user or to an application external to a conforming reader. However, occasionally the need arises to treat a name object as text, such as one that represents a font name [...], a colorant name in a separation or DeviceN colour space, or a structure type ([...]).
In such situations, the sequence of bytes (after expansion of NUMBER SIGN sequences, if any) should be interpretedaccording to UTF-8, a variable-length byte-encoded representation of Unicode in which the printable ASCII characters have the same representations as in ASCII. This enables a name object to represent text virtually in any natural language, [...]
So I understand that a name which is written with/in hexadecimal notation should be interpreted as a UTF-8 byte sequence.
I'd implemented this behaviour already until I reached a real live situation, that clashes with this definition: I created a checkbox form field and define the export value to be "€uro-Wert" (I tried to force this name to UTF-8 notation). Bizarrely Adobe Acrobat (9) encoded the Euro-Sign into PdfDocEncoding. The /AP entry looks like this:
<</D<</Off 42 0 R/#A0uro-Wert 43 0 R>>/N<</Off 39 0 R/#A0uro-Wert 40 0 R>>
I expected:
<</D<</Off 42 0 R/#e2#82#acuro-Wert 43 0 R>>/N<</Off 39 0 R/#e2#82#acuro-Wert 40 0 R>>
So... is this a bug in Acrobat or did I missed something? I mean this Name IS presented to a human through a dialog in Acrobat and also viewable in the form field tree in a panel. So from the view of specification it has to be encoded in UTF-8, or?
Anybody?
Thanks!
Jan