Quantcast
Channel: Adobe Community : Popular Discussions - PDF Language and Specifications
Viewing all articles
Browse latest Browse all 46145

Encoding of Name Objects

$
0
0

Hi there!

 

I just stumbled over a problem I'm not sure how to handle.

 

It's about handling Name objects with hexadecimal notation.

 

[...]Regular characters that are outside the range EXCLAMATION MARK(21h) (!) to TILDE (7Eh) (~) should be written using the hexadecimal notation.[...]

 

No problem with the escape or unescape function but I read further in the PDF specification and found this:

 

[...]Ordinarily, the bytes making up the name are never treated as text to be presented to a human user or to an application external to a conforming reader. However, occasionally the need arises to treat a name object as text, such as one that represents a font name [...], a colorant name in a separation or DeviceN colour space, or a structure type ([...]).

In such situations, the sequence of bytes (after expansion of NUMBER SIGN sequences, if any) should be interpretedaccording to UTF-8, a variable-length byte-encoded representation of Unicode in which the printable ASCII characters have the same representations as in ASCII. This enables a name object to represent text virtually in any natural language, [...]

 

So I understand that a name which is written with/in hexadecimal notation should be interpreted as a UTF-8 byte sequence.

 

I'd implemented this behaviour already until I reached a real live situation, that clashes with this definition: I created a checkbox form field and define the export value to be "€uro-Wert" (I tried to force this name to UTF-8 notation). Bizarrely Adobe Acrobat (9) encoded the Euro-Sign into PdfDocEncoding. The /AP entry looks like this:

 

<</D<</Off 42 0 R/#A0uro-Wert 43 0 R>>/N<</Off 39 0 R/#A0uro-Wert 40 0 R>>

 

I expected:

 

<</D<</Off 42 0 R/#e2#82#acuro-Wert 43 0 R>>/N<</Off 39 0 R/#e2#82#acuro-Wert 40 0 R>>

 

So... is this a bug in Acrobat or did I missed something? I mean this Name IS presented to a human through a dialog in Acrobat and also viewable in the form field tree in a panel. So from the view of specification it has to be encoded in UTF-8, or?

 

Anybody?

 

Thanks!

Jan


Viewing all articles
Browse latest Browse all 46145

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>