Quantcast
Channel: Adobe Community : Popular Discussions - PDF Language and Specifications
Viewing all articles
Browse latest Browse all 46145

CJK Cmaps

$
0
0
I have a font that does not have a tounicode map. It has this encoding: "/90ms-RKSJ-H"

Looking that up in the Pdf Ref I see these two things:

90ms−RKSJ−H Microsoft Code Page 932 (lfCharSet 0x80), JIS X 0208 character set with NEC and IBM® extensions

90ms−RKSJ−H/V Adobe-Japan1-2 Adobe-Japan1-2 Adobe-Japan1-2 Adobe-Japan1-2

That tells me that to extract the text content I need to use the Adobe-Japan1-2 CMap to convert char codes to cids, and then use the Adobe-Japan1-UCS2 CMap to convert cids to unicode. (The first CMap had a registry of Adobe and Ordering of Japan1).

Well, that makes sense. But then I look at Adobe-Japan1-2, which has this codespacerange:

1 begincodespacerange
<0000> <22FF>
endcodespacerange

So char codes are two bytes long and the first byte must be less than 0x22.

Here's the first draw string I get from the pdf:

<8DE092639640906C2091538D918E7392AC91BA90558BBB8BA689EF8AF1958D8D7388D 720>Tj

I believe that only one of the two byte codes in that string actually fit the range.

Viewing all articles
Browse latest Browse all 46145

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>