Quantcast
Channel: Adobe Community : Popular Discussions - PDF Language and Specifications
Viewing all articles
Browse latest Browse all 46145

How to recover incomplete cmap?

$
0
0

I printed a Microsoft Word file as a PDF file by Distiller with an embed SakkalMajalla truetype font. I want to extract unicode texts from the PDF file. I found ToUnicode misses part of mapping. For example, CID 06B4 doen't have any mapping. I guess 06b4 should be mapped to U+0644. There are some substitutions in SakkalMajalla. So uni0644.medi (U+FEE0) is replaced by liga.0758.medi.alt1 (U+10354). Why can't Distiller deal with the situation? How can I recover missed mapping from PDF objects except ToUnicode? Thanks

 

P.S. I also asked the question couple days ago. Please see Re: Is it a bug of Distiller? I haven't got answers. I don't have privilege to move or delete that discussion. Sorry for asking a question in two communities.

 

 

/GS1 gs

BT

/TT1 1 Tf

24 0 0 24 513.84 764.1203 Tm

0 g

0 Tc

0 Tw

<0284>Tj

.495 .5925 TD

<0551>Tj

-.1675 -.5925 TD

<06b4>Tj

.4 .4225 TD

<0551>Tj

-.12 -.4225 TD

<024f>Tj

/TT2 1 Tf

12 0 0 12 506.58 764.1203 Tm

( )Tj

ET

 

/CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo <<

/Registry (JJEELB+TT1+0) /Ordering (T42UV) /Supplement 0 >> def

/CMapName /JJEELB+TT1+0 def

/CMapType 2 def

1 begincodespacerange <024f> <0551> endcodespacerange

3 beginbfchar

<024f> <0639>

<0284> <0649>

<0551> <064E>

endbfchar

endcmap CMapName currentdict /CMap defineresource pop end end


Viewing all articles
Browse latest Browse all 46145

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>