I create font embeded PDFs in my application successfully. But content extraction (e.g. copy) does not work because I do not know how to create the /ToUnicode object correctly in font dictionary.
I read the spec and the note "5411.ToUnicode.pdf". But I did not get good understanding how to construct the ToUnicode object.
My font objects of embedded PDF looks like:
4 0 obj
<<
/Type /Font
/Subtype /Type0
/BaseFont /BBBAAA+MS-Gothic
/Encoding /Identity-H
/DescendantFonts [ 5 0 R ]
/ToUnicode 6 0
>>
endobj
5 0 obj
<<
/Type /Font
/Subtype /CIDFontType2
/BaseFont /BBBAAA+MS-Gothic
/FontDescriptor 7 0 R
/CIDSystemInfo <<
/Registry (Adobe)
/Ordering (Identity)
/Supplement 0
>>
/DW 1000
>>
6 0 obj
<< /Filter /FlateDecode /Length 8 0 R >>
stream
????
What short of my knowledge is how to create 6 0 obj for "ToUnicode" key?
What is the source/logic to specify CID to Unicode value mapings in 6 0 obj?
Does it depend on above Encoding Identity-H?
Thanks in advance for helping.