Identity-H, CMap and troubles choosing predefined encodings

I'm trying to parse certain pdf document on Mac OS X. The pages have embeded CID () fonts with Identity-H encoding. The font itself is Type0 font with CIDFontType2 descendant font. I'm able to extract text from any page by using 2-byte CIDs and mapping them to characters defined in ToUnicode stream. However there are a few character mismatches which (IMHO) are the cause of wrongly chosen encoding (MacRomanEncoding instead of PDFDocEncoding).

One of mismatched characters in document is Ø (latin capital o with stroke, empty set symbol) character, the character I'm extracting is ÿ (latin small character y with diaeresis). According to pdf 1.7 specification characters Ø and ÿ have same octal code, but in different encodings (330 in PDFDocStanrdardEncoding and MacRomanEncoding accordingly).

My question is how can I be sure to select correct encoding for the text? Is it PDFDocEncoding by default unless specified otherwise?

Identity-H, CMap and troubles choosing predefined encodings

Trending Articles

Playboi Carti – MUSIC – SORRY 4 DA WAIT [iTunes Plus M4A + M4V]

Practice Sheet of Right form of verbs for HSC Students

Man dies and another in serious condition after A614 crash between Driffield...

የኤሌክትሪክ ሥራዎች ተቋራጭ ሰርተፊኬት ለማግኘት የሚያስፈልጉ ቅድመ ሁኔታዎች

Microsoft ASL Compiler (asl.exe) を使ってみる

Inniss ordered to stay away from woman

Atrocities against Telangana SCs/STs Enhancement of compensation

Joining instruction bagamoyo high school

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Black Angus Grilled Artichokes

Throw Back: Samini — Where My Baby Dey (Prod by Kaywa)

Toughie 3495

Moondru Mudichu 05-04-2017 – Polimer tv Serial

Joshua Barlow lands prison term for failing to pay for guitar...

Mp3 Download: Stormzy - Cigarettes & Cush (feat. Kehlani & Lily Allen)

Help in sending .XLSX extension file as attachment

La Liga Font 2017/2018 (Free TTF Version)

Asianet plus schedule – list of programs , movie timings etc