Explain why Type 1 Time-Roman unembedded when Tags are added

September 16, 2010, 11:05 am

≫ Next: Pdf text extract problem with CID font and Identity-H

I've been plagued by a font issue I cannot resolve. I have PDFs that have to be tagged and have to have all fonts embedded. (they are being posted to a federal website and MUST conform to the web master's font requirements AND the 508 tagging guidelines. Absolute adherence.

Anyway. Start by doing a PDF preflight with no missing fonts.

Tags added and when Preflight check runs, there are fonts not embedded.

I try to use the Preflight embedded fonts (even if invisible) and the fonts will not embedded. I have the fonts on my system so that is not the issue. If I repostscript (print to file, then distill), the fonts are back in the document but then the tags are gone. When tags get added, fonts aren't embedded.

Does Adobe use Times-Roman for tags?

I have 4 documents that I have to post. 3 are similar and are having this problem. A fourth document which is unique in its composition (less content, different fonts used). I've been able to process this (add tags and fonts are all embedded). When I compare the differences, the PDF that processes fine has in the embedded font list Type 0 Times-Roman.

Type 0??? New one to me. My investigation into Type 0 fonts uncovered that they are:

A composite font,also called a Type 0 font, is one whose glyphs are obtained from a fontlike object called a CIDFont.

I also read that apparently Adobe retained most information regarding Type 0 and not much else is commonly known.

Why does one PDF look to embedded Type 1 Times-Roman, without success.

And the other PDF tags the PDFs apparently with Type 0 Times-Roman successfully.

Does anyone have a clue to what I am talking about. Any solutions? Am I barking up the wrong tree and need to change directions?

HELP.

Post replies here or email me a rhasney@ets.org

↧

Pdf text extract problem with CID font and Identity-H

May 15, 2011, 12:37 am

≫ Next: Finding errors in a PDF

≪ Previous: Explain why Type 1 Time-Roman unembedded when Tags are added

Hi all,

Iam facing some big problem with text extraction from pdf file.

Currently iam using congviews pdf2xl text extraction tool.

About 95% of the text extract correcly but few charaters showing box some ? and some dotted circle mark.

Font Used:

ArialUnicodeMS(Embedded Subset)

Type:(True Type (CID)

Encoding:Identity-H

TimesNewRomanPSMT

Type:True Type

Encoding:ANSI

ActualFont:TimesNewRomanPSMT

ActualFontType:TrueType

Anyone please help me to overcome this.

Regards

Gilbert.X

↧

Finding errors in a PDF

November 12, 2012, 12:19 am

≫ Next: Signature field rotation

≪ Previous: Pdf text extract problem with CID font and Identity-H

I am generating some PDFs from scratch. One of the files opens fine in Acroba but it still clearly has a structural problem because if I check the file with the preflight in Acrobat it says thet the file is damaged and needs repair, and if I close the file without eding anything Acrobat asks if I want to save the file. Acrobat does not say what the problem is, and I am unable to find the problem by inspecting it. It is quite a basic file and is easily inspectible using a text editor.

I have tried validating the pdf using different validators, but I have found no validators which will report on "structural" problems in the file, so I have tried to inspect the file using a text editor and also a binary-editor to check that offsets of the objects are at the correct place and that the stream lengths are right.

I am unable to find any problems by inspecting it manually so I don't understand what I am missing. And I don't know of any tools which can do this.

(I have got Acrobat Pro 8, but I am not an expert in using it so I don't know if it can be used to "debug" the file)

Anyway, if anyonecan give me any hints on how to solve this, it would be much appreciated.

And, if anyone wants to take a look at my testfile, it can be downloaded from here: https://www.box.com/files/0/f/0/1/f_3946274006

Again, the file is easily readable in a text editor since none of the streams are compressed...

Thanks in advance for any help!!

↧

Signature field rotation

September 27, 2011, 7:45 am

≫ Next: small error in pdf spec

≪ Previous: Finding errors in a PDF

Hello,

I found out that you can rotate signature fields the same way you rotate pages (using the MK dictionary of the widget annotation).

I'd like to know if it's possible to make a random rotation (i.e. 45° for example) or have we to stick to the multiples of 90° ?

↧

small error in pdf spec

October 11, 2011, 7:29 am

≫ Next: How to Manage Text In PDF

≪ Previous: Signature field rotation

found a possible error in the pdf spec PDF 32000-1:2008:

page 385, Table 165 – Annotation flags

8 bit position - Locked (PDF 1.4)

In pdf 1.4 there was no flag 'locked' at the 8bit position.

(there are also lots of spelling mistakes but don't want to nit-pick ;-)

↧

How to Manage Text In PDF

May 9, 2012, 6:17 am

≫ Next: how to create multiple pages in the output file?

≪ Previous: small error in pdf spec

Hi everyone.

I am pretty new to PDF programming world and for study PDF ,I am prefering PDF specification.

I can manage Printing text on my page but when my text length goes out to the page range it get crop

..so my requirement is how to justified that text line on next line

please suggest how to manage this problem.

currently I am tackling this prob by

T*[()] TJ

but I have to do it manually..

thanks in advance.

↧

how to create multiple pages in the output file?

July 14, 2012, 1:19 pm

≫ Next: Generating PDF from a C program. How do I get dimensions of rendered text?

≪ Previous: How to Manage Text In PDF

hi, we have an application (C++) that generates PDF output. I've been asked to figure out how to create multiple copies of each page (uncollated). i.e. the output would be 4 copies of p 1 followed by 4 copies of p 2, etc. I'd like to do this without actually replicating each page if possible ... any ideas?

↧

Generating PDF from a C program. How do I get dimensions of rendered text?

July 30, 2013, 1:31 am

≫ Next: in browser- URL of PDF where it imports FDF data

≪ Previous: how to create multiple pages in the output file?

Hello.

I am generating a PDF file from withing a C program I am developing and I am running into trouble measuring the dimensions of arbitrary text string when rendered on the page. Even if I had widths of individual glyphs in the font I am using, I would still run into trouble considering ligatures. My questions are the following:

When I generate a PDF file, I have to provide the /Widths array dictionary entry for the font I use. How do I get these values? Is it possible to get them out of the font file (OTF/Type-1) easily?
If I do have the widths of individual glyphs and the ascent/descent values for the whole font, I still need ascent/descent for individual glyphs so that I can typeset things perfectly. How about these? Can they be read from the formats mentioned above?

What confuses me is this. A PDF writer (be it Microsoft Word) knows the font it is using to write text, obviously. When it saves the document as PDF, does it really inspect the font to dig out the glyph sizes?

I do not use any libraries whatsoever, I am generating the whole file and its PostScript streams "by hand".

Thanks!

David

↧

in browser- URL of PDF where it imports FDF data

January 24, 2014, 9:08 am

≫ Next: Form field info not showing up, unless field is clicked? (again)

≪ Previous: Generating PDF from a C program. How do I get dimensions of rendered text?

I want to bring up my browser - put the URL of a PDF on a server that has fillable fields - and have the PDF automatically bring in the contents of an FDF file to populate the fields defined in the FDF file.

There is ADOBE documentation of doing something like this:
http://www.mydomain.com/abc.pdf#fdf=datafile.fdf

That's all I want to do. Simple.

In detail - in a PHP program I want to create the FDF file just using fopen and fwrites - no additional packages - then my JavaScript calls an already created PDF file on the web with the PDF when brought up on the server side automatically imports the FDF my program created a few seconds ago.

I do know about fdf_create in PHP but unfortunately the server I'm on is cpanel based and the hosting company cannot load a library for fdf_create related calls.

I cannot seem to get this working. I've called Adobe tech support. A few months ago I reached someone who actually said "You've reached the right person. Yes that is doable." and they confirmed what I wanted to do. But since then the Adobe people I call don't seem to understand what an FDF file is even though that is what Adobe uses to communicate and has it in their documentation as http://......abc.pdf#fdf=xxx.fdf

Now I've even eliminated the step/chance that I've not created my FDF file properly. I've gone into Acrobat Pro and done an extract to an .FDF file. Then I've tried http://www.myweb.com/abc.pdf#fdf=theAcrobatExtractedFile.fdf and it still ignores the contents of the FDF file.

I've also tried http://www.myweb.com/abc.pdf#fdf=http://www.myweb.com/extractedfile.fdf fully qualifying the location of the FDF file.

Help! This is important. Thank you.

Jay

↧

Form field info not showing up, unless field is clicked? (again)

March 3, 2014, 2:39 pm

≫ Next: XMP updated with exiftool is not read

≪ Previous: in browser- URL of PDF where it imports FDF data

We're using a pdf library called Sync Fusion to automate some fillable fields in a pdf but after the text has been placed it is not visible due to the form field high-lighting.

So far we have tried the 3 things below but nothing has solved this issue.

changing the fore/background color of the fillable fields to selected colors
disabling multiline fields
applying "mac fix" scripts

Anyone have any other ideas that might fix this? We're going to try a different pdf library next.

thanks,

Bob

↧

XMP updated with exiftool is not read

May 19, 2014, 6:05 am

≫ Next: How to display cyrillic characters in a PDF

≪ Previous: Form field info not showing up, unless field is clicked? (again)

If this question is better suited in another forum, please let me know.

The following example PDF has been used in a workflow where the XMP meta data has been updated with new values (e.g. dc:identifier has the value Metropoles).

When opening this file and looking at the meta data in for example Acrobat Pro this new updated information is not viewed but the old XMP stream is instead read by Acrobat (pdfLib).

The update looks correct as far as I can see, the object 54 has been updated and the new xref table at the end has the correct start byte for this new updated stream (byte 0022017676). So I can not understand why this new stream of XMP is not viewed in Acrobat.

Does anyone see any error in the PDF or is there any obvious reason why Acrobat wouldn't read the new XMP stream?

(other software using exiftools extract the new XMP correctly, I have also tested with some tool written in pure java and that will also show the new updated XMP as the currently working meta data)

Link to test PDF (appr 22 MB)

Dropbox - Test-20140518.pdf

↧

How to display cyrillic characters in a PDF

July 10, 2014, 1:52 pm

≫ Next: Is a zero length glyf table valid for a TTF subset?

≪ Previous: XMP updated with exiftool is not read

I am fairly green in terms of representing text in PDF documents and need some assistance. My main question is how do I represent Cyrillic characters in PDF files.

I know the basics of how to represent text in PDF files and the PostScript commands to use. I know that bytes written to the file in the range of 0 to 255 will print correctly when using the correct encoding (we are using the WinAnsiEncoding). What I cannot seem to figure out is how to represent extended character sets and different glyphs (such as those used in the Cyrillic alphabet) in a PDF file. Do I need to use CID fonts and CMaps?

Here is an example of the text I understand how to print:

stream

0.00000000 0.00000000 0.00000000 RG

0.00000000 0.00000000 0.00000000 rg

/Helvetica 14 Tf

7.2 768.96 Td

(Hello World!) Tj

endstream

I'm really not clear on how to represent any of the Chinese or Japanese fonts either, so really any help here is appreciated. Any examples are appreciated as well.

Thanks!

↧

Is a zero length glyf table valid for a TTF subset?

October 23, 2014, 11:02 pm

≫ Next: How to disable Advanced tab on PDF Properties dialog

≪ Previous: How to display cyrillic characters in a PDF

I've posted related queries in a few forums - there is some background info at https://forums.adobe.com/thread/1611899 and it was suggested I ask the experts here

Basically in some situations InDesign is generating a subset font with a zero length glyf table.

Does this 'exists but zero length' table fulfil the criteria from the PDF specification?

The following TrueType tables are always required: “head,” “hhea,” “loca,” “maxp,” “cvt_,” “prep,” “glyf,” “hmtx,” and “fpgm.”

A number of programs are not happy with such files, but Adobe products seem to handle them. However Acrobat in particular is fairly forgiving as we all know.

Microsoft's font validator says the subset is not valid. Programs like Ghostscript attempt to fix it by adding an empty table.

↧

How to disable Advanced tab on PDF Properties dialog

May 28, 2015, 9:08 am

≫ Next: What is a "Glyph Name"?

≪ Previous: Is a zero length glyf table valid for a TTF subset?

I created a simple form on Adobe's FormsCentral and noticed that the Advanced tab on the PDF Properties dialog is disabled (user cannot change things like Page Scaling, etc.) While the viewer preferences dictionary is described in the PDF spec, I have not found anything in the PDF spec for disabling the Advanced tab on the PDF Properties dialog to prevent users for changing these values. But since the form I created on Adobe's FormsCentral web site does this, I am assuming there may be something in the PDF spec for this. Anyone have an idea what this might be?

↧

What is a "Glyph Name"?

August 20, 2015, 5:21 am

≫ Next: Font Size in a PDF File?

≪ Previous: How to disable Advanced tab on PDF Properties dialog

What is the meaning of a "Glyph Name" . Is the "Glyph Name" a product of the Font Type. For example I see a font type of F3 in a adobe pdf file, can I determine the "Glyph Name" from that? How does it work?

Thanks

↧

Font Size in a PDF File?

August 20, 2015, 7:51 pm

≫ Next: Understanding User Space scale relative to device space scale?

≪ Previous: What is a "Glyph Name"?

I am trying to determine the font size of the current font used in a pdf stream. By example. As it has been established in another post that the F1 refers to a Font in the Resources Dictionary, can anyone tell me what the number 1 sitting beside the F1 means. The below example says it is used to set the Font Size but as previously argued, that is not actually the case. I'm using some C++ open source library which tells me when I extract the font size, that the font size is 12, not 1? Why is this? The number 1 doesnt' seem to refer to anything in any dictionary but I thought there may be some PDF standard I'm overlooking which indicates that 1 unit size of F3 font (Times-Roman) equates to a size 12 Font(specifically for 1 unit Times Roman, a proxy for font size).

Is that assumption correct? I'm struggling to find an explanation of it in the IS0 3200-1. Maybe someone can point me in the right direction?

stream

BT% Begin text object

/F1 1 Tf% Set text font and size

64 0 0 64 7.1771 2.4414 Tm% Set text matrix

0 Tc% Set character spacing

0 Tw% Set word spacing

↧

Understanding User Space scale relative to device space scale?

March 27, 2016, 9:03 pm

≫ Next: How to handle inverted letters (Devanagari)?

≪ Previous: Font Size in a PDF File?

I can successfully parse date from a pdf file and position most of that data using operators like Tw, TD, etc.

The problem I'm now having is determining the link between the device space coordinates and my user space. Because I am parsing and transferring

the pdf data I'm extracting to a less sophisticated medium such as a text file I'm trying to determine how far the x & y displacements actually are on that text

file. This way I can hopefully position text on the text file similar to the placement in the pdf documents device space.

I don't have much of an idea how to make that transition and was wondering if someone could provide a hint as to how to go about this.

For example, using the formula in Section 9.4.4 of the PDF Reference 3200-2008 I get say an X displacement amount (this is before I subtract it from the current position W0)

165.097.. or something like that. My understanding is that this is in 1/1000th of an inch (device space scale).

So is that transferred to a text file by using a standard user space unit of 1/72 of an inch? I'm not sure.

Another part of this problem is how do I determine how far ASCII 32 (Space) is in terms of device space?

I'm sure all you experts have been there. I'm completely lost as to how to make this link.

Would somebody be able to help? Thanks

Windows 7, C++, TDM-GCC

↧

How to handle inverted letters (Devanagari)?

September 6, 2016, 8:41 am

≫ Next: XObject - Alternates files for printing

≪ Previous: Understanding User Space scale relative to device space scale?

In Devanagari the Unicode 093F letter is displayed before the consonant, although in the Unicode string it appear thereafter.

This requires to invert two letters in a string which is written to the PDF. This causes a problem when text is extracted from the PDF. The ToUnicode map is no longer correct. It will return the two letters in inverted order.

For example:

incorrect:

093F

0935

correct:

0935

093F

Is there a solution for this?

Even LibreOffice PDF export does not handle this correctly.

↧

XObject - Alternates files for printing

November 19, 2007, 2:51 am

≫ Next: Programmatically search for a word in PDF file

≪ Previous: How to handle inverted letters (Devanagari)?

Hi all,

A second question with the same example as my preceding question solved now. I recall the example working well on Windows XP Professionnel SP 2 or Mac 10.3.9.

%PDF-1.7

1 0 obj [/PDF] endobj

4 0 obj << /Length 36>>

stream

q 100 0 0 100 100 400 cm /Im3 Do Q

endstream endobj

5 0 obj <</Type /Page /Parent 2 0 R /MediaBox [0 0 595 842] /Contents [4 0 R ] /Resources <</ProcSet 1 0 R /Font <<>> /XObject <</Im3 3 0 R >> >> >> endobj

3 0 obj <</Type /XObject /Subtype /Image /Width 39 /Height 41 /ColorSpace /DeviceRGB /BitsPerComponent 8 /Filter /DCTDecode /Length 1087>>

stream

% binary data

endstream endobj

2 0 obj <</Type /Pages /Kids [5 0 R ] /Count 1>> endobj

6 0 obj <</Type /Catalog /Pages 2 0 R >> endobj

xref

0 7

0000000000 65535 f

0000000010 00000 n

0000001536 00000 n

0000000278 00000 n

0000000033 00000 n

0000000120 00000 n

0000001593 00000 n

trailer

<</Size 7 /Root 6 0 R >> startxref 1642

%%EOF

b I add an alternate file for printing defined in PDF 1.7 page 348 (example). So I add two objects. Now I get an error 109.

Here it is the second one.

%PDF-1.7
1 0 obj [/PDF] endobj
4 0 obj [<</Image 5 0 R /DefaultForPrinting true >> ] endobj
6 0 obj << /Length 36>>
stream
q 100 0 0 100 100 400 cm /Im3 Do Q
endstream endobj
7 0 obj <</Type /Page /Parent 2 0 R /MediaBox [0 0 595 842] /Contents [6 0 R ] /Resources <</ProcSet 1 0 R /Font <<>> /XObject <</Im3 3 0 R >> >> >> endobj
3 0 obj <</Type /XObject /Subtype /Image /Width 39 /Height 41 /ColorSpace /DeviceRGB /BitsPerComponent 8 /Alternates 4 0 R /Filter /DCTDecode /Length 1087>>
stream

% binary data for the base image

endstream endobj
5 0 obj <</Type /XObject /Subtype /Image /Width 50 /Height 64 /ColorSpace /DeviceRGB /BitsPerComponent 8 /Filter /DCTDecode /Length 1721>>
stream

% binary data for the alternate image

endstream endobj

2 0 obj <</Type /Pages /Kids [7 0 R ] /Count 1>> endobj
8 0 obj <</Type /Catalog /Pages 2 0 R >> endobj
xref
0 9
0000000000 65535 f
0000000010 00000 n
0000003508 00000 n
0000000340 00000 n
0000000033 00000 n
0000000095 00000 n
0000000095 00000 n
0000000182 00000 n
0000003565 00000 n
trailer
<</Size 9 /Root 8 0 R >> startxref 3614
%%EOF

Thanks for your help.

François

↧

Programmatically search for a word in PDF file

February 25, 2009, 6:29 am

≫ Next: reuse embedded objects like images in pdf

≪ Previous: XObject - Alternates files for printing

In the program I am developing, I am opening up a PDF file from the application. Is there any way to search for a particular word in the PDF file and move to the page containing the first occurance of that word in the PDF file programmatically? I am using VC++ to develop the application.

Any guidance is appreciated and thanks in advance.

↧