can I generate a swf from PDF?

July 15, 2009, 11:35 am

≫ Next: Opt array without unicode values and NeedAppearance buggy in Acrobat/Reader?

≪ Previous: Duplicate entries in the Fields array

hello everybody, how I can generate a swf from PDF -> text and images, Example I have these files:
mytext.txt and some pics image1.jpg image2.jpg
This files, was parsing from a pdf file, and I need to inserts to a personalized swf (or fla). Yes, I need a free solution, like ipaper "http://www.scribd.com/ipaper" or flippingbook "http://page-flip.com/"... any idea: how this projects can works? because ipaper can read text and select it, it's not a image!! maybe they have an OCR but in AS3? in general, both projects have amazing characteristics.
I know this is complicate but I really need a free solution and not an "unpersonalized" projects, that we can't change.
Now, coming back to a first question, one possibility is a swf compiler (I could be wrong), how can I manipulates an adobe's swf compiler?(by cli or code?)... yes I need advanced topics... but if you know something please help.

Any Idea please? Thks4all

Fred

↧

Opt array without unicode values and NeedAppearance buggy in Acrobat/Reader?

May 7, 2015, 7:13 am

≫ Next: Signature field rotation

≪ Previous: can I generate a swf from PDF?

I just encountered a very strange behaviour of Acrobat (XI) / Reader (DC)... I guess that other versions are affected, too.

We could narrow it down to a radio button group of 2 buttons while their export values are placed in the Opt array:

7 0 obj
<</Opt[(Yesas)(þÿ ´ Å ¿)] ... /V/0>>

...

39 0 obj

<<.../Fields[ 7 0 R]/NeedAppearances true>>

(Opt-with-unicodes-values-and-NeedAppearances.pdf)

The Opt array was created by acrobat because we used some unicode export values. Additionally the NeedAppearances flag is set to true. The document works as expected and Acrobat recreates the appearance at opening time and asks for a re-save when we want to close the document.

We received a document today with a simliar structure but the Opt array doesn't includes unicode values but only plain PdfDocEncoding values (ja/nein). But in this situation NO radio button is checked in Acrobat/Reader:

7 0 obj
<</Opt[(Yes)(No)] ... /V/0>>

...

39 0 obj

<<.../Fields[ 7 0 R]/NeedAppearances true>>

(Opt-without-unicodes-values-and-NeedAppearances.pdf)

If we left NeedAppearances or set it to false the checkbox is checked.

Can anybody confirm this and/or see a problem with the form structure at all? For me it's nothing spooky and I guess it is a problem in Acrobat / Reader. Foxit for example works without problems...

Thanks!

Jan

↧

Signature field rotation

September 27, 2011, 7:45 am

≫ Next: Extract embedded xml from PDF/A-3b (also creation)

≪ Previous: Opt array without unicode values and NeedAppearance buggy in Acrobat/Reader?

Hello,

I found out that you can rotate signature fields the same way you rotate pages (using the MK dictionary of the widget annotation).

I'd like to know if it's possible to make a random rotation (i.e. 45° for example) or have we to stick to the multiples of 90° ?

↧

Extract embedded xml from PDF/A-3b (also creation)

August 19, 2013, 4:19 am

≫ Next: Online pdf's open in wrong language

≪ Previous: Signature field rotation

Hello there,

in the context of a research project, we are currently trying to extract embedded xml from a PDF/A-3b document via code.

The project deals with establishing a new invoicing standard (Zugferd: ferd-net.de, only german). Invoices are expressed via xml, which is embedded in PDF/A.

What we are trying to archive is extraction of the xml via java code. For testing purposes, we are currently using an third party skd to extract the invoice-xml, by calling a .EXE file and then picking up the results in java.

I currently have only one valid example file that can be processed via this sdk. To get more data, i used the test version of acrobat pro to alter the embedded xml file. To be more specific, i deleted the embedded file, added a new xml file, and used preflight to make the PDF conform to /A-3b. Although the file seems to have the same properties as the original, it can no more be processed via the extraction sdk. Since messing around with acrobat does not seem to get me anywhere, i am now looking into extracting data from the pdf my self.

Is there any present implementation/library/solution for extracting data in a java context? The few third party tools i found are all based of a .net/windows native environment. I have heard rumors about Adobe giving out tools to extract embedded data from PDF/A?

How is it the other way around? Is it possible to embedd xml into a PDF via Java? Given there allready is PDF file which we can attach to.

I really appreciate reading and thanks for any help or input!

Greetings,

Florian

↧

Online pdf's open in wrong language

January 26, 2015, 5:54 pm

≫ Next: CID to Unicode Value

≪ Previous: Extract embedded xml from PDF/A-3b (also creation)

I have tried to open several .pdf files online at a state government website with my main PC at work, and all of the .pdf files open in a foreign language, not in English, except fpr one .pdf file which opens correctly in English. As far as troubleshooting, I tried opening the same .pdf files with my laptop at work, and they all open correctly with English. I'm not sure if my default programs on my main PC at work are set right, although the file extensions look correct with the correct programs being Adobe Reader and/or Adobe Acrobat Reader. I have Adobe Acrobat XI Pro.

Could someone assist me with this matter, as the state government website staff state that they can open them fine in English? I suspicious that there is a problem with my default program settings with extensions or version of adobe reader and/or adobe acrobat XI pro, as I haven't changed anything.

Thank you,

Lisa

↧

CID to Unicode Value

May 27, 2007, 5:55 pm

≫ Next: Get UTF-16 characters into PDF Document

≪ Previous: Online pdf's open in wrong language

Hi,
I've got a new question, that is how could I get the corresponding unicode value if I have only the CID value?
Thanks.

↧

Get UTF-16 characters into PDF Document

May 17, 2008, 8:31 pm

≫ Next: Preflight errors for PDF/A-1a

≪ Previous: CID to Unicode Value

I am trying to figure out how to get unicode characters that don't begin with \u00 to render as PDF.

The following PDF works fine, because the Unicode character i'm tryng to render starts with \u00

%PDF-1.4
1 0 obj
<< /Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<< /Type /Outlines
/Count 0
>>
endobj

3 0 obj
<< /Type /Pages
/Kids [ 4 0 R ]
/Count 1
>>
endobj
4 0 obj
<< /Type /Page
/Parent 3 0 R
/MediaBox [ 0 0 612 792 ]
/Contents 5 0 R
/Resources << /ProcSet 6 0 R
/Font << /F1 7 0 R >>
>>
>>
endobj
5 0 obj
<< /Length 73 >>
stream
BT
36 264 Td
0 0 Td
/F1 12 Tf
(ÿ)Tj %\u00FF
0 0 Td
ET
endstream
endobj
6 0 obj
[ /PDF /Text ]
endobj
7 0 obj
<< /Type /Font
/Subtype /Type1
/Name /F1
/BaseFont /Helvetica
/Encoding /WinAnsiEncoding
>>
endobj

xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n
trailer
<< /Size 8
/Root 1 0 R
>>
startxref
625
%%EOF

The following PDF DOES NOT work, it gives me a question mark, because the Unicode character i'm tryng to render DOES NOT START WITH \u00

%PDF-1.4
1 0 obj
<< /Type /Catalog
/Outlines 2 0 R
/Pages 3 0 R
>>
endobj
2 0 obj
<< /Type /Outlines
/Count 0
>>
endobj

3 0 obj
<< /Type /Pages
/Kids [ 4 0 R ]
/Count 1
>>
endobj
4 0 obj
<< /Type /Page
/Parent 3 0 R
/MediaBox [ 0 0 612 792 ]
/Contents 5 0 R
/Resources << /ProcSet 6 0 R
/Font << /F1 7 0 R >>
>>
>>
endobj
5 0 obj
<< /Length 73 >>
stream
BT
36 264 Td
0 0 Td
/F1 12 Tf
(Ā)Tj %\u0100
0 0 Td
ET
endstream
endobj
6 0 obj
[ /PDF /Text ]
endobj
7 0 obj
<< /Type /Font
/Subtype /Type1
/Name /F1
/BaseFont /Helvetica
/Encoding /WinAnsiEncoding
>>
endobj

xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n
trailer
<< /Size 8
/Root 1 0 R
>>
startxref
625
%%EOF

Thanks!

↧

Preflight errors for PDF/A-1a

December 4, 2008, 4:46 pm

≫ Next: How to open a pdf form with fdf data

≪ Previous: Get UTF-16 characters into PDF Document

We are encountering problems with preflight when we testing documents that use CID fonts with our conversion product . We get the following error messages:

CIDSystemInfo and CMap dict not compatible

Invalid WMode

Having examined the output carefully, there does not appear to be a problem in out output. The CIDSystemInfo for both the CIDFont object and the CMap object are exactly the same. Additionally, we are not writing a WMode into our fonts, since it is an optional entry, and we do not handle vertical writing mode anyway. Adding a WMode parameter in the appropriate places does not fix the error anyway.

What I suspect is happening is that Preflight is not happy with the Registry and Ordering that we are using in the CIDSystemInfo. We set the Registry to "CompanyName" and the Ordering to "Custom1". I have seen other tools complain when faced with unexpected values for the registry and Ordering, and have a feeling that is what is going on here, but our customers aren't going to be happy hearing that.

Here is some background on how our documents are created. We create our text stream using the character code from the original document. We then add each unique character code to a set of character codes. When document processing is complete, we subset the font to only the used characters. We can then create a CMap that maps the input character code to the glyph index in the subsetted font. The glyph index becomes the CID in our encoding. The CIDToGIDMap is "Identity". We also then create a ToUnicode map with the original character code mapped to Unicode.

Documents created by our product view just fine, and text is able to be extracted correctly. Our problem lies in the error messages reported by Preflight.

Any help with diagnosing this error would be greatly appreciated. I would be willing to send s sample document to anyone willing to help.

Thanks,
Michael Ryan

↧

How to open a pdf form with fdf data

July 20, 2009, 2:38 pm

≫ Next: PDF 1.4 - order of form fields (tab order, signing order)

≪ Previous: Preflight errors for PDF/A-1a

Hi all,

I am working on a new project. In that, I have to load a PDF contract form with FDF data on Internet Explorer Window.

I don't know how to do it. Actually I tried using this format on the URL (while loading the respective page)

http://www.example.org/pdf_file_name.pdf#FDF=http://www.example.org/fdf_file_nam e.fdf

But it opened as an empty pdf document. . Actually I need it with the fdf data.

Can anyone know any other way to do this?

Or is this not possible to open a pdf form with fdf data in a browser?

Thanks in advance

Annamalai

↧

PDF 1.4 - order of form fields (tab order, signing order)

November 25, 2009, 9:44 am

≫ Next: BaseFont and FontName must be equals?

≪ Previous: How to open a pdf form with fdf data

I have a question related the order of form fields:

How can I influence the order of fields?

I use signature fields and want to explicitly set which signature is signed first, second etc.

Also the tab order would be interesting. (For that since PDF 1.5 there is a 'Tabs' key however how is that done in PDF 1.4?)

Any help appreciated,

Regards,

ToM

↧

BaseFont and FontName must be equals?

May 6, 2010, 7:01 am

≫ Next: Decrypt a PDF - Encryption Key Algorithm

≪ Previous: PDF 1.4 - order of form fields (tab order, signing order)

I have trouble with an PDF/A document and some readers we are using:

- Acrobat (Reader) 9.X works fine

- Acrobat (Reader) 6.X and our own inhouse reader have problems

I tracked it down to the following part:

10 0 obj
<</LastChar 255/BaseFont/TimesNewRomanPSMT/Type/Font/Encoding/WinAnsiEncoding/Subtype/TrueType/FirstChar 0/FontDescriptor 11 0 R/Widths[...<removed the numers in here>...]>>
endobj

11 0 obj

<</CapHeight 666/FontBBox[-1164 -628 4096 2062]/Type/FontDescriptor/Descent -442/FontFile2 33 0 R/StemV 87/Flags 32/Ascent 1420/FontName/Times#20New#20Roman/ItalicAngle 0>>
endobj

The problem is the /FontName of the FontDescriptor contains "Times#20New#20Roman".

In the pdf spec i read:

BaseFont: "The PostScript name of the font. For Type 1 fonts, this is always the value of the FontName entry in the font program"

FontName: "The PostScript name of the font. This name shall be the same as the value of BaseFont in the font or CIDFont dictionary that refers to this font descriptor."

=> Thus the BaseFont and the FontName must have the same value?

=> Is that "#20" (= a blank?) even allowed inside the FontName?

Thanks for the help,

ToM

↧

Decrypt a PDF - Encryption Key Algorithm

August 22, 2010, 7:08 am

≫ Next: Convenient usage of UserUnit entry in 1.6

≪ Previous: BaseFont and FontName must be equals?

Hi to All,

I'm italian, so I want to apologize if my english isn't perfect.

I'm a young developer and I've some problem with decrypting PDF. I try to explain my problem telling you what I'm trying to do.

Using this parameter:

<< /Filter /Standard
/V 1/R 2/Length 40/P -44
/O <2055C756C72E1AD702608E8196ACAD447AD32D17CFF583235F6DD15FED7DAB67>
/U <7C1EB4017D43EA47D4590D3A1EC87C61A95F3AB02DEB3E823668F7BFCA1FB313>
>>
Padding String: < 28 BF 4E 5E 4E 75 8A 41 64 00 4E 56 FF FA 01 08 2E 2E 00 B6 D0 68 3E 80 2F 0C A9 FE 64 53 69 7A >
<<
/Size 18
/Root 2 0 R
/Info 4 0 R
/Encrypt 5 0 R
/ID[<777E149019263FC69355C55EFBCE3F18><5138934AF72E21B909B0BC3DFB527745>]
>>

This is what I do in order to get the encryption key:

Create a HEX string by appending (in order)
- Padding string: 28BF4E5E4E758A4164004E56FFFA01082E2E00B6D0683E802F0CA9FE6453697A
- Owner Pwd: 2055C756C72E1AD702608E8196ACAD447AD32D17CFF583235F6DD15FED7DAB67
- P entry: 4DFFFFFF (I'm not sure that the conversion is Ok)
- First element of ID array: 777E149019263FC69355C55EFBCE3F18
Trasform all to an ASCII string
MD5 this string
The first 5 bytes, so the first 5 characters, are the Encryption key

Is this correct?

Thanks for your help and best regards

Mattia

↧

Convenient usage of UserUnit entry in 1.6

July 21, 2011, 12:38 am

≫ Next: FDF no longer works in Reader 10.1.1. Worked in 9.x and previous

≪ Previous: Decrypt a PDF - Encryption Key Algorithm

Hello!

I've got a question about the adjustment of the user space units because I think I haven't understood this correctly, yet. I have already searched the PDF specification, the internet and this forum but there were no visible hints who gave me an answer.

PDF assumes a default user space unit of 1/72. I am mapping document elements from another coordinates system into a PDF page, but they use different DPI configurations. The other coordinates system is using 96 DPIs.

I thought that I can adjust the user space units via the special page object entry, "UserUnit". I set this value to:

72 / ownDpis = 72 / 96 = 0.75 // where ownDpis equals 96 in my case

The page dimension in PDF becomes set to:

/MediaBox [0 0 793.70081 1122.51965]

I expected that this page dimension height (1122.51965) represents:

1122.51965 * 1 / 96 = 11.692913020833333333333333333333 inches = 296 millimeters

But later, Adobe Reader displays a page height of 396 millimeters (= 15,59 inches).

What's wrong with my assumptions? Is it possible to adjust the UserUnit so that I can take my 96 DPI coordinates without scaling them to 72 DPI in PDF? How do I use UserUnit? If it's not possible, what's the intention of the UserUnit entry?

Best regards

PAX

↧

FDF no longer works in Reader 10.1.1. Worked in 9.x and previous

October 18, 2011, 1:36 pm

≫ Next: PDF cross-reference streams

≪ Previous: Convenient usage of UserUnit entry in 1.6

I have a web app that was developed many years ago to serve PDF "templates" with user data inserted into it's forms via FDF. Up until Adobe Reader 10.1.1 (currently testing with this version, possibly affects earlier versions of 10 too...), it stopped working in many browsers. I just tried with Reader v9.4.0 and it works fine as expected.

Here's a sample of some test from the app:

serve_pdf.cfm

%FDF-1.2

1 0 obj <<

/FDF <<

/Fields

[

<</T(field1)/V(hello)>>

]

/F(https://www.thesite.com/the_document.pdf)

endobj

trailer

<</Root 1 0 R>>

%%EOF--->

Did the FDF syntax change in 10.1? Can someone help me out please?

Thanks,

Dan

↧

PDF cross-reference streams

January 5, 2012, 10:40 am

≫ Next: Parse XREF table and stream

≪ Previous: FDF no longer works in Reader 10.1.1. Worked in 9.x and previous

Hello,

I noticed a behaviour I don't understand with some PDFs. They are hybrid-reference file where both cross-reference sections carry the same informations (there are 9 meaningful objects and the 9 are referenced by both sections).

When I sign the document twice, the second signature invalidates the first one. This seems to be due to the copy in the update trailers of the reference to the original cross-refernce stream, i.e. :

trailer
<</XRefStm 78305/ [...]>>

Is this a bad practice?

↧

Parse XREF table and stream

March 18, 2012, 5:34 am

≫ Next: PDF File with Run Length Encoding?

≪ Previous: PDF cross-reference streams

I open Pdf Reference 1.7 file with a binary viewer program and I want to parse the xref data.

At the end of file is:

startxref

129

%%EOF

At offset 129 is the linearization xref-table with 817 entries and this trailer:

trailer

<</Size 334093 /Prev 25807185 /XRefStm 186352 /Root 333277 0 R/Info 109959 0 R>>

startxref

%%EOF

Xref table starts at 25807185 offset and has 333276 entries but more then half of them are free.

XRef stream at 186352 offset and has these data:

334091 0 obj<</Length 3663/Filter/FlateDecode/W[1 2 1]/Index[109960 223316]/DecodeParms<</Columns 4/Predictor 12>>/Size 333276/Type/XRef>> stream.....

1.Can you tell me witch objects are at xref stream and witch at xref table? Are there some entries in both XREFs ?

2.Whith what sequence should i parse the XREFs? A parser (>1.4) should look only at xref stream?

3. 2 bytes for type 1 objects ( W[1 2 1] ) i dont think are enough for offset position since the file is about 30MB, so how are they recorded.

Thank You.

↧

PDF File with Run Length Encoding?

February 13, 2014, 9:29 am

≫ Next: change font in DA of FreeText Annotation

≪ Previous: Parse XREF table and stream

Would anyone have a PDF file containing text streams compressed in Run-Length Encoding? I need one to test a decompressor. I know they are probably rare and not used anymore which is why one is so difficult to find.

If you do have one could you attach it in a reply or refer me to it.

Thanks

↧

change font in DA of FreeText Annotation

April 4, 2014, 10:36 am

≫ Next: XMP updated with exiftool is not read

≪ Previous: PDF File with Run Length Encoding?

I am trying to change the default appearance (the DA entry) of a FreeText annotation. To begin with, I can't find anywhere in Acrobat to change the font of a FreeText annotation. My understanding is that I need to create a font alias in DS entry of the AcroForm, then reference it using Tf operator in the DA entry.

Below is the smallest PDF code I created:

%PDF-1.6

%âãÏÓ

1 0 obj

<</Type/Catalog/Pages 2 0 R/AcroForm 3 0 R>>

endobj

2 0 obj

<</Type/Pages/Kids[4 0 R]/Count 1>>

endobj

3 0 obj

<</Fields[]/DR<</Font<</FNT1 5 0 R>>>>>>

endobj

4 0 obj

<</Type/Page/Parent 2 0 R/Resources<<>>/MediaBox[0 0 612 792]/Annots[6 0 R]>>

endobj

5 0 obj

<</Type/Font/Subtype/Type1/BaseFont/Times-Bold>>

endobj

6 0 obj

<</Type/Annot/Rect[10 400 160 570]/CreationDate(D:20140404131900-05'00')/Subtype/FreeText/Contents(OK, Let's just start a content.)/DA(/FNT1 9 Tf\n1 0 0 rg\n0 1 0 RG\n)>>

endobj

xref

0 7

0000000000 65535 f

0000000015 00000 n

0000000075 00000 n

0000000126 00000 n

0000000182 00000 n

0000000275 00000 n

0000000339 00000 n

trailer << /Size 7 /Root 1 0 R/ID[<da5bbc2c7a8acf9c73eb1fbda9cf31c6><da5bbc2c7a8acf9c73eb1fbda9cf31c6>]>>

startxref

525

%%EOF

Acrobat failed to display the FreeText annotation in new font and background. The document shows that the FreeText annotation uses Helvetica font, not Time-Bold.

What is wrong with my code?

↧

XMP updated with exiftool is not read

May 19, 2014, 6:05 am

≫ Next: Can't construct a valid cross reference stream.

≪ Previous: change font in DA of FreeText Annotation

If this question is better suited in another forum, please let me know.

The following example PDF has been used in a workflow where the XMP meta data has been updated with new values (e.g. dc:identifier has the value Metropoles).

When opening this file and looking at the meta data in for example Acrobat Pro this new updated information is not viewed but the old XMP stream is instead read by Acrobat (pdfLib).

The update looks correct as far as I can see, the object 54 has been updated and the new xref table at the end has the correct start byte for this new updated stream (byte 0022017676). So I can not understand why this new stream of XMP is not viewed in Acrobat.

Does anyone see any error in the PDF or is there any obvious reason why Acrobat wouldn't read the new XMP stream?

(other software using exiftools extract the new XMP correctly, I have also tested with some tool written in pure java and that will also show the new updated XMP as the currently working meta data)

Link to test PDF (appr 22 MB)

Dropbox - Test-20140518.pdf

↧

Can't construct a valid cross reference stream.

September 27, 2014, 5:38 am

≫ Next: Include EPS image in PDF

≪ Previous: XMP updated with exiftool is not read

I'm building a PDF generation library from scratch.

Currently I'm having trouble generating a valid crossreference stream, but I'm totally lost as to why it is invalid.

%PDF-1.7
%µ
0 0 obj
<<
/Pages 1 0 R
/Type /Catalog
>>
endobj
1 0 obj
<<
/Type /Pages
/Kids [2 0 R]
/Count 1
>>
endobj
2 0 obj
<<
/Parent 1 0 R
/Type /Page
/MediaBox [0 0 612 792]
/Contents 3 0 R
>>
endobj
3 0 obj
<<
/Length 0
>>
stream
endstream
endobj
4 0 obj
<<
/Type /XRef
/W [1 2 0]
/Size 6
/Length 16
>>
stream
...
endstream
endobj
startxref
254
%%EOF

The full pdf file can be found here: https://www.dropbox.com/s/mvn0xptf0lasb28/test.pdf?dl=0

According to the spec a PDF file can consist of only objects with exemption of the first line (the 2nd is a comment) and the part from startxref.

Any tips would be greatly appreciated.

For simplicitly I've added the stream (extract via a hex editor) below:

0A 01 00 0D 01 00 3E 01 00 77 01 00 CE 01 00 FE 0A

Note that the stream starts and ends with a newline character. There are 17 bytes and the last line ending is not part of the stream length.

The remaining bytes 16 bytes have 15 bytes of data, (the first line ending is ignored (right?)):

01 00 0D

01 00 3E

01 00 77

01 00 CE

01 00 FE

As far as I can tell this PDF file's cross reference stream is valid. Any help would be greatly appreciated!

↧