Quantcast
Channel: Adobe Community : Popular Discussions - PDF Language and Specifications
Viewing all articles
Browse latest Browse all 46145

Extract embedded xml from PDF/A-3b (also creation)

$
0
0

Hello there,

 

in the context of a research project, we are currently trying to extract embedded xml from a PDF/A-3b document via code.

The project deals with establishing a new invoicing standard (Zugferd: ferd-net.de, only german). Invoices are expressed via xml, which is embedded in PDF/A.

What we are trying to archive is extraction of the xml via java code. For testing purposes, we are currently using an third party skd to extract the invoice-xml, by calling a .EXE file and then picking up the results in java.

 

I currently have only one valid example file that can be processed via this sdk. To get more data, i used the test version of acrobat pro to alter the embedded xml file. To be more specific, i deleted the embedded file, added a new xml file, and used preflight to make the PDF conform to /A-3b. Although the file seems to have the same properties as the original, it can no more be processed via the extraction sdk. Since messing around with acrobat does not seem to get me anywhere, i am now looking into extracting data from the pdf my self.

 

Is there any present implementation/library/solution for extracting data in a java context? The few third party tools i found are all based of a .net/windows native environment. I have heard rumors about Adobe giving out tools to extract embedded data from PDF/A?

How is it the other way around? Is it possible to embedd xml into a PDF via Java? Given there allready is PDF file which we can attach to.

 

I really appreciate reading and thanks for any help or input!

Greetings,

Florian


Viewing all articles
Browse latest Browse all 46145

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>