Hi there,
while I'm able to successfully read a PDF file, the only way I can do it (based
on what I understood from PDF 1.7 specs) is to go to the end of the stream and
start from there (section 7.5.5).
But that requires the stream size to be known. While that is usually not a
problem (since the stream is usually a simple file), I would like to make my
library a little bit more generic, being able to handle any kind of stream, as
long as it is positioned at the start of a PDF document.
But unless I have missed something from the specs, when I'm dealing with a
stream of unknown size (one which might not even be fully loaded when I started
reading it, like a socket), there is not way to do it. I mean, I could do it,
but I would not be able to tell when I've reached the end of the document. Not
without some kind of length information (since I don't know where the stream
ends). I mean, where would I stop? At a trailer? That would only work if the
document have never been updated.
If right after the header (%PDF-) there was some kind of meta information about
the size of the document (after all updates, if any) I could move the stream to
that position (wait for the data to arrive, if needed) and work my way from
there (and to there, since I would know where the document ends). But from my
understanding, there is no way to retrieve this kind of information.
So, summing up, in order to successfully read a PDF document:
- I must always know the stream size (which I might not be able to)
- the stream must end with the PDF's data (which might also not always be true)
I am correct?