5/30/2023 0 Comments Pdfinfo cygwin![]() ![]() * string $key - key name, case insensitive use strtolower to make case insensitive loop each line and split into key and value * this will put all pdfinfo output into keyed array, then make them accessible via getValueĮxec(self::PDFINFO_CMD. * Wrapper for pdfinfo program, part of xpdf bundle I created a wrapper class for pdfinfo in case it's useful to anyone, based on Richard's /** Security Notice: Use escapeshellarg on $document if document name is being fed from user input or file uploads. That's why I made this question and answered it myself. I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question). Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP. Surround with double quotes if file name has spaces $cmd = "C:\\path\\to\\pdfinfo.exe" // Windows ![]() There is an easy way of extracting the pagecount from the output, here in PHP: // Make a function for convenience It is also really fast, even with big documents of 200 MB the response time is a just a few seconds or less. I haven't seen a PDF document where it returned a false pagecount (yet). ![]() Producer: Acrobat Distiller 9.2.0 (Windows) An example of data returned by running it on a PDF document: Title: test1.pdf One of those files is pdfinfo (or pdfinfo.exe for Windows). You download a compressed file containing several little PDF-related programs. It is downloadable for Linux and Windows. So, what does work reliable and accurate?Ī simple command line executable called: pdfinfo. ![]() /\/N\s (\d )/ (looks for /N ) doesn't work either, as the documents can contain multiple values of /N most, if not all, not containing the pagecount./\/Page\W*(\d )/ (looks for /Page) doesn't get the number of pages, mostly contains some other data./\/Count\s (\d )/ (looks for /Count ) doesn't work because only a few documents have the parameter /Count inside, so most of the time it doesn't return anything.If(preg_match_all($regex, $content, $matches)) Regular Expressions found by Googling (all linked to SO answers): $content = fread ($stream, filesize($f)) This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar. Opening a stream and search with a regular expression: It then returns an error:įPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI. Using FPDI (a PHP library)įPDI is easy to use and install (just extract files and call a PHP script), BUT many of the compression techniques are not supported by FPDI. That was with both the getNumberImages() and identifyImage() methods. Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned 1 page in every document (haven't seen a working copy of Imagick so far), so I threw it away. Here are some of the answers I found insufficient or simply NOT working: Using Imagick (a PHP extension) PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. Many hours have I searched for a fast and easy, but mostly accurate, way to get the number of pages in a PDF document. The solution is the accepted answer below. This question is for referencing and comparing. ![]()
0 Comments
Leave a Reply. |