Step away from the PDF (why translators don’t use the world’s most popular file format)
Everybody uses it, any computer or operating system can handle it, and it’s easy to send around by email without stuffing your recipient’s inbox. We are talking about PDF, a file format that is so popular that it is very hard to avoid in today’s digital business world. So then why are translators not a big fan of the PDF format?
DTP specialists love PDF, because it allows them to compress high-quality files into a relatively small size. Sales people who need to send contracts or proposals are sure that PDF will present their documents to the recipient exactly as they are intended to be. And legal professionals use PDF as a secure platform to exchange sensitive information.
But translators? Not big PDF fans.
How to prepare a PDF document for translation
As a standard procedure, we import editable files like MS Excel, MS Word or idml (Adobe InDesign) into our translation software. The software allows us to insert translated text in the original layout in the exact same place as in the source text, without the need to copy-paste. This reduces post-translation DTP work to a minimum. A PDF however cannot be imported, because it’s a static, non-editable format.
However, there are workarounds that allow us to access the text in a PDF. But these methods always imply additional effort, time and money, depending on how the PDF was created.
This is how we can prepare PDF documents for translation:
- Copy/paste:If the source document has not been scanned and the document does not include (many) images or tables, then it’s relatively easy to select the text, and copy/paste it into an editable format. However, you might need to edit the document here and there to make sure the layout matches the original.
- Convert from the PDF editor: Depending on the PDF editor you are using, it is possible to convert some PDF files into .txt or .docx. A PDF created from an Adobe InDesign or Word document will be easier to convert into something editable than a scanned written letter for example. But also here, complex layouts, including images or tables, may mess up the output or even leave certain text parts unconverted. Therefore, a DTP specialist will have to put in extra work to finalize the layout.
- Optical Character Recognition (OCR): If you have scanned documents, then OCR software might be able to extract all the information from the image into easily editable text format. ‘Might’, because the quality of the conversion will depend on the quality and resolution of the scanned document.
PDF for translation: yes, but no
In sum, PDFs are not editable and, depending on how they were created, extracting the text from a PDF document will require some time and effort. If possible, we always ask our customers to send in the source file, along with the PDF. Have a look at this article to know which file format is best suited for your translator. If you don’t have the source file, then let your translator know how the PDF was created. It will help us find the best way to extract the content we need for translation.
Want to know if your PDF can be translated? Just ask: we will help you look for the most time- and cost-efficient solution.
- Step away from the PDF (why translators don’t use the world’s most popular file format) Posted by Yamagata Europe posted on 22 october
- Neural Machine Translation: what's under the hood (final) Posted by Yamagata Europe posted on 31 august
- Join us at the Tekom Belgium event in Ghent Posted by Yamagata Europe posted on 30 august