How to find out why is text not searchable in a PDF (and make it searchable)
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------
Music by Eric Matyas
https://www.soundimage.org
Track title: Lost Jungle Looping
--
Chapters
00:00 How To Find Out Why Is Text Not Searchable In A Pdf (And Make It Searchable)
01:13 Accepted Answer Score 7
01:54 Answer 2 Score 2
02:17 Answer 3 Score 0
03:14 Thank you
--
Full question
https://superuser.com/questions/561589/h...
--
Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...
--
Tags
#pdf #search
#avk47
ACCEPTED ANSWER
Score 7
It may have a custom font encoding that assigns code points to characters in a way that is incompatible with established encodings such as ASCII or UTF-8/Unicode.
It may render characters individually out of sequence
It may have had characters flattened to paths
See Stack Overflow questions How do you debug PDF files? and the now deleted PDF Font encoding — why can't I copy text from a PDF?
To make it text searchable, the best way may be to go back to the original source (e.g. a Word document) and use a different process to produce the PDF. Alternatively you could try rendering your current PDF as a bitmap and then using OCR, but this will be tedious and produce poor results.
ANSWER 2
Score 2
I found a way around this problem. I did tools -> edit document text, then for each page, I hit Control-A (select all), then right-clicked and went to properties, and changed the font to something else. After I did this, the text was searchable and I could copy the text!
ANSWER 3
Score 0
this might be old but characters encoding issues in compound path pdf are still an issue today I solved by
- open unsearchable text file with illustrator
- Save a Copy as pdf with preset Smallest File Size
- then open file with acrobat
- Scan & OCR > Recognize Text with your settings
- now search
⌘ + f
should work
Test source
- compound path unsearchable text file if try copy paste text from this pdf you will get crap
Environment
- sw_vers
macOS 14.4.1 (23E224) x86_64
- Adobe Illustrator
24.0.2
- Adobe Acrobat Pro DC
Continuous Release | Version 2021.007.20091