Long Beach Design Company
Home Portfolio Services Product Reviews Tech Tips About Long Beach Design Company Contact Long Beach Design Company

Optomizing PDF documents for Search Engines

November 1, 2006

Okay, time for a hopefully helpful summary of the first batch of tests. An important note: This test invovled only PDF's that were actually created in Acrobat Version 6. Meaning there were text fields created in Acrobat itself. The next batch of tests will be conducted against PDF files that started out as Word/.doc files that were then converted to pdf format.

General findings first.
1. Setting a password of any type looks to be a very bad thing to do if you want anything at all to be indexed. Most of the password protected files either never showed up in any search engine index, or were picked up showing no information other than the URL address. All but one of my password protected test files have been dropped from the index at this point, save one that is still in Google's index.

When you password protect a pdf file not even the Meta information will be picked up.

This same treatment has been seen with user passwords, master passwords or both.

2. Other than Password Protection, none of the other security settings one may choose appears to make any difference. Nor does the encryption level. I tested both Acrobat's standard 40 and 128 encryption, with all sorts of different restrictions. Ranging from nothing being allowed to only printing and/or copying being disallowed. These individual settings appear to make no difference. Just whether the file is password protected.

3. MSN is having issues indexing anything in these Acrobat-created pdfs. They're including the files in their index, but the listings only show the URL address. No meta information is picked up or searchable. Nor is any text field, whether the text field is set to be Read Only or Editable.

Basically, MSN is spidering and picking up the files. They know they're there. But are seemingly unable to do anything with them. Bad news there. We'll have to see if they pick up the Word created versions. My hunch is they will.

4. a. Google always uses the Meta Title value as the snippet title, if you provide one. Much like the <title> tag in html.

b. Google also is able to index the contents of Acrobat text fields. It doesn't matter if they're set to be Read Only or may be edited. Both are searchable.

c. Google ignores all other Meta fields where search is concerned. Only the Meta Title is used. The Meta Keywords are not indexed, which I mention specifically because of something you'll see below.

5. a. Yahoo is very similar in indexing Acrobat-created text fields. Both Read Only and Editable fields are indexed, appear in the SERP snippet and may be searched.

b. Yahoo apparently doesn't do anything with the Meta Title. You can't search for content in the Meta Title, nor does it appear as the title in the SERP snippet. Surprising one that!

Instead, Yahoo! is making a line of Editable text the SERP Title by default. Skipping completely over the Meta information and even a Read Only text field that appeared first in the test pdf documents. Strange reaction indeed!

c. The only Meta information Yahoo! appears to pay attention to is the Meta Keywords in the test pdf's. They don't show up in the snippet or anything like that, but you can find the document by what's in the Meta Keywords field.

As a summary of the first test findings for Acrobat-created pdfs:

  • Don't use passwords if you want a pdf file to be indexable and searchable. This applies across all of the big three engines.
  • Don't worry much about any other security settings you may choose. They apppear to have no effect one way or the other.
  • Text fields are picked up by both Google and Yahoo, and the content is searchable.
  • MSN has some serious issues with Acrobat-created pdfs. They've been unable to index anything in the test files.
  • Google will pick up the Meta Title field and use its content as the SERP Title if one exists. This is the only Meta field Google seems to index.
  • Yahoo is a strange bird! It's using an Editable Text field as the SERP Title, to the exclusion of everything else. They don't appear to index the Meta Title information at all. They do however index the Meta Keywords field, and this content is searchable.
  • In the test files that had anything indexed Google picked up the Meta Title and all forms of text fields. Yahoo picked up the Meta Keywords and all forms of text fields.
Copyright © 2006 - Long Beach Design Company - All Rights Reserved