Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Google Street View Might Violate Canadian Privacy Law | Main | Superpages Acquires 'LocalSearch.com' Domain »

Sep. 12, 2007 at 9:59am Eastern by Galen DeYoung

Eleven Tips For Optimizing PDFs For Search Engines

Strictly Business - A Column From Search Engine Land The SEO purist may argue why anyone would ever want to use PDF content on a website for search purposes. The reality, however, is that many businesses have a lot of PDF assets. These may include sell sheets, brochures, white papers, technical briefs, etc. The purist simply says why not convert these to html? In the real world, not everyone has the time, budget, and expertise to do that. There may also be other “marketing” reasons. Perhaps a company wants its prospects to experience the content along with all the other brand elements inherent in its print materials. Whatever the reason, there are lots of PDFs available on the web, and you can optimize PDFs to get high-ranking search results. Here are some tips on the right way to do it.

1. Make sure your PDFs are text based. Okay, this first one is pretty obvious. However, we still find companies whose materials were designed in an image-based program. When the PDF is made using these programs, the PDF is an image; there is no text for the search engines to read.

2. Complete the document properties. It seems like the vast majority of PDFs are without specified document properties, the most important of which is the Title. The Title property, if present, almost invariably represents the words that will be displayed as the heading of the search result. It’s the equivalent of the html title tag. If you don’t complete the Title property, the search engine is going to generate a title from the PDF’s content, and it may not be what you would choose. We’ve all seen some pretty goofy looking titles to search results associated with PDFs. Not only do they look ridiculous, but they probably won’t get clicked. In the full version of Acrobat, go to File>Document Properties to specify the Title.

There are other document properties (meta data) you can supply, including Author, Subject, and Keywords, but presently these appear to have little search-related affect. It would be nice if Subject acted as the meta description to be displayed under the heading of the search result, but I haven’t seen this to be true. For now, however, I’d complete the Subject property as if it were a meta description. Perhaps in the future search engines will treat it as such.

3. Optimize the copy. Copy in text-based PDFs is no different than web-page copy. Optimize it.

4. Build links into PDFs. Make sure you include links in your PDFs, and pay attention to the anchor text used. Search engines do recognize these links. Not very often, but sometimes you’ll find backlinks in PDFs. Their limited occurrence, however, is likely related to the fact that most people don’t put links into PDFs; most people treat PDFs as static print documents. In addition to including links in PDFs for search-related purposes, there’s also a good business reason. Often, PDFs are passed along to others via email. Accordingly, a reader may be viewing the PDF in isolation (i.e., not associated with your website.) By placing links into PDFs, you give these readers an easy way to click back into your site, where you can further influence them.

5. Pay attention to the version. While search engines do “read” and index PDFs, search engines’ capabilities tend to lag new versions of Acrobat. Although Acrobat 8 is out, for now you should save your PDFs as version 1.6 (Acrobat 7) or lower to ensure search engines can index the content.

Not only is saving PDFs at a lower version good for the search engines, it’s also good for users. Not everyone has the latest versions of Acrobat Reader. Accordingly, I’d recommend saving PDFs as version 1.5 or lower. This way it will be good for search engines and most readers.

6. Optimize the file size for search. Don’t post a huge PDF for download. Not only is this annoying and unnecessary for site visitors, it’s also burdensome for the search engines. If it’s too big, the search engines may abandon the PDF before even getting access to its content. Using the full version of Acrobat, select Advanced>PDF Optimizer to “right-size” the document.

You may also want to enable the "Optimize for Fast Web View" option in the Preferences>General Settings panel. This allows the PDF to be “loaded” a page at a time, rather than waiting for the whole PDF to download.

7. Pay attention to placement. If you bury links to PDFs deep within your site’s file structure, they’re less likely to get indexed. If you want to use PDFs for high-ranking search results, links to those PDFs should be on web pages closer to the root level of the site’s file structure.

8. Influence meta descriptions for PDFs. For web pages, the meta description is what is displayed under the title in a search result. With PDFs, the search engines search the copy of the PDF and select something to display. While with PDFs you have less control of what is displayed as the description to the search result, you can still influence this. The best way to do this is to make sure that you have a good, optimized sentence or two near the start of your PDF. If these sentences correspond to the search term used, it’s likely that these sentences are the ones that will be displayed as the description under the search result’s heading.

9. Specify the reading order. As noted above, search engines search the copy of the PDF and select something to display as a description under the search result’s heading. Depending on how the reading order of your PDF is specified, this may lead the search engine to select some pretty strange stuff to display.

In a previous column, Organic Landing Page: A Case Study, I noted a search result for “transit seating.” That search result is noted below:

Admittedly, this is not a very enticing description, and it’s not likely to get clicked even if it ranks highly in the search results. Why did Google select this text to display? Because it’s the first thing Google read in the PDF.

Every PDF has a reading order. Similar to properly optimized web pages, you want to make sure that valuable content is read first. How do you know the reading order? With the PDF open and while using the full version of Acrobat, select Advanced>Accessibility>Add Tags to Document. Then select Advanced>Accessibility>Touch Up Reading Order. Then the reading order of the PDF will be displayed.

You can see in the image above that the reading order of the transit seating PDF does not start with valuable content. Rather, many extraneous items are “read” before the valuable content. That’s why Google displayed what it did in the search result. If you want PDFs to be optimized for search, make sure you understand the reading order of the PDF and use the Touch Up Reading Order tool to manage what the search engine will read first.

10. Tag your PDFs You can also add tags to your PDFs, similar to html tags. Again, with the PDF open and while using the full version of Acrobat, select Advanced>Accessibility>Add Tags to Document. Acrobat will give you a document report and recommend things you may want to consider changing. You’ll have the ability to tag headings, alternate text for images, etc.

11. Pay attention. Every time you open a PDF, make even a small change, and save it once again, major unseen things may change. The reading order may change automatically. You may inadvertently save it as a higher version. It may get saved using the default size setting instead of a properly optimized size. If you’re going to further optimize existing PDFs, may sure you check all of these things before posting a new version of the PDF.

Galen De Young is Managing Director of Francis SEO, a firm specializing in B2B search engine optimization, and Francis Marketing, one of the leading marketing consulting firms specializing in repositioning B2B companies and their brands. You can reach Galen at gdeyoung@francis-seo.com. The Strictly Business column appears Wednesdays at Search Engine Land.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Galen DeYoung Permalink Jump To Comments See Related Stories In: Strictly Business



Reader Comments

Do you have to use Acrobat to create the PDF? Or can I use Photoshop and save it as a PDF? Does the text in a Photoshop PDF count as text or as an image?

Comment by crimsongirl [TypeKey Profile Page] | September 12, 2007 2:50 PM

Great advise.
Learned some new tips here!

Thanks,
Rick Vidallon

Funny thing,...I just optimized my PDF-Files. Unfortunately I still have a question. I posted it on DigitalPoint Forums: http://forums.digitalpoint.com/showthread.php?t=471185 It has to do with cacheing and the HTML-Version of my documents. Maybe you can help.

Best regards - Jab

Great piece, Galen. Although they are generally well-skilled in Adobe Acrobat, technical writers and technical marketing writers often deliver PDF documents to clients without giving any thought to properly tagging them for search.

Thank you for an excellent PDF SEO tutorial. Many of these tips, like source-ordering, are invaluable.

On a different note, do you think many people still use Acrobat reader? Perhaps in the corporate world they do. I believe that Internet-savvy people are more likely to use FoxitReader, though - it is far quicker, and speed is what you need here. Editing-wise, for me Acrobat Full brings a new meaning to the word "agony". Alternative editing apps might be usefully investigated.

Crimsongirl:

You can use other programs to generate pdfs, and ideally, they should be text-based programs, such as Word, Quark, etc. Image-based programs generally generate image-based pdfs. A quick way to test things is to try to select the text in your pdfs. If you can select the text (not as a box, but as individual words and letters), then you're likely okay. However, as I noted in the article, just having it text-based won't help if you don't consider the other matters as well.

Jab:

While Google doesn't appear to provide a cache date for pdfs in "view as html" mode, it appears these are cached images. Google's language: "Google automatically generates html versions of documents as we crawl the web." I'm not aware of any option to stop that, short of blocking search engines from the pdf content, which would result in the pdf content not being indexed at all.

Thanks for the info. Many of our business clients have a lot of PDF content and helping them optimize it has been challenging. We'll certainly pass along these tips.

Search:

Search Marketing Expo

Save the date for:
SMX China (Nanjing) - Sept. 23-24
SMX Stockholm - Sept. 23-24: See who's speaking or register now.
SMX East (New York City) - Oct. 6-8: See the agenda or register today and save!
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll