How to tell if PDF content is being indexed?

zazo

I've searched extensively for this, but could not find a definitive answer.

We recently updated our website and it contains links to about 30 PDF data sheets. I want to determine if the text from these PDFs is being archived by search engines.

When I do this search http://bit.ly/rRYJPe (google - site:www.gamma-sci.com and filetype:pdf) I can see that the PDF urls are getting indexed, but does that mean that their content is getting indexed?

I have read in other posts/places that if you can copy text from a PDF and paste it that means Google can index the content. When I try this with PDFs from our site I cannot copy text, but I was told that these PDFs were all created from Word docs, so they should be indexable, correct?

Since WordPress has you upload PDFs like they are an image could this be causing the problem?

Would it make sense to take the time and extract all of the PDF content to html?

Thanks for any assistance, this has been driving me crazy.

zazo

Kyle,

Thanks for the quick response. The data is being displayed in the title and meta description field. I also did some searches for specific terms with my parameter search from our site and filetype:pdf, which shows that the content is being indexed. It also shows that the PDF titles and meta descriptions are not optimized, so I have some work there.

Thanks,

Anthony

kchandler

Is the data being displayed in the title and meta description in the SERP content from the PDF?

If so, then yes, they are being indexed/crawled.

Regards,

Kyle

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Moz Q&A is closed.

How to tell if PDF content is being indexed?

Browse Questions

Explore more categories

Related Questions

Page Indexing without content

URLs dropping from index (Crawled, currently not indexed)

Google is indexing bad URLS

Duplicate content issue: staging urls has been indexed and need to know how to remove it from the serps

How to determine which pages are not indexed

De-indexed from Google

Root vs. Index.html

Dynamically-generated .PDF files, instead of normal pages, indexed by and ranking in Google

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved