Project Description

📖 < 1 min read
Cloud Search now supports Optical Character Recognition (OCR) based text extraction for PDFs that contain images, such as:
  • Physical contract documents
  • Engineering documents that contain annotations or labels
  • Physical customer invoices, and more
This makes PDFs with images containing text, such as scanned documents, easily searchable by users and improving discoverability of such PDFs.

Who’s impacted

Admins and end users

Why it’s important

Many critical business documents are either in physical form or as scanned versions of those physical documents. With OCR support, admins can now easily index these documents for Cloud Search, making it easier for users to quickly find relevant scanned documents.
In addition, this feature eliminates the need to extract the text offline from PDFs containing images before indexing these documents on Cloud Search.

Getting started

  • Admins: The feature is ON by default. Use this guide to learn more about how to use enhanced search for PDFs containing images Important Note: PDFs must be submitted using the Asynchronous Indexing mode and must contain only images.
  • End Users: No user action is required

Rollout pace

Availability

  • Available to Google Workspace Enterprise Plus and Google Cloud Search customers
  • Not available to Google Workspace Essentials, Business Starter, Business Standard, Business Plus, Enterprise Essentials, Enterprise Standard, Education Fundamentals, Education Plus, Frontline, and Nonprofits, as well as Google Workspace Basic and Business customers
Thanks for sharing and spreading the word!