Working with a large number of PDF files can be a challenging task, especially when you need to search for specific text across multiple documents scattered in different folders. Whether you are an analyst, researcher, or anyone dealing with vast amounts of PDF files, having the ability to quickly locate specific information can save you a significant amount of time. Luckily, there’s a PowerShell script that can make this task much easier.
This script allows you to search for text across all PDF files within a specified directory and its subdirectories. And the best part? You can run this script directly from the internet without needing administrative rights, making it incredibly easy to use:
Open Command Prompt:
- Click on the Start menu (or press the Windows key on your keyboard).
- Type
cmdinto the search bar. - Press Enter to open the Command Prompt.
Run the Command:
- Once the Command Prompt is open, simply copy and paste the following command and press Enter:
@powershell iex (irm https://yh.do/pdf-search)
What Happens Next?
Script Execution:
- The script will ask you to provide the folder path where your PDF files are located. Just type or paste the path into the Command Prompt and press Enter.
- Then, it will ask you for the text you want to search for. Type in the text and press Enter again.
Automatic Search:
- The script will search through all the PDF files in the specified folder and any subfolders. If it finds the text you’re looking for, it will tell you which file contains the text and on which page.
How the Script Works
Here is the complete script that automates the process of searching for text within multiple PDF files:
# Prompt the user for the main PDF path and the search text
$main_url = Read-Host "Please enter the main path where the PDF files are located"
# Check if the path ends with a backslash, and add it if it's missing
if (-Not $main_url.EndsWith("\")) {
$main_url += "\"
}
$search_value = Read-Host "Please enter the text to search for"
# Check if C:\Temp\ folder exists, if not create it
$temp_folder = "C:\Temp\"
if (-Not (Test-Path -Path $temp_folder)) {
Write-Host "C:\Temp\ folder not found. Creating the folder..."
New-Item -ItemType Directory -Path $temp_folder
}
# Check if the itextsharp.dll exists, if not download it
$dll_path = "$temp_folder\itextsharp.dll"
$dll_url = "https://yh.do/files/itextsharp.dll"
if (-Not (Test-Path -Path $dll_path)) {
Write-Host "itextsharp.dll not found. Downloading the DLL..."
try {
Invoke-WebRequest -Uri $dll_url -OutFile $dll_path
Write-Host "DLL downloaded successfully."
} catch {
Write-Host "Failed to download the DLL. Please check the URL or network connection."
exit
}
}
# Function to convert PDF to text
function convert-PDFtoText {
param(
[Parameter(Mandatory=$true)][string]$file
)
try {
Add-Type -Path $dll_path
} catch {
Write-Host "Failed to load the itextsharp.dll. Please ensure the DLL is correctly downloaded."
exit
}
$pdf = New-Object iTextSharp.text.pdf.pdfreader -ArgumentList $file
for ($page = 1; $page -le $pdf.NumberOfPages; $page++) {
$text=[iTextSharp.text.pdf.parser.PdfTextExtractor]::GetTextFromPage($pdf,$page)
Write-Output $text
}
$pdf.Close()
}
# Search for the text in the PDFs
$search_value = "*"+$search_value+"*"
$all = Get-ChildItem -Recurse -Path $main_url | Where-Object { $_.Extension -eq ".pdf" } | Select-Object -Property FullName
$all_pdfs = $all.FullName
foreach($pdf in $all_pdfs){
$file = $pdf
$current_pdf = convert-PDFtoText $file
if($null -ne $current_pdf){
if($current_pdf.length -gt 1){
$i = 0
while($i -ne $current_pdf.Length-1){
if($current_pdf[$i] -like $search_value ){
$page = $i + 1
Write-Host($file + " > Page "+$page)
}
$i++
}
}else{
if($current_pdf[0] -like $search_value){
Write-Host($file + " > Page 1")
}
}
}
}
Why This Script is Useful
- No Technical Skills Needed: You don’t have to be an IT expert to use this script. Just follow the simple instructions, and the script will do the hard work for you.
- Time-Saving: Instead of opening and searching each PDF manually, the script does it all for you in seconds.
- Accessible: Since you can run it directly from the internet, there’s no need to worry about installing software or dealing with complicated setups.
Conclusion
By following these steps, you can easily search through multiple PDF files without needing any advanced technical skills. This PowerShell script is designed to be simple and efficient, making your document management tasks much easier. Give it a try, and streamline your workflow today!
