Puteţi vizualiza întregul cod aici: https://pastebin.com/XgNqJqS7
Un exemplu de cod al paginilor html care vor fi modifcate cu codul PowerShell.Copiaţi textul de mai jops într-un fişier .html, salvaţi în locaţia C:\Folder1
<!DOCTYPE html> <html xmlns="https://www.w3.org/1999/xhtml" dir="ltr" lang="ro"> <head> <title>YOUR FIRST PAGE</title> <link rel="canonical" href="https://MY-WEBSITE.COM" /> <meta name="description" content="I LOVE HTML and CSS"/> <meta name="keywords" content="abordarea frontala a lucrurilor neelucidate"/> <meta name="abstract" content="My laptop works just fine"/> <meta name="Subject" content="I think I need a new car."/> <meta property="og:url" content="https://otherwebsite.com"/> <meta property="og:title" content="Nobody is here?" /> <meta property="og:description" content="Dance is my passion."/> <!-- Schema Org Start --> <script type="application/ld+json"> { "@context":"https://schema.org", "@type":"Article", "mainEntityOfPage": { "@type": "WebPage", "@id": "https://books-and-reading.com" }, "headline": "Another glass", "keywords": "anything, words", "description": "My name is Prince.", "image": { "@type": "ImageObject", "url": "https://website.com/icon-facebook.jpg" } } </script>
Codul PowerShell de mai jos va copia conţinutul tag-urilor html, în celelalte tag-uri, prin parsing data. Trebuie să aveţi completate doar tag-urile <title> si <meta name="description"... />
$sourcedir = "C:\Folder1\" $resultsdir = "C:\Folder1\" Get-ChildItem -Path $sourcedir -Filter *.html | ForEach-Object { $content = Get-Content -Path $_.FullName -Raw # Copy the content of the tag <link rel="canonical" in the tag "OG:URL" and in the tag "@ID": # $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<link rel="canonical" href=").*(")').Matches.Value $content = $content -replace '(?<=<meta property="og:url" content=").*(")',$replaceValue $content = $content -replace '(?<="@id": ").*(")',$replaceValue # Copy the content of the tag <title> in the tags ABSTRACT, SUBJECT, OG:TITLE, HEADLINE, KEYWORDS # $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<title>).+(?=</title>)').Matches.Value $content = $content -replace '(?<=<meta property="og:title" content=").+(?=")',$replaceValue $content = $content -replace '(?<=<meta name="abstract" content=").+(?=")',$replaceValue $content = $content -replace '(?<=<meta name="keywords" content=").+(?=")',$replaceValue $content = $content -replace '(?<=<meta name="Subject" content=").+(?=")',$replaceValue $content = $content -replace '(?<="headline": ").+(?=")',$replaceValue $content = $content -replace '(?<="keywords": ").+(?=")',$replaceValue # Copy the content of the tag <meta name="description" in the tags "OG:DESCRIPTION" and in the tag "description": " # $replaceValue = (Select-String -InputObject $content -Pattern '(?<=<meta name="description" content=").+(?=")').Matches.Value $content = $content -replace '(?<=<meta property="og:description" content=").+(?=")',$replaceValue $content = $content -replace '(?<="description": ").+(?=")',$replaceValue Set-Content -Path $resultsdir\$($_.name) $content }
Opţional. Iată o expresie REGEX care va modifica tag-ul "KEYWORDS" din pagina html, adăugând virgulă după fiecare cuvânt.
Use with Notepad++ -> Ctr+F -> Check: Regular Expression
SEARCH: (?s)<title>.*?<\/title>.*?<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+) REPLACE BY: ?1\l\1:,\x20\l\2
That's all folks.
If you like my code, please SHARE IT
Puteţi vizualiza şi versiunea de cod în Python
Uitati-va si la acest cod de traducere: BeautifulSoup Library sau la acest cod DEEPL+API Key. Also, there is a VERSION 2 of this code or VERSION 3 or VERSION 4 or VERSION 5