如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.
ro  fr  en  es  pt  ar  zh  hi  de  ru
ART 2.0 ART 3.0 ART 4.0 ART 5.0 ART 6.0 Pinterest

如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)

On Iunie 16, 2021, in Leadership and Attitude, by Neculai Fantanaru

您可以在此处查看完整代码:HTTPS://帕萨特斌.com/我num5q公5

将使用Python代码修改的HTML页面的代码示例。 将上述文本复制到.html文件,将其保存到位置C:\ folder1

   

 xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="ro">

如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)|  Neculai Fantanaru.
 rel="canonical" href="https://MY-WEBSITE.COM" />
 name="description" content="I LOVE HTML and CSS"/>

 name="keywords" content="abordarea frontala a lucrurilor neelucidate"/>
 name="abstract" content="My laptop works just fine"/>
 name="Subject" content="I think I need a new car."/>
 property="og:url" content="https://otherwebsite.com"/>
 property="og:title" content="Nobody is here?" />
 property="og:description" content="Dance is my passion."/>





下面的PowerShell代码将通过解析数据将HTML标记的内容复制到其他标记。 您只需要填写标签</span>四<span class="tabela_shop_donate_5"><meta name =“描述”...... /></span></p> <!-- HTML generated using hilite.me --> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .4em;padding:.2em .6em;"> <pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">requests</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">re</span> <span style="color: #888888"># Path to english folder 1</span> english_folder1 <span style="color: #333333">=</span> <span style="background-color: #fff0f0">r"c:\Folder1"</span> <span style="color: #888888"># Path to english folder 2</span> english_folder2 <span style="color: #333333">=</span> <span style="background-color: #fff0f0">r"c:\Folder1"</span> extension_file <span style="color: #333333">=</span> <span style="background-color: #fff0f0">".html"</span> use_parse_folder <span style="color: #333333">=</span> <span style="color: #008800; font-weight: bold">True</span> <span style="color: #888888">#Face folder nou daca pui True, iar daca pui False redenumeste fisierele in acelasi folder</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">os</span> en1_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder1) en2_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder2) <span style="color: #007020">print</span>(<span style="background-color: #fff0f0">'Going through english folder'</span>) <span style="color: #008800; font-weight: bold">for</span> file <span style="color: #000000; font-weight: bold">in</span> os<span style="color: #333333">.</span>listdir(en1_directory): filename <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsdecode(file) <span style="color: #007020">print</span>(filename) <span style="color: #008800; font-weight: bold">if</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'y_key_e479323ce281e459.html'</span> <span style="color: #000000; font-weight: bold">or</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'TS_4fg4_tr78.html'</span>: <span style="color: #008800; font-weight: bold">continue</span> <span style="color: #008800; font-weight: bold">if</span> filename<span style="color: #333333">.</span>endswith(extension_file): <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder1, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> html: html <span style="color: #333333">=</span> html<span style="color: #333333">.</span>read() <span style="color: #008800; font-weight: bold">try</span>: <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder2, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> en_html: en_html <span style="color: #333333">=</span> en_html<span style="color: #333333">.</span>read() <span style="color: #008800; font-weight: bold">if</span> <span style="color: #008800; font-weight: bold">False</span>: <span style="color: #888888"># if True: will Parse also the content that starts from <!-- ARTICOL START --> to <!-- ARTICOL FINAL --> and so on</span> <span style="color: #008800; font-weight: bold">try</span>: comment_body <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<!-- ARTICOL START -->.+<!-- ARTICOL FINAL -->'</span>, html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL)[<span style="color: #0000DD; font-weight: bold">0</span>] en_html <span style="color: #333333">=</span> re<span style="color: #333333">.</span>sub(<span style="background-color: #fff0f0">'<!-- ARTICOL START -->.+<!-- ARTICOL FINAL -->'</span>, comment_body, en_html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL) <span style="color: #008800; font-weight: bold">except</span>: <span style="color: #008800; font-weight: bold">pass</span> <span style="color: #008800; font-weight: bold">try</span>: comment_body2 <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<!-- FLAGS_1 -->.+<!-- FLAGS -->'</span>, html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL)[<span style="color: #0000DD; font-weight: bold">0</span>] en_html <span style="color: #333333">=</span> re<span style="color: #333333">.</span>sub(<span style="background-color: #fff0f0">'<!-- FLAGS_1 -->.+<!-- FLAGS -->'</span>, comment_body2, en_html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL) <span style="color: #008800; font-weight: bold">except</span>: <span style="color: #008800; font-weight: bold">pass</span> <span style="color: #008800; font-weight: bold">try</span>: comment_body3 <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<!-- MENIU BARA SUS -->.+<!-- SFARSIT MENIU BARA SUS -->'</span>, html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL)[<span style="color: #0000DD; font-weight: bold">0</span>] en_html <span style="color: #333333">=</span> re<span style="color: #333333">.</span>sub(<span style="background-color: #fff0f0">'<!-- MENIU BARA SUS -->.+<!-- SFARSIT MENIU BARA SUS -->'</span>, comment_body3, en_html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL) <span style="color: #008800; font-weight: bold">except</span>: <span style="color: #008800; font-weight: bold">pass</span> <span style="color: #888888"># title to meta</span> <span style="color: #008800; font-weight: bold">try</span>: title <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<title>如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.', html)[0] title_content = re.search('>(.+)<', title)[1] except: pass try: meta_og_title = re.search('', en_html)[0] new_meta_og_title = re.sub(r'content=".+"', f'content="{title_content}"', meta_og_title) en_html = en_html.replace(meta_og_title, new_meta_og_title) except: pass try: meta_keywords = re.search('', en_html)[0] new_meta_keywords = re.sub(r'content=".+"', f'content="{title_content}"', meta_keywords) en_html = en_html.replace(meta_keywords, new_meta_keywords) except: pass try: meta_abstract = re.search('', en_html)[0] new_meta_abstract = re.sub(r'content=".+"', f'content="{title_content}"', meta_abstract) en_html = en_html.replace(meta_abstract, new_meta_abstract) except: pass try: meta_Subject = re.search('', en_html)[0] new_meta_Subject = re.sub(r'content=".+"', f'content="{title_content}"', meta_Subject) en_html = en_html.replace(meta_Subject, new_meta_Subject) except: pass try: headline = re.search('"headline":.+', en_html)[0] new_headline = re.sub(r':.+', f': "{title_content}",', headline) en_html = en_html.replace(headline, new_headline) except: pass try: keywords = re.search('"keywords": "如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.", new_keywords = re.sub(r':.+', f': "{title_content}",', keywords) en_html = en_html.replace(keywords, new_keywords) except: pass # canonical to meta og:url and @id try: canonical_content = re.search('', html)[1] except: pass try: og_url = re.search('', en_html)[0] new_og_url = re.sub(r'content=".+"', f'content="{canonical_content}"', og_url) en_html = en_html.replace(og_url, new_og_url) except: pass try: id = re.search('"@id":.+', en_html)[0] new_id = re.sub(r':.+', f': "{canonical_content}"', id) en_html = en_html.replace(id, new_id) except: pass # meta description to og:description and description try: meta = re.search('] meta_description = re.search('] except: pass try: og_description = re.search('', en_html)[0] new_og_description = re.sub(r'content=".+"', f'content="{meta_description}"', og_description) en_html = en_html.replace(og_description, new_og_description) except: pass try: description = re.search('"description": "如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.", new_description = re.sub(r':.+', f': "{meta_description}",', description) en_html = en_html.replace(description, new_description) except: pass try: en_html = re.sub(', meta, en_html) except: pass try: en_html = re.sub('如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.', title, en_html) except: pass except FileNotFoundError: continue print(f'{filename} parsed') if use_parse_folder: try: with open(os.path.join(english_folder2+r'\parsed', 'parsed_'+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) except: os.mkdir(english_folder2+r'\parsed') with open(os.path.join(english_folder2+r'\parsed', 'parsed_'+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) else: with open(os.path.join(english_folder2, 'parsed_'+filename), 'w', encoding='utf-8') as html: html.write(en_html)

可选的。 这是一个正则表达式表达式,它将在HTML页面中更改“关键字”标记,在每个单词后添加逗号。

使用Notepad ++ - > Ctr + F - >检查:正则表达式

SEARCH: (?s)<title>.*?<\/title>.*?<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)  
REPLACE BY:  ?1\l\1:,\x20\l\2

您可以尝试此复杂版本的代码,这需要更多:从标记中检索数据并将其复制到标记</span><span class="tabela_shop_donate_4"><meta name="keywords" content=" "/></span> <span class="text_obisnuit2">您可以在此处查看代码:</span></p> <p class="den_articol"><span class="text_obisnuit2"> <a href="https://pastebin.com/jM5zf2qS" target="_new">HTTPS://帕萨特斌.com/JM5政府2QS</a></span></p> <p class="den_articol"></p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .4em;padding:.2em .6em;"> <pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">requests</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">re</span> <span style="color: #888888"># Path to english folder 1</span> english_folder2 <span style="color: #333333">=</span> <span style="background-color: #fff0f0">r"c:\Folder1"</span> extension_file <span style="color: #333333">=</span> <span style="background-color: #fff0f0">".html"</span> use_parse_folder <span style="color: #333333">=</span> <span style="color: #007020">True</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">os</span> en1_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder2) en2_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder2) <span style="color: #888888"># These connection words will be ignore when parsing data from <title> tag to <meta keywords> tag</span> LISTA_CUVINTE_LEGATURA <span style="color: #333333">=</span> [ <span style="background-color: #fff0f0">'in'</span>, <span style="background-color: #fff0f0">'la'</span>, <span style="background-color: #fff0f0">'unei'</span>, <span style="background-color: #fff0f0">'si'</span>, <span style="background-color: #fff0f0">'sa'</span>, <span style="background-color: #fff0f0">'se'</span>, <span style="background-color: #fff0f0">'de'</span>, <span style="background-color: #fff0f0">'prin'</span>, <span style="background-color: #fff0f0">'unde'</span>, <span style="background-color: #fff0f0">'care'</span>, <span style="background-color: #fff0f0">'a'</span>, <span style="background-color: #fff0f0">'al'</span>, <span style="background-color: #fff0f0">'prea'</span>, <span style="background-color: #fff0f0">'lui'</span>, <span style="background-color: #fff0f0">'din'</span>, <span style="background-color: #fff0f0">'ai'</span>, <span style="background-color: #fff0f0">'unui'</span>, <span style="background-color: #fff0f0">'acei'</span>, <span style="background-color: #fff0f0">'un'</span>, <span style="background-color: #fff0f0">'doar'</span>, <span style="background-color: #fff0f0">'tine'</span>, <span style="background-color: #fff0f0">'ale'</span>, <span style="background-color: #fff0f0">'sau'</span>, <span style="background-color: #fff0f0">'dintre'</span>, <span style="background-color: #fff0f0">'intre'</span>, <span style="background-color: #fff0f0">'cu'</span>,<span style="background-color: #fff0f0">'ce'</span>, <span style="background-color: #fff0f0">'va'</span>, <span style="background-color: #fff0f0">'fi'</span>, <span style="background-color: #fff0f0">'este'</span>, <span style="background-color: #fff0f0">'cand'</span>, <span style="background-color: #fff0f0">'o'</span>, <span style="background-color: #fff0f0">'cine'</span>, <span style="background-color: #fff0f0">'aceasta'</span>, <span style="background-color: #fff0f0">'ca'</span>, <span style="background-color: #fff0f0">'dar'</span>, <span style="background-color: #fff0f0">'II'</span>, <span style="background-color: #fff0f0">'III'</span>, <span style="background-color: #fff0f0">'IV'</span>, <span style="background-color: #fff0f0">'V'</span>, <span style="background-color: #fff0f0">'VI'</span>, <span style="background-color: #fff0f0">'VII'</span>, <span style="background-color: #fff0f0">'VIII'</span>, <span style="background-color: #fff0f0">'to'</span>, <span style="background-color: #fff0f0">'was'</span>, <span style="background-color: #fff0f0">'your'</span>, <span style="background-color: #fff0f0">'you'</span>, <span style="background-color: #fff0f0">'is'</span>, <span style="background-color: #fff0f0">'are'</span>, <span style="background-color: #fff0f0">'iar'</span>, <span style="background-color: #fff0f0">'fara'</span>, <span style="background-color: #fff0f0">'aceasta'</span>, <span style="background-color: #fff0f0">'pe'</span>, <span style="background-color: #fff0f0">'tu'</span>, <span style="background-color: #fff0f0">'nu'</span>, <span style="background-color: #fff0f0">'mai'</span>, <span style="background-color: #fff0f0">'ne'</span>, <span style="background-color: #fff0f0">'le'</span>, <span style="background-color: #fff0f0">'intr'</span>, <span style="background-color: #fff0f0">'cum'</span>, <span style="background-color: #fff0f0">'e'</span>, <span style="background-color: #fff0f0">'for'</span>, <span style="background-color: #fff0f0">'she'</span>, <span style="background-color: #fff0f0">'it'</span>, <span style="background-color: #fff0f0">'esti'</span>, <span style="background-color: #fff0f0">'this'</span>, <span style="background-color: #fff0f0">'that'</span>, <span style="background-color: #fff0f0">'how'</span>, <span style="background-color: #fff0f0">'can'</span>, <span style="background-color: #fff0f0">'t'</span>, <span style="background-color: #fff0f0">'must'</span>, <span style="background-color: #fff0f0">'be'</span>, <span style="background-color: #fff0f0">'the'</span>, <span style="background-color: #fff0f0">'and'</span>, <span style="background-color: #fff0f0">'do'</span>, <span style="background-color: #fff0f0">'so'</span>, <span style="background-color: #fff0f0">'or'</span>, <span style="background-color: #fff0f0">'ori'</span>, <span style="background-color: #fff0f0">'who'</span>, <span style="background-color: #fff0f0">'what'</span>, <span style="background-color: #fff0f0">'if'</span>, <span style="background-color: #fff0f0">'of'</span>, <span style="background-color: #fff0f0">'on'</span>, <span style="background-color: #fff0f0">'i'</span>, <span style="background-color: #fff0f0">'we'</span>, <span style="background-color: #fff0f0">'they'</span>, <span style="background-color: #fff0f0">'them'</span>, <span style="background-color: #fff0f0">'but'</span>, <span style="background-color: #fff0f0">'where'</span>, <span style="background-color: #fff0f0">'by'</span>, <span style="background-color: #fff0f0">'an'</span>, <span style="background-color: #fff0f0">'on'</span>, <span style="background-color: #fff0f0">'1'</span>, <span style="background-color: #fff0f0">'2'</span>, <span style="background-color: #fff0f0">'3'</span>, <span style="background-color: #fff0f0">'4'</span>, <span style="background-color: #fff0f0">'5'</span>, <span style="background-color: #fff0f0">'6'</span>, <span style="background-color: #fff0f0">'7'</span>, <span style="background-color: #fff0f0">'8'</span>, <span style="background-color: #fff0f0">'9'</span>, <span style="background-color: #fff0f0">'0'</span>, <span style="background-color: #fff0f0">'made'</span>, <span style="background-color: #fff0f0">'make'</span>, <span style="background-color: #fff0f0">'my'</span>, <span style="background-color: #fff0f0">'me'</span>, <span style="background-color: #fff0f0">'-'</span>, <span style="background-color: #fff0f0">'vom'</span>, <span style="background-color: #fff0f0">'voi'</span>, <span style="background-color: #fff0f0">'ei'</span>, <span style="background-color: #fff0f0">'cat'</span>, <span style="background-color: #fff0f0">'ar'</span>, <span style="background-color: #fff0f0">'putea'</span>, <span style="background-color: #fff0f0">'poti'</span>, <span style="background-color: #fff0f0">'sunteti'</span>, <span style="background-color: #fff0f0">'inca'</span>, <span style="background-color: #fff0f0">'still'</span>, <span style="background-color: #fff0f0">'noi'</span>, <span style="background-color: #fff0f0">'l'</span>, <span style="background-color: #fff0f0">'ma'</span>, <span style="background-color: #fff0f0">'s'</span>, <span style="background-color: #fff0f0">'dupa'</span>, <span style="background-color: #fff0f0">'after'</span>, <span style="background-color: #fff0f0">'under'</span>, <span style="background-color: #fff0f0">'sub'</span>, <span style="background-color: #fff0f0">'niste'</span>, <span style="background-color: #fff0f0">'some'</span>, <span style="background-color: #fff0f0">'those'</span>, <span style="background-color: #fff0f0">'he'</span> ] <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">creeaza_lista_keywords</span>(titlu): <span style="color: #888888"># imparte titlul in 2 in functie de bara verticala |</span> prima_parte_titlu <span style="color: #333333">=</span> titlu<span style="color: #333333">.</span>split(<span style="background-color: #fff0f0">'|'</span>)[<span style="color: #0000DD; font-weight: bold">0</span>] <span style="color: #888888"># extrage toate cuvintele din prima parte a titlului</span> keywords <span style="color: #333333">=</span> re<span style="color: #333333">.</span>findall(<span style="background-color: #fff0f0">r'(?:\w|-*\!)+'</span>, prima_parte_titlu) <span style="color: #888888"># extrage keyword-urile care nu se gasesc in lista de cuvinte de legatura</span> keywords_OK <span style="color: #333333">=</span> <span style="color: #007020">list</span>() <span style="color: #008800; font-weight: bold">for</span> keyword <span style="color: #000000; font-weight: bold">in</span> keywords: <span style="color: #008800; font-weight: bold">if</span> keyword <span style="color: #000000; font-weight: bold">not</span> <span style="color: #000000; font-weight: bold">in</span> LISTA_CUVINTE_LEGATURA: <span style="color: #888888"># adauga keyword-ul cu litere mici</span> keywords_OK<span style="color: #333333">.</span>append(keyword<span style="color: #333333">.</span>lower()) <span style="color: #888888"># returneaza un string in care toate keyword-urile sunt alaturate prin ', '</span> <span style="color: #008800; font-weight: bold">return</span> <span style="background-color: #fff0f0">", "</span><span style="color: #333333">.</span>join(keywords_OK) <span style="color: #008800; font-weight: bold">print</span>(<span style="background-color: #fff0f0">'Going through english folder'</span>) amount <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">1</span> <span style="color: #008800; font-weight: bold">for</span> <span style="color: #007020">file</span> <span style="color: #000000; font-weight: bold">in</span> os<span style="color: #333333">.</span>listdir(en1_directory): filename <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsdecode(<span style="color: #007020">file</span>) <span style="color: #008800; font-weight: bold">print</span>(filename) <span style="color: #008800; font-weight: bold">if</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'y_key_e479323ce281e459.html'</span> <span style="color: #000000; font-weight: bold">or</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'directory.html'</span>: <span style="color: #008800; font-weight: bold">continue</span> <span style="color: #008800; font-weight: bold">if</span> filename<span style="color: #333333">.</span>endswith(extension_file): <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder2, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> html: html <span style="color: #333333">=</span> html<span style="color: #333333">.</span>read() <span style="color: #008800; font-weight: bold">try</span>: <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder2, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> en_html: en_html <span style="color: #333333">=</span> en_html<span style="color: #333333">.</span>read() <span style="color: #888888"># title to meta</span> <span style="color: #008800; font-weight: bold">try</span>: title <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<title>如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.', html)[0] title_content = re.search('>(.+)<', title)[1] except: pass try: meta_og_title = re.search('', en_html)[0] new_meta_og_title = re.sub(r'content=".+"', f'content="{title_content}"', meta_og_title) en_html = en_html.replace(meta_og_title, new_meta_og_title) except: pass try: meta_keywords = re.search('', en_html)[0] keywords = creeaza_lista_keywords(title_content) new_meta_keywords = re.sub(r'content=".+"', f'content="{keywords}"', meta_keywords) en_html = en_html.replace(meta_keywords, new_meta_keywords) except: pass try: meta_abstract = re.search('', en_html)[0] new_meta_abstract = re.sub(r'content=".+"', f'content="{title_content}"', meta_abstract) en_html = en_html.replace(meta_abstract, new_meta_abstract) except: pass try: meta_Subject = re.search('', en_html)[0] new_meta_Subject = re.sub(r'content=".+"', f'content="{title_content}"', meta_Subject) en_html = en_html.replace(meta_Subject, new_meta_Subject) except: pass try: headline = re.search('"headline":.+', en_html)[0] new_headline = re.sub(r':.+', f': "{title_content}",', headline) en_html = en_html.replace(headline, new_headline) except: pass try: keywords = re.search('"keywords": "如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.", new_keywords = re.sub(r':.+', f': "{title_content}",', keywords) en_html = en_html.replace(keywords, new_keywords) except: pass # canonical to meta og:url and @id try: canonical_content = re.search('', html)[1] except: pass try: og_url = re.search('', en_html)[0] new_og_url = re.sub(r'content=".+"', f'content="{canonical_content}"', og_url) en_html = en_html.replace(og_url, new_og_url) except: pass try: id = re.search('"@id":.+', en_html)[0] new_id = re.sub(r':.+', f': "{canonical_content}"', id) en_html = en_html.replace(id, new_id) except: pass # meta description to og:description and description try: meta = re.search('] meta_description = re.search('] except: pass try: og_description = re.search('', en_html)[0] new_og_description = re.sub(r'content=".+"', f'content="{meta_description}"', og_description) en_html = en_html.replace(og_description, new_og_description) except: pass try: description = re.search('"description": "如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.", new_description = re.sub(r':.+', f': "{meta_description}",', description) en_html = en_html.replace(description, new_description) except: pass try: en_html = re.sub(', meta, en_html) except: pass try: en_html = re.sub('如何使用Python和Regex创建批处理处理器来替换HTML标记(解析)| Neculai Fantanaru.', title, en_html) except: pass except FileNotFoundError: continue print(f'{filename} parsed ({amount})') amount += 1 if use_parse_folder: try: with open(os.path.join(english_folder2+r'', ''+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) except: os.mkdir(english_folder2+r'') with open(os.path.join(english_folder2+r'', ''+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) else: with open(os.path.join(english_folder2, 'parsed_'+filename), 'w', encoding='utf-8') as html: html.write(en_html)

That's all folks.

If you like my code, please SHARE IT

您还可以查看代码版本电源外壳或其他python.版本3.要么版本4.要么版本5.


Latest articles accessed by readers:

  1. An Eye To See And A Mind To Understand
  2. Turn Towards Me With An Eye Full Of Your Own Gaze
  3. The Snapshot Of Magic In God's Universe
  4. Rhythm Of My Heart

Donate via Paypal

Alternate Text

RECURRENT DONATION

Donate monthly to support
the NeculaiFantanaru.com project

SINGLE DONATION

Donate the desired amount to support
the NeculaiFantanaru.com project

Donate by Bank Transfer

Account Ron: RO34INGB0000999900448439

Open account at ING Bank

Join The Neculai Fantanaru Community



* Note: If you want to read all my articles in real time, please check the romanian version !

decoration
About | Site Map | Partners | Feedback | Terms & Conditions | Privacy | RSS Feeds
© Neculai Fântânaru - All rights reserved