ro  fr  en  es  pt  ar  zh  hi  de  ru
ART 2.0 ART 3.0 ART 4.0 ART 5.0 ART 6.0 Pinterest

Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)

On Iunie 16, 2021, in Leadership and Attitude, by Neculai Fantanaru

You can view the full code here: https://pastebin.com/wnuM5Qg5

A kóða dæmi um HTML síður sem verða breytt með Python Code. Afritaðu framangreindan texta í .html skrá, vista það á staðinnC: \ Folder1

   

 xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="ro">

Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)
 rel="canonical" href="https://MY-WEBSITE.COM" />
 name="description" content="I LOVE HTML and CSS"/>

 name="keywords" content="abordarea frontala a lucrurilor neelucidate"/>
 name="abstract" content="My laptop works just fine"/>
 name="Subject" content="I think I need a new car."/>
 property="og:url" content="https://otherwebsite.com"/>
 property="og:title" content="Nobody is here?" />
 property="og:description" content="Dance is my passion."/>





The PowerShell code below will copy the contents of the html tags to the other tags by parsing the data. You only need to fill in the tags </span>Fjórir<span class="tabela_shop_donate_5"><meta name="description" content="Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing) | Neculai Fantanaru."/> <!-- HTML generated using hilite.me --> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .4em;padding:.2em .6em;"> <pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">requests</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">re</span> <span style="color: #888888"># Path to english folder 1</span> english_folder1 <span style="color: #333333">=</span> <span style="background-color: #fff0f0">r"c:\Folder1"</span> <span style="color: #888888"># Path to english folder 2</span> english_folder2 <span style="color: #333333">=</span> <span style="background-color: #fff0f0">r"c:\Folder1"</span> extension_file <span style="color: #333333">=</span> <span style="background-color: #fff0f0">".html"</span> use_parse_folder <span style="color: #333333">=</span> <span style="color: #008800; font-weight: bold">True</span> <span style="color: #888888">#Face folder nou daca pui True, iar daca pui False redenumeste fisierele in acelasi folder</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">os</span> en1_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder1) en2_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder2) <span style="color: #007020">print</span>(<span style="background-color: #fff0f0">'Going through english folder'</span>) <span style="color: #008800; font-weight: bold">for</span> file <span style="color: #000000; font-weight: bold">in</span> os<span style="color: #333333">.</span>listdir(en1_directory): filename <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsdecode(file) <span style="color: #007020">print</span>(filename) <span style="color: #008800; font-weight: bold">if</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'y_key_e479323ce281e459.html'</span> <span style="color: #000000; font-weight: bold">or</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'TS_4fg4_tr78.html'</span>: <span style="color: #008800; font-weight: bold">continue</span> <span style="color: #008800; font-weight: bold">if</span> filename<span style="color: #333333">.</span>endswith(extension_file): <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder1, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> html: html <span style="color: #333333">=</span> html<span style="color: #333333">.</span>read() <span style="color: #008800; font-weight: bold">try</span>: <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder2, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> en_html: en_html <span style="color: #333333">=</span> en_html<span style="color: #333333">.</span>read() <span style="color: #008800; font-weight: bold">if</span> <span style="color: #008800; font-weight: bold">False</span>: <span style="color: #888888"># if True: will Parse also the content that starts from <!-- ARTICOL START --> to <!-- ARTICOL FINAL --> and so on</span> <span style="color: #008800; font-weight: bold">try</span>: comment_body <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<!-- ARTICOL START -->.+<!-- ARTICOL FINAL -->'</span>, html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL)[<span style="color: #0000DD; font-weight: bold">0</span>] en_html <span style="color: #333333">=</span> re<span style="color: #333333">.</span>sub(<span style="background-color: #fff0f0">'<!-- ARTICOL START -->.+<!-- ARTICOL FINAL -->'</span>, comment_body, en_html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL) <span style="color: #008800; font-weight: bold">except</span>: <span style="color: #008800; font-weight: bold">pass</span> <span style="color: #008800; font-weight: bold">try</span>: comment_body2 <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<!-- FLAGS_1 -->.+<!-- FLAGS -->'</span>, html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL)[<span style="color: #0000DD; font-weight: bold">0</span>] en_html <span style="color: #333333">=</span> re<span style="color: #333333">.</span>sub(<span style="background-color: #fff0f0">'<!-- FLAGS_1 -->.+<!-- FLAGS -->'</span>, comment_body2, en_html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL) <span style="color: #008800; font-weight: bold">except</span>: <span style="color: #008800; font-weight: bold">pass</span> <span style="color: #008800; font-weight: bold">try</span>: comment_body3 <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<!-- MENIU BARA SUS -->.+<!-- SFARSIT MENIU BARA SUS -->'</span>, html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL)[<span style="color: #0000DD; font-weight: bold">0</span>] en_html <span style="color: #333333">=</span> re<span style="color: #333333">.</span>sub(<span style="background-color: #fff0f0">'<!-- MENIU BARA SUS -->.+<!-- SFARSIT MENIU BARA SUS -->'</span>, comment_body3, en_html, flags<span style="color: #333333">=</span>re<span style="color: #333333">.</span>DOTALL) <span style="color: #008800; font-weight: bold">except</span>: <span style="color: #008800; font-weight: bold">pass</span> <span style="color: #888888"># title to meta</span> <span style="color: #008800; font-weight: bold">try</span>: title <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<title>Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)', html)[0] title_content = re.search('>(.+)<', title)[1] except: pass try: meta_og_title = re.search('', en_html)[0] new_meta_og_title = re.sub(r'content=".+"', f'content="{title_content}"', meta_og_title) en_html = en_html.replace(meta_og_title, new_meta_og_title) except: pass try: meta_keywords = re.search('', en_html)[0] new_meta_keywords = re.sub(r'content=".+"', f'content="{title_content}"', meta_keywords) en_html = en_html.replace(meta_keywords, new_meta_keywords) except: pass try: meta_abstract = re.search('', en_html)[0] new_meta_abstract = re.sub(r'content=".+"', f'content="{title_content}"', meta_abstract) en_html = en_html.replace(meta_abstract, new_meta_abstract) except: pass try: meta_Subject = re.search('', en_html)[0] new_meta_Subject = re.sub(r'content=".+"', f'content="{title_content}"', meta_Subject) en_html = en_html.replace(meta_Subject, new_meta_Subject) except: pass try: headline = re.search('"headline":.+', en_html)[0] new_headline = re.sub(r':.+', f': "{title_content}",', headline) en_html = en_html.replace(headline, new_headline) except: pass try: keywords = re.search('"keywords": "Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)", new_keywords = re.sub(r':.+', f': "{title_content}",', keywords) en_html = en_html.replace(keywords, new_keywords) except: pass # canonical to meta og:url and @id try: canonical_content = re.search('', html)[1] except: pass try: og_url = re.search('', en_html)[0] new_og_url = re.sub(r'content=".+"', f'content="{canonical_content}"', og_url) en_html = en_html.replace(og_url, new_og_url) except: pass try: id = re.search('"@id":.+', en_html)[0] new_id = re.sub(r':.+', f': "{canonical_content}"', id) en_html = en_html.replace(id, new_id) except: pass # meta description to og:description and description try: meta = re.search('] meta_description = re.search('] except: pass try: og_description = re.search('', en_html)[0] new_og_description = re.sub(r'content=".+"', f'content="{meta_description}"', og_description) en_html = en_html.replace(og_description, new_og_description) except: pass try: description = re.search('"description": "Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing) | Neculai Fantanaru.", new_description = re.sub(r':.+', f': "{meta_description}",', description) en_html = en_html.replace(description, new_description) except: pass try: en_html = re.sub(', meta, en_html) except: pass try: en_html = re.sub('Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)', title, en_html) except: pass except FileNotFoundError: continue print(f'{filename} parsed') if use_parse_folder: try: with open(os.path.join(english_folder2+r'\parsed', 'parsed_'+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) except: os.mkdir(english_folder2+r'\parsed') with open(os.path.join(english_folder2+r'\parsed', 'parsed_'+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) else: with open(os.path.join(english_folder2, 'parsed_'+filename), 'w', encoding='utf-8') as html: html.write(en_html)

Optional. Here is a REGEX expression that will change the "KEYWORDS" tag in the html page, adding a comma after each word.

Notaðu með Notepad ++ -> Ctr + F -> Athugaðu: Venjulegur tjáning

SEARCH: (?s)<title>.*?<\/title>.*?<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)  
REPLACE BY:  ?1\l\1:,\x20\l\2

Þú getur prófað þessa flóknu útgáfu kóða, sem gerir meira: Sækja gögn úr merkinu og afritaðu það á merkið</span><span class="tabela_shop_donate_4"><meta name="keywords" content="hvernig, á, að, búa, til, hópur, örgjörva, með, python, og, regex, til, að, skipta, um, html, tags, parsing"/></span> <span class="text_obisnuit2">Þú getur séð kóðann hér:</span></p> <p class="den_articol"><span class="text_obisnuit2"> <a href="https://pastebin.com/jM5zf2qS" target="_new">https://pastebin.com/jM5zf2qS</a></span></p> <p class="den_articol"></p> <div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .4em;padding:.2em .6em;"> <pre style="margin: 0; line-height: 125%"><span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">requests</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">re</span> <span style="color: #888888"># Path to english folder 1</span> english_folder2 <span style="color: #333333">=</span> <span style="background-color: #fff0f0">r"c:\Folder1"</span> extension_file <span style="color: #333333">=</span> <span style="background-color: #fff0f0">".html"</span> use_parse_folder <span style="color: #333333">=</span> <span style="color: #007020">True</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">os</span> en1_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder2) en2_directory <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsencode(english_folder2) <span style="color: #888888"># These connection words will be ignore when parsing data from <title> tag to <meta keywords> tag</span> LISTA_CUVINTE_LEGATURA <span style="color: #333333">=</span> [ <span style="background-color: #fff0f0">'in'</span>, <span style="background-color: #fff0f0">'la'</span>, <span style="background-color: #fff0f0">'unei'</span>, <span style="background-color: #fff0f0">'si'</span>, <span style="background-color: #fff0f0">'sa'</span>, <span style="background-color: #fff0f0">'se'</span>, <span style="background-color: #fff0f0">'de'</span>, <span style="background-color: #fff0f0">'prin'</span>, <span style="background-color: #fff0f0">'unde'</span>, <span style="background-color: #fff0f0">'care'</span>, <span style="background-color: #fff0f0">'a'</span>, <span style="background-color: #fff0f0">'al'</span>, <span style="background-color: #fff0f0">'prea'</span>, <span style="background-color: #fff0f0">'lui'</span>, <span style="background-color: #fff0f0">'din'</span>, <span style="background-color: #fff0f0">'ai'</span>, <span style="background-color: #fff0f0">'unui'</span>, <span style="background-color: #fff0f0">'acei'</span>, <span style="background-color: #fff0f0">'un'</span>, <span style="background-color: #fff0f0">'doar'</span>, <span style="background-color: #fff0f0">'tine'</span>, <span style="background-color: #fff0f0">'ale'</span>, <span style="background-color: #fff0f0">'sau'</span>, <span style="background-color: #fff0f0">'dintre'</span>, <span style="background-color: #fff0f0">'intre'</span>, <span style="background-color: #fff0f0">'cu'</span>,<span style="background-color: #fff0f0">'ce'</span>, <span style="background-color: #fff0f0">'va'</span>, <span style="background-color: #fff0f0">'fi'</span>, <span style="background-color: #fff0f0">'este'</span>, <span style="background-color: #fff0f0">'cand'</span>, <span style="background-color: #fff0f0">'o'</span>, <span style="background-color: #fff0f0">'cine'</span>, <span style="background-color: #fff0f0">'aceasta'</span>, <span style="background-color: #fff0f0">'ca'</span>, <span style="background-color: #fff0f0">'dar'</span>, <span style="background-color: #fff0f0">'II'</span>, <span style="background-color: #fff0f0">'III'</span>, <span style="background-color: #fff0f0">'IV'</span>, <span style="background-color: #fff0f0">'V'</span>, <span style="background-color: #fff0f0">'VI'</span>, <span style="background-color: #fff0f0">'VII'</span>, <span style="background-color: #fff0f0">'VIII'</span>, <span style="background-color: #fff0f0">'to'</span>, <span style="background-color: #fff0f0">'was'</span>, <span style="background-color: #fff0f0">'your'</span>, <span style="background-color: #fff0f0">'you'</span>, <span style="background-color: #fff0f0">'is'</span>, <span style="background-color: #fff0f0">'are'</span>, <span style="background-color: #fff0f0">'iar'</span>, <span style="background-color: #fff0f0">'fara'</span>, <span style="background-color: #fff0f0">'aceasta'</span>, <span style="background-color: #fff0f0">'pe'</span>, <span style="background-color: #fff0f0">'tu'</span>, <span style="background-color: #fff0f0">'nu'</span>, <span style="background-color: #fff0f0">'mai'</span>, <span style="background-color: #fff0f0">'ne'</span>, <span style="background-color: #fff0f0">'le'</span>, <span style="background-color: #fff0f0">'intr'</span>, <span style="background-color: #fff0f0">'cum'</span>, <span style="background-color: #fff0f0">'e'</span>, <span style="background-color: #fff0f0">'for'</span>, <span style="background-color: #fff0f0">'she'</span>, <span style="background-color: #fff0f0">'it'</span>, <span style="background-color: #fff0f0">'esti'</span>, <span style="background-color: #fff0f0">'this'</span>, <span style="background-color: #fff0f0">'that'</span>, <span style="background-color: #fff0f0">'how'</span>, <span style="background-color: #fff0f0">'can'</span>, <span style="background-color: #fff0f0">'t'</span>, <span style="background-color: #fff0f0">'must'</span>, <span style="background-color: #fff0f0">'be'</span>, <span style="background-color: #fff0f0">'the'</span>, <span style="background-color: #fff0f0">'and'</span>, <span style="background-color: #fff0f0">'do'</span>, <span style="background-color: #fff0f0">'so'</span>, <span style="background-color: #fff0f0">'or'</span>, <span style="background-color: #fff0f0">'ori'</span>, <span style="background-color: #fff0f0">'who'</span>, <span style="background-color: #fff0f0">'what'</span>, <span style="background-color: #fff0f0">'if'</span>, <span style="background-color: #fff0f0">'of'</span>, <span style="background-color: #fff0f0">'on'</span>, <span style="background-color: #fff0f0">'i'</span>, <span style="background-color: #fff0f0">'we'</span>, <span style="background-color: #fff0f0">'they'</span>, <span style="background-color: #fff0f0">'them'</span>, <span style="background-color: #fff0f0">'but'</span>, <span style="background-color: #fff0f0">'where'</span>, <span style="background-color: #fff0f0">'by'</span>, <span style="background-color: #fff0f0">'an'</span>, <span style="background-color: #fff0f0">'on'</span>, <span style="background-color: #fff0f0">'1'</span>, <span style="background-color: #fff0f0">'2'</span>, <span style="background-color: #fff0f0">'3'</span>, <span style="background-color: #fff0f0">'4'</span>, <span style="background-color: #fff0f0">'5'</span>, <span style="background-color: #fff0f0">'6'</span>, <span style="background-color: #fff0f0">'7'</span>, <span style="background-color: #fff0f0">'8'</span>, <span style="background-color: #fff0f0">'9'</span>, <span style="background-color: #fff0f0">'0'</span>, <span style="background-color: #fff0f0">'made'</span>, <span style="background-color: #fff0f0">'make'</span>, <span style="background-color: #fff0f0">'my'</span>, <span style="background-color: #fff0f0">'me'</span>, <span style="background-color: #fff0f0">'-'</span>, <span style="background-color: #fff0f0">'vom'</span>, <span style="background-color: #fff0f0">'voi'</span>, <span style="background-color: #fff0f0">'ei'</span>, <span style="background-color: #fff0f0">'cat'</span>, <span style="background-color: #fff0f0">'ar'</span>, <span style="background-color: #fff0f0">'putea'</span>, <span style="background-color: #fff0f0">'poti'</span>, <span style="background-color: #fff0f0">'sunteti'</span>, <span style="background-color: #fff0f0">'inca'</span>, <span style="background-color: #fff0f0">'still'</span>, <span style="background-color: #fff0f0">'noi'</span>, <span style="background-color: #fff0f0">'l'</span>, <span style="background-color: #fff0f0">'ma'</span>, <span style="background-color: #fff0f0">'s'</span>, <span style="background-color: #fff0f0">'dupa'</span>, <span style="background-color: #fff0f0">'after'</span>, <span style="background-color: #fff0f0">'under'</span>, <span style="background-color: #fff0f0">'sub'</span>, <span style="background-color: #fff0f0">'niste'</span>, <span style="background-color: #fff0f0">'some'</span>, <span style="background-color: #fff0f0">'those'</span>, <span style="background-color: #fff0f0">'he'</span> ] <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">creeaza_lista_keywords</span>(titlu): <span style="color: #888888"># imparte titlul in 2 in functie de bara verticala |</span> prima_parte_titlu <span style="color: #333333">=</span> titlu<span style="color: #333333">.</span>split(<span style="background-color: #fff0f0">'|'</span>)[<span style="color: #0000DD; font-weight: bold">0</span>] <span style="color: #888888"># extrage toate cuvintele din prima parte a titlului</span> keywords <span style="color: #333333">=</span> re<span style="color: #333333">.</span>findall(<span style="background-color: #fff0f0">r'(?:\w|-*\!)+'</span>, prima_parte_titlu) <span style="color: #888888"># extrage keyword-urile care nu se gasesc in lista de cuvinte de legatura</span> keywords_OK <span style="color: #333333">=</span> <span style="color: #007020">list</span>() <span style="color: #008800; font-weight: bold">for</span> keyword <span style="color: #000000; font-weight: bold">in</span> keywords: <span style="color: #008800; font-weight: bold">if</span> keyword <span style="color: #000000; font-weight: bold">not</span> <span style="color: #000000; font-weight: bold">in</span> LISTA_CUVINTE_LEGATURA: <span style="color: #888888"># adauga keyword-ul cu litere mici</span> keywords_OK<span style="color: #333333">.</span>append(keyword<span style="color: #333333">.</span>lower()) <span style="color: #888888"># returneaza un string in care toate keyword-urile sunt alaturate prin ', '</span> <span style="color: #008800; font-weight: bold">return</span> <span style="background-color: #fff0f0">", "</span><span style="color: #333333">.</span>join(keywords_OK) <span style="color: #008800; font-weight: bold">print</span>(<span style="background-color: #fff0f0">'Going through english folder'</span>) amount <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">1</span> <span style="color: #008800; font-weight: bold">for</span> <span style="color: #007020">file</span> <span style="color: #000000; font-weight: bold">in</span> os<span style="color: #333333">.</span>listdir(en1_directory): filename <span style="color: #333333">=</span> os<span style="color: #333333">.</span>fsdecode(<span style="color: #007020">file</span>) <span style="color: #008800; font-weight: bold">print</span>(filename) <span style="color: #008800; font-weight: bold">if</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'y_key_e479323ce281e459.html'</span> <span style="color: #000000; font-weight: bold">or</span> filename <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'directory.html'</span>: <span style="color: #008800; font-weight: bold">continue</span> <span style="color: #008800; font-weight: bold">if</span> filename<span style="color: #333333">.</span>endswith(extension_file): <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder2, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> html: html <span style="color: #333333">=</span> html<span style="color: #333333">.</span>read() <span style="color: #008800; font-weight: bold">try</span>: <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(os<span style="color: #333333">.</span>path<span style="color: #333333">.</span>join(english_folder2, filename), encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf-8'</span>) <span style="color: #008800; font-weight: bold">as</span> en_html: en_html <span style="color: #333333">=</span> en_html<span style="color: #333333">.</span>read() <span style="color: #888888"># title to meta</span> <span style="color: #008800; font-weight: bold">try</span>: title <span style="color: #333333">=</span> re<span style="color: #333333">.</span>search(<span style="background-color: #fff0f0">'<title>Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)', html)[0] title_content = re.search('>(.+)<', title)[1] except: pass try: meta_og_title = re.search('', en_html)[0] new_meta_og_title = re.sub(r'content=".+"', f'content="{title_content}"', meta_og_title) en_html = en_html.replace(meta_og_title, new_meta_og_title) except: pass try: meta_keywords = re.search('', en_html)[0] keywords = creeaza_lista_keywords(title_content) new_meta_keywords = re.sub(r'content=".+"', f'content="{keywords}"', meta_keywords) en_html = en_html.replace(meta_keywords, new_meta_keywords) except: pass try: meta_abstract = re.search('', en_html)[0] new_meta_abstract = re.sub(r'content=".+"', f'content="{title_content}"', meta_abstract) en_html = en_html.replace(meta_abstract, new_meta_abstract) except: pass try: meta_Subject = re.search('', en_html)[0] new_meta_Subject = re.sub(r'content=".+"', f'content="{title_content}"', meta_Subject) en_html = en_html.replace(meta_Subject, new_meta_Subject) except: pass try: headline = re.search('"headline":.+', en_html)[0] new_headline = re.sub(r':.+', f': "{title_content}",', headline) en_html = en_html.replace(headline, new_headline) except: pass try: keywords = re.search('"keywords": "Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)", new_keywords = re.sub(r':.+', f': "{title_content}",', keywords) en_html = en_html.replace(keywords, new_keywords) except: pass # canonical to meta og:url and @id try: canonical_content = re.search('', html)[1] except: pass try: og_url = re.search('', en_html)[0] new_og_url = re.sub(r'content=".+"', f'content="{canonical_content}"', og_url) en_html = en_html.replace(og_url, new_og_url) except: pass try: id = re.search('"@id":.+', en_html)[0] new_id = re.sub(r':.+', f': "{canonical_content}"', id) en_html = en_html.replace(id, new_id) except: pass # meta description to og:description and description try: meta = re.search('] meta_description = re.search('] except: pass try: og_description = re.search('', en_html)[0] new_og_description = re.sub(r'content=".+"', f'content="{meta_description}"', og_description) en_html = en_html.replace(og_description, new_og_description) except: pass try: description = re.search('"description": "Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing) | Neculai Fantanaru.", new_description = re.sub(r':.+', f': "{meta_description}",', description) en_html = en_html.replace(description, new_description) except: pass try: en_html = re.sub(', meta, en_html) except: pass try: en_html = re.sub('Hvernig á að búa til hópur örgjörva með Python og Regex til að skipta um HTML tags (parsing)', title, en_html) except: pass except FileNotFoundError: continue print(f'{filename} parsed ({amount})') amount += 1 if use_parse_folder: try: with open(os.path.join(english_folder2+r'', ''+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) except: os.mkdir(english_folder2+r'') with open(os.path.join(english_folder2+r'', ''+filename), 'w', encoding='utf-8') as new_html: new_html.write(en_html) else: with open(os.path.join(english_folder2, 'parsed_'+filename), 'w', encoding='utf-8') as html: html.write(en_html)

That's all folks.

If you like my code, please SHARE IT

Þú getur líka skoðað kóðann íPowershell. or other Python Útgáfa 3. or Útgáfa 4. or VERSION 5


Latest articles accessed by readers:

  1. An Eye To See And A Mind To Understand
  2. Turn Towards Me With An Eye Full Of Your Own Gaze
  3. The Snapshot Of Magic In God's Universe
  4. Rhythm Of My Heart

Donate via Paypal

Alternate Text

RECURRENT DONATION

Donate monthly to support
the NeculaiFantanaru.com project

SINGLE DONATION

Donate the desired amount to support
the NeculaiFantanaru.com project

Donate by Bank Transfer

Account Ron: RO34INGB0000999900448439

Open account at ING Bank

Join The Neculai Fantanaru Community



* Note: If you want to read all my articles in real time, please check the romanian version !

decoration
About | Site Map | Partners | Feedback | Terms & Conditions | Privacy | RSS Feeds
© Neculai Fântânaru - All rights reserved