ro  fr  en  es  pt  ar  zh  hi  de  ru
ART 2.0 ART 3.0 ART 4.0 ART 5.0 ART 6.0 Pinterest

Python: Búðu til margar HTML skrár úr textaskrám og tag hagræðingu

On January 22, 2022, in Lideri şi Atitudine, by Neculai Fantanaru

Þú getur skoðað fullan kóða hér:Hér

Setja uppPython.. Settu síðan upp eftirfarandi tvær bókasöfn með því að nota Command Prompt (CMD) túlkann í Windows10:

py- m pip install unidecode
py -m pip install nltk

Þú þarft eftirfarandi:

1. Búðu til möppu sem heitir:files_html.(Textaskrár verða vistaðar hér sem HTML)

2. Búðu til möppu sem heitirTenglar(Hér verður þú að búa tilTenglar.txt.Skrá þar sem þú þarft að setja undir hver öðrum HTML þeim tenglum sem verða settar inn sem leitarorð í líkamanum greinar frá nýju HTML-síðum).

3. Þú þarft einn HTML skrá, sem heitir:oana.tmmys.. Það mun hafa þessa uppbyggingu:

<title>Blah Blah Blahtitle>

<meta name="description" content="Blah Blah Blah.">

<h3 class="font-weight-normal">TITLE OF THE ARTICLEh3>

    

<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit, 
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. 
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in 
reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
 pariatur. Excepteur sint occaecat cupidatat non proident, sunt in 
 culpa qui officia deserunt mollit anim id est laborum.p>

    

4. Á.aðalmöppuÞú verður að setja allar textaskrár og oana.html skrána

WHAT DOES THE CODE DO?:

1. Sækja fyrstu 10 orðin úr hverri textaskrá og vista þessi skrá sem HTML hlekkur af þessum 10 orðum.

2. Sækja fyrstu 10 orðin úr hverri textaskrá og afritaðu þau á merkið og <H3 Class> tag</p> <p class="text_obisnuit2">3. Sækja fyrstu 20 orðin úr hverri texta skrá og afritaðu þau á <Meta Lýsing> Tag.</p> <p class="text_obisnuit2">4. Afritaðu allt innihald textans í kaflann<span style="margin: 0; line-height: 125%"> <span class="style16" style="color: #333333"><! -</span><span class="style16">Grein byrjun.<span style="color: #333333">- >.</span></span></span> <span class="style16" style="margin: 0; line-height: 125%"><span style="color: #333333"><! -</span>Endanleg atriði<span style="color: #333333">-> (Skiptu um núverandi texta úr HTML-skrá)</span></span></p> <p class="text_obisnuit2">5. Endurnefna nýja HTML skjalið samkvæmt fyrstu 10 orðunum í textaskránni.</p> <p class="text_obisnuit2">6. Athugaðu hvort leitarorðin í tenglunum séu staðsettar í<span class="style16">Tenglar.txt.</span>Skráin eru í textanum. Ef já, veldu það handahófi orð úr líkamanum á nýju HTML-síðunni og varpa ljósi á það sem tengil. (Link orð eins og "og, hver, hvað, þegar" verður útilokað vegna þess að þau eru ekki leitarorð).</p> </div> <p class="text_obisnuit"></p> <p class="text_obisnuit"><span class="titlu_text_dreapta">Kóðinn:</span><span class="text_obisnuit2">Afritaðu og keyrðu kóðann hér að neðan í hvaða túlkunaráætlun sem er</span>(Ég nota<a href="https://sourceforge.net/projects/pyscripter/" target="_new">pycripter.</a>.</p> <p class="text_obisnuit"></p> <!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .4em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #888888">#-------------------------------------------------------------------------------</span> <span style="color: #888888"># Name: Create html files from text files</span> <span style="color: #888888"># Purpose:</span> <span style="color: #888888">#</span> <span style="color: #888888"># Author: Neculai Fantanaru</span> <span style="color: #888888">#</span> <span style="color: #888888"># Created: 22/01/2022</span> <span style="color: #888888"># Copyright: (c) Neculai Fantanaru 2022</span> <span style="color: #888888">#-------------------------------------------------------------------------------</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">os</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">re</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">random</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">unidecode</span> <span style="color: #008800; font-weight: bold">import</span> <span style="color: #0e84b5; font-weight: bold">nltk</span> <span style="color: #008800; font-weight: bold">from</span> <span style="color: #0e84b5; font-weight: bold">nltk</span> <span style="color: #008800; font-weight: bold">import</span> tokenize <span style="color: #888888"># nltk.download('punkt')</span> SITE <span style="color: #333333">=</span> <span style="background-color: #fff0f0">'https://neculaifantanaru.com/'</span> LISTA_CUVINTE_LEGATURA <span style="color: #333333">=</span> [ <span style="background-color: #fff0f0">'in'</span>, <span style="background-color: #fff0f0">'la'</span>, <span style="background-color: #fff0f0">'unei'</span>, <span style="background-color: #fff0f0">'si'</span>, <span style="background-color: #fff0f0">'sa'</span>, <span style="background-color: #fff0f0">'se'</span>, <span style="background-color: #fff0f0">'de'</span>, <span style="background-color: #fff0f0">'prin'</span>, <span style="background-color: #fff0f0">'unde'</span>, <span style="background-color: #fff0f0">'care'</span>, <span style="background-color: #fff0f0">'a'</span>, <span style="background-color: #fff0f0">'al'</span>, <span style="background-color: #fff0f0">'prea'</span>, <span style="background-color: #fff0f0">'lui'</span>, <span style="background-color: #fff0f0">'din'</span>, <span style="background-color: #fff0f0">'ai'</span>, <span style="background-color: #fff0f0">'unui'</span>, <span style="background-color: #fff0f0">'acei'</span>, <span style="background-color: #fff0f0">'un'</span>, <span style="background-color: #fff0f0">'doar'</span>, <span style="background-color: #fff0f0">'tine'</span>, <span style="background-color: #fff0f0">'ale'</span>, <span style="background-color: #fff0f0">'sau'</span>, <span style="background-color: #fff0f0">'dintre'</span>, <span style="background-color: #fff0f0">'intre'</span>, <span style="background-color: #fff0f0">'cu'</span>, <span style="background-color: #fff0f0">'ce'</span>, <span style="background-color: #fff0f0">'va'</span>, <span style="background-color: #fff0f0">'fi'</span>, <span style="background-color: #fff0f0">'este'</span>, <span style="background-color: #fff0f0">'cand'</span>, <span style="background-color: #fff0f0">'o'</span>, <span style="background-color: #fff0f0">'cine'</span>, <span style="background-color: #fff0f0">'aceasta'</span>, <span style="background-color: #fff0f0">'ca'</span>, <span style="background-color: #fff0f0">'dar'</span>, <span style="background-color: #fff0f0">'II'</span>, <span style="background-color: #fff0f0">'III'</span>, <span style="background-color: #fff0f0">'IV'</span>, <span style="background-color: #fff0f0">'V'</span>, <span style="background-color: #fff0f0">'VI'</span>, <span style="background-color: #fff0f0">'VII'</span>, <span style="background-color: #fff0f0">'VIII'</span>, <span style="background-color: #fff0f0">'to'</span>, <span style="background-color: #fff0f0">'was'</span>, <span style="background-color: #fff0f0">'your'</span>, <span style="background-color: #fff0f0">'you'</span>, <span style="background-color: #fff0f0">'is'</span>, <span style="background-color: #fff0f0">'are'</span>, <span style="background-color: #fff0f0">'iar'</span>, <span style="background-color: #fff0f0">'fara'</span>, <span style="background-color: #fff0f0">'asta'</span>, <span style="background-color: #fff0f0">'pe'</span>, <span style="background-color: #fff0f0">'tu'</span>, <span style="background-color: #fff0f0">'nu'</span>, <span style="background-color: #fff0f0">'mai'</span>, <span style="background-color: #fff0f0">'ne'</span>, <span style="background-color: #fff0f0">'le'</span>, <span style="background-color: #fff0f0">'intr'</span>, <span style="background-color: #fff0f0">'cum'</span>, <span style="background-color: #fff0f0">'e'</span>, <span style="background-color: #fff0f0">'for'</span>, <span style="background-color: #fff0f0">'she'</span>, <span style="background-color: #fff0f0">'it'</span>, <span style="background-color: #fff0f0">'esti'</span>, <span style="background-color: #fff0f0">'this'</span>, <span style="background-color: #fff0f0">'that'</span>, <span style="background-color: #fff0f0">'how'</span>, <span style="background-color: #fff0f0">'can'</span>, <span style="background-color: #fff0f0">'t'</span>, <span style="background-color: #fff0f0">'must'</span>, <span style="background-color: #fff0f0">'be'</span>, <span style="background-color: #fff0f0">'the'</span>, <span style="background-color: #fff0f0">'and'</span>, <span style="background-color: #fff0f0">'do'</span>, <span style="background-color: #fff0f0">'so'</span>, <span style="background-color: #fff0f0">'or'</span>, <span style="background-color: #fff0f0">'ori'</span>, <span style="background-color: #fff0f0">'who'</span>, <span style="background-color: #fff0f0">'what'</span>, <span style="background-color: #fff0f0">'if'</span>, <span style="background-color: #fff0f0">'of'</span>, <span style="background-color: #fff0f0">'on'</span>, <span style="background-color: #fff0f0">'i'</span>, <span style="background-color: #fff0f0">'we'</span>, <span style="background-color: #fff0f0">'they'</span>, <span style="background-color: #fff0f0">'them'</span>, <span style="background-color: #fff0f0">'but'</span>, <span style="background-color: #fff0f0">'where'</span>, <span style="background-color: #fff0f0">'by'</span>, <span style="background-color: #fff0f0">'an'</span>, <span style="background-color: #fff0f0">'mi'</span>, <span style="background-color: #fff0f0">'1'</span>, <span style="background-color: #fff0f0">'2'</span>, <span style="background-color: #fff0f0">'3'</span>, <span style="background-color: #fff0f0">'4'</span>, <span style="background-color: #fff0f0">'5'</span>, <span style="background-color: #fff0f0">'6'</span>, <span style="background-color: #fff0f0">'7'</span>, <span style="background-color: #fff0f0">'8'</span>, <span style="background-color: #fff0f0">'9'</span>, <span style="background-color: #fff0f0">'0'</span>, <span style="background-color: #fff0f0">'made'</span>, <span style="background-color: #fff0f0">'my'</span>, <span style="background-color: #fff0f0">'me'</span>, <span style="background-color: #fff0f0">'-'</span>, <span style="background-color: #fff0f0">'vom'</span>, <span style="background-color: #fff0f0">'voi'</span>, <span style="background-color: #fff0f0">'ei'</span>, <span style="background-color: #fff0f0">'cat'</span>, <span style="background-color: #fff0f0">'ar'</span>, <span style="background-color: #fff0f0">'putea'</span>, <span style="background-color: #fff0f0">'poti'</span>, <span style="background-color: #fff0f0">'sunteti'</span>, <span style="background-color: #fff0f0">'inca'</span>, <span style="background-color: #fff0f0">'still'</span>, <span style="background-color: #fff0f0">'noi'</span>, <span style="background-color: #fff0f0">'l'</span>, <span style="background-color: #fff0f0">'ma'</span>, <span style="background-color: #fff0f0">'s'</span>, <span style="background-color: #fff0f0">'dupa'</span>, <span style="background-color: #fff0f0">'after'</span>, <span style="background-color: #fff0f0">'under'</span>, <span style="background-color: #fff0f0">'sub'</span>, <span style="background-color: #fff0f0">'niste'</span>, <span style="background-color: #fff0f0">'some'</span>, <span style="background-color: #fff0f0">'those'</span>, <span style="background-color: #fff0f0">'he'</span>, <span style="background-color: #fff0f0">'no'</span>, <span style="background-color: #fff0f0">'too'</span>, <span style="background-color: #fff0f0">'fac'</span>, <span style="background-color: #fff0f0">'made'</span>, <span style="background-color: #fff0f0">'make'</span>, <span style="background-color: #fff0f0">'cei'</span>, <span style="background-color: #fff0f0">'most'</span>, <span style="background-color: #fff0f0">'face'</span>, <span style="background-color: #fff0f0">'pentru'</span>, <span style="background-color: #fff0f0">'cat'</span>, <span style="background-color: #fff0f0">'cate'</span>, <span style="background-color: #fff0f0">'much'</span>, <span style="background-color: #fff0f0">'more'</span>, <span style="background-color: #fff0f0">'many'</span>, <span style="background-color: #fff0f0">'sale'</span>, <span style="background-color: #fff0f0">'tale'</span>, <span style="background-color: #fff0f0">'tau'</span>, <span style="background-color: #fff0f0">'has'</span>, <span style="background-color: #fff0f0">'sunt'</span>, <span style="background-color: #fff0f0">'his'</span>, <span style="background-color: #fff0f0">'yours'</span>, <span style="background-color: #fff0f0">'only'</span>, <span style="background-color: #fff0f0">'as'</span>, <span style="background-color: #fff0f0">'toate'</span>, <span style="background-color: #fff0f0">'all'</span>, <span style="background-color: #fff0f0">'tot'</span>, <span style="background-color: #fff0f0">'incat'</span>, <span style="background-color: #fff0f0">'which'</span>, <span style="background-color: #fff0f0">'ti'</span>, <span style="background-color: #fff0f0">'asa'</span>, <span style="background-color: #fff0f0">'like'</span>, <span style="background-color: #fff0f0">'these'</span>, <span style="background-color: #fff0f0">'because'</span>, <span style="background-color: #fff0f0">'unor'</span>, <span style="background-color: #fff0f0">'caci'</span>, <span style="background-color: #fff0f0">'ele'</span>, <span style="background-color: #fff0f0">'have'</span>, <span style="background-color: #fff0f0">'haven'</span>, <span style="background-color: #fff0f0">'te'</span>, <span style="background-color: #fff0f0">'cea'</span>, <span style="background-color: #fff0f0">'else'</span>, <span style="background-color: #fff0f0">'imi'</span>, <span style="background-color: #fff0f0">'iti'</span>, <span style="background-color: #fff0f0">'should'</span>, <span style="background-color: #fff0f0">'could'</span>, <span style="background-color: #fff0f0">'not'</span>, <span style="background-color: #fff0f0">'even'</span>, <span style="background-color: #fff0f0">'chiar'</span>, <span style="background-color: #fff0f0">'when'</span>, <span style="background-color: #fff0f0">'ci'</span>, <span style="background-color: #fff0f0">'ne'</span>, <span style="background-color: #fff0f0">'ni'</span>, <span style="background-color: #fff0f0">'her'</span>, <span style="background-color: #fff0f0">'our'</span>, <span style="background-color: #fff0f0">'alta'</span>, <span style="background-color: #fff0f0">'another'</span>, <span style="background-color: #fff0f0">'other'</span>, <span style="background-color: #fff0f0">'decat'</span>, <span style="background-color: #fff0f0">'acelasi'</span>, <span style="background-color: #fff0f0">'same'</span>, <span style="background-color: #fff0f0">'au'</span>, <span style="background-color: #fff0f0">'had'</span>, <span style="background-color: #fff0f0">'haven'</span>, <span style="background-color: #fff0f0">'hasn'</span>, <span style="background-color: #fff0f0">'alte'</span>, <span style="background-color: #fff0f0">'alt'</span>, <span style="background-color: #fff0f0">'others'</span>, <span style="background-color: #fff0f0">'ceea'</span>, <span style="background-color: #fff0f0">'cel'</span>, <span style="background-color: #fff0f0">'cele'</span>, <span style="background-color: #fff0f0">'alte'</span>, <span style="background-color: #fff0f0">'despre'</span>, <span style="background-color: #fff0f0">'about'</span>, <span style="background-color: #fff0f0">'acele'</span>, <span style="background-color: #fff0f0">'acel'</span>, <span style="background-color: #fff0f0">'acea'</span>, <span style="background-color: #fff0f0">'decit'</span>, <span style="background-color: #fff0f0">'with'</span>, <span style="background-color: #fff0f0">'_'</span>, <span style="background-color: #fff0f0">'fata'</span>, <span style="background-color: #fff0f0">'towards'</span>, <span style="background-color: #fff0f0">'against'</span>, <span style="background-color: #fff0f0">'cind'</span>, <span style="background-color: #fff0f0">'dinspre'</span>, <span style="background-color: #fff0f0">'fost'</span>, <span style="background-color: #fff0f0">'been'</span>, <span style="background-color: #fff0f0">'era'</span> ] PATTERN_LINK <span style="color: #333333">=</span> <span style="background-color: #fff0f0">"<a href=</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\"</span><span style="background-color: #fff0f0">{}</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\"</span><span style="background-color: #fff0f0"> target=</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\"</span><span style="background-color: #fff0f0">_new</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\"</span><span style="background-color: #fff0f0">>{}</a>"</span> <span style="color: #DD4422">'''</span> <span style="color: #DD4422">structura dictionar cuvinte</span> <span style="color: #DD4422">{</span> <span style="color: #DD4422"> "cuvantul1": [lista_linkuri1],</span> <span style="color: #DD4422"> "cuvantul2": [lista_linkuri2]</span> <span style="color: #DD4422">}</span> <span style="color: #DD4422">'''</span> CALE_FISIER_LINKURI <span style="color: #333333">=</span> <span style="background-color: #fff0f0">"C:</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\\</span><span style="background-color: #fff0f0">Folder1</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\\</span><span style="background-color: #fff0f0">LINKS</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\\</span><span style="background-color: #fff0f0">links.txt"</span> <span style="color: #888888"># folosim DEF cand vrem sa definim o functie => un cuvant cheie in Python</span> <span style="color: #888888"># REGULA: def nume_functie(lista_argumente)</span> <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">preia_cuvinte_link</span>(link): cuvinte <span style="color: #333333">=</span> link<span style="color: #333333">.</span>split(<span style="background-color: #fff0f0">'.'</span>)[<span style="color: #0000DD; font-weight: bold">0</span>] <span style="color: #888888"># [0] ia primul element iar daca pun [1] ia al doilea element</span> cuvinte <span style="color: #333333">=</span> cuvinte<span style="color: #333333">.</span>split(<span style="background-color: #fff0f0">'-'</span>) cuvinte_ok <span style="color: #333333">=</span> <span style="color: #007020">list</span>() <span style="color: #008800; font-weight: bold">for</span> cuv <span style="color: #000000; font-weight: bold">in</span> cuvinte: <span style="color: #008800; font-weight: bold">if</span> cuv <span style="color: #000000; font-weight: bold">not</span> <span style="color: #000000; font-weight: bold">in</span> LISTA_CUVINTE_LEGATURA: cuvinte_ok<span style="color: #333333">.</span>append(cuv) <span style="color: #008800; font-weight: bold">return</span> cuvinte_ok <span style="color: #888888"># am pus retutn fiindca voi avea nevoie de rezultatul functiei de mai sus</span> <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">preia_cuvinte_lista_linkuri</span>(cale_fisier_linkuri): lista_cuvinte_linkuri <span style="color: #333333">=</span> <span style="color: #007020">list</span>() dictionar_cuvinte_linkuri <span style="color: #333333">=</span> <span style="color: #007020">dict</span>() <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(cale_fisier_linkuri, encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf8'</span>) <span style="color: #008800; font-weight: bold">as</span> fp: lines <span style="color: #333333">=</span> fp<span style="color: #333333">.</span>readlines() <span style="color: #008800; font-weight: bold">for</span> line <span style="color: #000000; font-weight: bold">in</span> lines: <span style="color: #888888"># functia preia_cuvinte_link returneaza un rezultat care este salvat in variabila cuvinte_link</span> cuvinte_link <span style="color: #333333">=</span> preia_cuvinte_link(line) <span style="color: #008800; font-weight: bold">for</span> cuv <span style="color: #000000; font-weight: bold">in</span> cuvinte_link: <span style="color: #008800; font-weight: bold">if</span> cuv <span style="color: #000000; font-weight: bold">in</span> dictionar_cuvinte_linkuri<span style="color: #333333">.</span>keys(): <span style="color: #008800; font-weight: bold">if</span> <span style="color: #000000; font-weight: bold">not</span> SITE <span style="color: #333333">+</span> line<span style="color: #333333">.</span>strip() <span style="color: #000000; font-weight: bold">in</span> dictionar_cuvinte_linkuri[cuv]: dictionar_cuvinte_linkuri[cuv]<span style="color: #333333">.</span>append(SITE <span style="color: #333333">+</span> line<span style="color: #333333">.</span>strip()) <span style="color: #008800; font-weight: bold">else</span>: dictionar_cuvinte_linkuri[cuv] <span style="color: #333333">=</span> [SITE <span style="color: #333333">+</span> line<span style="color: #333333">.</span>strip()] lista_cuvinte_linkuri<span style="color: #333333">.</span>extend(cuvinte_link) lista_cuvinte_linkuri <span style="color: #333333">=</span> <span style="color: #007020">list</span>(<span style="color: #007020">set</span>(lista_cuvinte_linkuri)) <span style="color: #008800; font-weight: bold">return</span> lista_cuvinte_linkuri, dictionar_cuvinte_linkuri <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">citeste_fisier_linie_cu_linie</span>(cale_fisier): <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(cale_fisier, encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf8'</span>) <span style="color: #008800; font-weight: bold">as</span> fp: lines <span style="color: #333333">=</span> fp<span style="color: #333333">.</span>readlines() count <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">0</span> <span style="color: #008800; font-weight: bold">for</span> line <span style="color: #000000; font-weight: bold">in</span> lines: <span style="color: #007020">print</span>(count, line<span style="color: #333333">.</span>strip()) count <span style="color: #333333">+=</span> <span style="color: #0000DD; font-weight: bold">1</span> <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">read_text_from_file</span>(file_path): <span style="color: #DD4422">"""</span> <span style="color: #DD4422"> Aceasta functie returneaza continutul unui fisier.</span> <span style="color: #DD4422"> file_path: calea catre fisierul din care vrei sa citesti</span> <span style="color: #DD4422"> """</span> <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(file_path, encoding<span style="color: #333333">=</span><span style="background-color: #fff0f0">'utf8'</span>) <span style="color: #008800; font-weight: bold">as</span> f: text <span style="color: #333333">=</span> f<span style="color: #333333">.</span>read() <span style="color: #008800; font-weight: bold">return</span> text <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">write_to_file</span>(text, file_path): <span style="color: #DD4422">"""</span> <span style="color: #DD4422"> Aceasta functie scrie un text intr-un fisier.</span> <span style="color: #DD4422"> text: textul pe care vrei sa il scrii</span> <span style="color: #DD4422"> file_path: calea catre fisierul in care vrei sa scrii</span> <span style="color: #DD4422"> """</span> <span style="color: #008800; font-weight: bold">with</span> <span style="color: #007020">open</span>(file_path, <span style="background-color: #fff0f0">'wb'</span>) <span style="color: #008800; font-weight: bold">as</span> f: f<span style="color: #333333">.</span>write(text<span style="color: #333333">.</span>encode(<span style="background-color: #fff0f0">'utf8'</span>, <span style="background-color: #fff0f0">'ignore'</span>)) <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">split_propozitii</span>(text): <span style="color: #888888"># 01.02.2022: folosit librarie pentru extragerea propozitiilor</span> propozitii <span style="color: #333333">=</span> tokenize<span style="color: #333333">.</span>sent_tokenize(text) <span style="color: #888888"># 01.02.2022: scoatem spatiile in plus de la inceputul/finalul propozitiilor si facem prima litera mare</span> propozitii <span style="color: #333333">=</span> [prop<span style="color: #333333">.</span>strip()<span style="color: #333333">.</span>capitalize() <span style="color: #008800; font-weight: bold">for</span> prop <span style="color: #000000; font-weight: bold">in</span> propozitii] <span style="color: #888888"># 01.02.2022: scot spatiile in plus de la final de propozitie. De exemplu: "ana are mere ?" => "ana are mere?"</span> propozitii <span style="color: #333333">=</span> [prop[:<span style="color: #333333">-</span><span style="color: #0000DD; font-weight: bold">1</span>]<span style="color: #333333">.</span>strip() <span style="color: #333333">+</span> prop[<span style="color: #333333">-</span><span style="color: #0000DD; font-weight: bold">1</span>] <span style="color: #008800; font-weight: bold">for</span> prop <span style="color: #000000; font-weight: bold">in</span> propozitii] <span style="color: #888888"># 31.01.2022: modificat tag-ul p si adaugat css (4)</span> tag <span style="color: #333333">=</span> <span style="background-color: #fff0f0">"<p class=</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\"</span><span style="background-color: #fff0f0">mb-40px</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\"</span><span style="background-color: #fff0f0">>{}</p>"</span> text_start_final <span style="color: #333333">=</span> <span style="background-color: #fff0f0">""</span> <span style="color: #888888"># print(len(propozitii))</span> numar_propozitii_grup <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">7</span> numar_grupuri <span style="color: #333333">=</span> <span style="color: #007020">int</span>(<span style="color: #007020">len</span>(propozitii) <span style="color: #333333">/</span> numar_propozitii_grup) start <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">0</span> LINK_INTRODUS <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">0</span> <span style="color: #008800; font-weight: bold">for</span> numar_grup <span style="color: #000000; font-weight: bold">in</span> <span style="color: #007020">range</span>(numar_grupuri): <span style="color: #888888"># print("Iteratia: ", numar_grup)</span> lista_cuvinte_gasite <span style="color: #333333">=</span> <span style="color: #007020">list</span>() <span style="color: #008800; font-weight: bold">if</span> numar_grup <span style="color: #333333">!=</span> <span style="color: #0000DD; font-weight: bold">0</span> <span style="color: #000000; font-weight: bold">and</span> numar_grup <span style="color: #333333">!=</span> numar_grupuri <span style="color: #333333">-</span> <span style="color: #0000DD; font-weight: bold">1</span>: <span style="color: #888888"># 31.01.2022: fixat bug (1)</span> text_tag <span style="color: #333333">=</span> <span style="background-color: #fff0f0">" "</span><span style="color: #333333">.</span>join(propozitii[start:(start <span style="color: #333333">+</span> numar_propozitii_grup)]) <span style="color: #008800; font-weight: bold">if</span> LINK_INTRODUS <span style="color: #333333">==</span> <span style="color: #0000DD; font-weight: bold">0</span>: cuvinte <span style="color: #333333">=</span> re<span style="color: #333333">.</span>findall(<span style="background-color: #fff0f0">r' (?:\w|-*\!)+[ ,]'</span>, text_tag) cuvinte_linkuri, dictionar_linkuri <span style="color: #333333">=</span> preia_cuvinte_lista_linkuri(CALE_FISIER_LINKURI) <span style="color: #008800; font-weight: bold">for</span> cuv <span style="color: #000000; font-weight: bold">in</span> cuvinte: cuv_fara_semne <span style="color: #333333">=</span> cuv<span style="color: #333333">.</span>replace(<span style="background-color: #fff0f0">' '</span>, <span style="background-color: #fff0f0">''</span>) cuv_fara_semne <span style="color: #333333">=</span> cuv_fara_semne<span style="color: #333333">.</span>replace(<span style="background-color: #fff0f0">','</span>, <span style="background-color: #fff0f0">''</span>) <span style="color: #008800; font-weight: bold">if</span> cuv_fara_semne <span style="color: #000000; font-weight: bold">in</span> dictionar_linkuri<span style="color: #333333">.</span>keys(): lista_cuvinte_gasite<span style="color: #333333">.</span>append(cuv) lista_cuvinte_gasite <span style="color: #333333">=</span> <span style="color: #007020">list</span>(<span style="color: #007020">set</span>(lista_cuvinte_gasite)) cuvant_random <span style="color: #333333">=</span> random<span style="color: #333333">.</span>sample(lista_cuvinte_gasite, <span style="color: #0000DD; font-weight: bold">1</span>)[<span style="color: #0000DD; font-weight: bold">0</span>] cuvant_random_fara_semne <span style="color: #333333">=</span> cuvant_random<span style="color: #333333">.</span>replace(<span style="background-color: #fff0f0">' '</span>, <span style="background-color: #fff0f0">''</span>) cuvant_random_fara_semne <span style="color: #333333">=</span> cuvant_random_fara_semne<span style="color: #333333">.</span>replace(<span style="background-color: #fff0f0">','</span>, <span style="background-color: #fff0f0">''</span>) link_random <span style="color: #333333">=</span> random<span style="color: #333333">.</span>sample(dictionar_linkuri[cuvant_random_fara_semne], <span style="color: #0000DD; font-weight: bold">1</span>)[<span style="color: #0000DD; font-weight: bold">0</span>] <span style="color: #888888"># singur cuvant subliniat</span> pattern <span style="color: #333333">=</span> PATTERN_LINK<span style="color: #333333">.</span>format(link_random, cuvant_random<span style="color: #333333">.</span>strip()) text_tag <span style="color: #333333">=</span> text_tag<span style="color: #333333">.</span>replace(cuvant_random<span style="color: #333333">.</span>strip(), pattern, <span style="color: #0000DD; font-weight: bold">1</span>) LINK_INTRODUS <span style="color: #333333">=</span> <span style="color: #0000DD; font-weight: bold">1</span> <span style="color: #888888"># doua cuvinte subliniate</span> <span style="color: #DD4422">'''</span> <span style="color: #DD4422"> expresie_regulata = cuvant_random.strip() + r' *\w+'</span> <span style="color: #DD4422"> urmatorul_cuvant = re.findall(expresie_regulata, text_tag)[0]</span> <span style="color: #DD4422"> pattern = PATTERN_LINK.format(link_random, urmatorul_cuvant)</span> <span style="color: #DD4422"> text_tag = text_tag.replace(urmatorul_cuvant, pattern, 1)</span> <span style="color: #DD4422"> LINK_INTRODUS = 1</span> <span style="color: #DD4422"> '''</span> text_tag <span style="color: #333333">=</span> tag<span style="color: #333333">.</span>format(text_tag) text_start_final <span style="color: #333333">=</span> text_start_final <span style="color: #333333">+</span> <span style="background-color: #fff0f0">'</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n</span><span style="background-color: #fff0f0">'</span> <span style="color: #333333">+</span> text_tag start <span style="color: #333333">=</span> start <span style="color: #333333">+</span> numar_propozitii_grup <span style="color: #008800; font-weight: bold">else</span>: <span style="color: #888888"># 31.01.2022: fixat bug (1)</span> text_tag <span style="color: #333333">=</span> <span style="background-color: #fff0f0">" "</span><span style="color: #333333">.</span>join(propozitii[start:(start <span style="color: #333333">+</span> numar_propozitii_grup)]) text_tag <span style="color: #333333">=</span> tag<span style="color: #333333">.</span>format(text_tag) text_start_final <span style="color: #333333">=</span> text_start_final <span style="color: #333333">+</span> <span style="background-color: #fff0f0">'</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n</span><span style="background-color: #fff0f0">'</span> <span style="color: #333333">+</span> text_tag start <span style="color: #333333">=</span> start <span style="color: #333333">+</span> numar_propozitii_grup text_tag <span style="color: #333333">=</span> <span style="background-color: #fff0f0">" "</span><span style="color: #333333">.</span>join(propozitii[start:<span style="color: #007020">len</span>(propozitii)]) text_tag <span style="color: #333333">=</span> tag<span style="color: #333333">.</span>format(text_tag) text_start_final <span style="color: #333333">=</span> text_start_final <span style="color: #333333">+</span> <span style="background-color: #fff0f0">'</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n</span><span style="background-color: #fff0f0">'</span> <span style="color: #333333">+</span> text_tag <span style="color: #888888"># print(text_start_final)</span> <span style="color: #888888"># 31.01.2022: Verificat, paragrafele se afiseaza frumos unul sub altul (5)</span> <span style="color: #008800; font-weight: bold">return</span> text_start_final <span style="color: #008800; font-weight: bold">def</span> <span style="color: #0066BB; font-weight: bold">copiaza_continut_txt_html</span>(cale_fisier_txt, cale_fisier_html): <span style="color: #888888"># astea sunt argumentele functiei, adica cand apelez functia</span> <span style="color: #888888"># citesti textul din fisier</span> text_txt <span style="color: #333333">=</span> read_text_from_file(cale_fisier_txt) <span style="color: #888888"># split dupa '\n'</span> lines <span style="color: #333333">=</span> text_txt<span style="color: #333333">.</span>splitlines() ok_lines <span style="color: #333333">=</span> <span style="color: #007020">list</span>() <span style="color: #008800; font-weight: bold">for</span> line <span style="color: #000000; font-weight: bold">in</span> lines: <span style="color: #008800; font-weight: bold">if</span> line <span style="color: #333333">==</span> <span style="background-color: #fff0f0">''</span> <span style="color: #000000; font-weight: bold">or</span> line <span style="color: #333333">==</span> <span style="background-color: #fff0f0">'</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\ufeff</span><span style="background-color: #fff0f0">'</span>: <span style="color: #008800; font-weight: bold">continue</span> <span style="color: #008800; font-weight: bold">else</span>: ok_lines<span style="color: #333333">.</span>append(line) <span style="color: #888888"># 02.02.2022: titlul e format din primele 10 cuvinte din text</span> <span style="color: #888888"># title_words = re.findall(r'(?:\w|-*\!)+', ok_lines[0])</span> title_words <span style="color: #333333">=</span> re<span style="color: #333333">.</span>findall(<span style="background-color: #fff0f0">r'(?:\w|-*\!)+'</span>, ok_lines[<span style="color: #0000DD; font-weight: bold">0</span>])[:<span style="color: #0000DD; font-weight: bold">10</span>] description_words <span style="color: #333333">=</span> re<span style="color: #333333">.</span>findall(<span style="background-color: #fff0f0">r'(?:\w|-*\!)+'</span>, ok_lines[<span style="color: #0000DD; font-weight: bold">0</span>]) description_words <span style="color: #333333">=</span> <span style="background-color: #fff0f0">u' '</span><span style="color: #333333">.</span>join(description_words[:<span style="color: #0000DD; font-weight: bold">20</span>]) <span style="color: #888888"># print("title: ", title_words)</span> <span style="color: #888888"># print("description: ", description_words)</span> text_html <span style="color: #333333">=</span> read_text_from_file(cale_fisier_html) <span style="color: #888888"># aici e pattern-ul pentru expresia regex; (.*?) inseamna ca preia tot ce este intre tag-uri</span> <span style="color: #888888"># modifici expresia regulata in functie de ce tag dai ca argument pentru functie</span> articol_pattern <span style="color: #333333">=</span> re<span style="color: #333333">.</span>compile(<span style="background-color: #fff0f0">'<!-- ARTICOL START -->([\s\S]*?)<!-- ARTICOL FINAL -->[\s\S]*?'</span>) text_articol <span style="color: #333333">=</span> re<span style="color: #333333">.</span>findall(articol_pattern, text_html) <span style="color: #008800; font-weight: bold">if</span> <span style="color: #007020">len</span>(text_articol) <span style="color: #333333">!=</span> <span style="color: #0000DD; font-weight: bold">0</span>: text_articol <span style="color: #333333">=</span> text_articol[<span style="color: #0000DD; font-weight: bold">0</span>] text_txt <span style="color: #333333">=</span> split_propozitii(text_txt) text_txt <span style="color: #333333">=</span> <span style="background-color: #fff0f0">'</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n\n</span><span style="background-color: #fff0f0">'</span> <span style="color: #333333">+</span> text_txt <span style="color: #333333">+</span> <span style="background-color: #fff0f0">'</span><span style="color: #666666; font-weight: bold; background-color: #fff0f0">\n\n</span><span style="background-color: #fff0f0">'</span> text_html <span style="color: #333333">=</span> text_html<span style="color: #333333">.</span>replace(text_articol, text_txt) <span style="color: #008800; font-weight: bold">else</span>: <span style="color: #007020">print</span>(<span style="background-color: #fff0f0">"Fisier html fara ARTICOL START/FINAL."</span>) title_pattern <span style="color: #333333">=</span> re<span style="color: #333333">.</span>compile(<span style="background-color: #fff0f0">'<title>Python: Búðu til margar HTML skrár úr textaskrám og tag hagræðingu') text_title = re.findall(title_pattern, text_html) # 01.02.2022: inlocuire h3 cu text titlu (2) h3_pattern = re.compile('

\"font-weight-normal\">\"javascript:void\(0\)\" class=\"color-black\">(.*?)

') text_h3 = re.findall(h3_pattern, text_html) if len(text_title) != 0: text_title = text_title[0] # inlocuire semne expresii_regex = [r'\.', r'\,', r'\?', r'\!', r'\:', r'\;', r'\"'] for exp_reg in expresii_regex: title_words = [re.sub(exp_reg, '-', word) for word in title_words] # creare nume nou link new_filename = u'-'.join(title_words).lower() new_file_name_fara_spatiu = unidecode.unidecode(new_filename) new_file_name_fara_spatiu = new_file_name_fara_spatiu + '.html' # inlocuire text titlu cu primele 10 cuvinte text_html = text_html.replace(text_title, u' '.join(title_words)) # 01.02.2022: inlocuire h3 cu text titlu (2) if len(text_h3) != 0: text_h3 = text_h3[0] text_html = text_html.replace(text_h3, u' '.join(title_words)) else: print("Fisierul nu are tag-ul h3.") # 07.02.2022: inlocuire text canonical tag canonical_tag_pattern = re.compile('') canonical_tag = re.findall(canonical_tag_pattern, text_html) if len(canonical_tag) != 0: canonical_tag = canonical_tag[0] #text_html = text_html.replace(canonical_tag, new_file_name_fara_spatiu) # daca trebuie sa pui si "https://neculaifantanaru.com/" in fata, comentezi linia de mai sus si o decomentezi pe cea de jos text_html = text_html.replace(canonical_tag, "https://trinketbox.ro/" + new_file_name_fara_spatiu) else: print("Fisier fara tag canonical") else: print("Fisier html fara titlu.") description_pattern = re.compile(') text_description = re.findall(description_pattern, text_html) if len(text_description) != 0: text_description = text_description[0] # print("text description: ", text_description) text_html = text_html.replace(text_description, description_words) else: print("Fisier html fara description.") file_path = os.path.dirname(cale_fisier_txt) + "\\" + "fisiere_html" + "\\" + new_file_name_fara_spatiu write_to_file(text_html, file_path) # print("Fisier: ", new_file_name_fara_spatiu) print("Scriere efectuata cu succes.") def creare_fisiere_html(cale_folder_txt, cale_fisier_html): """ Functia itereaza printr-un folder care contine fisiere txt si creeaza fisiere html corespunzatoare """ count = 0 for f in os.listdir(cale_folder_txt): if f.endswith('.txt'): cale_fisier_txt = cale_folder_txt + "\\" + f copiaza_continut_txt_html(cale_fisier_txt, cale_fisier_html) count += 1 else: continue print("Numarul de fisiere modificate: ", count) def main(): creare_fisiere_html("C:\\Folder1", "C:\\Folder1\\index.html") # lista_cuvinte, dictionar_cuvinte = preia_cuvinte_lista_linkuri(CALE_FISIER_LINKURI) # print(len(lista_cuvinte)) # len - arata dmensiunea # print(dictionar_cuvinte) if __name__ == '__main__': main()

That's all folks.

Einnig sjá þettaKóði.eðaÚtgáfa 4.eðaÚtgáfa 5.eðaÚtgáfa 6.


Latest articles accessed by readers:

  1. An Eye To See And A Mind To Understand
  2. Turn Towards Me With An Eye Full Of Your Own Gaze
  3. The Snapshot Of Magic In God's Universe
  4. Rhythm Of My Heart

Donate via Paypal

Alternate Text

RECURRENT DONATION

Donate monthly to support
the NeculaiFantanaru.com project

SINGLE DONATION

Donate the desired amount to support
the NeculaiFantanaru.com project

Donate by Bank Transfer

Account Ron: RO34INGB0000999900448439

Open account at ING Bank

Join The Neculai Fantanaru Community



* Note: If you want to read all my articles in real time, please check the romanian version !

decoration
About | Site Map | Partners | Feedback | Terms & Conditions | Privacy | RSS Feeds
© Neculai Fântânaru - All rights reserved