ro  fr  en  es  pt  ar  zh  hi  de  ru
ART 2.0 ART 3.0 ART 4.0 ART 5.0 ART 6.0 Pinterest

Python: Mitadiava ireo rohy ireo izay miverimberina amin'ny pejy html hafa ao amin'ny lahatahiry mitovy

On Noiembrie 23, 2021, in Leadership and Attitude, by Neculai Fantanaru

Azonao atao ny mijery ny kaody feno:Https: // passatin.com / v1mdx0 tsy maintsy

hametrakaPython.

Betsaka ny rohy HTML, izay rehetra tafiditra ao amin'ny fizaranany

Ny rakitra HTML rehetra dia manana ity firafitra ity eto ambany, ny rohy ihany no tsy mitovy. Ka tsy misy na iray aza amin'ireo rohy eto ambany etsy ambany dia tokony averina amin'ny pejy HTML hafa (ao amin'ny Faritra Flags).

Ary manomboka ny rohy rehetra HTTPS: // teraka hanelingelina azy toy ny .com /

Codul: Mitadiava rakitra HTML rehetra (avy amin'ny lahatahiry mitovy) izay manana rohy mitovy amin'ny fizarana Ny kaody ihany koa dia hampiseho rohy miverimberina sy pejy HTML izay misy azy ireo.

import sys
import re
import os


def read_text_from_file(file_path):
    """
    Aceasta functie returneaza continutul unui fisier.
    file_path: calea catre fisierul din care vrei sa citesti
    """
    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text


def write_to_file(text, file_path):
    """
    Aceasta functie scrie un text intr-un fisier.
    text: textul pe care vrei sa il scrii
    file_path: calea catre fisierul in care vrei sa scrii
    """
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore'))


def extragere_linkuri(cale_fisier_html):
    text_html = read_text_from_file(cale_fisier_html)
    flags_pattern = re.compile('([\s\S]*?)[\s\S]*?')
    text_flags = re.findall(flags_pattern, text_html)
    if len(text_flags) != 0:
        text_flags = text_flags[0]
        link_pattern = 'href=\"(.*?)\"'
        links = re.findall(link_pattern, text_flags)
        links = list(set(links))
    return links

def verificare_fisiere(cale_folder_fisiere, cale_fisier_rezultat):
    cai_fisiere = list()
    lista_linkuri = list()
    for f in os.listdir(cale_folder_fisiere):
            if f.endswith('.html'):
                cale_fisier_html = cale_folder_fisiere + "\\" + f
                links = extragere_linkuri(cale_fisier_html)
                cai_fisiere.append(cale_fisier_html)
                lista_linkuri.append(links)
            else:
                continue
    rezultate = ''
    for i in range(0, len(lista_linkuri)):
        for j in range(i + 1, len(lista_linkuri)):
            if len(set(lista_linkuri[i]).intersection(set(lista_linkuri[j]))) != 0:
                rezultate += "Fisiere comune: \n"
                print("Fisiere comune: ")
                for link in set(lista_linkuri[i]).intersection(set(lista_linkuri[j])):
                    rezultate += link
                    rezultate += '\n'
                    print(link, '\n')
                rezultate += 'Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j])
                rezultate += '\n\n'
                print('Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j]))
                print('\n\n')
    limba = "en" # BEBE AICI VEZI EXACT FOLDERUL, sa lasi doar "" daca vrei sa cauti in limba romana
    rezultate += "==========={}============\n\n".format(limba.upper())
    print("==========={}============\n\n".format(limba.upper()))
    for i in range(0, len(lista_linkuri)):
        for j in range(i + 1, len(lista_linkuri)):
            linkuri_limba = list()
            if len(set(lista_linkuri[i]).intersection(set(lista_linkuri[j]))) != 0:
                for link in set(lista_linkuri[i]).intersection(set(lista_linkuri[j])):
                    if limba in link.split('/'):
                        linkuri_limba.append(link)
                if len(linkuri_limba) != 0:
                    rezultate += "Fisiere comune: \n"

                    print("Fisiere comune: ")
                    for link in linkuri_limba:
                        rezultate += link
                        rezultate += '\n'
                        print(link, '\n')
                        rezultate += 'Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j])
                        rezultate += '\n\n'
                        print('Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j]))
                        print('\n\n')

    write_to_file(rezultate, cale_fisier_rezultat)

if __name__ == "__main__":
    verificare_fisiere("c:\\Folder1", "c:\\Folder1\\rezultate.txt")
    # verificare_fisiere("e:\\Carte\\BB\\17 - Site Leadership\\Principal\\en", "c:\\Folder1\\rezultate.txt")

That's all folks.

If you like my code, then make me a favor: translate your website into Romanian, "ro".

Jereo koa ny kaody python hafa:Version 2amin'ity code ity naVersion 3naVersion 4naVersion 5naVersion 6


Latest articles accessed by readers:

  1. An Eye To See And A Mind To Understand
  2. Turn Towards Me With An Eye Full Of Your Own Gaze
  3. The Snapshot Of Magic In God's Universe
  4. Rhythm Of My Heart

Donate via Paypal

Alternate Text

RECURRENT DONATION

Donate monthly to support
the NeculaiFantanaru.com project

SINGLE DONATION

Donate the desired amount to support
the NeculaiFantanaru.com project

Donate by Bank Transfer

Account Ron: RO34INGB0000999900448439

Open account at ING Bank

Join The Neculai Fantanaru Community



* Note: If you want to read all my articles in real time, please check the romanian version !

decoration
About | Site Map | Partners | Feedback | Terms & Conditions | Privacy | RSS Feeds
© Neculai Fântânaru - All rights reserved