Neculai Fântânaru

Everything Depends On The Leader

Python: Find Those Links That Are Repeated In Other Html Pages In The Same Folder

On Noiembrie 23, 2021
, in
Python Scripts Examples by Neculai Fantanaru

You can view the full code here: https://pastebin.com/V1MDx0yd

Install Python.

There are several html links, all included in the section < ! -- FLAGS_1 --> to < ! -- FLAGS -->

All html files have this structure below, only the links are different. So none of the links below should be repeated in other html pages (in the FLAGS section).

And all the links start with https://neculaifantanaru.com/

<!-- FLAGS_1 -->

<div class="cautareField">
  <div align="right">

  <a href="https://neculaifantanaru.com/stralucirea-nestematei.html">
  <a href="https://neculaifantanaru.com/fr/l-eclat-de-la-gemme.html">
  <a href="https://neculaifantanaru.com/en/brilliance-of-the-gem.html">
  <a href="https://neculaifantanaru.com/es/gema-stargaionss.html">
  <a href="https://neculaifantanaru.com/pt/brilho-da-gema.html">
  <a href="https://neculaifantanaru.com/ar/my-name-is-prince.html">
  <a href="https://neculaifantanaru.com/zh/books-and-magic.html">
  <a href="https://neculaifantanaru.com/hi/many-things.html">
  <a href="https://neculaifantanaru.com/de/horror-scenario.html">
  <a href="https://neculaifantanaru.com/ru/everything-is-here.html">
  
</div>
</div>

<!-- FLAGS -->

CODUL: Find all html files (from the same folder) that have identical links in the section < ! -- FLAGS_1 --> The code will also show recurring links and html pages where they are.

import sys
import re
import os


def read_text_from_file(file_path):
    """
    Aceasta functie returneaza continutul unui fisier.
    file_path: calea catre fisierul din care vrei sa citesti
    """
    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text


def write_to_file(text, file_path):
    """
    Aceasta functie scrie un text intr-un fisier.
    text: textul pe care vrei sa il scrii
    file_path: calea catre fisierul in care vrei sa scrii
    """
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore'))


def extragere_linkuri(cale_fisier_html):
    text_html = read_text_from_file(cale_fisier_html)
    flags_pattern = re.compile('<!-- FLAGS_1 -->([\s\S]*?)<!-- FLAGS -->[\s\S]*?')
    text_flags = re.findall(flags_pattern, text_html)
    if len(text_flags) != 0:
        text_flags = text_flags[0]
        link_pattern = 'href=\"(.*?)\"'
        links = re.findall(link_pattern, text_flags)
        links = list(set(links))
    return links

def verificare_fisiere(cale_folder_fisiere, cale_fisier_rezultat):
    cai_fisiere = list()
    lista_linkuri = list()
    for f in os.listdir(cale_folder_fisiere):
            if f.endswith('.html'):
                cale_fisier_html = cale_folder_fisiere + "\\" + f
                links = extragere_linkuri(cale_fisier_html)
                cai_fisiere.append(cale_fisier_html)
                lista_linkuri.append(links)
            else:
                continue
    rezultate = ''
    for i in range(0, len(lista_linkuri)):
        for j in range(i + 1, len(lista_linkuri)):
            if len(set(lista_linkuri[i]).intersection(set(lista_linkuri[j]))) != 0:
                rezultate += "Fisiere comune: \n"
                print("Fisiere comune: ")
                for link in set(lista_linkuri[i]).intersection(set(lista_linkuri[j])):
                    rezultate += link
                    rezultate += '\n'
                    print(link, '\n')
                rezultate += 'Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j])
                rezultate += '\n\n'
                print('Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j]))
                print('\n\n')
    limba = "en" # BEBE AICI VEZI EXACT FOLDERUL, sa lasi doar "" daca vrei sa cauti in limba romana
    rezultate += "==========={}============\n\n".format(limba.upper())
    print("==========={}============\n\n".format(limba.upper()))
    for i in range(0, len(lista_linkuri)):
        for j in range(i + 1, len(lista_linkuri)):
            linkuri_limba = list()
            if len(set(lista_linkuri[i]).intersection(set(lista_linkuri[j]))) != 0:
                for link in set(lista_linkuri[i]).intersection(set(lista_linkuri[j])):
                    if limba in link.split('/'):
                        linkuri_limba.append(link)
                if len(linkuri_limba) != 0:
                    rezultate += "Fisiere comune: \n"

                    print("Fisiere comune: ")
                    for link in linkuri_limba:
                        rezultate += link
                        rezultate += '\n'
                        print(link, '\n')
                        rezultate += 'Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j])
                        rezultate += '\n\n'
                        print('Fisier {} ARE LINKURI IN COMUN CU: {}'.format(cai_fisiere[i], cai_fisiere[j]))
                        print('\n\n')

    write_to_file(rezultate, cale_fisier_rezultat)

if __name__ == "__main__":
    verificare_fisiere("c:\\Folder1", "c:\\Folder1\\rezultate.txt")
    # verificare_fisiere("e:\\Carte\\BB\\17 - Site Leadership\\Principal\\en", "c:\\Folder1\\rezultate.txt")

That's all folks.

If you like my code, then make me a favor: translate your website into Romanian, "ro".

Also, see this VERSION 2 or VERSION 3 or VERSION 4 or VERSION 5 or VERSION 6 or VERSION 7

Alatura-te Comunitatii Neculai Fantanaru
The 63 Greatest Qualities of a Leader
Cele 63 de calităţi ale liderului

Why read this book? Because it is critical to optimizing your performance. Because it reveals the main coordinates after that are build the character and skills of the leaders, highlighting what it is important for them to increase their influence.

Leadership - Magic of Mastery
Atingerea maestrului

The essential characteristic of this book in comparison with others on the market in the same domain is that it describes through examples the ideal competences of a leader. I never claimed that it's easy to become a good leader, but if people will...

The Master Touch
Leadership - Magia măiestriei

For some leaders, "leading" resembles more to a chess game, a game of cleverness and perspicacity; for others it means a game of chance, a game they think they can win every time risking and betting everything on a single card.

Leadership Puzzle
Leadership Puzzle

I wrote this book that conjoins in a simple way personal development with leadership, just like a puzzle, where you have to match all the given pieces in order to recompose the general image.

Performance in Leading
Leadership - Pe înţelesul tuturor

The aim of this book is to offer you information through concrete examples and to show you how to obtain the capacity to make others see things from the same angle as you.

Leadership for Dummies
Leadership - Pe înţelesul tuturor

Without considering it a concord, the book is representing the try of an ordinary man - the author - who through simple words, facts and usual examples instills to the ordinary man courage and optimism in his own quest to be his own master and who knows... maybe even a leader.