Python: Find All Files That Contain Double Words (String Or Number)

Name: Python: Find All Files That Contain A Double Words (String Or Number) | Neculai Fantanaru (en)
Brand: Neculai Fantanaru
SKU: NFL
Availability: OnlineOnly
Rating: 5 (55 reviews)

On February 28, 2022

, in

Python Scripts Examples by Neculai Fantanaru

You can view the full code here: https://pastebin.com/YNCWi580

Install Python. What does the code below do?

In each html file I have a php sequence containing this variable < ! -- $item_id = NUMBER;

Number is equal to the range from 1 to 1600 (or up to what number you want) For example in a file can have < ! -- $item_id = < ! -- HTML generated using hilite.me -->23; and in other file I can have < ! -- $item_id = 1340; And so on..

I want to find those files that contain the numbers in the string that are repeated. For example I can have 23; n a file and can have the same 23; in other file. Python code will save in results_duplicates.txt all file names that contain duplicates of this type.

CODUL: Copy and run the code below in any interpreter program (I use pyScripter) .

The CODE:

import os
import re

def read_text_from_file(file_path):
    """
    Aceasta functie returneaza continutul unui fisier.
    file_path: calea catre fisierul din care vrei sa citesti
    """
    with open(file_path, encoding='utf8', errors='ignore') as f:
        text = f.read()
        return text


def write_to_file(text, file_path, encoding='utf8'):
    """
    Aceasta functie scrie un text intr-un fisier.
    text: textul pe care vrei sa il scrii
    file_path: calea catre fisierul in care vrei sa scrii
    """
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf-8', 'ignore'))


def get_duplicates(directory_path, results_file, tag):
    duplicates = dict()
    fisiere_care_nu_au_id = ''
    fisiere_duplicat = ''
    id_pattern = re.compile('\$item_id = (.*?);')
    for f in os.listdir(directory_path):
            if f.endswith('.html') and f != 'termeni-si-conditii.html' and f != "parteneri.html":
                filepath = directory_path + '//' + f
                file_text = read_text_from_file(filepath)
                number = re.findall(id_pattern, file_text)
                if len(number) != 0:
                    number = number[0]
                    number = number.strip()
                    if number in duplicates.keys():
                        duplicates[number].append(f)
                    else:
                        duplicates[number] = [f]
                else:
                    fisiere_care_nu_au_id = fisiere_care_nu_au_id + f + '\n'

    for key in duplicates.keys():
        if len(duplicates[key]) >= 2:
            for f in duplicates[key]:
                fisiere_duplicat = fisiere_duplicat + f + '\n'
            fisiere_duplicat += '\n\n'

    # i-au toate numerele din intervalul 1 - id maxim
    # modificare in numere intregi
    numere_intregi = [int(i) for i in list(duplicates.keys())]
    interval = list()
    if tag == 'ro':
        interval = [i for i in range(1, max(numere_intregi) + 1)]
    elif tag == 'en':
        interval = [i for i in range(5000, max(numere_intregi) + 1)]

    numere_care_lipsesc = list()
    for number in interval:
        if number not in numere_intregi:
            numere_care_lipsesc.append(number)
    print("MAX: ", max(numere_intregi))
    print("NUMERE CARE LIPSESC: ", numere_care_lipsesc)

    fisiere_care_lipsesc_id = ''
    for numar in numere_care_lipsesc:
        fisiere_care_lipsesc_id = fisiere_care_lipsesc_id + str(numar) + '\n'

    result = "FISIERE CARE NU AU ID \n\n" + fisiere_care_nu_au_id + '\n' + "FISIERE DUPLICAT \n\n" + fisiere_duplicat  + '\n' + "NUMERE CARE LIPSESC \n\n" + fisiere_care_lipsesc_id
    write_to_file(result, results_file)

    print("Scriere efectuata cu succes.")

if __name__ == '__main__':
    directory_path = "e:\\Carte\\BB\\17 - Site Leadership\\Principal\\en"   # AICI SCHIMB PATCH cu ro sau cu en
    results_file = "e:\\Carte\\BB\\17 - Site Leadership\\Principal\\ro\\results_duplicates.txt"  # AICI APAR REZULTATELE FINALE

    get_duplicates(directory_path, results_file, "en") # "ro"  # AICI SCHIMB PATCH cu ro sau cu en  (SCHIMBA SI MAI SUS )

That's all folks.

If you like my code, then make me a favor: translate your website into Romanian, "ro".

Also, see this VERSION 2 or VERSION 3 or VERSION 4 or VERSION 5 or VERSION 6 or VERSION 7

Alatura-te Comunitatii Neculai Fantanaru

The 63 Greatest Qualities of a Leader

Why read this book? Because it is critical to optimizing your performance. Because it reveals the main coordinates after that are build the character and skills of the leaders, highlighting what it is important for them to increase their influence.

The essential characteristic of this book in comparison with others on the market in the same domain is that it describes through examples the ideal competences of a leader. I never claimed that it's easy to become a good leader, but if people will...

For some leaders, "leading" resembles more to a chess game, a game of cleverness and perspicacity; for others it means a game of chance, a game they think they can win every time risking and betting everything on a single card.

I wrote this book that conjoins in a simple way personal development with leadership, just like a puzzle, where you have to match all the given pieces in order to recompose the general image.

The aim of this book is to offer you information through concrete examples and to show you how to obtain the capacity to make others see things from the same angle as you.

Without considering it a concord, the book is representing the try of an ordinary man - the author - who through simple words, facts and usual examples instills to the ordinary man courage and optimism in his own quest to be his own master and who knows... maybe even a leader.

Python: Find All Files That Contain Double Words (String Or Number)

The Most Read

The 63 Greatest Qualities of a Leader

Leadership - Magic of Mastery

The Master Touch

Leadership Puzzle

Performance in Leading

Leadership for Dummies

Python: Find All Files That Contain Double Words (String Or Number)

The Most Read

Categories

The 63 Greatest Qualities of a Leader

Leadership - Magic of Mastery

The Master Touch

Leadership Puzzle

Performance in Leading

Leadership for Dummies