Python: Delete double empty spaces in html tags

Name: Python: Delete double empty spaces in html tags | Neculai Fantanaru (en)
Brand: Neculai Fantanaru
SKU: NFL
Availability: OnlineOnly
Rating: 5 (55 reviews)

On Noiembrie 23, 2021

, in

Python Scripts Examples by Neculai Fantanaru

You can view the full code here: https://pastebin.com/e2vY70di

Install Python.

Python code finds all html tags that contain double blanks between words, and will leave a single blank space between words.

It will also delete any empty space at the beginning and end of each line that is contained in the html tags. I only took the tags into account .. si ..

<p class="obisnuit"><em>    Honor your  moral and spiritual      obligations   .</em></p>
<p class="nint">  Bishop  knew how to say the    most meaningful    of things  speech. </p>

Will become:

<p class="obisnuit"><em>Honor your moral and spiritual obligations.</em></p>
<p class="nint">Bishop knew how to say the most meaningful of things speech.</p>

THE CODE: Copy and run the code below in any interpreter program (I use pyScripter) . Don't forget to change the path on the "directory_name =" line.

import re
import os


def read_text_from_file(file_path):
    """
    Aceasta functie returneaza continutul unui fisier.
    file_path: calea catre fisierul din care vrei sa citesti
    """
    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text


def write_to_file(text, file_path):
    """
    Aceasta functie scrie un text intr-un fisier.
    text: textul pe care vrei sa il scrii
    file_path: calea catre fisierul in care vrei sa scrii
    """
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore'))


def replace_white_spaces(tag_name, file_path):
    """
    Aceasta functie modifica textul dintre un tag dat ca argument.
    """
    # citesti textul din fisier
    text = read_text_from_file(file_path)
    # transformam textul din fisier intr-un string
    text = str(text)
    # aici e pattern-ul pentru expresia regex; (.*?) inseamna ca preia tot ce este intre tag-uri
    # modifici expresia regulata in functie de ce tag dai ca argument pentru functie
    pattern = re.compile('<{} class=\".*?\">(.*?)</{}>'.format(tag_name, tag_name))  
    # aici se preiau toate textele dintre tag-uri
    tag_texts = re.findall(pattern, text)
    for tag_text in tag_texts:
        # strip taie toate spatiile de la inceputul si finalul text-ului
        new_text = tag_text.strip()
        m = re.findall('<em>(.*?)</em>', new_text)
        if len(m) >= 1:
            text_em = str(m[0])
            text_em_new = text_em.strip()
            new_text = new_text.replace(text_em, text_em_new)
        # facem split la text dupa spatiu si apoi unim cuvintele gasite printr-un singur spatiu
        new_text = " ".join(new_text.split())
        # textul nou va fi textul initial, dar care are textul dintre tag-uri inlocuit cu textul prelucrat
        text = text.replace(tag_text, new_text)
    # la final suprascriem continutul initial al fisierului cu noul continut
    write_to_file(text, file_path)


def replace_white_spaces_only_html_php(tag_name, directory_name):
    for file in os.listdir(directory_name):
        filename = str(file)
        print(filename)
        # verificam daca fisierul se termina cu extensia html sau php
        if filename.endswith(".html") or filename.endswith(".php"):
            file_path = os.path.join(directory_name, filename)
            # pentru fiecare fisier gasit, stergem spatiile in plus
            replace_white_spaces(tag_name, file_path)
        else:
            continue


if __name__ == '__main__':
    # setezi numele folderului
    # nu uita de slash-urile duble
    directory_name = "c:\\Folder2\\5"
    # setezi numele tag-ului
    tag_name = 'p'
    # apelezi functia care itereaza prin director
    replace_white_spaces_only_html_php(tag_name, directory_name)

That's all folks.

If you like my code, then make me a favor: translate your website into Romanian, "ro".

Also, see this VERSION 2 or VERSION 3 or VERSION 4 or VERSION 5 or VERSION 6 or VERSION 7

Alatura-te Comunitatii Neculai Fantanaru

The 63 Greatest Qualities of a Leader

Why read this book? Because it is critical to optimizing your performance. Because it reveals the main coordinates after that are build the character and skills of the leaders, highlighting what it is important for them to increase their influence.

The essential characteristic of this book in comparison with others on the market in the same domain is that it describes through examples the ideal competences of a leader. I never claimed that it's easy to become a good leader, but if people will...

For some leaders, "leading" resembles more to a chess game, a game of cleverness and perspicacity; for others it means a game of chance, a game they think they can win every time risking and betting everything on a single card.

I wrote this book that conjoins in a simple way personal development with leadership, just like a puzzle, where you have to match all the given pieces in order to recompose the general image.

The aim of this book is to offer you information through concrete examples and to show you how to obtain the capacity to make others see things from the same angle as you.

Without considering it a concord, the book is representing the try of an ordinary man - the author - who through simple words, facts and usual examples instills to the ordinary man courage and optimism in his own quest to be his own master and who knows... maybe even a leader.

Python: Delete double empty spaces in html tags

The Most Read

The 63 Greatest Qualities of a Leader

Leadership - Magic of Mastery

The Master Touch

Leadership Puzzle

Performance in Leading

Leadership for Dummies

Python: Delete double empty spaces in html tags

The Most Read

Categories

The 63 Greatest Qualities of a Leader

Leadership - Magic of Mastery

The Master Touch

Leadership Puzzle

Performance in Leading

Leadership for Dummies