Neculai Fântânaru

Everything Depends On The Leader

Regex & Python: Translate with beautifulsoup and DeepL only those html tags that contain certain keywords

On May 05, 2021
, in
Python Scripts Examples by Neculai Fantanaru

You can view the full code here: https://pastebin.com/NkNM4Dix

Install Python. Then install the following two libraries using the Command Prompt (cmd) interpreter in Windows10:

< ! -- HTML generated using hilite.me -->
py- m pip install pydeepl
py -m pip install beautifulsoup4     

Python will automatically translate the following html tags with the googletrans library:

< ! -- HTML generated using hilite.me -->
<title>Your Text</title>
<meta name="description" content="Your Text"/>
<p class="text_obisnuit">Your Text</p>
<p class="text_obisnuit2">Your Text</p>

< ! -- HTML generated using hilite.me -->

CODE: Copy and run the code below in any interpreter program (I use pyScripter) . Don't forget to change the path on the line "files_from_folder". And don't forget to change the API CODE.

Find here the list of languages that can be translated: LANG.

Google will automatically find the language of the files. All you have to do is change the language you want to translate to: destination_language

< ! -- HTML generated using hilite.me -->
from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter
import requests
import json
import re

class UnsortedAttributes(HTMLFormatter):
    def attributes(self, tag):
        for k, v in tag.attrs.items():
            yield k, v

files_from_folder = r"c:\Users\Castel\Videos"

use_translate_folder = False

destination_language = 'nl'

extension_file = ".html"
pattern1 = r'<p class="text_obisnuit">.*(( the | you | which | have | had | then | that | must | make | from | else | does | get | will | make | made | yours | can | your | doesn | their | could | from | at | of | my | an | by | with | are | his | him | she | he | it | may | seem | and | for | else | while | which | be | these | let | ask | has | as | won | keep | but | everything | without | thinking | about | just | to | doesn | if | each | try | I'm | them | one | more | much | on | all | even | over | seems ).*){3,}.*</p>'
pattern2 = r'<p class="text_obisnuit2">.*(( the | you | which | have | had | then | that | must | make | from | else | does | get | will | make | made | yours | can | your | doesn | their | could | from | at | of | my | an | by | with | are | his | him | she | he | it | may | seem | and | for | else | while | which | be | these | let | ask | has | as | won | keep | but | everything | without | thinking | about | just | to | doesn | if | each | try | I'm | them | one | more | much | on | all | even | over | seems ).*){3,}.*</p>'
pattern3 = r'<title>.*(( the | you | which | have | had | then | that | must | make | from | else | does | get | will | make | made | yours | can | your | doesn | their | could | from | at | of | my | an | by | with | are | his | him | she | he | it | may | seem | and | for | else | while | which | be | these | let | ask | has | as | won | keep | but | everything | without | thinking | about | just | to | doesn | if | each | try | I'm | them | one | more | much | on | all | even | over | seems ).*){3,}.*</title>'
pattern4 = r'<meta name="description" content=.*(( the | you | which | have | had | then | that | must | make | from | else | does | get | will | make | made | yours | can | your | doesn | their | could | from | at | of | my | an | by | with | are | his | him | she | he | it | may | seem | and | for | else | while | which | be | these | let | ask | has | as | won | keep | but | everything | without | thinking | about | just | to | doesn | if | each | try | I'm | them | one | more | much | on | all | even | over | seems ).*){3,}.*>'

patterns = [pattern1, pattern2, pattern3, pattern4]
import os

directory = os.fsencode(files_from_folder)

def recursively_translate(node):
    for x in range(len(node.contents)):
        if isinstance(node.contents[x], str):
            if node.contents[x].strip() != '':
                try:
                    newtext = requests.post('https://api-free.deepl.com/v2/translate',
                    data={'auth_key':'YOUR-CODE:fx',
                          'text':node.contents[x],
                          'target_lang':destination_language
                          }).content
                    node.contents[x].replaceWith(json.loads(newtext)['translations'][0]['text'])
                except:
                    pass
        elif node.contents[x] != None:
            recursively_translate(node.contents[x])

for file in os.listdir(directory):
    filename = os.fsdecode(file)
    print(filename)
    if filename == 'y_key_e479323ce281e459.html' or filename == 'TS_4fg4_tr78.html':
        continue
    if filename.endswith(extension_file):
        with open(os.path.join(files_from_folder, filename), encoding='utf-8') as html:
            page = html.read()
            updated = False
            for pattern in patterns:
                for x in re.finditer(pattern, page):
                    updated = True
                    new = x.group(0)
                    soup = BeautifulSoup(new, 'html.parser')
                    if pattern != pattern4:
                        recursively_translate(soup)
                    else:
                        meta = soup.find('meta')
                        newtext = requests.post('https://api-free.deepl.com/v2/translate',
                        data={'auth_key':'YOUR-CODE:fx',
                              'text':meta['content'],
                              'target_lang':destination_language
                              }).content
                        meta['content'] = json.loads(newtext)['translations'][0]['text']
                    soup = soup.encode(formatter=UnsortedAttributes()).decode('utf-8')
                    page = page.replace(new, soup)
        if updated:
            print(f'{filename} translated')
            new_filename = f'{filename.split(".")[0]}_{destination_language}.html'
            if use_translate_folder:
                try:
                    with open(os.path.join(files_from_folder+r'\translated', new_filename), 'w', encoding='utf-8') as new_html:
                        new_html.write(page)
                except:
                    os.mkdir(files_from_folder+r'\translated')
                    with open(os.path.join(files_from_folder+r'\translated', new_filename), 'w', encoding='utf-8') as new_html:
                        new_html.write(page)
            else:
                with open(os.path.join(files_from_folder, new_filename), 'w', encoding='utf-8') as html:
                    html.write(page)
       

That's all folks.

If you like my code, then make me a favor: translate your website into Romanian, "ro".

Also, see this VERSION 2 or VERSION 3 or VERSION 4 or VERSION 5 or VERSION 6 or VERSION 7

Alatura-te Comunitatii Neculai Fantanaru
The 63 Greatest Qualities of a Leader
Cele 63 de calităţi ale liderului

Why read this book? Because it is critical to optimizing your performance. Because it reveals the main coordinates after that are build the character and skills of the leaders, highlighting what it is important for them to increase their influence.

Leadership - Magic of Mastery
Atingerea maestrului

The essential characteristic of this book in comparison with others on the market in the same domain is that it describes through examples the ideal competences of a leader. I never claimed that it's easy to become a good leader, but if people will...

The Master Touch
Leadership - Magia măiestriei

For some leaders, "leading" resembles more to a chess game, a game of cleverness and perspicacity; for others it means a game of chance, a game they think they can win every time risking and betting everything on a single card.

Leadership Puzzle
Leadership Puzzle

I wrote this book that conjoins in a simple way personal development with leadership, just like a puzzle, where you have to match all the given pieces in order to recompose the general image.

Performance in Leading
Leadership - Pe înţelesul tuturor

The aim of this book is to offer you information through concrete examples and to show you how to obtain the capacity to make others see things from the same angle as you.

Leadership for Dummies
Leadership - Pe înţelesul tuturor

Without considering it a concord, the book is representing the try of an ordinary man - the author - who through simple words, facts and usual examples instills to the ordinary man courage and optimism in his own quest to be his own master and who knows... maybe even a leader.