Neculai Fantanaru

Everything Depends On The Leader

Example of Python code that translates the website into other languages, with the googletrans library (Version 2)

June 20, 2021
, in
Python Scripts Examples by Neculai Fantanaru

You can view the full code here: https://pastebin.com/i3Nd08TX

Install Python. Then install the following two libraries using the Command Prompt (cmd) interpreter in Windows10:

Python will automatically translate the following html tags with the googletrans library:

py -m pip install "googletrans"
py -m pip install googletrans==4.0.0rc1
py -m pip install "google_trans_new"

Also, Python code will also automatically translate the contents of the following tags (Your Text), but only if these tags are framed by < ! -- ARTICOL START --> and < ! -- ARTICOL START --> html comments. Of course, you will need to replace these tags with your own tags.

<!-- ARTICOL START -->

<h1 class="den_articol" itemprop="name">Your Text</h1>
<p class="text_obisnuit">Your Text</p>
<p class="text_obisnuit2">Your Text</p>
<span class="text_obisnuit2">Your Text</span>
<span class="text_obisnuit">Your Text</span>
<li class="text_obisnuit">Your Text</li>
<a class="linkMare" href="https://neculaifantanaru.com/en/">Your Text</a>
<h4 class="text_obisnuit2>Your Text</h4>
<h3 class="text_obisnuit2>Your Text</h3>
<h5 class="text_obisnuit2>Your Text</h5>

<!-- ARTICOL FINAL -->
              

THE CODE: Copy and run the code below in any interpreter program (I use pyScripter) . Don't forget to change the path in the "files_from_folder" line. And here is the list of languages that can be translated: LANG.

import os
import re
import textwrap
import html

#-------------------------------------------------------------------------------
# aici pui numlele librarii folosita pentru traducere: ai optiunile google_trans_new sau googletrans
librarie_folosita = "googletrans"
# calea catre folder-ul cu documente de tradus
fisiere_din_folder = r"d:\Downloads\Test"
source_language = 'en'
# in ce limba vreau sa traduc
# {'af': 'afrikaans', 'sq': 'albanian', 'am': 'amharic', 'ar': 'arabic', 'hy': 'armenian', 'az': 'azerbaijani', 'eu': 'basque', 'be': 'belarusian', 'bn': 'bengali', 'bs': 'bosnian', 'bg': 'bulgarian', 'ca': 'catalan', 'ceb': 'cebuano', 'ny': 'chichewa', 'zh-cn': 'chinese (simplified)', 'zh-tw': 'chinese (traditional)', 'co': 'corsican', 'hr': 'croatian', 'cs': 'czech', 'da': 'danish', 'nl': 'dutch', 'en': 'english', 'eo': 'esperanto', 'et': 'estonian', 'tl': 'filipino', 'fi': 'finnish', 'fr': 'french', 'fy': 'frisian', 'gl': 'galician', 'ka': 'georgian', 'de': 'german', 'el': 'greek', 'gu': 'gujarati', 'ht': 'haitian creole', 'ha': 'hausa', 'haw': 'hawaiian', 'iw': 'hebrew', 'hi': 'hindi', 'hmn': 'hmong', 'hu': 'hungarian', 'is': 'icelandic', 'ig': 'igbo', 'id': 'indonesian', 'ga': 'irish', 'it': 'italian', 'ja': 'japanese', 'jw': 'javanese', 'kn': 'kannada', 'kk': 'kazakh', 'km': 'khmer', 'ko': 'korean', 'ku': 'kurdish (kurmanji)', 'ky': 'kyrgyz', 'lo': 'lao', 'la': 'latin', 'lv': 'latvian', 'lt': 'lithuanian', 'lb': 'luxembourgish', 'mk': 'macedonian', 'mg': 'malagasy', 'ms': 'malay', 'ml': 'malayalam', 'mt': 'maltese', 'mi': 'maori', 'mr': 'marathi', 'mn': 'mongolian', 'my': 'myanmar (burmese)', 'ne': 'nepali', 'no': 'norwegian', 'ps': 'pashto', 'fa': 'persian', 'pl': 'polish', 'pt': 'portuguese', 'pa': 'punjabi', 'ro': 'romanian', 'ru': 'russian', 'sm': 'samoan', 'gd': 'scots gaelic', 'sr': 'serbian', 'st': 'sesotho', 'sn': 'shona', 'sd': 'sindhi', 'si': 'sinhala', 'sk': 'slovak', 'sl': 'slovenian', 'so': 'somali', 'es': 'spanish', 'su': 'sundanese', 'sw': 'swahili', 'sv': 'swedish', 'tg': 'tajik', 'ta': 'tamil', 'te': 'telugu', 'th': 'thai', 'tr': 'turkish', 'uk': 'ukrainian', 'ur': 'urdu', 'uz': 'uzbek', 'vi': 'vietnamese', 'cy': 'welsh', 'xh': 'xhosa', 'yi': 'yiddish', 'yo': 'yoruba', 'zu': 'zulu', 'fil': 'Filipino', 'he': 'Hebrew'}
destination_language = 'ru'
delimitatori_text_exterior_articol = [['<title','</title>']]
delimitatori_text_interior_articol = [['<h1 class="den_articol" itemprop="name', '</h1>'], ['<p class="text_obisnuit', '</p>'], ['<span class="text', '</span>'], ['<li class="text_obisnuit', '</li>'], ['<h3 class="text_obisnuit2', '</h3>'], ['<h4 class="text_obisnuit2', '</h4>'], ['<h2', '</h2>']]

extensie_fisier = ".html"
#-------------------------------------------------------------------------------

if (librarie_folosita == "google_trans_new"):
    from google_trans_new import google_translator
    translator = google_translator()
elif (librarie_folosita == "googletrans"):
    from googletrans import Translator
    translator = Translator()
elif (librarie_folosita == "translators"):
    import translators as ts

lista_cale_fisiere = []

VAR, REPL = re.compile(r'(<.*?>)'), re.compile(r'_____(\d+)_____')
varlist = []
def replace(matchobj):
  varlist.append(matchobj.group())
  return "_____%d_____" %(len(varlist)-1)
def restore(matchobj):
    try:
        return varlist[int(matchobj.group(1))]
    except:
        a=9

def traducere_text(text):
    result = ""
    for txt in (textwrap.wrap(text, 4500, break_long_words=False)):     # impart in maxim 4500 de caractere uitandu-ma dupa spatii
        txt = html.unescape(txt)      # Convert all named and numeric character references (e.g. &gt;, &#62;, &#x3e;) in the string s to the corresponding Unicode characters.
        if (librarie_folosita == "google_trans_new"):
            while (re.search(r'(<.*?>)', txt)) :
                txt = VAR.sub(replace, txt)
            translation = translator.translate(txt, lang_tgt=destination_language)
            while (re.search(r'_____[0-9 ]+ _____', translation)):
                rep = re.search(r'_____[0-9 ]+ _____', translation).group(0)
                translation = translation.replace(rep, rep.replace(r' ', r''))
            while (re.search(r'_____ [0-9 ]+_____', translation)):
                rep = re.search(r'_____ [0-9 ]+_____', translation).group(0)
                translation = translation.replace(rep, rep.replace(r' ', r''))
            translation = REPL.sub(restore, translation)
            result = result + translation
        elif (librarie_folosita == "googletrans"):
            while (re.search(r'(<.*?>)', txt)) :
                txt = VAR.sub(replace, txt)
            translation = translator.translate(txt, dest=destination_language).text
            while (re.search(r'_____[0-9 ]+ _____', translation)):
                rep = re.search(r'_____[0-9 ]+ _____', translation).group(0)
                translation = translation.replace(rep, rep.replace(r' ', r''))
            while (re.search(r'_____ [0-9 ]+_____', translation)):
                rep = re.search(r'_____ [0-9 ]+_____', translation).group(0)
                translation = translation.replace(rep, rep.replace(r' ', r''))
            translation = REPL.sub(restore, translation)
            result = result + translation
        elif (librarie_folosita == "translators"):
            while (re.search(r'(<.*?>)', txt)) :
                txt = VAR.sub(replace, txt)
            translation = translator.translate(txt, dest=destination_language).text
            while (re.search(r'_____[0-9 ]+ _____', translation)):
                rep = re.search(r'_____[0-9 ]+ _____', translation).group(0)
                translation = translation.replace(rep, rep.replace(r' ', r''))
            while (re.search(r'_____ [0-9 ]+_____', translation)):
                rep = re.search(r'_____ [0-9 ]+_____', translation).group(0)
                translation = translation.replace(rep, rep.replace(r' ', r''))
            translation = REPL.sub(restore, translation)
            result = result + translation
    return result

def selectare_traducere_continut(cont, delimitatori):
    for delimitator in delimitatori:
        start_delim = delimitator[0]                                            # '<title'
        stop_delim  = delimitator[1]                                            # '</title>'
        start_position = 0
        stop_position = len(cont)-1
        while cont[start_position:stop_position].find(start_delim)>0:
            temp_st = cont[start_position:stop_position].find(start_delim) + len(start_delim) + start_position
            temp = temp_st + cont[temp_st:stop_position].find('>')
            if (cont[temp-1] == '/'):
                start_position = temp
            else:
                start_position = temp+1
                st = cont[start_position:stop_position].find(stop_delim) + start_position
                extracted_text = cont[start_position:st]
                translated_text = traducere_text(extracted_text)
                cont = cont[:start_position] + translated_text + cont[st:]
                start_position = start_position + len(translated_text)
    return cont

def selectare_text():
    for file in os.listdir(fisiere_din_folder):
        if file.endswith(extensie_fisier):
            lista_cale_fisiere.append(os.path.join(fisiere_din_folder, file))

    for fisier in lista_cale_fisiere:

        f = open(fisier, 'r')

        if f.mode == 'r':
            contents = f.read()
            #contents = html.unescape(contents)      # Convert all named and numeric character references (e.g. &gt;, &#62;, &#x3e;) in the string s to the corresponding Unicode characters.
            print ("Acum lucrez la fisierul :", fisier)

            continut = []
            if (contents.find('<meta name="description" content="')>0):       # am gasit '<!-- ARTICOL START -->' in pagina
                pozitia1 = contents.find('<meta name="description" content="') + len('<meta name="description" content="')
                pozitia2 = pozitia1 + contents[pozitia1+1:].find(">") - 1
                selectie = contents[pozitia1:pozitia2]
                trad = traducere_text(selectie)
                contents = contents[:pozitia1] + trad + contents[pozitia2:]
            if (contents.find('<!-- ARTICOL START -->')>0):       # am gasit '<!-- ARTICOL START -->' in pagina
                poz1 = contents.find('<!-- ARTICOL START -->') + len('<!-- ARTICOL START -->')
                poz2 = contents.find('<!-- ARTICOL FINAL -->') + len('<!-- ARTICOL FINAL -->')
                inceput = contents[:poz1]
                articol = contents[poz1:poz2]
                final = contents[poz2:]
                continut.append(inceput)
                continut.append(articol)
                continut.append(final)
            else:
                continut.append(contents)
            continut_tradus = ''
            if (len(continut) == 3):
                continut_tradus = continut_tradus + selectare_traducere_continut(continut[0], delimitatori_text_exterior_articol) # inceput
                continut_tradus = continut_tradus + selectare_traducere_continut(continut[1], delimitatori_text_interior_articol) # articol
                continut_tradus = continut_tradus + selectare_traducere_continut(continut[2], delimitatori_text_exterior_articol) # exterior
            elif (len(continut) == 1):
                continut_tradus = continut_tradus + selectare_traducere_continut(continut[0], delimitatori_text_exterior_articol)
            else:
                print ("S-a produs o eroare cand am vrut sa citesc articolul sau exteriorul lui !")
        print("Am citit un fisier si incep traducerea!\n")

        with open(fisier[:len(fisier)-len(extensie_fisier)]+"_"+destination_language+extensie_fisier, 'w', encoding="utf-8") as f:
            f.write(continut_tradus)
    print("Am terminat traducerea !")

selectare_text()

That's all folks.

If you like my code, then make me a favor: translate your website into Romanian, "ro".

Also, see this VERSION 2 or VERSION 3 or VERSION 4 or VERSION 5 or VERSION 6 or VERSION 7

Alatura-te Comunitatii Neculai Fantanaru
The 63 Greatest Qualities of a Leader
Cele 63 de calităţi ale liderului

Why read this book? Because it is critical to optimizing your performance. Because it reveals the main coordinates after that are build the character and skills of the leaders, highlighting what it is important for them to increase their influence.

Leadership - Magic of Mastery
Atingerea maestrului

The essential characteristic of this book in comparison with others on the market in the same domain is that it describes through examples the ideal competences of a leader. I never claimed that it's easy to become a good leader, but if people will...

The Master Touch
Leadership - Magia măiestriei

For some leaders, "leading" resembles more to a chess game, a game of cleverness and perspicacity; for others it means a game of chance, a game they think they can win every time risking and betting everything on a single card.

Leadership Puzzle
Leadership Puzzle

I wrote this book that conjoins in a simple way personal development with leadership, just like a puzzle, where you have to match all the given pieces in order to recompose the general image.

Performance in Leading
Leadership - Pe înţelesul tuturor

The aim of this book is to offer you information through concrete examples and to show you how to obtain the capacity to make others see things from the same angle as you.

Leadership for Dummies
Leadership - Pe înţelesul tuturor

Without considering it a concord, the book is representing the try of an ordinary man - the author - who through simple words, facts and usual examples instills to the ordinary man courage and optimism in his own quest to be his own master and who knows... maybe even a leader.