Regex&Python:用BeautySoup和GoogleTrans翻译只有包含某些关键字的HTML标签| Neculai Fantanaru(en)
ro  fr  en  es  pt  ar  zh  hi  de  ru
ART 2.0 ART 3.0 ART 4.0 ART 5.0 ART 6.0 Pinterest

Regex&Python:仅使用beautifulsoup和googlelerrans进行翻译,只有包含某些关键字的HTML标签

On May 05, 2021, in Leadership and Attitude, by Neculai Fantanaru

您可以在此处查看完整代码:HTTPS://帕萨特斌.com/球队CA美女MM

安装Python。 然后使用Windows10中的命令提示符(CMD)解释器安装以下两个库:

py -m pip install "googletrans"
py -m pip install googletrans==4.0.0rc1
py -m pip install beautifulsoup4
   

Python将使用Googletrans库自动翻译以下HTML标记:

Regex&Python:用BeautySoup和GoogleTrans翻译只有包含某些关键字的HTML标签|  Neculai Fantanaru(en)
 name="description" content="Your Text"/>
 class="text_obisnuit">Your Text

class="text_obisnuit2">Your Text

使用hilite.me生成的HTML

代码:在任何翻译程序中复制并运行下面的代码(我用Pycripter) .不要忘记更改线路“files_from_folder”的路径。在这里查找可以翻译的语言列表:Lang.

谷歌将自动查找文件的语言。 您所要做的就是改变要翻译的语言:destination_language.

from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter
from googletrans import Translator
import requests
import re

translator = Translator()

class UnsortedAttributes(HTMLFormatter):
    def attributes(self, tag):
        for k, v in tag.attrs.items():
            yield k, v

files_from_folder = r"c:\Users\Castel\Videos\Captures"

use_translate_folder = False

destination_language = 'fr'  #translate into french

extension_file = ".html"
pattern1 = r'

.*(( the | you | which | have | had | then | that | must | make | from | else | does | get | will | make | made | yours | can | your | doesn | their | could | from | at | of | my | an | by | with | are | his | him | she | he | it | may | seem | and | for | else | while | which | be | these | let | ask | has | as | won | keep | but | everything | without | thinking | about | just | to | doesn | if | each | try | I'm | them | one | more | much | on | all | even | over | seems ).*){3,}.*

'
pattern2 = r'

.*(( the | you | which | have | had | then | that | must | make | from | else | does | get | will | make | made | yours | can | your | doesn | their | could | from | at | of | my | an | by | with | are | his | him | she | he | it | may | seem | and | for | else | while | which | be | these | let | ask | has | as | won | keep | but | everything | without | thinking | about | just | to | doesn | if | each | try | I'm | them | one | more | much | on | all | even | over | seems ).*){3,}.*

'
pattern3 = r'Regex&Python:用BeautySoup和GoogleTrans翻译只有包含某些关键字的HTML标签| Neculai Fantanaru(en)' pattern4 = r' patterns = [pattern1, pattern2, pattern3, pattern4] import os directory = os.fsencode(files_from_folder) def recursively_translate(node): for x in range(len(node.contents)): if isinstance(node.contents[x], str): if node.contents[x].strip() != '': try: translation = translator.translate(node.contents[x], dest=destination_language).text node.contents[x].replaceWith(translation) except Exception as e: print(e) elif node.contents[x] != None: recursively_translate(node.contents[x]) for file in os.listdir(directory): filename = os.fsdecode(file) print(filename) if filename == 'y_key_e479323ce281e459.html' or filename == 'TS_4fg4_tr78.html': continue if filename.endswith(extension_file): with open(os.path.join(files_from_folder, filename), encoding='utf-8') as html: page = html.read() updated = False for pattern in patterns: for x in re.finditer(pattern, page): updated = True new = x.group(0) soup = BeautifulSoup(new, 'html.parser') if pattern != pattern4: recursively_translate(soup) else: meta = soup.find('meta') meta['content'] = translator.translate(meta['content'], dest=destination_language).text soup = soup.encode(formatter=UnsortedAttributes()).decode('utf-8') page = page.replace(new, soup) if updated: print(f'{filename} translated') new_filename = f'{filename.split(".")[0]}_{destination_language}.html' if use_translate_folder: try: with open(os.path.join(files_from_folder+r'\translated', new_filename), 'w', encoding='utf-8') as new_html: new_html.write(page) except: os.mkdir(files_from_folder+r'\translated') with open(os.path.join(files_from_folder+r'\translated', new_filename), 'w', encoding='utf-8') as new_html: new_html.write(page) else: with open(os.path.join(files_from_folder, new_filename), 'w', encoding='utf-8') as html: html.write(page)

That's all folks.

If you like my code, then make me a favor: translate your website into Romanian, "ro".

此外,有一个版本2.这个代码。版本3.要么版本4.要么版本5.要么版本6.

 


Latest articles accessed by readers:

  1. An Eye To See And A Mind To Understand
  2. Turn Towards Me With An Eye Full Of Your Own Gaze
  3. The Snapshot Of Magic In God's Universe
  4. Rhythm Of My Heart

Donate via Paypal

Alternate Text

RECURRENT DONATION

Donate monthly to support
the NeculaiFantanaru.com project

SINGLE DONATION

Donate the desired amount to support
the NeculaiFantanaru.com project

Donate by Bank Transfer

Account Ron: RO34INGB0000999900448439

Open account at ING Bank

Join The Neculai Fantanaru Community



* Note: If you want to read all my articles in real time, please check the romanian version !

decoration
About | Site Map | Partners | Feedback | Terms & Conditions | Privacy | RSS Feeds
© Neculai Fântânaru - All rights reserved