Python:将文件夹中的所有文本文件拆分为较小的文本文件| Neculai Fantanaru(en)
ro  fr  en  es  pt  ar  zh  hi  de  ru
ART 2.0 ART 3.0 ART 4.0 ART 5.0 ART 6.0 Pinterest

Python:将文件夹中的所有文本文件拆分为较小的文本文件

On May 05, 2021, in Leadership and Attitude, by Neculai Fantanaru

您可以在此处查看完整代码:HTTPS://帕萨特斌.com/可2我PU WB2

水管工Python.

制作文件夹: files_spartite.(调整大小的文件将存储在此处)

代码:

import sys
import os
import nltk
from nltk import tokenize

def read_text_from_file(file_path):
    """
    Aceasta functie returneaza continutul unui fisier.
    file_path: calea catre fisierul din care vrei sa citesti
    """
    with open(file_path, encoding='utf8') as f:
        text = f.read()
        return text


def write_to_file(text, file_path):
    """
    Aceasta functie scrie un text intr-un fisier.
    text: textul pe care vrei sa il scrii
    file_path: calea catre fisierul in care vrei sa scrii
    """
    with open(file_path, 'wb') as f:
        f.write(text.encode('utf8', 'ignore'))

def imparte_fisiere(cale_fisier_txt, cale_folder_fisiere_impartite):
    text = read_text_from_file(cale_fisier_txt)
    propozitii = tokenize.sent_tokenize(text)
    nume_fisier = os.path.basename(cale_fisier_txt).split('.')[0] # "30.txt" => split('.') => ["30", "txt"] => [0] => "30"
    chunk = ''
    chunk_size = 5000 # 5KB
    chunk_number = 1
    for propozitie in propozitii:
        if len(chunk.encode('utf-8')) < chunk_size:
            chunk = chunk + " " + propozitie
        else:
            # scriere fisier
            cale_fisier_rezultat = cale_folder_fisiere_impartite + '\\' + nume_fisier + "_" + str(chunk_number) + ".txt" # => "30_1.txt"
            write_to_file(chunk, cale_fisier_rezultat)
            # print("Fisierul {} a fost scris cu succes.".format(nume_fisier + "_" + str(chunk_number) + ".txt"))
            chunk = propozitie
            chunk_number += 1

def creare_fisiere(cale_folder_txt, cale_folder_fisiere_impartite):
    """
    Functia itereaza printr-un folder care contine fisiere txt si imparte in 5KB fiecare fisier
    """
    count = 0
    for f in os.listdir(cale_folder_txt):
            if f.endswith('txt'):
                cale_fisier_txt = cale_folder_txt + "\\" + f
                imparte_fisiere(cale_fisier_txt, cale_folder_fisiere_impartite)
                count += 1
            else:
                continue
    print("Numarul de fisiere modificate: ", count)

# cale_folder_txt/30.txt => cale_folder_fisiere_impartite/30_part1.txt
#                        => cale_folder_fisiere_impartite/30_part2.txt

def main():
    creare_fisiere("c:\\Folder1", "c:\\Folder1\\fisiere_impartite")

if __name__ == '__main__':
    main()
  

That's all folks.

另外,看到这个版本2.要么 版本3.要么版本4.要么版本5.要么版本6.要么版本7.


Latest articles accessed by readers:

  1. An Eye To See And A Mind To Understand
  2. Turn Towards Me With An Eye Full Of Your Own Gaze
  3. The Snapshot Of Magic In God's Universe
  4. Rhythm Of My Heart

Donate via Paypal

Alternate Text

RECURRENT DONATION

Donate monthly to support
the NeculaiFantanaru.com project

SINGLE DONATION

Donate the desired amount to support
the NeculaiFantanaru.com project

Donate by Bank Transfer

Account Ron: RO34INGB0000999900448439

Open account at ING Bank

Join The Neculai Fantanaru Community



* Note: If you want to read all my articles in real time, please check the romanian version !

decoration
About | Site Map | Partners | Feedback | Terms & Conditions | Privacy | RSS Feeds
© Neculai Fântânaru - All rights reserved