Skip to content

wb-08/frequency_analysis

Repository files navigation

Frequency analysis of the modern Russian language based on comments in the VK

Code

  1. scraping_text.py - data scraping.
  2. data_cleaning.py - remove commas, emojis and duplicate comments.
  3. plotting_graph.py - frequency analysis and plotting graph, where the y axis is the frequency of occurrence of the symbol, and x is the letter.
  4. encryption.py - encrypt text by the method of frequency analysis and decryption using frequency analysis.
  5. syllables_splitting.py - splitting words into syllables. To split words into syllables, I use rusyllab/rusyllab.py. It's from https://github.com/Koziev/rusyllab.
  6. check_syllable.py - each incorrect word( you need to select it manually) is split into syllables and we search for the most similar syllables for each syllable.

Results

Frequency of occurrence of symbols in the form of a graph:

frequency_graph

Frequency of occurrence of characters in the form of a table:

frequency_table

Frequency of syllable occurrence:

syllables

Article

https://habr.com/ru/post/513926/

About

frequency analysis of the modern russian Internet language

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages