Demeuk
This application is part of the CERBERUS project that has received funding from the European Union’s Internal Security Fund - Police under grant agreement No. 82201
Demeuk is a simple tool to clean up corpora (like dictionaries) or any dataset containing plain text strings. Example usecases are: cleaning up language dictionaries, password sets (like for example RockYou) or any file containing plain text strings.
Table of content
- Install
- Usage
- Design
- API Reference
- Demeuk-api
add_latin_ligatures()
add_lower()
add_split()
add_without_punctuation()
check_case()
check_character()
check_controlchar()
check_email()
check_empty_line()
check_ending_with()
check_hash()
check_length()
check_mac_address()
check_non_ascii()
check_regex()
check_starting_with()
check_uuid()
chunkify()
clean_add_umlaut()
clean_cut()
clean_encode()
clean_googlengram()
clean_hex()
clean_html()
clean_html_named()
clean_lowercase()
clean_mojibake()
clean_newline()
clean_non_ascii()
clean_tab()
clean_title_case()
clean_trim()
clean_up()
main()
remove_email()
remove_punctuation()
remove_strip_punctuation()
try_encoding()