Demeuk
This application is part of the CERBERUS project that has received funding from the European Union’s Internal Security Fund - Police under grant agreement No. 82201
Demeuk is a simple tool to clean up corpora (like dictionaries) or any dataset containing plain text strings. Example usecases are: cleaning up language dictionaries, password sets (like for example RockYou) or any file containing plain text strings.
Table of content
- Install
- Usage
- Basic usage
- Standard Options
- Separating options
- Check modules
- check-min-length
- check-max-length
- check-case
- check-controlchar
- check-email
- check-hash
- check-mac-address
- check-uuid
- check-non-ascii
- check-replacement-character
- check-starting-with
- check-ending-with
- check-contains
- check-empty-line
- check-regex
- check-min-digits
- check-max-digits
- check-min-uppercase
- check-max-uppercase
- check-min-specials
- check-max-specials
- Modify modules
- Remove modules
- Add modules
- Macro modules
- Design
- API Reference
- Demeuk-api
add_first_upper()add_latin_ligatures()add_lower()add_split()add_title_case()add_without_punctuation()check_case()check_character()check_contains()check_controlchar()check_email()check_empty_line()check_ending_with()check_hash()check_length()check_mac_address()check_non_ascii()check_regex()check_starting_with()check_uuid()chunkify()clean_add_umlaut()clean_cut()clean_encode()clean_googlengram()clean_hex()clean_html()clean_html_named()clean_lowercase()clean_mojibake()clean_newline()clean_non_ascii()clean_tab()clean_title_case()clean_transliterate()clean_trim()clean_up()contains_at_least()contains_at_most()init_worker()main()remove_email()remove_punctuation()remove_strip_punctuation()stderr_print()try_encoding()