Extract specified data types from the given text or JSON, supporting custom regex-based filtering for precise data retrieval.
| Parameter | Description |
|---|
| Text Or JSON | The input text or JSON from which patterns will be extracted. |
| Data Types | Select data types to search for within the input text or JSON, each selected type will be used to identify relevant matches in the text. |
| RegExes | A list of custom regular expression patterns to match within the input text or JSON. Multiple regex patterns can be provided, separated by the OR operator. These patterns allow free-form searching beyond the predefined data types. |
| Remove Duplicates | When checked, repeated values in each array in the final result are removed. |
Note:Use two vertical bar symbols || as the OR operator when writing expressions in the RegExes list.
Data Types
Each predefined regex will search for specific matching data in the text or JSON inputs:
- CVEs - Extracts Common Vulnerabilities and Exposures (CVE) identifiers in the format
CVE-YYYY-NNNN to CVE-YYYY-NNNNNNN, where the year is four digits and the ID ranges from four to seven digits.
- Email Addresses - Extracts complete email addresses including the username and domain parts (e.g.,
user@example.com).
- Email Domains - Extracts only the domain portion from email addresses (e.g.,
example.com from user@example.com).
- IPV4 - Extracts valid IPv4 addresses (e.g.,
192.168.0.1) within the standard range (0.0.0.0 to 255.255.255.255).
- IPV6 - Extracts full IPv6 addresses in standard colon-separated hexadecimal format (e.g.,
2001:0db8:85a3:0000:0000:8a2e:0370:7334).
- MD5 - Extracts 32-character hexadecimal MD5 hash strings.
- SHA1 - Extracts 40-character hexadecimal SHA-1 hash strings.
- SHA256 - Extracts 64-character hexadecimal SHA-256 hash strings.
- URL Domains - Extracts domain names from URLs, including subdomains, but excluding the protocol and path (e.g.,
example.com from https://example.com/page).
- URLs - Extracts full URLs starting with a protocol (e.g.,
https://, ftp://, file:///), followed by a domain and optional path, query, or fragment (e.g., https://www.google.com/, https://example.com/images/avatar).
Results without removing duplicates
Results after removing duplicates
Get all email addresses extracted from the provided input.
| Parameter | Description |
|---|
| Text or JSON | The input text or JSON object to extract email addresses from. |
| Remove Duplicates | When checked, repeated values in the final result are removed. |
Results without removing duplicates
Results after removing duplicates
Extract Email Domains
Get all email domains extracted from the provided input.
| Parameter | Description |
|---|
| Text or JSON | The input text or JSON object to extract email domains from. |
| Remove Duplicates | When checked, repeated values in the final result are removed. |
Extract URL parts (scheme, netloc, path, params, query, fragment, hostname, port).
| Parameter | Description |
|---|
| URL | The URL to extract parts from. |
Results without removing duplicates
Results after removing duplicates
Get a list of URL’s in the order they are found in the provided text or JSON object.
| Parameter | Description |
|---|
| Text or JSON | The input text or JSON object to extract URLs from. |
| Remove Duplicates | When checked, repeated values in the final result are removed. |
Results without removing duplicates
Results after removing duplicates
RegEx Match
Returns a list of RegEx matches in the order they are found when applied to a provided string. This action specifically utilizes Python’s RegEx flavor.
| Parameter | Description |
|---|
| String | The string for which regex matches are to be returned. |
| RegEx | The regular expression pattern used to search the string. Provide only the pattern you wish to match. If you are including group matching, please enclose the group within (?:) rather than (). For example: For the string - test1, test2, test3 and the RegEx - test1, the returned list looks like the following: [“test1”] For the string - aabbaa and the RegEx - (?:aabb), the returned list looks like the following: [“aabb”] |