Cliche

Regular expressions engine for batch processing.
Main features:
Trie like data structure
Cliche compile expressions to chain of nodes and than add this chain to tree.
Every node have they own key.
When adding a new chain to the tree and finding the same key,
a new node isn't created, the new expression is simply added to it.
This way, the tree tries to be as minimally branched as possible,
which is beneficial when scanning text.
Compaction / Unification
Cliche unify expression by few methods.
Character classes stored as range table.
All expression bellow the same and have the same node in tree:
[a-z1-2]
[1-2a-z]
[12a-z]
[1a-z2]
[1-2[a-z]]
[[1-2][a-z]]
[12[a-z]]
[12a[b-z]]
Single character stored as character class too.
All expression bellow the same and have them same node in tree:
a
[a]
[aaaa]
[a-a]
[a-aa]
Quantificators unified too:
x+ equal x{1,}
x* equal x{0,}
x? equal x{0,1} and x{,1}
Comments removed in simple cases.
For example x equal (?#123)x and stored the same.
Group options unified too.
All expression bellow the same and have them same topology in tree:
(?i:y) eqaul (?i)(?:y)(?-i)
(?i-m:test) equal (?i-m)(test)(?m-i)
Non-unique variants within an alternation are removed from it.
All expression bellow the same and have them same node in tree:
(a|b|c)
(a|b|c|c)
(a|b|b|c)
(a|a|b|c)
(a|[a]|b|c)
(a|[a]|[b]|[c])
You can see more examples here.
Result of unification - reusing one path or branch by more than one expressions.
Scanner can match multiple expressions at once.
Of course unification not change behaviour of tree or scanner results.
Installation
go get github.com/okneniz/cliche
Quick start
package main
import (
"fmt"
"github.com/okneniz/cliche"
)
func main() {
tree := cliche.New(cliche.DefaultParser)
tree.Add(
"a[0123-9]+",
"a[01-5[67-9]]{1,}",
)
fmt.Println("tree:")
fmt.Println(tree.String())
text := "Text with a1, b, c32."
fmt.Println("scan text:", text)
for _, match := range tree.Match(text) {
fmt.Printf("text: %s\n", match.SubString())
fmt.Printf("bounds: %v\n", match.Span())
fmt.Println("regexps:")
for _, regexp := range match.Expressions() {
fmt.Printf("\t%v\n", regexp)
}
}
}
Output:
tree:
[
{
"key": "[97]",
"type": "*node.class",
"nested": [
{
"key": "[R16(48-57)]+",
"type": "*node.quantifier",
"expressions": [
"a[0123-9]+",
"a[01-5[67-9]]{1,}"
],
"value": {
"key": "[R16(48-57)]",
"type": "*node.class"
}
}
]
}
]
scan text: Text with a1, b, c32.
text: a1
bounds: [10-11]
regexps:
a[01-5[67-9]]{1,}
a[0123-9]+
Documentation
GoDoc documentation.
Parsing and predefined engines
Cliche have default compabilities common for most regular expressions engine.
You can configure your own or copy behaviour of exists engine.
Basic syntax
| alternation
(...) parentheses () group parts of a regular expression, allowing you to apply quantifiers or other operations to the group as a whole.
[...] character class
\ escape (enable or disable meta character)
- postfix expressions as quantifiers
Predefined engines
Roadmap
See the open issues for a list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
- Please make sure you check your spelling and grammar.
- Create individual PR for each suggestion.
Creating A Pull Request
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature)
- Commit your Changes (
git commit -m 'Add some AmazingFeature')
- Push to the Branch (
git push origin feature/AmazingFeature)
- Open a Pull Request