cliche

package module
v0.0.0-...-e4c065d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 7, 2025 License: MIT Imports: 8 Imported by: 0

README

Cliche

Downloads Contributors Forks Stargazers Issues License

Regular expressions engine for batch processing.

Main features:

Trie like data structure

Cliche compile expressions to chain of nodes and than add this chain to tree. Every node have they own key. When adding a new chain to the tree and finding the same key, a new node isn't created, the new expression is simply added to it. This way, the tree tries to be as minimally branched as possible, which is beneficial when scanning text.

Compaction / Unification

Cliche unify expression by few methods.

Character classes stored as range table. All expression bellow the same and have the same node in tree:

  • [a-z1-2]
  • [1-2a-z]
  • [12a-z]
  • [1a-z2]
  • [1-2[a-z]]
  • [[1-2][a-z]]
  • [12[a-z]]
  • [12a[b-z]]

Single character stored as character class too. All expression bellow the same and have them same node in tree:

  • a
  • [a]
  • [aaaa]
  • [a-a]
  • [a-aa]

Quantificators unified too:

  • x+ equal x{1,}
  • x* equal x{0,}
  • x? equal x{0,1} and x{,1}

Comments removed in simple cases.

For example x equal (?#123)x and stored the same.

Group options unified too. All expression bellow the same and have them same topology in tree:

  • (?i:y) eqaul (?i)(?:y)(?-i)
  • (?i-m:test) equal (?i-m)(test)(?m-i)

Non-unique variants within an alternation are removed from it. All expression bellow the same and have them same node in tree:

  • (a|b|c)
  • (a|b|c|c)
  • (a|b|b|c)
  • (a|a|b|c)
  • (a|[a]|b|c)
  • (a|[a]|[b]|[c])

You can see more examples here.

Result of unification - reusing one path or branch by more than one expressions. Scanner can match multiple expressions at once. Of course unification not change behaviour of tree or scanner results.

Installation

go get github.com/okneniz/cliche

Quick start

package main

import (
	"fmt"

	"github.com/okneniz/cliche"
)

func main() {
	tree := cliche.New(cliche.DefaultParser)

	tree.Add(
		"a[0123-9]+",
		"a[01-5[67-9]]{1,}",
	)

	fmt.Println("tree:")
	fmt.Println(tree.String())

	text := "Text with a1, b, c32."
	fmt.Println("scan text:", text)

	for _, match := range tree.Match(text) {
		fmt.Printf("text: %s\n", match.SubString())
		fmt.Printf("bounds: %v\n", match.Span())
		fmt.Println("regexps:")
		for _, regexp := range match.Expressions() {
			fmt.Printf("\t%v\n", regexp)
		}
	}
}

Output:

tree:
[
 {
  "key": "[97]",
  "type": "*node.class",
  "nested": [
   {
    "key": "[R16(48-57)]+",
    "type": "*node.quantifier",
    "expressions": [
     "a[0123-9]+",
     "a[01-5[67-9]]{1,}"
    ],
    "value": {
     "key": "[R16(48-57)]",
     "type": "*node.class"
    }
   }
  ]
 }
]

scan text: Text with a1, b, c32.
text: a1
bounds: [10-11]
regexps:
	a[01-5[67-9]]{1,}
	a[0123-9]+

Documentation

GoDoc documentation.

Parsing and predefined engines

Cliche have default compabilities common for most regular expressions engine. You can configure your own or copy behaviour of exists engine.

Basic syntax
  • | alternation
  • (...) parentheses () group parts of a regular expression, allowing you to apply quantifiers or other operations to the group as a whole.
  • [...] character class
  • \ escape (enable or disable meta character)
  • postfix expressions as quantifiers
Predefined engines

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  • If you have suggestions for adding or removing projects, feel free to open an issue to discuss it, or directly create a pull request after you edit the README.md file with necessary changes.
  • Please make sure you check your spelling and grammar.
  • Create individual PR for each suggestion.
Creating A Pull Request
  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	DefaultParser = onigmo.Parser
)

Functions

This section is empty.

Types

type Parser

type Parser interface {
	// Parse - parse regular expression and return node.Alternation as base type for it
	Parse(string) (node.Alternation, error)
}

type Tree

type Tree interface {
	// Add - add regular expressions to tree
	Add(...string) error
	// Size - return count of nodes in tree
	Size() int
	// String - dump tree to string
	String() string
	// Match - scan text and return matches
	Match(text string, options ...node.ScanOption) []*scanner.Match
}

func New

func New(parser Parser) Tree

New - return Tree

Directories

Path Synopsis
encoding

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL