Skip to content

MHowells/pattern_mining

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pattern-mining

This is code for running the grammatical inference algorithm ALERGIA for sequential pattern mining. It includes evaluation metrics and methods, as well as code to visualise prefix-tree acceptors (PTAs), deterministic finitie automata (PDAs), and their probabilistic counterparts (PPTAs and PDFAs respectively).

Input data

Given an initial list of sequences [ "0", "0", "0", "0", "0", "0", "0", "0", "01", "01", "01", "01", "01", "01", "01", "10", "10", "10", "10", "10", "11", "11", "11", "11", "11", "12", "12", "1", "1", "1"], we can derive the following PPTA from this data:

example_ppta

For this initial PTA, with three actions, or an alphabet, ['0', '1', '2'] and 7 states, [0, 1, 2, 3, 4, 5, 6], the example input would be:

states = ["*", 0, 1, 2, 3, 4, 5, 6]
alphabet = ['0', '1', '2']
pathway_matrix = np.array([
    [
        [0, 30, 0, 0, 0, 0, 0, 0],
        [0, 0, 15, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 5, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
    ],
    [
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 15, 0, 0, 0],
        [0, 0, 0, 7, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 5, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
    ],
    [
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 2],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
        [0, 0, 0, 0, 0, 0, 0, 0],
    ]
])

The states includes an extra state "*" representing the starting state. The pathway_matrix is a 3D matrix with the first dimension representing the action, the second dimension representing the state, and the third dimension representing the next state. For example, pathway_matrix[0, 1, 4] is the number of times that action '0' was taken from state 0 to state 3.

Note that we have assumed that you go from the starting state "*" to state 0 by using action '0'. This is not true, but has no effect on the results of the algorithm.

There are two approaches to the ALERGIA algorithm implemented in this codebase, dictated by the method parameter in the alergia function. The first is the Carrasco approach, found in the original paper for the ALERGIA algorithmby Carrasco and Oncina (1994)[1]. The second is the Higuera approach, that uses a red-blue framework to solve the algorithm, as outlined in Higuera (2010)[2]. By default, it is set to Carrasco.

Running tests

$ python -m pytest tests.py

References

[1] Carrasco, R.C. and Oncina, J., (1994). Learning stochastic regular grammars by means of a state merging method. International Colloquium on Grammatical Inference, (pp. 139-152).

[2] De la Higuera, C., (2010). Grammatical inference: learning automata and grammars. Cambridge University Press.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages