GitHub - jsliu/OLIC: On Line Inference for multiple Changepoints problem in time series model.

jsliu / OLIC Public

Notifications You must be signed in to change notification settings
Fork 1
Star 0

On Line Inference for multiple Changepoints problem in time series model.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
include		include
math		math
output		output
paperplot		paperplot
r		r
src		src
.Rhistory		.Rhistory
.cproject		.cproject
.project		.project
README		README

Repository files navigation

******************************************************************************
About the model:

The date we analysed is segmented by a series of changepoints, the number of 
which is unknown. We have considered two slightly different piecewise 
polynomial model for it. 

(i) The first model is constructed based upon the assumption that the 
observations before and after a changepoint are independent of each other, so 
the model in each segment is a q-th order linear regression model

                        Y = H x beta + eps              

where H is a s x q matrix with s = t_i+1 - t_i, the length between two
changepoints:

              | 1, x_((t_i)+1)-x_(t_i), ..., (x_((t_i)+1)-x(t_i))^(q-1) | 
              | ..        ..            ...,            ..              |
              | 1, x_(t_i+1)-x_(t_i),   ..., (x_(t_i+1)-x(t_i))^(q-1)   |,

and eps_i follows a zero mean normal distribution. Note that Y and eps are a
vector of observations and noises respectively. The model allows for an 
uncertainty, which means the order q in each segment is unknow. 

(ii) The second model sets an constraint on the first model so that we can
check if the lines are connected or not at each changepoint and then make a
proper the decision. To ease the question, we don't allow for the model choice
in each segment, e.g. q is fixed in this model. 
In each segment, we still have a linear regression model

                         Y = H x beta + eps

which is almost the same as the function in the first model. The only  
difference comes from that if the lines are continuous at a certain 
change point, the intercept is beta_0 ~ N(mu, sigma2), where mu and sigma can 
be estimated according to the information known in the previous segment, i.e.
  
                       mu = sum w_i*mu_i
                       sigma2 = sum w_i*sigma_i

where mu_i = H[lastline,]*beta,sigma_i = M_i(1,1) and w_i is the associated 
weight of each particle. The rest of beta_is comes from the same distributions 
as those in the first model. These are equivalent to transform the data as
 
                         y_i = y_i-mu

in that segment. We therefore estimate regression parameters in a longer 
segment between two consective discontinuous change points. The corresponding
basis matrix is 

           |  1   a    a^2     0      0                  ...  |
           | ..      ..                   ...,                |
       H = |  1  s*a (s*a)^2   b     b^2  0  0                |
           | ..                     ...,                  ... |
           |  1  s*a (s*a)^2  t*b  (t*b)^2  c  c^2  0  0  ... |
           |  ...            ...               ...            |

where s and t are the lengths between two successive change points (whatever 
it is), a an b are of form x_s-x_0 and x_t-x_s.
 
*****************************************************************************
About the program

This is a combination code of C++ and R. To deal with the matrix, a package
called Template Numerical Toolkit (tnt), developed by National Institute of
Standard and Technology (NIST), was adapted.

BCF.cpp implements the Particle Filtering,
rand.cpp gives the different simulations of random probabilities 
tnt_addon.hpp implements the inverse of sparse matrix

To generate the .so file, type 

  make -f Makefile.Rshare
 
in the terminal.


******************************************************************************
INSTRUCTIONS

To run the code: 
(i) Type

        source("simBCF.R")

     in the R prompt.
(ii) Then, choose the length of data, e.g nn <- 200
     load the fucntion you want to analyse,e.g.

      # heavisine function
      heavi.sine <- function(t)
      {
         f <- 10*sin(4*pi*t)-4*sign(t-.3)-4*sign(.72-t)+rnorm(nn)
         return(f)
      }
      t <- c(.3,.72,1)
      x <- c(1:nn)/nn
      y <- heavi.sine(x)
  
(iii) Run the function "sim.bcf" in R to obtain the simulation results;
      Run the function "y.mix" in R to obtain the fitted line.
      For example, type following commands in R prompt
      
      sim.out <- sim.bcf(y,x,continuity=T)
      y.mix <- para.sim(y,x,sim.out,continuity=T)

(iv) Run the function meanKSD.R in KSdistance archive to obtain the Kolmogorov
     Smirnov distance. 

P.S: The arguments of sim.bcf includes threshold, largest model order,prior 
     parameters. See simBCF.R for more details.

For 200 data, the code only takes 8 seconds for the result, while Denison's 
code need 1m8s. This is the advantage of Particle Filters.
     
For 2048 data, Denison's code has to run 145 min on our computer, while our 
algorithm spent a couple of minutes on the first model.  
     
              
*******************************************************************************
Compiling:

Compiling executable code, type "make" in the prompt
Compile share object, type "make -f Make.Rshare" in the prompt


*******************************************************************************
Update:

Modified the way to calculate incremental weight, P(y_t|cpt,q), which is more 
efficient