Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPR #119

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

GPR #119

Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@ authors = ["xKDR Forum, Sourish Das"]
version = "0.1.0"

[deps]
Clustering = "aaaa29a8-35af-508c-8bc3-b662a17a0fe5"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Distances = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
GaussianProcesses = "891a1506-143c-57d2-908e-e1f8e92e6de9"
HypothesisTests = "09f84164-cd44-5f33-b23f-e6b0d136a0d5"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
NLSolversBase = "d41bc354-129a-5804-8e4c-c37616107c6c"
Expand Down
35 changes: 33 additions & 2 deletions src/CRRao.jl
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
module CRRao


using DataFrames, GLM, Turing, StatsModels, StatsBase
using StatsBase, Distributions, LinearAlgebra
using Optim, NLSolversBase, Random, HypothesisTests
using GaussianProcesses, Distances, StatsModels

import StatsBase: coef, coeftable, r2, adjr2, loglikelihood, aic, bic, predict, residuals, cooksdistance, fit
import HypothesisTests: pvalue


"""
```julia
LinearRegression
Expand Down Expand Up @@ -94,12 +98,37 @@ where
"""
struct PoissonRegression end

"""
```julia
GaussianProcessesRegression
```
Type representing the Gaussian Processes Regression model class.

```math
y = f(X) + \\varepsilon,
```
where
```math
\\varepsilon \\sim N(0,\\sigma^2),
```
+ ``y`` is the response vector of size ``n``, representing the observed target values,
+ ``X`` is the matrix of input variables of size ``n \\times p``, where ``n`` is the sample size and ``p`` is the number of input variables,
+ ``f(X)`` is the latent function that represents the underlying unknown relationship between ``X`` and ``y``,
+ ``\\varepsilon`` is the noise term that follows a Gaussian distribution with zero mean and variance ``\\sigma^2``.
+ ``\\sigma`` is the standard deviation of the noise ``\\varepsilon``.

The latent function `f(X)` is assumed to follow a Gaussian process, completely specified by its mean function and covariance function. The mean function provides the prior expectation of the latent function, and the covariance function captures the dependence structure between different input points.

"""
struct GaussianProcessesRegression end

"""
```julia
Boot_Residual
```
Type representing Residual Bootstrap.
"""

struct Boot_Residual end

"""
Expand Down Expand Up @@ -307,6 +336,8 @@ y_i \\sim D(\\mu_i,\\sigma), i=1,2,\\cdots,n
"""
struct Prior_HorseShoe end



"""
```julia
CRRaoLink
Expand Down Expand Up @@ -392,7 +423,7 @@ end

Cauchit() = Cauchit(Cauchit_Link)

export LinearRegression, LogisticRegression, PoissonRegression, NegBinomRegression, Boot_Residual
export LinearRegression, LogisticRegression, PoissonRegression, NegBinomRegression, Boot_Residual, GaussianProcessesRegression
export Prior_Ridge, Prior_Laplace, Prior_Cauchy, Prior_TDist, Prior_HorseShoe, Prior_Gauss
export CRRaoLink, Logit, Probit, Cloglog, Cauchit, fit
export coef, coeftable, r2, adjr2, loglikelihood, aic, bic, sigma, predict, residuals, cooksdistance, BPTest, pvalue
Expand All @@ -401,5 +432,5 @@ export FrequentistRegression, BayesianRegression
include("random_number_generator.jl")
include("general_stats.jl")
include("fitmodel.jl")

include("bayesian/gaussian_processes_regression.jl")
end
132 changes: 132 additions & 0 deletions src/bayesian/gaussian_processes_regression.jl
ayushpatnaikgit marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@

"""
```julia

fit(formula, data::DataFrame, modelClass::GaussianProcessesRegression, IndexVar, mean, kern::Kernel,
DistanceClass::Euclidean)
```

Fit a Gaussian Process Regression model on the input data with a Gaussian Process prior and user-specific mean and kernel functions.

#Example

```julia-repl

julia> using CRRao, RDatasets, StatsModels, StatsPlots, GaussianProcesses, Distances
julia> df = dataset("datasets", "mtcars")
32×12 DataFrame
Row │ Model MPG Cyl Disp HP DRat WT QSec VS AM Gear Carb
│ String31 Float64 Int64 Float64 Int64 Float64 Float64 Float64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Mazda RX4 21.0 6 160.0 110 3.9 2.62 16.46 0 1 4 4
2 │ Mazda RX4 Wag 21.0 6 160.0 110 3.9 2.875 17.02 0 1 4 4
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
31 │ Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8
32 │ Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2
28 rows omitted

julia> container=fit(@formula(MPG ~0+ HP),df,GaussianProcessesRegression(),[:MPG, :HP],MeanZero(), SE(0.0,0.0),Euclidean())

Formula: MPG ~ 0 + HP
Link: CRRao.Identity(CRRao.Identity_Link)
Chain: GP Exact object:
Dim = 1
Number of observations = 32
Mean function:
Type: MeanZero, Params: Float64[]
Kernel:
Type: SEIso{Float64}, Params: [5.464908573213355, 3.3936838718120708]
MarlaJahari marked this conversation as resolved.
Show resolved Hide resolved
Input observations =
[110.0 110.0 … 335.0 109.0]
Output observations = [21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2 … 15.2, 13.3, 19.2, 27.3, 26.0, 30.4, 15.8, 19.7, 15.0, 21.4]
Variance of observation noise = 9.667961411202336
Marginal Log-Likelihood = -89.745
julia> plot(container.chain)
```
"""
function fit(formula,
data::DataFrame,
modelClass::GaussianProcessesRegression,
IndexVar,
mean,
kern::Kernel,
DistanceClass::Euclidean)

formula = apply_schema(formula, schema(formula, data), RegressionModel)
select!(data, IndexVar)
y, X = modelcols(formula, data)
logObsNoise = -1.0
gp = GP(X', y, mean, kern, logObsNoise)
optimize!(gp)
return BayesianRegression(:GaussianProcessesRegression, gp, formula)

end

"""

```julia

fit(formula, data::DataFrame, modelClass::GaussianProcessesRegression, IndexVar, mean
DistanceClass::Euclidean)
```

Fit a Gaussian Process Regression model on the input data with a Gaussian Process prior. The Zero Mean and Squared Exponential kernel function is implemented by default.

#Example

```julia-repl
julia> using CRRao, RDatasets, StatsModels, StatsPlots, GaussianProcesses, Distances

julia> df = dataset("datasets", "mtcars")
32×12 DataFrame
Row │ Model MPG Cyl Disp HP DRat WT QSec VS AM Gear Carb
│ String31 Float64 Int64 Float64 Int64 Float64 Float64 Float64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ Mazda RX4 21.0 6 160.0 110 3.9 2.62 16.46 0 1 4 4
2 │ Mazda RX4 Wag 21.0 6 160.0 110 3.9 2.875 17.02 0 1 4 4
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
31 │ Maserati Bora 15.0 8 301.0 335 3.54 3.57 14.6 0 1 5 8
32 │ Volvo 142E 21.4 4 121.0 109 4.11 2.78 18.6 1 1 4 2
28 rows omitted

julia> container=fit(@formula(MPG ~0+ HP),df,GaussianProcessesRegression(),[:MPG, :HP],Euclidean())

Formula: MPG ~ 0 + HP
Link: CRRao.Identity(CRRao.Identity_Link)
Chain: GP Exact object:
Dim = 1
Number of observations = 32
Mean function:
Type: MeanZero, Params: Float64[]
Kernel:
Type: SEIso{Float64}, Params: [5.464908573213355, 3.3936838718120708]
Input observations =
[110.0 110.0 … 335.0 109.0]
Output observations = [21.0, 21.0, 22.8, 21.4, 18.7, 18.1, 14.3, 24.4, 22.8, 19.2 … 15.2, 13.3, 19.2, 27.3, 26.0, 30.4, 15.8, 19.7, 15.0, 21.4]
Variance of observation noise = 9.667961411202336
Marginal Log-Likelihood = -89.745



julia> plot(container.chain)
```
"""


function fit(formula,
data::DataFrame,
modelClass::GaussianProcessesRegression,
IndexVar,
DistanceClass::Euclidean)

formula = apply_schema(formula, schema(formula, data), RegressionModel)
select!(data, IndexVar)
y, X = modelcols(formula, data)
logObsNoise = -1.0
mean= MeanZero()
kern=SE(0.0,0.0)
gp = GP(X', y, mean, kern, logObsNoise)
optimize!(gp)
return BayesianRegression(:GaussianProcessesRegression, gp, formula)

end
Loading