You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's what I want to do specifically.
For example, let's say I have monthly trading data for all tickers in the stock market.
I want to be able to sort the predicted returns for all stocks by year and month.
Then, I want to calculate a statistic for only the top 10% of stocks by predicted return for each month of the year.
The specific metric is up to you, but you can calculate the RMSE of the top 10% .
The goal is to tune the hyperparameters so that the actual returns of the top 10% predicted stocks are higher.
In other words, I want to find a hyperparameter that tends to get the top 10% right, even if it gets the bottom 90% wrong, rather than getting the whole universe right.
I tried to define custom_metric well, but it was limited. I wanted to put group_by(yearmonth) in the process, but I didn't really know how to do it.
so I made a makeshift code.
# customize_metric --------------------------------------------------------# irr = The return of a portfolio of stocks with a predicted top 10% return, calculated monthly.# return_pct = monthly cumulative return ratio. ex. 5.1~ 5.31 's cumulative return ratio.irr_vec<-function(truth,
estimate,
n_tiles=10,
purpose_tile=10, #predicted top 10% return =10 , bottom 10% = 1na_rm=TRUE,
case_weights=NULL,
...) {
irr_impl<-function(truth, estimate,..., case_weights=NULL) {
fold_index<<-NULLfor( iin1:dim(valid_years_splited)[1]){
if(length(truth) == nrow(valid_years_splited$data[[i]]) ){
fold_index<<-i
}
}
valid_years_splited$data[[fold_index]] |>
mutate(estimate=estimate) |>
group_by(yearmonth) |>
mutate(top_n_pct= ntile(estimate,10)) |>
filter(top_n_pct==purpose_tile) |>#mean_y := portpolio which is composed by predicted return top10%
summarise(mean_y= mean(return_pct)) |>#irr := portpolio 1year cumulative return ratio
mutate(irr=cumprod(mean_y/100+1) ) |> slice_tail(n=1) |> pull(irr) ->irr#If another folder has same length each other, this code is unusable .# cross -validation summarize by mean default. So I can calculate geometric mean by log( )return(log(irr))
}
metric_vec_template(
metric_impl=irr_impl,
truth=truth,
estimate=estimate,
na_rm=na_rm,
case_weights=case_weights,
cls="numeric"
)
}
irr<-function(data, ...) {
UseMethod("irr")
}
irr<- new_numeric_metric(
irr,
direction="maximize"
)
irr.data.frame<-function(data,
truth,
estimate,
na_rm=TRUE,
case_weights=NULL,
...) {
metric_summarizer(
metric_nm="irr",
metric_fn=irr_vec,
data=data,
truth=!!enquo(truth),
estimate=!!enquo(estimate),
na_rm=na_rm,
case_weights=!!enquo(case_weights)
)
}
In the custom metric code, valid_years_splited is the result of organizing time-series-cross-validation into 3 folders with 1 year term. It is also defined as a global variable via <<-.
This results in three rows, each containing one year's worth of monthly stock trading data for all sectors. This is what we did to calculate the metric per folder.
However, I realize that this is not a perfect solution.
The text was updated successfully, but these errors were encountered:
I like the idea of what you are trying to do. Would you be able to show some example input data and what you would want the output to look like? I wanna make sure I completely understand what you are trying to accomplish before giving feedback 😄
Here's what I want to do specifically.
For example, let's say I have monthly trading data for all tickers in the stock market.
I want to be able to sort the predicted returns for all stocks by year and month.
Then, I want to calculate a statistic for only the top 10% of stocks by predicted return for each month of the year.
The specific metric is up to you, but you can calculate the RMSE of the top 10% .
The goal is to tune the hyperparameters so that the actual returns of the top 10% predicted stocks are higher.
In other words, I want to find a hyperparameter that tends to get the top 10% right, even if it gets the bottom 90% wrong, rather than getting the whole universe right.
I tried to define custom_metric well, but it was limited. I wanted to put
group_by(yearmonth)
in the process, but I didn't really know how to do it.so I made a makeshift code.
In the custom metric code,
valid_years_splited
is the result of organizing time-series-cross-validation into 3 folders with 1 year term. It is also defined as a global variable via<<-
.This results in three rows, each containing one year's worth of monthly stock trading data for all sectors. This is what we did to calculate the metric per folder.
However, I realize that this is not a perfect solution.
The text was updated successfully, but these errors were encountered: