Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Abstract type for summary statistics (Mean) #277

Closed
wants to merge 4 commits into from

Conversation

nadiaenh
Copy link
Collaborator

@nadiaenh nadiaenh commented Mar 3, 2023

Resolves #275, #184, #274
Partial implementation of @smishr's confidence interval function
Implemented Mean structure as an abstract SummaryStat

Remaining to do on this PR :

  • Implement "t" and "margin" confidence intervals
  • Add more attributes to Mean structure (standard error of estimate, ...)
  • Add support for ReplicateDesign
  • Write test/SummaryStats.jl
  • Make sure documentation is in order
  • Delete mean.jl once all functionality is implemented in Mean struct ?

Can implement Total, Quantile, and Ratio later

@nadiaenh nadiaenh self-assigned this Mar 3, 2023
@smishr smishr self-requested a review March 3, 2023 10:00
Copy link
Contributor

@smishr smishr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main goal is to have an easy type dispatching over all the SummaryStats Functions. We need to check how Function types heirarchy works in Julia. There should be AbstractFunction and mean, total etc can be sub types of (<:) SummaryStats::AbstractFunction


"""

abstract type SummaryStats end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this abstract type looks good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am wondering if SummaryStats should be an AbstractFunction instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know about this, will look into it !


"""

function confint(estimate::Float64, std_dev::Float64; type::String="normal", alpha::Float64=0.05)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: After reviewing I think you can just copy the entire of _ci from 190 and rename it here.

design::SurveyDesign
end

function Mean(x::Symbol, design::SurveyDesign)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This should be an addition to the existing mean.jl file. You can add with multiple dispatch you can define mean(x::Symbol, design::ReplicateDesign, ci_type=normal ...) see WIP: Add general CI and integration with mean #190 line 169-170 of what the defintion of function can be
  • build for ReplicateDesign first, since SurveyDesign doesnt have std error yet

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • to your first point : see my reply to your comment on line 60
  • to your 2nd point : yes that makes sense


"""

struct Mean <: SummaryStats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This should be done as part of edits inside mean.jl
  • Im not sure whether the struct would just need two things x and design
  • I have gut feeling that Mean should be a Function, not a struct. Lets ask @ayushpatnaikgit this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with the idea that it's easier to have all summary statistics as a struct in a single file (SummaryStats.jl) since it's easier to manage, if we want to add another statistic later on you would just add it to this one file.

The reason for having Mean as a struct (as well as a Function) is the same : if we want to add other statistics, and we want all statistics to have a set of basic elements (e.g. an estimate), you would write this into SummaryStats which is the "parent abstract type" and it would be automatically passed down to all the other statistics (since Mean <: SummaryStats). At least this is my understanding.

Also mean.jl (and quantile, ratio, total) has a lot of dispatched functions, so I thought it'd be more efficient and maintainable to have only 1 file which implements all summary stats, instead of 1 file per stat (mean.jl, quantile.jl etc) which has multiple dispatched functions inside.

I agree with having confint.jl separately though.

Thoughts ?

@ayushpatnaikgit
Copy link
Member

Have you considered how this would work with domain estimation? Where we have mean(:variable, :domain, design)?

@ayushpatnaikgit
Copy link
Member

Overall, I think it's a good idea to be able to do ans.estimate, ans.CI and ans.SE. Need to think through the implementation.

It will need a few show methods to make it beautiful.

@ayushpatnaikgit
Copy link
Member

I suggest having

struct Estimate
   value::Number
   SE::Number
end

Instead of SummaryStat and Mean

mean, quantile and total can be factory functions for the structure.

For domain estimation, we'll need a separate struct
perhaps

struct DomainEstimates
   values::Vector{Number}
   SE::Vector{Number}
end

After this, we can have a function confint(x::Estimate).

@smishr
Copy link
Contributor

smishr commented Mar 9, 2023

struct Estimate
    value::Number
    SE::Number
end

How does this broadcast to estimates which are vectors?

@asinghvi17
Copy link
Member

asinghvi17 commented Mar 25, 2023

Any reason not to use https://github.com/JuliaPhysics/Measurements.jl here? It's fundamentally the same data structure but probably supported more widely across the ecosystem. In this case a vector estimate would be a vector of Measurements.

@smishr smishr changed the base branch from main to v0.1.1 March 27, 2023 09:28
@codecov-commenter
Copy link

codecov-commenter commented Mar 27, 2023

Codecov Report

Merging #277 (bc220b2) into next_release (e9362f9) will decrease coverage by 4.48%.
The diff coverage is 0.00%.

@@               Coverage Diff                @@
##           next_release     #277      +/-   ##
================================================
- Coverage        100.00%   95.52%   -4.48%     
================================================
  Files                12       14       +2     
  Lines               200      268      +68     
================================================
+ Hits                200      256      +56     
- Misses                0       12      +12     
Impacted Files Coverage Δ
src/SummaryStats.jl 0.00% <0.00%> (ø)

... and 3 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@smishr
Copy link
Contributor

smishr commented Mar 28, 2023

Any reason not to use https://github.com/JuliaPhysics/Measurements.jl here? It's fundamentally the same data structure but probably supported more widely across the ecosystem. In this case a vector estimate would be a vector of Measurements.

Ayush and I looked at the Measurements.jl package, we see where you are coming from. Except for the pretty printing of uncertainty, their class would not be useful for other aspects of Survey.jl. But some ideas can be adapted from them

@smishr smishr deleted the branch xKDR:next_release April 10, 2023 06:02
@smishr smishr closed this Apr 10, 2023
@smishr smishr reopened this Apr 10, 2023
@smishr smishr changed the base branch from v0.1.1 to next_release April 10, 2023 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Abstract types for summary statistics function for code reuse
5 participants