Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hoodie Community Dashboard #102

Open
gr2m opened this issue Dec 21, 2016 · 31 comments
Open

Hoodie Community Dashboard #102

gr2m opened this issue Dec 21, 2016 · 31 comments
Labels

Comments

@gr2m
Copy link
Member

gr2m commented Dec 21, 2016

If you asked me:

How many active contributors does Hoodie have today?

I could not answer it. Nor could any other maintainer from any other Open Source project that I asked so far. And this is a problem, because Open Source Burnout is real and yet we don’t measure the underlying problems in ways we measure code quality.

What we don’t measure, we cannot improve.

The question about active contributors is only one aspect. What I am really interested in how well balanced the community is between active users, contributors and maintainers.

Goal

The goal for the Hoodie Community Dashboard is to be able to answer this question at all times, and make the underlying measurements transparent to everyone.

Out of scope:
In future, I would also measure the success / impact of the Hoodie community which would include things like the reach we have, number of first-time open source contributors, diversity numbers etc.

Measurements

1. Work load

Measuring amount of users is hard for Open Source project, for good reason. But while it would be nice to know how many active users we have to measure the success of Hoodie, we are only interested in how much work load people produce that the Hoodie community has to take care of. Things that we can measure are

  • number of open issues
  • number of open pull request
  • avg. time until response
  • ... what else?

2. Active contributors

At Hoodie, we think contributions go beyond code and documentation. Equally important is the work from our editorial and design team, people helping answer questions in slack or on GitHub. In opposite to Load of users, we are not interested in amount of contributions, but in amount of different people who do the contributions, as we are not interested to have a few people do huge amounts of work, but in having a big group to balance the work load.

We can experiment with the details, but for a start I would define active as "contributed within the past month"

A contribution can be one of the following (from people who are part of the Contributors Team on GitHub)

  • comment
  • commit to pull request (all commits should go through pull requests)
  • reaction
  • donation on Open Collective
  • number of new contributors
  • number of contributors that became inactive
  • number of contributors that became active again
  • ... what else?

3. Active maintainers

Traditionally maintainers are seen as gate keepers in Open Source projects, often times referred to as "committers". At Hoodie, we see maintainers being in charge to maintain and grow the space in which people enjoy becoming and staying an active contributor. Just like with contributors, we are less interested in the total amount of work by maintainers, and more interested in the total amount of active contributors.

Activities by maintainers are

Visualisations

To be done.

Basically I would love to see different charts, the main one showing the "community climate" indicator (or however we want to call it) over time.

I would like to add these visualisations to hoodie.camp (it currently is a simple prototype only showing open issues).

Besides having a website, I would like to be able to send out weekly and monthly reports via email

Feedback

We are actively discussing all aspects of the Hoodie dashboard and are very interested in your thoughts, questions and insights into existing tools or our experiences with other Open Source communities

@hzoo
Copy link

hzoo commented Dec 22, 2016

https://libraries.io/ has a lot of awesome stuff by @andrew which you probably know about.

1. Load by users

Stats like # issues/PRs should be easy to track via https://www.githubarchive.org/ or just the github API since it's pretty straightforward. Also big query, 2, and more.

Actually just looking at the winners/entries for the data challenge might give some more inspiration in general - https://github.com/blog/1864-third-annual-github-data-challenge. Not sure why there wasn't one last/this year though.

I know there was http://issuestats.com/ (might be down) that tracked avg time to close an issue/merge a PR + badges and a graph (won the github data challege previously).

screen shot 2016-12-21 at 8 30 51 pm

You can also track activity of 3rd party stuff (tweets with @hoodiehq), stackoverflow questions, slack messages (probably hard to measure if we aren't paying).

Probably not very many data points but other things like # of blog posts related, # of meetups/conferences/videos, # of talks.

2. Active contributors

Comments are really good - also active conversations via slack/twitter or other ways we do community engagement.

@jasonLaster
Copy link

Thanks @hzoo, I think it would be nice to share some Big Query APIs

We're also working on gathering contributor information in a google sheet as well. The sheet helps us keep track of the names and backgrounds of our contributors so that we can answer several maintainer questions:

  • do we have a contributor who has a skill that can help (designer, right-to-left language speaker, linux user)
  • how many contributors do we have who have mastered some skills, have domain knowledge in a component, what do they enjoy doing
  • when was the last time we talked. what's going on in their life. burn out risk. opportunity to be mentored, grow...

We are also working on the right process for checking in on the community:

  • discussing it in weekly meetings
    • go over active contributors
    • discuss possible churn, opportunities for mentorship, promotion

@lasomethingsomething
Copy link

@gr2m Saw a tweet directing readers to this issue -- very exciting. I work in Berlin at Zalando (huge, publicly traded company) as open source evangelist. Would love to chat, but in the meantime wanted to share these links to some dashboards (apologies if they're already known to you);

I'm aware of other initiatives falling along these lines; happy to talk more.

@dicortazar
Copy link

Hi there, willing to help with metrics :)

@andrew
Copy link

andrew commented Dec 22, 2016

Let me know if I can help pulling any statistics out of @librariesio for you

@dicortazar
Copy link

dicortazar commented Dec 22, 2016

BTW, I run a small analysis of ten projects of HoodieHQ, you can have a look at http://cauldron.io/dashboards/hoodiehq . This is based on the tech we have at grimoirelab as referenced by @LappleApple .

You can see aggregated info per data source (Git, GitHub Issues and GitHub Pull Requests) and each panel provides several charts and tables. You can drill down, filter or even export the data and use your own viz.

Indeed you have info you already mention such as the number of open pull requests and issues, people involved in them, time to close those pull requests and issues, time zone distribution of commits and developers and others.

Hope this is useful!

@nayafia
Copy link

nayafia commented Dec 22, 2016

So happy you're doing this! A few more ideas:

avg. time until response

Maybe average time til an issue/PR is closed, as well? Basically tracking how long it takes to resolve. I think support departments often track this.

number of open issues
number of open pull request

Also the number of opened issues/PRs, i.e. the rate at which they are being opened each week/month/whatever (which is more about growth).

number of new contributors

This is probably implied, but I'd also track the number of repeat contributors, the ratio of first-time to repeat, and how that changes over time.

Visualisations

If you haven't seen it, icecrime's vossibility project might be useful here.

@lasomethingsomething
Copy link

+1 to @nayafia's ideas here. Nadia, are you saying that vossibility addresses your point? From the README I can't tell, exactly.

@icecrime, do you have other docs? (if not, let me know if you need help on that; I tend to follow this README template I created for Zalando.

@nolanlawson
Copy link
Member

# of StackOverflow questions and average time until a StackOverflow response are also good things to track.

Dunno about Slack, but for IRC there's https://botbot.me/ which tracks logs and can be used to calculate messages per day (probably need to average it over number of users, though, because of bots).

@gr2m
Copy link
Member Author

gr2m commented Dec 22, 2016

Cate mentioned foss-heartbeat by @sarahsharp

@nayafia
Copy link

nayafia commented Dec 22, 2016

@LappleApple I meant that vossibility might be a useful tool for creating dashboards/visualizations of any GitHub data collected. vossibility-collector has a bit more info in its README.

@icecrime
Copy link

icecrime commented Dec 22, 2016

Thanks for the ping!

Vossibility is a tool I created to help me manage the 🐳 Docker open source project. I do wish it was easier to consume or use in other projects, it's mostly a matter of documentation 😞

TL;DR: vossibility takes GitHub data, transforms it to extract the information you want and to enrich where necessary, sends all this to Elastic Search, and then you can use the wonderful Kibana as a frontend.

These are some examples of how I'm using vossibility today:

  • Visualize time to process pull requests:

image

  • Visualize how the activity is spread among the different repositories, and how this evolves over time:

image

  • Understand where contributions are coming from (Docker employed maintainers, external maintainers, or the broader community), and the respective merge ratio (merged/total ratio) for each:

image

  • Visualize what kind of issues are opened through pie charts that break down our different area/*, kind/* and version/* GitHub labels:

image

  • Build a "live changelog" by listing all merged pull request that bear a special impact/changelog label.

  • Measuring activity in order to identify potential maintainers. For this, we're using the number of unique GitHub items (issues and pull requests combined) a user has interacted with (meaning: opened, commented on, reviewed, etc.). The goal is to identify people who care about more than their very own issues or contributions: opening a pull request counts just as much as commenting on someone else's, and commenting 20 times on your own issue won't contribute more than opening it in the first place.

  • Frankly useless, but fun idea of generating an automated "weekly digest" of what happened on the project.

Happy to discuss more if this kind of information is helpful 👍

@dicortazar
Copy link

Hey just to mention (do not want to spam! ^^) that grimoirelab supports the following data sources: askbot, bugzilla, confluence, discourse, gerrit, git, github issues, github pull requests, mbox, jenkins, jira, mediawiki, meetup, phabricator, pipermail, redmine, rss, stackexchange (stackoverflow), supybot, telegram, kitsune and remo. There's some extra info of Perceval, the retrieval tool at: https://github.com/GrimoireLab/perceval

That means, that having all of that information in a database (ElasticSearch mainly), we all can go for the metrics that you're mentioning such as people and evolution of contributors in all of those data sources, activity for all of those data sources, etc... And then on top of that, build more advanced analysis, such as the demographics of the community or some others.

btw, is any of you attending FOSDEM? that could be a great place to meet and discuss about metrics. We're also having this workshop to talk about metrics and show how to use the grimoire toolchain, just in case you're interested [http://grimoirelab.github.io/con/] and we also have this collaborative book https://jgbarah.gitbooks.io/grimoirelab-training/content/ where that's also detailed.

@sagesharp
Copy link

One of the things I would love to focus FOSS Heartbeat on is the people in open source communities. I think we often get caught up in metrics like "Is rate of merged pull requests increasing?" without focusing on the people behind those metrics. Examples of more people-focused questions I would love FOSS Heartbeat to answer are:

  • What is the average load of pull requests reviewed and merged for each core developer? How does that workload compare to other open source projects of a similar size or in the same category?
  • Can we identify maintainer burnout, characterized by an increased amount of work, working at odder hours than normal, and an increased negative sentiment in their responses?
  • Which maintainers are good at mentorship, and are there specific things maintainers can do to encourage growth in the developer user base?

I'd love to chat more about this. If you're looking to hire contractors to work on these sort of people-focused metrics, you can drop a line to [email protected]. I'll also be at FOSDEM.

@nayafia
Copy link

nayafia commented Dec 22, 2016

@sarahsharp's excellent qs remind me...a lot of these metrics should be used to measure not just growth, but sustainability.

Ex. "average response time" can be used to measure how quickly maintainers respond to issues/PRs, but if it's decreasing over time, that can also be a sign of exhaustion. So the response isn't just "answer them faster!" but might be "how do we get additional 👀s and ✋s to help out?"

@mikeal
Copy link

mikeal commented Dec 22, 2016

A few things that we pull regular metrics on in the Node.js project that have been important.

  • % of contributions by top 10 contributors (we like to see this go down over time, distributing the work across more contributors).
  • PRs merged per month, broken down by non-committers and committers. This lets us quickly identify people to on-board as committers each month.
  • Adoption of each major release line. This is probably the best "top line" metric you can have. If people aren't adopting new releases then something is not working and even though you're making forward progress with releases you're accumulating a lot of long term maintenance burden.

We used to track who was merging commits but that has gotten less useful over time because it doesn't really indicate who is reviewing commits as more PRs get reviewed by many people before being merged and it's common for someone to merge a bunch of already reviewed PRs. You can probably get better data out of the new review tools in GitHub if you're using that review system.

@lasomethingsomething
Copy link

@dicortazar / @sarahsharp I'll be at FOSDEM too, as will some of my Zalando colleagues (@alexkops for sure, hopefully @hjacobs and our IP lawyer at minimum as well). Also cc'ing my colleagues @jbspeakr and @KathleenLD here so they can follow this thread; both are interested in/have exp in metrics and balance.

@nayafia Thank you for clarifying your point and for the extra link. @icecrime, would be up for adding some bits to your READMEs over the holidays.

@mikeal
Copy link

mikeal commented Dec 22, 2016

One last thing I'll say: think carefully about what is important to the project before building the dashboard.

There's plenty of data out there and it's easy to get lost in creating amazing visualizations of it. I've done this myself a few times and the result was more of a distraction than a benefit. There's also a bunch of products out there that already do this and I feel the same way about most of them as well.

I've actually paired back the data that I regularly consider. For instance, I no longer try to track the total number of commits in the main repo in master. There's an inflection point where the project can't handle any more activity in one place and things are spun off more liberally. If we obsessed about that metric we'd end up overloading that branch/repo. Instead, more focus is put on "how" the work is getting done rather than just the volume of work.

If you want to distribute the work load, attract more contributors, increase diversity, etc, a lot of these metrics won't help and can become counter-productive.

@gr2m
Copy link
Member Author

gr2m commented Dec 22, 2016

@mikeal I very much agree. The Hoodie way to avoid this problem is to start with the end result without thinking about technical limitations or what data and tooling is available today. Then we will probably create some kind of dummy dashboard that just looks and feels amazing, then we all get super exciting about it, and then we make it work backwards :)

This is also the reason why I want @leighphan to lead this project, because she cares about the processes and the experience from the perspective of new and existing contributors as well as maintainers, and she has the skills for and interest in data visualsiations.

Thanks for this great discussion y’all <3 keep it coming

@tracykteal
Copy link

This is great & very important! @sarahsharp point about understanding the perceptions of people in the open source community is important for understanding the health & sustainability of the project, beyond the quantitative metrics.

@kariljordan just pointed me at this great paper from Steinmacher et al on self-efficacy towards OSS projects Increasing the Self-Efficacy of Newcomers to Open Source Software Projects. This study shows that self-efficacy (belief in one's ability to succeed in accomplishing a task) can increase with more guidance around initial commits. #win! The study doesn't then go on to show that people with more self-efficacy continue to contribute to the project, but more general studies in self-efficacy show that it's important for involvement in an activity.

It would be interesting to survey people new to and actively working on the projects, potentially with these survey questions, to understand why there might a particular balance between active users, contributors and maintainers on a project and if people are transitioning from being newcomers to active contributors.

Steinmacher et al survey questions:

  1. I feel comfortable asking for help from the community using electronic communication means
  2. I can write my doubts and understand answers in English
  3. I am good in understanding code written by other people
  4. I have pretty good skills to write and change code
  5. I feel comfortable with the process of contributing to an Open Source project
  6. I think that contributing to an open source software project is an interesting activity
  7. I feel I can set up and run an application if a set of instructions is properly given
  8. I am pretty good on searching for solutions and understanding technical issues by myself
  9. I can choose an adequate task to fix if a list of tasks is given
  10. I can find the piece of code that need to be fixed given a bug report presenting the issue

@leighphan
Copy link

Thanks everyone for your interest. I greatly appreciate all the tips on starting points! Looks like there are many different angles and ways we can extract and visualize the climate of Open Source communities.

@jasonLaster I'm curious - how do you get to know contributors better? Are there weekly chats/meetings welcome to everyone?

While project data can reveal efficiency and growth of OS projects, I'm also very interested in the data that will help us connect with people first in Open Source communities. Thanks @sarahsharp and @nayafia for bringing up very perceptive questions and indicators and questions about sustainability of people and (before) projects.

I'm all taking the Hoodie approach for starting with the interface look and feel, then working backward - people first. :)

@gr2m gr2m added the Project label Dec 25, 2016
@lasomethingsomething
Copy link

Hey all, where are we on this thread? Some of us are talking about FOSDEM right now and it reminded me, there was talk of a FOSDEM get-together. Should we plan?

@bkeepers
Copy link

bkeepers commented Jan 2, 2017

Hey all, where are we on this thread? Some of us are talking about FOSDEM right now and it reminded me, there was talk of a FOSDEM get-together. Should we plan?

👍 GitHub would be happy to host a dinner conversation.

@lasomethingsomething
Copy link

👍 @bkeepers. Or does the group here want to take some sort of action, based on @gr2m's original pitch and the ideas that have followed since? I don't know the answer to this question, it's for everyone :)

@jasonLaster
Copy link

Thanks everyone for the wealth of information. I am inspired by all of the ongoing work.

Here's a quick doc that I started to summarize what I learned. Please feel free to improve it in anyway.

Also, if you're interested in joining a hangout, add your name to the doc and we can discuss next steps.

@gr2m
Copy link
Member Author

gr2m commented Jan 2, 2017

@jasonLaster this is great work, thanks for putting it together 👍

@nayafia
Copy link

nayafia commented Jan 3, 2017

It would be interesting to survey people new to and actively working on the projects, potentially with these survey questions, to understand why there might a particular balance between active users, contributors and maintainers on a project and if people are transitioning from being newcomers to active contributors.

@tracykteal you might be interested in http://opensourcesurvey.org/, which is being conducted by GitHub. While not specific to any one project, some of those Steinmacher et al questions will be asked of respondents, so might be helpful as a baseline. The survey questions are public, and results will be public too.

@clarkbw
Copy link

clarkbw commented Jan 3, 2017

Just getting back into the swing of things but @jasonLaster pointed me to this over the holidays and I was really excited to see all the enthusiasm and work going on. I've been looking at this from a couple different angles with very similar goals to what I've seen here so I'll share what I've been thinking.

First I want to understand the contributor funnel (funnel being a standard marketing term, perhaps not the best way to describe people; onboarding?)

  • I want to understand the number of people who are viewing / using the project and on into how many people are getting started, being mentored, maintaining, and even exiting.
    • I'm worried about the health of our project for attracting new contributors or perhaps turning them away. I really like @sarahsharp's Heartbeat project sentiment analysis which looks like it could possibly answer questions about why people aren't progressing further.
    • I think tracking the maintainer level also ties into what @mikeal is tracking with % of contributions, I worry about people burning out by contributing too much and I'd like to know that others are on their way to becoming maintainers to spread the load.

Then I'd like to understand area of interest or strengths for contributors. (I think this is what @tracykteal is getting at)

  • Some people come to learn a specific thing (like React or Redux), or build a resume, or get a job, etc. The more I could understand what is driving people the better I could help them achieve that. Often we'll file an issue and know that a certain contributor would love to tackle "this kind of problem" and I'd like to be better at spreading that kind work fairly.
  • This also helps to understand why people sometimes disappear completely. We've had very active contributors over the years who just go dark one day. Later we've found that they were in school at the time and then got a job and didn't have time anymore.

Looking forward to more discussion! 🎉

@leighphan
Copy link

I'm also also curious how or if projects have an onboarding process (perhaps like Hoodie Camp) to get to know contributors' interests, such as building a portfolio, getting a job - this would help give a clearer idea of their trajectory. I started @codelaboc, a learning group in my community, and we conduct a survey for new members, to get an idea of their goals and strengths, and shape our events/direction to help each other.

@nayafia Any chance the http://opensourcesurvey.org/ will touch on such questions?

@nayafia
Copy link

nayafia commented Jan 4, 2017

@leighphan off the top of my head, I don't think so (it's geared more towards contributor behavior than project norms), but you can see the full set of questions here.

@sohini-roy
Copy link
Collaborator

Hello,
Thanks for the guide. It was helpful.
I am acquainted with GitHub API and am willing to be a part of this issue.

Please guide me through :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests