Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write a blog post on a machine learning project highlighting Yellowbrick. (GSoC-2019) #691

Closed
wagner2010 opened this issue Jan 21, 2019 · 12 comments
Labels
level: novice good for beginners or new contributors priority: low no particular rush in addressing

Comments

@wagner2010
Copy link
Contributor

Write a blogpost using a data set of your choice highlighting Yellowbrick. Please begin by reviewing our QuickStart guide (http://www.scikit-yb.org/en/latest/quickstart.html), complete the walkthrough (http://www.scikit-yb.org/en/latest/quickstart.html#walkthrough), our Model Selection tutorial (http://www.scikit-yb.org/en/latest/tutorial.html) and review our Contributor section (http://www.scikit-yb.org/en/latest/contributing.html). Use a dataset of your choice. Some good sites for data include Data.Gov (data.gov), UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/index.php), and Kaggle (https://www.kaggle.com). Run through a machine learning project using Jupyter Notebooks (data ingestion, storage, wrangling, statistical computation, model selection, machine learning with Yellowbrick and visualization with Yellowbrick). Use this work to formulate your blogpost and work with the Yellowbrick team to get it reviewed and published in the forum of your choice.

@wagner2010 wagner2010 added priority: low no particular rush in addressing level: novice good for beginners or new contributors GSoC labels Jan 21, 2019
@ndanielsen
Copy link
Contributor

I've been seeing a lot of good posts on this site: https://towardsdatascience.com/

It might be a great place to submit?

@wagner2010
Copy link
Contributor Author

Absolutely @ndanielsen . Additionally, an accompanying YouTube video blog or even podcast would be awesome as well!

@dnabanita7
Copy link
Contributor

can i be assigned this issue?

@bbengfort
Copy link
Member

@naba7 that would be great - looking forward to seeing a draft!

@wagner2010
Copy link
Contributor Author

@naba7 thanks for taking an interest. As @bbengfort said, we would love to see a draft when you have it ready. Out of curiosity are you participating or applying to participate in Google's Summer of Code (GSoC) program? This is not certainly restricted to those who are participating in the GSoC however I am just wondering if you're looking to participate or if you just have an interest in writing a blogpost for Yellowbrick? Cheers!

@dnabanita7
Copy link
Contributor

dnabanita7 commented Jan 29, 2019 via email

@wagner2010
Copy link
Contributor Author

wagner2010 commented Jan 29, 2019

Very cool. Thanks for reaching out. We don't have specific information on GSoC at this time but we definitely will after the process unfolds. Let's keep in touch and we're happy to check out a draft or hear about your ideas for a posting on a general basis.

@richardjgowers
Copy link

@wagner2010 Cool project! - I’ve done gsoc a few years now, I think the project has to be code based and not documentation. A code based blogpost is obviously borderline but I’d double check this

@dnabanita7
Copy link
Contributor

I have made a draft. I am pasting it below.Please check out and help me get clear through the errors and please specify if I left anything.Here is the link https://github.com/Naba7/NYPD_Hunchlab

@wagner2010
Copy link
Contributor Author

Hi Naba, thank you for your work. I and we (the team) don't really had the bandwidth to take up blogposts drafts at the moment. The purpose of my starting this issue was to include it on a list of issues for our GSoC proposal (for this summer) which is still going through the mentor organization application process. Blogposts are a low priority right now for us as we're trying to push forward on a number of higher priority issues that will propel us towards our next version bump (release). As you know, GSoC hasn't started yet and as I encouraged you a few weeks ago, I encourage you to go through the GSoC student application process. As a mentor organization, we want a chance to go through that application process for ourselves as it is still unfolding. In terms of this draft, due to the fact that we aren't able to review it for edits right now, you have two options: 1.) you can publish it yourself and let us know. We are most happy to Tweet/promote the blogpost. 2.) If you want it published on the District Data Labs blog site, you would need to get in touch with Tony Ojeda and work it out with him. And with that, I'm closing this issue for now.

@dnabanita7
Copy link
Contributor

dnabanita7 commented Feb 13, 2019 via email

@Yogayu
Copy link

Yogayu commented Apr 8, 2019

Dear Mentors @rebeccabilbro @bbengfort @lwgray @ndanielsen @pdamodaran @wagner2010,

I plan to work on "Allow ModelVisualizor to wrap pipeline objects" and this idea for GSoC. So, I'd like to describe how I am going to do this.

Idea: write blog posts about Data Science (including EDA and ML) by highlighting the usage of Yellowbrick.

Writing a blog is great to help users better use Yellowbrick, extend the project's influence, and increase community activity. Besides, I plan to translate the documents into Chinese.

Process
I will write a blog highlighting Yellowbrick for a machine learning project. The basic process follows:

  1. Define the Goal and Audience for a post
  2. Design the machine learning task
  3. Choose the dataset
  4. Explore and design the data science pipeline: problem statement, hypothesis, ingestion, storage, wrangling, statistical exploration, model selection, machine learning, and visualization.
  5. Question and reflection

Theme

The themes I am going to cover is based on the document:

  • Introduction and Quick Start
  • Feature Analysis
  • Target Visualizers
  • Regression
  • Classification
  • Clustering
  • Model Selection
  • Text Modeling

Platform

I'd like to discuss the Platform we choose to publish. I have a Blog (http://data2art.com, with a WeChat Office Account) already, which is one choice. So we can choose to publish on District Data Labs blog site, Medium platform or my blog. Of course, for my blog is the most convenient, the custom is also fast. By the way, it's worth considering that Medium platform can's be assessed in China without a VPN, like Google Site.

Translate the documents into Chinese
Besides, I plan to translate the documents into Chinese which will greatly help increase the influence of the project in the Chinese community. I have already created a PR: "Fix some mistakes in quickstart.rst file and add Chinese translation to the tutorial.rst" which has been merged.

We all know that there is little time left for GSoC Proposal to be submitted. And I understand you are not available to respond in detail. So I will put those in my proposal and submit it. I hope that I will have a chance to discuss and work with you later.


Best wishes,
Xinyu You

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level: novice good for beginners or new contributors priority: low no particular rush in addressing
Projects
None yet
Development

No branches or pull requests

6 participants