Intro to Data Science with Metis

Data Science, Documentation, Learning, Python

While searching through Meetup.com, I stumbled upon a free “One Day at Bootcamp” sponsored by Metis.   Since I am unfamiliar with data science and love any opportunity to learn something new, I signed up.  Within minutes, I had a welcome email from Metis letting me know of the things I should expect to learn in their class.  A few days later, I received a follow-up email reminding me that I should download Python 3 and Anaconda, if I didn’t already have it.  The correspondences that were sent from Metis were easy to follow and I found myself with the proper tools for the task ahead.

The day of the bootcamp, I wandered into the room and was greeted by a friendly person.  We started a bit late because of technical difficulties, but the teacher Roberto Reif, gave thorough explanations.  This class would have been accessible to a person of any skill level.  Throughout the course, Roberto was receptive to questions and interacted with the students.  We opened the Jupyter notebooks that were provided by Metis and began to work.  From what I understand, Jupyter notebook is a powerful prototyping tool.  It looks like a standard webpage or markdown file intermingled with mini-terminals for executing code.

First, we started with an intro to Python.  I haven’t written in Python code very much so I appreciated the intro.  We went through data types, indexing, loops, and functions.  I find it funny that Python has a data structure called a dictionary which is analogous to a Ruby hash.  Some new things I learned about were tuples and sets.  In Python, white space is extremely important.  I have been used to languages that call for an ‘end’ to a loop or function.  Python uses white space to mark which parts are or are not included in the function.  Apparently, the Python documentation isn’t very helpful due to it being open sourced, but there are some powerful modules available that I’d like to take some more time to research.

After the intro, we made our way to the next notebook on linear regression.  Linear regression is a tool to help us find trends in data.  Roberto showed us how to interact with data and make mock data with gaussian noise.  He said that what we were doing in this segment would be familiar to someone who uses MATLAB.

The Scikit learn api was the next subject that we looked at.  I will summarize Scikit by quoting the notebook.  “Basically, it’s an extraordinarily convenient way to start into machine learning and data mining.”  We use the SkLearn Api with three(ish) steps.

  1. Import and initialize the regression from SkLearn

2. Call the fit function of the module (learn from the data)

3. Predict/transform the data (predict outcome)

As an example of this model, we could see a prediction of which handwritten numbers were which numerical digits.  The results were surprisingly accurate.

Our final segment of the day was case study with Scikit learn and Pandas.  Pandas is a module for Python that helps you handle lots of data.  Our first example had us manipulate data from a CSV of weather and use Pandas to learn about our data.  In addition to data manipulation, we were able to visualize the data in a way that elucidated trends.  For the icing on the cake, we  built regression models in scikit-learn for housing in Ames, IA.  This was an excellent example because anyone could see how this model could be useful for predicting values.

Overall, I would say that my experience with Metis was fantastic and I learned a lot that day.  The staff was extremely helpful and I enjoyed the passion that everyone had for data science.  I would definitely attend another event at Metis.