Intro to Data Science with Metis

Data Science, Documentation, Learning, Python

While searching through Meetup.com, I stumbled upon a free “One Day at Bootcamp” sponsored by Metis.   Since I am unfamiliar with data science and love any opportunity to learn something new, I signed up.  Within minutes, I had a welcome email from Metis letting me know of the things I should expect to learn in their class.  A few days later, I received a follow-up email reminding me that I should download Python 3 and Anaconda, if I didn’t already have it.  The correspondences that were sent from Metis were easy to follow and I found myself with the proper tools for the task ahead.

The day of the bootcamp, I wandered into the room and was greeted by a friendly person.  We started a bit late because of technical difficulties, but the teacher Roberto Reif, gave thorough explanations.  This class would have been accessible to a person of any skill level.  Throughout the course, Roberto was receptive to questions and interacted with the students.  We opened the Jupyter notebooks that were provided by Metis and began to work.  From what I understand, Jupyter notebook is a powerful prototyping tool.  It looks like a standard webpage or markdown file intermingled with mini-terminals for executing code.

First, we started with an intro to Python.  I haven’t written in Python code very much so I appreciated the intro.  We went through data types, indexing, loops, and functions.  I find it funny that Python has a data structure called a dictionary which is analogous to a Ruby hash.  Some new things I learned about were tuples and sets.  In Python, white space is extremely important.  I have been used to languages that call for an ‘end’ to a loop or function.  Python uses white space to mark which parts are or are not included in the function.  Apparently, the Python documentation isn’t very helpful due to it being open sourced, but there are some powerful modules available that I’d like to take some more time to research.

After the intro, we made our way to the next notebook on linear regression.  Linear regression is a tool to help us find trends in data.  Roberto showed us how to interact with data and make mock data with gaussian noise.  He said that what we were doing in this segment would be familiar to someone who uses MATLAB.

The Scikit learn api was the next subject that we looked at.  I will summarize Scikit by quoting the notebook.  “Basically, it’s an extraordinarily convenient way to start into machine learning and data mining.”  We use the SkLearn Api with three(ish) steps.

  1. Import and initialize the regression from SkLearn

2. Call the fit function of the module (learn from the data)

3. Predict/transform the data (predict outcome)

As an example of this model, we could see a prediction of which handwritten numbers were which numerical digits.  The results were surprisingly accurate.

Our final segment of the day was case study with Scikit learn and Pandas.  Pandas is a module for Python that helps you handle lots of data.  Our first example had us manipulate data from a CSV of weather and use Pandas to learn about our data.  In addition to data manipulation, we were able to visualize the data in a way that elucidated trends.  For the icing on the cake, we  built regression models in scikit-learn for housing in Ames, IA.  This was an excellent example because anyone could see how this model could be useful for predicting values.

Overall, I would say that my experience with Metis was fantastic and I learned a lot that day.  The staff was extremely helpful and I enjoyed the passion that everyone had for data science.  I would definitely attend another event at Metis.

The Importance of Documentation

Documentation

You need to know yourself. Always. Especially, know what you don’t know.  When getting into the journey of coding, it can feel so cumbersome.  Luckily, anything that may seem confusing can be easily searched for and code libraries are easily found.

There is a phrase, “standing on the shoulders of giants.”  This means that everything you can do on the journey of coding has been built on the work of others.  It’s comforting to know that I can find definitions of modules fairly easily.

For the Udemy Course, Programming Foundations with Python, there is a project where you have to learn to draw a flower using the turtle module. I am comfortable enough in my knowledge that I know that I don’t know how to draw a flower with the turtle module as well as I would like.  Possibly, I could draw something that might resemble a flower abstractly, but that wasn’t good enough for me.

I decided that I needed to learn how to draw flower in Python that could conceivably be considered a flower on paper.  In this pursuit, I utilized Google.  Through the list of lackluster flowers, I stumbled upon something that suited my idea of the visual interpretation.

I could read the Python script, but I didn’t quite understand it.  This is where the Python Standard Library comes in handy.  After figuring out what the code meant, I modified it to fit my idea of a flower.  As it turns out, things get simpler when you break them down.