What I’ve Learned in Data Science so Far…

Yetti Obasade
4 min readMar 6, 2021

Yetti Obasade

It’s week 6 of the course and we are halfway through. A lot has happened in just a few weeks. It doesn’t even seem like that much time has passed! Nonetheless, I have been enjoying this program so far. A few of the topics we have learned about over the last month includes, linear regression, over fitting and under fitting, bias vs variance, classification models, logistic regression, train vs test split, confusion matrices, scaling data, knn models, adaboost, bootstrapping, and a plethora of other data science subjects.

I will explain some of the topics above just to give a deeper overview:

Linear Regression is a type of supervised learning used for predictive analysis. The regression part of linear regression is defined as a search for relationships among features or variables.

The bias vs variance trade off refers to the balance of two different properties of modeling with machine learning. Bias is the difference between the predicted value and the expected value. For example, it is how close in proximity a dart is to the target. Variance is when the model takes into account differences in the data. In relation to the dart board, it can be described as how many darts there are on the board.

Train test split is where data is split into different sets so that a model can be trained on one set, and then be evaluated against the other set. We thus use the best fit or trained model to predict our values.

Bootstrapping is taking random samples from your model and testing them all: These populations are very similar but not identical.

A random forest picks the randomized features and utilizes bootstrapping. It adds another layer of randomness that helps with bagging issues.

There is a multitude of topics I can further define, but those are just a few. This past week we learned about decision trees, randomized forests, bagging, support vectors and generalized linear models. These topics are essentially the layers of data exploration and model optimization that go on top of what we learned in weeks 2-4. Week 5 however, was a little peek into the UX world where we put our skills to the test in web scraping and natural language processing (NLP). NLP just might be my favorite topic covered so far. I won’t make that a solid statement just yet though (haha).

What exactly is natural language processing you might ask? Essentially, it is the processing of language by programs and applications. It is what helps the program or model evaluate the sentimental meaning of human speech. NLP focuses on the context of words. You might notice NLP at work when you’re searching for a product and a few hours later, a similar product pops up in your suggestions. Or if you’ve heard about the digital assistant Siri, you have also come across forms of NLP. The applications of this go even deeper into other areas. Spam filtering, search engines, and news feeds are all things we use on the day to day that include NLP. This is a very powerful type of analysis with a strong influence on how we interact with technology. Yet, most people don’t know much about it.

With 6 weeks left in the program, there is still much left to learn. Next week is project week, so we will be delivering our data and analysis on predictive models and NLP. We will also be covering deep learning, which will introduce us to neural networks and keras. Following that is unsupervised learning, time series analysis, SQL, Bayesian statistics and Big data. The learning train hasn’t slowed down one bit!

It is apparent that this course is keeping me extremely busy and focused. With so much to learn, you have to be committed to staying on top of things. The pace is very fast, and it is easy to fall behind, which, I admittedly have in previous weeks. One of my biggest struggles in the course is cementing the conceptual content of the material. I feel it almost takes me double to time for it to sink in, which makes the follow up material seem daunting. I haven’t quite discovered how to balance myself in that area yet. I am still learning what works best for me, in terms of studying, and I am also trying to work on increasing my focus in class. It has definitely been a challenging journey so far, but I am excited to continue into week 7. the course is halfway over, and now I am one week closer to graduating from this life changing program.

References:

Bias and Variance Tradeoff | Beginners Guide with Python Implementation (analyticsvidhya.com)

Gentle Start to Natural Language Processing using Python | by Raheel Shaikh | Towards Data Science

--

--