[P5-DS] My Data Science Path 2019 sept-4th week

SanjayKhanSSK
3 min readSep 21, 2019

Subject:How to analyze a data-set set from kaggle & How about IBM DS Course

Photo by Joshua Sortino on Unsplash

Hello everyone welcome back to my data science journey , it’s my 5th week it means 5th post ,How I moved to Learning to Practicing ,yeah i think my learning will help me to take a data-set till the model development , if you read my old posts you will understand what are the things i learned right.

1) Let’s Talk about where i faced difficulty

A s we all know the real world data is really a messy one , so i planned to visit to kaggle and take some dataset i will do some analysis , so i visited kaggle.com and searched for India dataset it showed me few result . so i planed of taking Startup India Dataset and i downloaded and loaded it , there comes the real task.

All of the columns in Object Type , when i convert df[“Date”] to datetime it’s not converted , df[“Amount”] to float it’s not converted , and a sound from my mind “let’s do this ” so i gone through each and every column it’s not a cleaned data , it’s still messy

so i found what are the things making my df[“Date”] to not convert-able column :

for i in Date.unique():
if len(i) >10 :
print(i)
elif i[-5] != "/":
print(i)
---------------------------------------

output:
05/072018
01/07/015
\\xc2\\xa010/7/2015
12/05.2015
13/04.2015
15/01.2015
22/01//2015

so these are the date’s that caused me problem i fixed those

And moved to df[“Amount]:

I found lots of characters and symbols in the amount columns so i wrote code that only scrape the [0–9] and [ . ] with the help of re library.

import redef modifiy_int(amt):     if amt is np.nan:
return np.nan
x = re.compile(“[\d|.]”)

x=x.findall(amt)
if len(x) >0 :
return ‘’.join(x)
else:
return np.nan

And this solved my problems after that i filled all the NaN values with mean and mode so it finished my data cleaning area , and moved to visualization the DataSet source have a good questions.

Possible questions which could be answered are:How does the funding ecosystem change with time?
Do cities play a major role in funding?
Which industries are favored by investors for funding?
Who are the important investors in the Indian Ecosystem?
How much funds does startups generally get in India?

so i plotted few graphs on based that question ,

And I planned to take more dataset for my practice for next week till the September ends.

2)So what about IBM data science ?

Photo by Carson Masterson on Unsplash

I’m not a judge to say the result , I’ll say my opinion the course i took is Data Analysis with pandas part of IBM DS module. In this weak trust me i completed the 4 weeks course in a half-day the 5th weak is model creation i haven’t viewed the videos because i have planned the path to learn Machine Learning because I already know how the Algorithms works Theoretically (5–6 months ago) so i need to learn the math way. I think they haven’t coverd all the pandas function , EDA but they covered the Most Famous techniques.

3) Resources of the Week:

Photo by Jordan Sanchez on Unsplash

kaggle : Trust me we need to do lots of practice , Tera bytes of practice , while practice we can learn a lot , no matter at what stage you’re now , try to analyze a dataset with what you have learned.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

No responses yet

Write a response