Starwar Movies-data manipulation practice

In [102]:
import pandas as pd
starwars = pd.read_csv("/Users/33Phoebe/Documents/OneDrive/Data Scientist Path/Data Sets/starwars.csv", encoding = "ISO-8859-1")
In [103]:
starwars.head(10)
Out[103]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? Which of the following Star Wars films have you seen? Please select all that apply. Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
0 NaN Response Response Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi Star Wars: Episode I The Phantom Menace ... Yoda Response Response Response Response Response Response Response Response Response
1 3.292880e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 3 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 No NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 Yes No Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith NaN NaN NaN 1 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
6 3.292719e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 1 ... Very favorably Han Yes No Yes Male 18-29 $25,000 - $49,999 Bachelor degree Middle Atlantic
7 3.292685e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 6 ... Very favorably Han Yes No No Male 18-29 NaN High school degree East North Central
8 3.292664e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 4 ... Very favorably Han No NaN Yes Male 18-29 NaN High school degree South Atlantic
9 3.292654e+09 Yes Yes Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Somewhat favorably Han No NaN No Male 18-29 $0 - $24,999 Some college or Associate degree South Atlantic

10 rows × 38 columns

this dataset is the results from a survey conducted by FiveThirtyEight in Jun 2014 with 1186 respondents from SurveyMonkey Audience

In [104]:
starwars.columns
Out[104]:
Index(['RespondentID',
       'Have you seen any of the 6 films in the Star Wars franchise?',
       'Do you consider yourself to be a fan of the Star Wars film franchise?',
       'Which of the following Star Wars films have you seen? Please select all that apply.',
       'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',
       'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.',
       'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13',
       'Unnamed: 14',
       'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.',
       'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
       'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',
       'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
       'Unnamed: 28', 'Which character shot first?',
       'Are you familiar with the Expanded Universe?',
       'Do you consider yourself to be a fan of the Expanded Universe?ξ',
       'Do you consider yourself to be a fan of the Star Trek franchise?',
       'Gender', 'Age', 'Household Income', 'Education',
       'Location (Census Region)'],
      dtype='object')
In [105]:
# eliminate the null values
starwars = starwars[pd.notnull(starwars["RespondentID"])]
In [106]:
starwars.shape
Out[106]:
(1186, 38)

Converting Yes/No to Boolean with Series.Map() Method

In [107]:
def convert(series):
    yes_no = {"Yes": True, "No": False}
    return series.map(yes_no)
cols = ["Have you seen any of the 6 films in the Star Wars franchise?", "Do you consider yourself to be a fan of the Star Wars film franchise?"]
for col in cols:
    starwars[col] = convert(starwars[col])
starwars.head()
Out[107]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? Which of the following Star Wars films have you seen? Please select all that apply. Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
1 3.292880e+09 True True Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 3 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 False NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 True False Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith NaN NaN NaN 1 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 True True Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 True True Star Wars: Episode I The Phantom Menace Star Wars: Episode II Attack of the Clones Star Wars: Episode III Revenge of the Sith Star Wars: Episode IV A New Hope Star Wars: Episode V The Empire Strikes Back Star Wars: Episode VI Return of the Jedi 5 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central

5 rows × 38 columns

In [108]:
starwar_movies = ["Star Wars: Episode I  The Phantom Menace", "Star Wars: Episode II  Attack of the Clones", "Star Wars: Episode III  Revenge of the Sith", "Star Wars: Episode IV  A New Hope", "Star Wars: Episode V The Empire Strikes Back", "Star Wars: Episode VI Return of the Jedi"]
movie_dict = {}
for movie in starwar_movies:
    movie_dict[movie] = True
#making sure the name in starwar_movies match the entries
for i in starwars.iloc[:, 3:9]:
    print(starwars[i].unique())
['Star Wars: Episode I  The Phantom Menace' nan]
['Star Wars: Episode II  Attack of the Clones' nan]
['Star Wars: Episode III  Revenge of the Sith' nan]
['Star Wars: Episode IV  A New Hope' nan]
['Star Wars: Episode V The Empire Strikes Back' nan]
['Star Wars: Episode VI Return of the Jedi' nan]
In [109]:
import numpy as np
movie_dict[np.nan] = False
print(movie_dict)
cols = starwars.columns[3:9].tolist()
print(cols)
rename_dict = {}
{'Star Wars: Episode I  The Phantom Menace': True, 'Star Wars: Episode II  Attack of the Clones': True, 'Star Wars: Episode III  Revenge of the Sith': True, 'Star Wars: Episode IV  A New Hope': True, 'Star Wars: Episode V The Empire Strikes Back': True, 'Star Wars: Episode VI Return of the Jedi': True, nan: False}
['Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8']
In [110]:
num = 1
for col in cols:
    rename_dict[col] = 'seen_' + str(num)
    num += 1
rename_dict
    
Out[110]:
{'Unnamed: 4': 'seen_2',
 'Unnamed: 5': 'seen_3',
 'Unnamed: 6': 'seen_4',
 'Unnamed: 7': 'seen_5',
 'Unnamed: 8': 'seen_6',
 'Which of the following Star Wars films have you seen? Please select all that apply.': 'seen_1'}
In [111]:
#rename col 3~9:
starwars = starwars.rename(columns = rename_dict)
In [112]:
newcol = starwars
for col in newcol.iloc[:, 3:9]:
    newcol[col] = newcol[col].map(movie_dict)
newcol.head()
starwars = newcol
starwars.head()
Out[112]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film. ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
1 3.292880e+09 True True True True True True True True 3 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 False NaN False False False False False False NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 True False True True True False False False 1 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 True True True True True True True True 5 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 True True True True True True True True 5 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central

5 rows × 38 columns

In [113]:
#convert data type
starwars[starwars.columns[9:15]] = starwars[starwars.columns[9:15]].astype(float)
In [118]:
rename_dict2 = {}
ranking_cols = starwars.columns[9:15].tolist()
num = 1
for col in ranking_cols:
    rename_dict2[col] = 'ranking_' + str(num)
    num += 1
rename_dict2
Out[118]:
{'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.': 'ranking_1',
 'Unnamed: 10': 'ranking_2',
 'Unnamed: 11': 'ranking_3',
 'Unnamed: 12': 'ranking_4',
 'Unnamed: 13': 'ranking_5',
 'Unnamed: 14': 'ranking_6'}
In [121]:
starwars = starwars.rename(columns = rename_dict2)
starwars.head()
Out[121]:
RespondentID Have you seen any of the 6 films in the Star Wars franchise? Do you consider yourself to be a fan of the Star Wars film franchise? seen_1 seen_2 seen_3 seen_4 seen_5 seen_6 ranking_1 ... Unnamed: 28 Which character shot first? Are you familiar with the Expanded Universe? Do you consider yourself to be a fan of the Expanded Universe?ξ Do you consider yourself to be a fan of the Star Trek franchise? Gender Age Household Income Education Location (Census Region)
1 3.292880e+09 True True True True True True True True 3.0 ... Very favorably I don't understand this question Yes No No Male 18-29 NaN High school degree South Atlantic
2 3.292880e+09 False NaN False False False False False False NaN ... NaN NaN NaN NaN Yes Male 18-29 $0 - $24,999 Bachelor degree West South Central
3 3.292765e+09 True False True True True False False False 1.0 ... Unfamiliar (N/A) I don't understand this question No NaN No Male 18-29 $0 - $24,999 High school degree West North Central
4 3.292763e+09 True True True True True True True True 5.0 ... Very favorably I don't understand this question No NaN Yes Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central
5 3.292731e+09 True True True True True True True True 5.0 ... Somewhat favorably Greedo Yes No No Male 18-29 $100,000 - $149,999 Some college or Associate degree West North Central

5 rows × 38 columns

In [125]:
ranking = starwars[starwars.columns[9:15]].mean()
In [123]:
import matplotlib.pyplot as plt
%matplotlib inline
In [126]:
ranking.plot.bar()
Out[126]:
<matplotlib.axes._subplots.AxesSubplot at 0x110928fd0>

From the barchart above, we can tell that movie 5 is ranked highest among the franchise.

In [127]:
seen = starwars[starwars.columns[3:9]].sum()
In [128]:
seen.plot.bar()
Out[128]:
<matplotlib.axes._subplots.AxesSubplot at 0x11591d668>

It appears most people have seen the most recent two movies as well as the very first one. Movie 2 and 3 are ranked the lowest and correspondingly the viewership is the lowest two. While the 4th movie is a rejuvenation for the series, the viewership goes up at the sametime two, and highest viewership occurs at the best movie, and makes great sense, also the viewership persists at a high level from the 5th to 6th due to audience expectation, presumably.

In [142]:
males = starwars[starwars["Gender"] == "Male"]
females = starwars[starwars["Gender"] == "Female"]
male_seen = males[males.columns[3:9]].sum()
female_seen = females[females.columns[3:9]].sum()
male_ranking = males[males.columns[9:15]].mean()
female_ranking = females[females.columns[9:15]].mean()
test = [male_seen, female_seen, male_ranking, female_ranking]

j = [0, 0, 1, 1]
g = [0, 1, 0, 1]
fig, axes = plt.subplots(2, 2, figsize = (9, 9))
for i in range(4):
    if i!=1:
        test[i].plot.bar(ax = axes[j[i], g[i]])
    else:
        test[i].plot.bar(ax = axes[j[i], g[i]], ylim = (0, 400))
plt.show()

There are more men watched series than female do, as we would expect. Except movie 3, which both groups equally dislike, movie 4, female views dislike more, overall females have a more positive reviews on the series.

social