In [102]:

import pandas as pd
starwars = pd.read_csv("/Users/33Phoebe/Documents/OneDrive/Data Scientist Path/Data Sets/starwars.csv", encoding = "ISO-8859-1")

In [103]:

starwars.head(10)

Out[103]:

	RespondentID	Have you seen any of the 6 films in the Star Wars franchise?	Do you consider yourself to be a fan of the Star Wars film franchise?	Which of the following Star Wars films have you seen? Please select all that apply.	Unnamed: 4	Unnamed: 5	Unnamed: 6	Unnamed: 7	Unnamed: 8	Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.	...	Unnamed: 28	Which character shot first?	Are you familiar with the Expanded Universe?	Do you consider yourself to be a fan of the Expanded Universe?æ	Do you consider yourself to be a fan of the Star Trek franchise?	Gender	Age	Household Income	Education	Location (Census Region)
0	NaN	Response	Response	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	Star Wars: Episode I The Phantom Menace	...	Yoda	Response	Response	Response	Response	Response	Response	Response	Response	Response
1	3.292880e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	3	...	Very favorably	I don't understand this question	Yes	No	No	Male	18-29	NaN	High school degree	South Atlantic
2	3.292880e+09	No	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	Yes	Male	18-29	$0 - $24,999	Bachelor degree	West South Central
3	3.292765e+09	Yes	No	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	NaN	NaN	NaN	1	...	Unfamiliar (N/A)	I don't understand this question	No	NaN	No	Male	18-29	$0 - $24,999	High school degree	West North Central
4	3.292763e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Very favorably	I don't understand this question	No	NaN	Yes	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
5	3.292731e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Somewhat favorably	Greedo	Yes	No	No	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
6	3.292719e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	1	...	Very favorably	Han	Yes	No	Yes	Male	18-29	$25,000 - $49,999	Bachelor degree	Middle Atlantic
7	3.292685e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	6	...	Very favorably	Han	Yes	No	No	Male	18-29	NaN	High school degree	East North Central
8	3.292664e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	4	...	Very favorably	Han	No	NaN	Yes	Male	18-29	NaN	High school degree	South Atlantic
9	3.292654e+09	Yes	Yes	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Somewhat favorably	Han	No	NaN	No	Male	18-29	$0 - $24,999	Some college or Associate degree	South Atlantic

10 rows × 38 columns

this dataset is the results from a survey conducted by FiveThirtyEight in Jun 2014 with 1186 respondents from SurveyMonkey Audience

In [104]:

starwars.columns

Out[104]:

Index(['RespondentID',
       'Have you seen any of the 6 films in the Star Wars franchise?',
       'Do you consider yourself to be a fan of the Star Wars film franchise?',
       'Which of the following Star Wars films have you seen? Please select all that apply.',
       'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8',
       'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.',
       'Unnamed: 10', 'Unnamed: 11', 'Unnamed: 12', 'Unnamed: 13',
       'Unnamed: 14',
       'Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her.',
       'Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19',
       'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23',
       'Unnamed: 24', 'Unnamed: 25', 'Unnamed: 26', 'Unnamed: 27',
       'Unnamed: 28', 'Which character shot first?',
       'Are you familiar with the Expanded Universe?',
       'Do you consider yourself to be a fan of the Expanded Universe?æ',
       'Do you consider yourself to be a fan of the Star Trek franchise?',
       'Gender', 'Age', 'Household Income', 'Education',
       'Location (Census Region)'],
      dtype='object')

In [105]:

# eliminate the null values
starwars = starwars[pd.notnull(starwars["RespondentID"])]

In [106]:

starwars.shape

Out[106]:

(1186, 38)

Converting Yes/No to Boolean with Series.Map() Method¶

In [107]:

def convert(series):
    yes_no = {"Yes": True, "No": False}
    return series.map(yes_no)
cols = ["Have you seen any of the 6 films in the Star Wars franchise?", "Do you consider yourself to be a fan of the Star Wars film franchise?"]
for col in cols:
    starwars[col] = convert(starwars[col])
starwars.head()

Out[107]:

	RespondentID	Have you seen any of the 6 films in the Star Wars franchise?	Do you consider yourself to be a fan of the Star Wars film franchise?	Which of the following Star Wars films have you seen? Please select all that apply.	Unnamed: 4	Unnamed: 5	Unnamed: 6	Unnamed: 7	Unnamed: 8	Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.	...	Unnamed: 28	Which character shot first?	Are you familiar with the Expanded Universe?	Do you consider yourself to be a fan of the Expanded Universe?æ	Do you consider yourself to be a fan of the Star Trek franchise?	Gender	Age	Household Income	Education	Location (Census Region)
1	3.292880e+09	True	True	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	3	...	Very favorably	I don't understand this question	Yes	No	No	Male	18-29	NaN	High school degree	South Atlantic
2	3.292880e+09	False	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	Yes	Male	18-29	$0 - $24,999	Bachelor degree	West South Central
3	3.292765e+09	True	False	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	NaN	NaN	NaN	1	...	Unfamiliar (N/A)	I don't understand this question	No	NaN	No	Male	18-29	$0 - $24,999	High school degree	West North Central
4	3.292763e+09	True	True	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Very favorably	I don't understand this question	No	NaN	Yes	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
5	3.292731e+09	True	True	Star Wars: Episode I The Phantom Menace	Star Wars: Episode II Attack of the Clones	Star Wars: Episode III Revenge of the Sith	Star Wars: Episode IV A New Hope	Star Wars: Episode V The Empire Strikes Back	Star Wars: Episode VI Return of the Jedi	5	...	Somewhat favorably	Greedo	Yes	No	No	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central

5 rows × 38 columns

In [108]:

starwar_movies = ["Star Wars: Episode I  The Phantom Menace", "Star Wars: Episode II  Attack of the Clones", "Star Wars: Episode III  Revenge of the Sith", "Star Wars: Episode IV  A New Hope", "Star Wars: Episode V The Empire Strikes Back", "Star Wars: Episode VI Return of the Jedi"]
movie_dict = {}
for movie in starwar_movies:
    movie_dict[movie] = True
#making sure the name in starwar_movies match the entries
for i in starwars.iloc[:, 3:9]:
    print(starwars[i].unique())

['Star Wars: Episode I  The Phantom Menace' nan]
['Star Wars: Episode II  Attack of the Clones' nan]
['Star Wars: Episode III  Revenge of the Sith' nan]
['Star Wars: Episode IV  A New Hope' nan]
['Star Wars: Episode V The Empire Strikes Back' nan]
['Star Wars: Episode VI Return of the Jedi' nan]

In [109]:

import numpy as np
movie_dict[np.nan] = False
print(movie_dict)
cols = starwars.columns[3:9].tolist()
print(cols)
rename_dict = {}

{'Star Wars: Episode I  The Phantom Menace': True, 'Star Wars: Episode II  Attack of the Clones': True, 'Star Wars: Episode III  Revenge of the Sith': True, 'Star Wars: Episode IV  A New Hope': True, 'Star Wars: Episode V The Empire Strikes Back': True, 'Star Wars: Episode VI Return of the Jedi': True, nan: False}
['Which of the following Star Wars films have you seen? Please select all that apply.', 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8']

In [110]:

num = 1
for col in cols:
    rename_dict[col] = 'seen_' + str(num)
    num += 1
rename_dict

Out[110]:

{'Unnamed: 4': 'seen_2',
 'Unnamed: 5': 'seen_3',
 'Unnamed: 6': 'seen_4',
 'Unnamed: 7': 'seen_5',
 'Unnamed: 8': 'seen_6',
 'Which of the following Star Wars films have you seen? Please select all that apply.': 'seen_1'}

In [111]:

#rename col 3~9:
starwars = starwars.rename(columns = rename_dict)

In [112]:

newcol = starwars
for col in newcol.iloc[:, 3:9]:
    newcol[col] = newcol[col].map(movie_dict)
newcol.head()
starwars = newcol
starwars.head()

Out[112]:

	RespondentID	Have you seen any of the 6 films in the Star Wars franchise?	Do you consider yourself to be a fan of the Star Wars film franchise?	seen_1	seen_2	seen_3	seen_4	seen_5	seen_6	Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.	...	Unnamed: 28	Which character shot first?	Are you familiar with the Expanded Universe?	Do you consider yourself to be a fan of the Expanded Universe?æ	Do you consider yourself to be a fan of the Star Trek franchise?	Gender	Age	Household Income	Education	Location (Census Region)
1	3.292880e+09	True	True	True	True	True	True	True	True	3	...	Very favorably	I don't understand this question	Yes	No	No	Male	18-29	NaN	High school degree	South Atlantic
2	3.292880e+09	False	NaN	False	False	False	False	False	False	NaN	...	NaN	NaN	NaN	NaN	Yes	Male	18-29	$0 - $24,999	Bachelor degree	West South Central
3	3.292765e+09	True	False	True	True	True	False	False	False	1	...	Unfamiliar (N/A)	I don't understand this question	No	NaN	No	Male	18-29	$0 - $24,999	High school degree	West North Central
4	3.292763e+09	True	True	True	True	True	True	True	True	5	...	Very favorably	I don't understand this question	No	NaN	Yes	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
5	3.292731e+09	True	True	True	True	True	True	True	True	5	...	Somewhat favorably	Greedo	Yes	No	No	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central

5 rows × 38 columns

In [113]:

#convert data type
starwars[starwars.columns[9:15]] = starwars[starwars.columns[9:15]].astype(float)

In [118]:

rename_dict2 = {}
ranking_cols = starwars.columns[9:15].tolist()
num = 1
for col in ranking_cols:
    rename_dict2[col] = 'ranking_' + str(num)
    num += 1
rename_dict2

Out[118]:

{'Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film.': 'ranking_1',
 'Unnamed: 10': 'ranking_2',
 'Unnamed: 11': 'ranking_3',
 'Unnamed: 12': 'ranking_4',
 'Unnamed: 13': 'ranking_5',
 'Unnamed: 14': 'ranking_6'}

In [121]:

starwars = starwars.rename(columns = rename_dict2)
starwars.head()

Out[121]:

	RespondentID	Have you seen any of the 6 films in the Star Wars franchise?	Do you consider yourself to be a fan of the Star Wars film franchise?	seen_1	seen_2	seen_3	seen_4	seen_5	seen_6	ranking_1	...	Unnamed: 28	Which character shot first?	Are you familiar with the Expanded Universe?	Do you consider yourself to be a fan of the Expanded Universe?æ	Do you consider yourself to be a fan of the Star Trek franchise?	Gender	Age	Household Income	Education	Location (Census Region)
1	3.292880e+09	True	True	True	True	True	True	True	True	3.0	...	Very favorably	I don't understand this question	Yes	No	No	Male	18-29	NaN	High school degree	South Atlantic
2	3.292880e+09	False	NaN	False	False	False	False	False	False	NaN	...	NaN	NaN	NaN	NaN	Yes	Male	18-29	$0 - $24,999	Bachelor degree	West South Central
3	3.292765e+09	True	False	True	True	True	False	False	False	1.0	...	Unfamiliar (N/A)	I don't understand this question	No	NaN	No	Male	18-29	$0 - $24,999	High school degree	West North Central
4	3.292763e+09	True	True	True	True	True	True	True	True	5.0	...	Very favorably	I don't understand this question	No	NaN	Yes	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central
5	3.292731e+09	True	True	True	True	True	True	True	True	5.0	...	Somewhat favorably	Greedo	Yes	No	No	Male	18-29	$100,000 - $149,999	Some college or Associate degree	West North Central

5 rows × 38 columns

In [125]:

ranking = starwars[starwars.columns[9:15]].mean()

In [123]:

import matplotlib.pyplot as plt
%matplotlib inline

In [126]:

ranking.plot.bar()

Out[126]:

<matplotlib.axes._subplots.AxesSubplot at 0x110928fd0>

From the barchart above, we can tell that movie 5 is ranked highest among the franchise.

In [127]:

seen = starwars[starwars.columns[3:9]].sum()

In [128]:

seen.plot.bar()

Out[128]:

<matplotlib.axes._subplots.AxesSubplot at 0x11591d668>

It appears most people have seen the most recent two movies as well as the very first one. Movie 2 and 3 are ranked the lowest and correspondingly the viewership is the lowest two. While the 4th movie is a rejuvenation for the series, the viewership goes up at the sametime two, and highest viewership occurs at the best movie, and makes great sense, also the viewership persists at a high level from the 5th to 6th due to audience expectation, presumably.

In [142]:

males = starwars[starwars["Gender"] == "Male"]
females = starwars[starwars["Gender"] == "Female"]
male_seen = males[males.columns[3:9]].sum()
female_seen = females[females.columns[3:9]].sum()
male_ranking = males[males.columns[9:15]].mean()
female_ranking = females[females.columns[9:15]].mean()
test = [male_seen, female_seen, male_ranking, female_ranking]

j = [0, 0, 1, 1]
g = [0, 1, 0, 1]
fig, axes = plt.subplots(2, 2, figsize = (9, 9))
for i in range(4):
    if i!=1:
        test[i].plot.bar(ax = axes[j[i], g[i]])
    else:
        test[i].plot.bar(ax = axes[j[i], g[i]], ylim = (0, 400))
plt.show()

There are more men watched series than female do, as we would expect. Except movie 3, which both groups equally dislike, movie 4, female views dislike more, overall females have a more positive reviews on the series.

Phoebe's blog

Starwar Movies-data manipulation practice

Converting Yes/No to Boolean with Series.Map() Method¶

Converting Yes/No to Boolean with Series.Map() Method¶

social