LYRICAL ANALYSIS OF BILLBOARD TOP 100 SONGS THROUGH THE YEARS

oct. 15, 2020
·
samuel mignot

Dataset:

A dataset of Billboard top 100 songs since 1958 (the advent) has been collected; containing year, rank, title, artist, lyrics, and audio features. Lyrics were scraped from genius.com. Audio features were added to the dataset using spotify's API.


Feature Engineering:

Lyric sentiment was extracted from each songs lyrics, using TextBlob. Textblob divides sentiment into two categories: 'polarity' and 'subjectivity'. Both were added to the dataset.


Exploratory Data Analysis:

Various features, averaged across songs, were plotted against year: including modality, energy, dancibility, and explicitness. Go to the section for a full list.


Data Preprocessing


A sklearn preprocessing pipeline was created to standard scale numeric features and count vectorize each songs lyrics. This is a preperatory step for training a Random Forest Model to predict age from lyrics and audio features.

Modeling:

A Random Forest was trained, and hyperparameter tuned by a RandomGrid. I then looked at the feature importance of the model to get a sense of the most date-indicative audio features and words.


Bonus Explorations I was also intersted in looking for songs with opposite valence and lyric sentiment scores: depressing songs with happy lyrics or happy songs with depressing lyrics. I decided to look into this by finding the songs with the highest difference in lyric sentiment and valence.

I call these songs Sonic Chimeras.

Next Steps:

  • Look at how the racial and sexual distribution of billboard top 100 artists has evolved.
  • Look at whether specific historical events are latentely (or concretely) present in the lyrics our sentiment of songs.
  • Test other models, including RNN-variants

Imports/Constants/Data Instantiation

In [16]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import os
import sklearn
import warnings

%matplotlib inline
%config InlineBackend.figure_formats = ['svg']


rc = {'figure.figsize':(8,8),
      'axes.facecolor':'white',
      'axes.grid' : True,
      'grid.color': '.8',
      'font.size' : 13}
plt.rcParams.update(rc)

sns.set_style({'axes.facecolor':'white', 'grid.color': '.8', 'font.family':'IBM Plex Mono'})
In [17]:
key_map = {
    0: 'C',
    1: 'C#/Db',
    2: 'D',
    3: 'D#/Eb',
    4: 'E',
    5: 'F',
    6: 'F#/Gb',
    7: 'G',
    8: 'G#/Ab',
    9: 'A',
    10: 'A#/Bb',
    11: 'B',
}
In [18]:
df = pd.read_csv('bb_top_100.csv', index_col=0)
In [34]:
cleaned_df = df.drop(columns=['id', 'title', 'artist', 'lyric_url', 'lyrics', 'spotify_id', 'audio_features', 'uri', 'track_href', 'type'], axis=1)
cleaned_df['duration_mins'] = cleaned_df['duration_ms']/(1000*60)
cleaned_df.head()
Out[34]:
year rank c_artist c_title c_lyrics danceability energy key loudness mode ... valence tempo analysis_url duration_ms time_signature explicit lyric_sent polarity subjectivity duration_mins
0 1980 73 Carole King "One Fine Day" One fine day, you'll look at me\nAnd you will ... 0.397 0.809 5.0 -6.557 1.0 ... 0.735 180.804 https://api.spotify.com/v1/audio-analysis/0yFN... 150200.0 4.0 False Sentiment(polarity=0.30784832451499117, subjec... 0.307848 0.545811 2.503333
1 1963 77 Nat King Cole "That Sunday, That Summer" (If I had to choose just one day)\nIf I had to... 0.249 0.460 1.0 -9.914 1.0 ... 0.462 82.495 https://api.spotify.com/v1/audio-analysis/3SWI... 190667.0 4.0 False Sentiment(polarity=0.28585858585858587, subjec... 0.285859 0.439506 3.177783
2 1964 69 The Shangri-Las "Leader of the Pack" -Is she really going out with him?\n-Well, the... 0.417 0.546 0.0 -8.710 1.0 ... 0.310 126.224 https://api.spotify.com/v1/audio-analysis/6wzL... 173533.0 4.0 False Sentiment(polarity=-0.0621632996632997, subjec... -0.062163 0.548190 2.892217
3 1972 78 Bill Withers "Use Me" My friends feel it's their appointed duty\nThe... 0.759 0.586 11.0 -13.461 0.0 ... 0.948 154.624 https://api.spotify.com/v1/audio-analysis/4gRA... 228327.0 4.0 False Sentiment(polarity=0.1869352869352869, subject... 0.186935 0.639927 3.805450
4 1994 80 Eternal "Stay" "Stay"\nStay (x3)\nStay baby\nStay, come on da... 0.541 0.474 2.0 -6.273 1.0 ... 0.315 129.891 https://api.spotify.com/v1/audio-analysis/6ldR... 207999.0 4.0 False Sentiment(polarity=0.275, subjectivity=0.58500... 0.275000 0.585000 3.466650

5 rows × 24 columns

Extract Lyric Sentiment

Use TextBlob to extract sentiment of lyrics.

TextBlob divides sentiment into two scores: polarity and subjectivity. Polarity is a float within the range [-1.0, 1.0] which maps to negative and positive emotionality. Subjectivity is a float within the range [0.0, 1.0]: the lower the score, the more objective the text is.

Provided that:

  1. the data has been properly processed and collected,
  2. happy lyrics accompany happy sounding sounds and sad lyrics accom

Then extracted sentiment should correlate with spotify's valence and modality audio feature metrics (which register the phonic positivity/negativity of a song).

In [20]:
from textblob import TextBlob
import math
In [22]:
cleaned_df['lyric_sent'] = df.c_lyrics.apply(lambda x: TextBlob(x).sentiment if pd.notnull(x) else None)
In [23]:
cleaned_df[['polarity','subjectivity']] = pd.DataFrame(cleaned_df.lyric_sent.tolist(), index= cleaned_df.index)

Correlation Matrix

In [38]:
corr = cleaned_df[['year', 'loudness', 'danceability', 'valence', 'polarity', 'subjectivity', 'instrumentalness', 'mode', 'acousticness', 'liveness', 'explicit', 'energy', 'duration_ms']].corr()
sns.matrix.heatmap(corr, mask=np.triu(np.ones(corr.shape)), linewidth=1.2)
plt.show()

Most Common Keys (Across All Years)

In [39]:
import scipy
ax = df['key'].value_counts().plot(kind='bar', figsize=(8,8))
ax.set_ylabel('count')
ax.set_xticklabels(key_map.values())
ax.set_title('Most Common Song Keys')
Out[39]:
Text(0.5, 1.0, 'Most Common Song Keys')

Major and Minor Tendencies

There is a very noticeable decline, since the mid 1950's, of Major Key songs (though it's hard not to go down from 90+%).

Of course, it's also important to note that a number of songs move between keys: this isn't captured at all in Spotify's characterizations.

In [40]:
maj = df.groupby('year')['mode'].mean()
ax = sns.regplot('year', 'mode',data=maj.to_frame().reset_index())
ax.set_title("Fraction of Billboard Top 100 Songs in Major Key for each Year")
Out[40]:
Text(0.5, 1.0, 'Fraction of Billboard Top 100 Songs in Major Key for each Year')

Average Valence

Valence is another spotify metric that captures musical positiveness. In the Spotify documentation, it is defined as follows:

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

In [41]:
val = df.groupby('year')['valence'].mean()
ax = sns.regplot('year', 'valence', data=val.to_frame().reset_index())
ax.set_title("Average Valence of Billboard Top 100 Songs per Year")
plt.show()

Average Lyric Polarity

In [42]:
val = df.groupby('year')['polarity'].mean()
ax = sns.regplot('year', 'polarity', data=val.to_frame().reset_index())
ax.set_title("Average Polarity of Billboard Top 100 Songs per Year")
plt.show()

Average Lyric Subjectivity

There is no clear trend in the subjectivity of pop lyrics. They've consistently hovered arounud 50%, which makes: lyrics aren't depely subjective nor are they especially objective.

In [43]:
val = df.groupby('year')['subjectivity'].mean()
ax = sns.scatterplot('year', 'subjectivity', data=val.to_frame().reset_index())
ax.set_title("Average Subjectivity of Billboard Top 100 Songs per Year")
plt.show()

Average Danceability

Valence is another spotify metric that captures musical positiveness. In the Spotify documentation, it is defined as follows:

A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

In [44]:
val = df.groupby('year')['danceability'].mean()
ax = sns.regplot('year', 'danceability', data=val.to_frame().reset_index())
ax.set_title("Average Danceability of Billboard Top 100 Songs per Year")
plt.show()

Average Loudness

Loudness is another of Spotify's audio features.

Since the inception of Billboard top 100, average Loudness of Songs has beeen going.

However, this trend could easily be a byproduct of evolving recording technology (this, of course, could be a large problems for all of Spotify's metrics, which I don't believe are controlled for release year).

In [45]:
val = df.groupby('year')['loudness'].mean()
ax = sns.scatterplot('year', 'loudness', data=val.to_frame().reset_index())
ax.set_title("Average Loudness of Billboard Top 100 Songs per Year")
plt.show()

Duration

Duration has an interesting, definitely non-linear trend.

In [46]:
val = df.groupby('year')['duration_ms'].mean()
ax = sns.scatterplot('year', 'duration_ms', data=val.to_frame().reset_index())
ax.set_title("Average Duration (in ms) of Billboard Top 100 Songs per Year")
plt.show()

Speachiness

In [47]:
val = df.groupby('year')['speechiness'].mean()
ax = sns.regplot('year', 'speechiness', data=val.to_frame().reset_index())
ax.set_title("Average Speechiness. of Billboard Top 100 Songs per Year")
plt.show()

Acousticness

In [48]:
val = df.groupby('year')['acousticness'].mean()
ax = sns.regplot('year', 'acousticness', data=val.to_frame().reset_index())
ax.set_title("Average Acousticness of Billboard Top 100 Songs per Year")
plt.show()

Instrumentalness

In [49]:
val = df.groupby('year')['instrumentalness'].mean()
ax = sns.regplot('year', 'instrumentalness', data=val.to_frame().reset_index())
ax.set_title("Average Instrumentalness of Billboard Top 100 Songs per Year")
plt.show()

Energy

In [50]:
val = df.groupby('year')['energy'].mean()
ax = sns.regplot('year', 'energy', data=val.to_frame().reset_index())
ax.set_title("Average Energy of Billboard Top 100 Songs per Year")
plt.show()

Explicitness

Fraction of Explicit Songs per Year noticably up. However, it could definitely result from recordings not the actual features of the songs themselves.

In [51]:
ax = ((df.groupby('year')['explicit'].sum())/(df.groupby('year')['explicit'].count())).plot()
ax.set_ylabel('Fraction of Explicit Songs per Year')
Out[51]:
Text(0, 0.5, 'Fraction of Explicit Songs per Year')
In [52]:
# CHECK CORRECTNESS BY LOOKING AT FIRST FEW 'EXPLICIT' SONGS

df[(df.explicit) & (df.year<1990)].sort_values('year')
Out[52]:
year rank title artist c_artist c_title lyric_url lyrics c_lyrics spotify_id ... id uri track_href analysis_url duration_ms time_signature explicit lyric_sent polarity subjectivity
5785 1975 29 "Fight the Power" The Isley Brothers The Isley Brothers "Fight the Power" https://www.genius.com/the-isley-brothers-figh... NaN NaN 5q5qmdfdJAVOv1mbSk7xxN ... 5q5qmdfdJAVOv1mbSk7xxN spotify:track:5q5qmdfdJAVOv1mbSk7xxN https://api.spotify.com/v1/tracks/5q5qmdfdJAVO... https://api.spotify.com/v1/audio-analysis/5q5q... 318733.0 4.0 True NaN NaN NaN
1520 1984 21 "Let's Go Crazy" Prince and The Revolution Prince and The Revolution "Let's Go Crazy" https://www.genius.com/prince-and-the-revoluti... \n\nsse\n\n\n[Spoken Intro]\n\n\nDearly belove... [Spoken Intro]\nDearly beloved\nWe are gathere... 0QeI79sp1vS8L3JgpEO7mD ... 0QeI79sp1vS8L3JgpEO7mD spotify:track:0QeI79sp1vS8L3JgpEO7mD https://api.spotify.com/v1/tracks/0QeI79sp1vS8... https://api.spotify.com/v1/audio-analysis/0QeI... 280000.0 4.0 True Sentiment(polarity=-0.13665660511363636, subje... -0.136657 0.585590
2999 1989 21 "Blame It on the Rain" Milli Vanilli Milli Vanilli "Blame It on the Rain" https://www.genius.com/milli-vanilli-blame-it-... \n\nsse\n\n\n[Verse 1]\n\nYou said you didn't ... [Verse 1]\nYou said you didn't need her\nYou t... 6onwlDmIVKE8bBgyBRSuS0 ... 6onwlDmIVKE8bBgyBRSuS0 spotify:track:6onwlDmIVKE8bBgyBRSuS0 https://api.spotify.com/v1/tracks/6onwlDmIVKE8... https://api.spotify.com/v1/audio-analysis/6onw... 257646.0 5.0 True Sentiment(polarity=0.013333333333333326, subje... 0.013333 0.558333

3 rows × 33 columns

Create Preprocessing Pipeline

In [53]:
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.feature_extraction.text import CountVectorizer
In [54]:
from sklearn.preprocessing import FunctionTransformer

text_vars = ['c_lyrics']
num_vars = ['polarity', 'subjectivity', 'duration_ms', 'time_signature', 'energy', 'key', 'loudness', 'mode', 'speechiness', 'acousticness', 'instrumentalness', 'liveness', 'tempo']

preprocess = ColumnTransformer([
    ('nv', Pipeline([
        ('imp', SimpleImputer()),
        ('ss', StandardScaler())
    ]), num_vars),
    ('tv', Pipeline([
        ('imp', SimpleImputer(strategy='constant', fill_value='None')),
        ('oned', FunctionTransformer(np.reshape, kw_args={'newshape':-1})),
        ('count_v', CountVectorizer())
    ]), text_vars)
    ]
)

res = preprocess.fit_transform(df)

Train Random Forest to Predict Release Year

In [55]:
from sklearn.model_selection import train_test_split, RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor
In [56]:
X_train, X_test, y_train, y_test = train_test_split(res, df.year, random_state=14, stratify=df.year, test_size=.2)
In [57]:
from joblib import dump, load

MODEL_SAVE_FILE = 'rcv.joblib'

if MODEL_SAVE_FILE not in os.listdir():
    params = {
     'bootstrap': [True, False],
     'max_depth': [10, 20, 30, 40, 50, 60, 70, 80, 90, 100, None],
     'max_features': ['auto', 'sqrt'],
     'min_samples_leaf': [1, 2, 4],
     'min_samples_split': [2, 5, 10],
     'n_estimators': [200, 400, 600, 800, 1000, 1200, 1400]
    }

    rcv = RandomizedSearchCV(RandomForestRegressor(), params, cv=3, verbose=2)
    rcv.fit(X_train, y_train)
    dump(rcv, MODEL_SAVE_FILE)
else:
    rcv = load(MODEL_SAVE_FILE)
/Users/samuelmignot/.pyenv/versions/3.8.0/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator DecisionTreeRegressor from version 0.24.dev0 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
/Users/samuelmignot/.pyenv/versions/3.8.0/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator RandomForestRegressor from version 0.24.dev0 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
/Users/samuelmignot/.pyenv/versions/3.8.0/lib/python3.8/site-packages/sklearn/base.py:329: UserWarning: Trying to unpickle estimator RandomizedSearchCV from version 0.24.dev0 when using version 0.23.2. This might lead to breaking code or invalid results. Use at your own risk.
  warnings.warn(
In [58]:
rcv.best_estimator_
Out[58]:
RandomForestRegressor(max_depth=90, min_samples_leaf=2, min_samples_split=5,
                      n_estimators=1000)
In [59]:
rcv.best_score_
Out[59]:
0.72003986156259

Most Important Features for Year Prediction

Got the function get_feature_names from Venkatachalam on stackoverflow.

In [60]:
def get_feature_names(column_transformer):
    """Get feature names from all transformers.
    Returns
    -------
    feature_names : list of strings
        Names of the features produced by transform.
    """
    def get_names(trans):
        if trans == 'drop' or (
                hasattr(column, '__len__') and not len(column)):
            return []
        if trans == 'passthrough':
            if hasattr(column_transformer, '_df_columns'):
                if ((not isinstance(column, slice))
                        and all(isinstance(col, str) for col in column)):
                    return column
                else:
                    return column_transformer._df_columns[column]
            else:
                indices = np.arange(column_transformer._n_features)
                return ['x%d' % i for i in indices[column]]
        if not hasattr(trans, 'get_feature_names'):
            warnings.warn("Transformer %s (type %s) does not "
                                 "provide get_feature_names. "
                                 "Will return input column names if available"
                                 % (str(name), type(trans).__name__))
            if column is None:
                return []
            else:
                return [name + "__" + f for f in column]

        return [name + "__" + f for f in trans.get_feature_names()]

    feature_names = []

    if type(column_transformer) == sklearn.pipeline.Pipeline:
        l_transformers = [(name, trans, None, None) for step, name, trans in column_transformer._iter()]
    else:
        # For column transformers, follow the original method
        l_transformers = list(column_transformer._iter(fitted=True))


    for name, trans, column, _ in l_transformers: 
        if type(trans) == sklearn.pipeline.Pipeline:
            # Recursive call on pipeline
            _names = get_feature_names(trans)
            # if pipeline has no transformer that returns names
            if len(_names)==0:
                _names = [name + "__" + f for f in column]
            feature_names.extend(_names)
        else:
            feature_names.extend(get_names(trans))

    return feature_names
In [61]:
sorted(zip(get_feature_names(preprocess), rcv.best_estimator_.feature_importances_), key=lambda x: x[1],reverse=True)[:50]
<ipython-input-60-7b1e9aaf1419>:23: UserWarning: Transformer imp (type SimpleImputer) does not provide get_feature_names. Will return input column names if available
  warnings.warn("Transformer %s (type %s) does not "
<ipython-input-60-7b1e9aaf1419>:23: UserWarning: Transformer ss (type StandardScaler) does not provide get_feature_names. Will return input column names if available
  warnings.warn("Transformer %s (type %s) does not "
<ipython-input-60-7b1e9aaf1419>:23: UserWarning: Transformer oned (type FunctionTransformer) does not provide get_feature_names. Will return input column names if available
  warnings.warn("Transformer %s (type %s) does not "
Out[61]:
[('nv__duration_ms', 0.37514994261569884),
 ('nv__loudness', 0.17692645403977073),
 ('count_v__verse', 0.03628326079355393),
 ('nv__instrumentalness', 0.031161983320100457),
 ('nv__acousticness', 0.029506097849462044),
 ('nv__speechiness', 0.029062386639604747),
 ('count_v__shit', 0.017558158480760394),
 ('nv__energy', 0.01615528184975608),
 ('count_v__chorus', 0.011036525214755094),
 ('count_v__nigga', 0.00987116855465679),
 ('count_v__pre', 0.009838460660814905),
 ('nv__tempo', 0.008620548299603117),
 ('count_v__gon', 0.0070301971660029295),
 ('count_v__you', 0.006355569714279636),
 ('nv__liveness', 0.005925670856101493),
 ('count_v__it', 0.0057311264787564715),
 ('count_v__like', 0.005527164245585076),
 ('count_v__me', 0.005445237174151811),
 ('nv__polarity', 0.00499163816774728),
 ('count_v__up', 0.004580401031943993),
 ('count_v__tryna', 0.004100513190881116),
 ('nv__subjectivity', 0.0036440844844594753),
 ('count_v__my', 0.003525393814831158),
 ('count_v__fuck', 0.0030335211694033134),
 ('count_v__to', 0.002777550689398734),
 ('count_v__the', 0.0025981172245813314),
 ('count_v__and', 0.0025209596243264314),
 ('count_v__this', 0.002451909335617855),
 ('nv__key', 0.002362802939969369),
 ('count_v__bitch', 0.002360813895448969),
 ('count_v__niggas', 0.0023135917619168883),
 ('count_v__that', 0.0023058744379155414),
 ('count_v__in', 0.0022319634290492987),
 ('count_v__ain', 0.0017246962944533226),
 ('count_v__we', 0.0015248510962111756),
 ('count_v__wanna', 0.0015133696863081588),
 ('count_v__on', 0.0014654268229335123),
 ('count_v__love', 0.0014601143266033477),
 ('count_v__so', 0.001442474044268944),
 ('count_v__don', 0.0014311181237328654),
 ('count_v__all', 0.0014234525150335015),
 ('count_v__be', 0.0013889668566714496),
 ('count_v__know', 0.0013759227850620305),
 ('count_v__with', 0.0012994141529612518),
 ('count_v__feel', 0.0012920334445470914),
 ('count_v__your', 0.0012648246822938875),
 ('count_v__can', 0.0012587225653774832),
 ('count_v__cause', 0.001229202649971493),
 ('count_v__ayy', 0.0012158171952289703),
 ('count_v__but', 0.0012087471955818715)]
In [62]:
from sklearn.metrics import mean_squared_error
In [63]:
np.sqrt(mean_squared_error(rcv.best_estimator_.predict(X_train), y_train))
Out[63]:
3.978288228342027

Identifying Sonic Chimeras

Since this dataset contains lyrical and musical sentiment indicators—polarity for lyrics and valence/modality for musical—it would be fun to find contradictory songs: ones that are musically optimistic and lyrically sad or vice versa. There are a couple of potential ways of accomplishing this:

  1. Looking at Minor/Major Songs with Positive and Negative polarity (respectively)
  2. Looking at songs with the greatest difference between polarity and valence (this works because the scale is similar)
In [66]:
NUMBER_OF_SONGS = 10
In [67]:
df['valp'] = abs(df['valence']-df['polarity'])

Minor Songs with the Happiest Lyrics

In [71]:
df.loc[(df['mode']==0.0)].sort_values('polarity', ascending=False).iloc[:NUMBER_OF_SONGS][['title', 'artist', 'valence', 'polarity', 'valp']]
Out[71]:
title artist valence polarity valp
2968 "Up, Up and Away" The 5th Dimension 0.515 0.805000 0.290000
5264 "Fly, Robin, Fly" Silver Convention 0.939 0.800000 0.139000
4618 "Best of You" Foo Fighters 0.369 0.779530 0.410530
4421 "We Are Family" Sister Sledge 0.819 0.700000 0.119000
1551 "Beautiful" Akon featuring Colby O'Donis and Kardinal Offi... 0.614 0.632569 0.018569
4308 "Daddy's Home" Jermaine Jackson 0.604 0.625000 0.021000
1385 "Mi Gente" J Balvin and Willy William featuring Beyoncé 0.469 0.625000 0.156000
1384 "Mi Gente" J Balvin and Willy William featuring Beyoncé 0.469 0.625000 0.156000
5155 "Beautiful Life" Ace of Base 0.749 0.621605 0.127395
3534 "I'll Be Good to You" The Brothers Johnson 0.930 0.619559 0.310441

Major Songs with Saddest Lyrics

In [73]:
df.loc[(df['mode']==1.0)].sort_values('polarity', ascending=True).iloc[:NUMBER_OF_SONGS][['title', 'artist', 'valence', 'polarity', 'valp']]
Out[73]:
title artist valence polarity valp
1526 "Music" Madonna 0.871 -0.675000 1.546000
2551 "Bad Moon Rising" Creedence Clearwater Revival 0.942 -0.675000 1.617000
815 "Everything About You" Ugly Kid Joe 0.738 -0.633212 1.371212
432 "Insane in the Brain" Cypress Hill 0.767 -0.613137 1.380137
3409 "Bad Boys" (theme from Cops) Inner Circle 0.533 -0.594397 1.127397
4923 "Shake It Off" Taylor Swift 0.943 -0.507692 1.450692
1957 "Shake It Off" Taylor Swift 0.943 -0.480196 1.423196
3715 "Don't Call Us, We'll Call You" Sugarloaf 0.760 -0.475000 1.235000
2479 "Crazy" K-Ci & JoJo 0.448 -0.474432 0.922432
1919 "Jump" Van Halen 0.796 -0.451229 1.247229

Songs With the Greatest Difference Between Valence and Polarity

In [74]:
df.sort_values('valp', ascending=False).iloc[:NUMBER_OF_SONGS][['title', 'artist', 'valence', 'polarity', 'valp']]
Out[74]:
title artist valence polarity valp
2551 "Bad Moon Rising" Creedence Clearwater Revival 0.942 -0.675000 1.617000
1526 "Music" Madonna 0.871 -0.675000 1.546000
2438 "Cruel Summer" Bananarama 0.936 -0.569234 1.505234
2437 "Cruel Summer" Ace of Base 0.882 -0.586741 1.468741
4923 "Shake It Off" Taylor Swift 0.943 -0.507692 1.450692
1957 "Shake It Off" Taylor Swift 0.943 -0.480196 1.423196
432 "Insane in the Brain" Cypress Hill 0.767 -0.613137 1.380137
324 "Another Saturday Night" Sam Cooke 0.969 -0.402778 1.371778
1013 "Bad Boy" Miami Sound Machine 0.858 -0.513571 1.371571
815 "Everything About You" Ugly Kid Joe 0.738 -0.633212 1.371212
In [79]:

Out[79]:
Sentiment(polarity=-0.6749999999999999, subjectivity=0.65)
In [ ]: