Python: Most frequent words in a string

This post is my practice on getting the most frequent words in a string in Python. Here is the code

#import the necessary packages
from collections import Counter
import pandas as pd
#open the text file. Here in this case, it is named "text"
with open('text.txt') as fin:
    counter = Counter(fin.read().strip().split())

numbers = sorted(counter.most_common(), key=lambda student: student[1], reverse=True)
top15 = numbers[0:15]

counter.most_common() is the function to get all the words and their respective count from the string. If you put a number, let’s say 10, in the brackets, it means that you want to get only the first 10 elements of the array. Here is how counter.most_common(10) looks:

[(‘to’, 27), (‘the’, 26), (‘of’, 20), (‘and’, 19), (‘in’, 16), (‘a’, 15), (‘is’, 12), (‘for’, 11), (‘it’, 9), (‘new’, 9)]

numbers = sorted(counter.most_common(), key=lambda student: student[1], reverse=True)
top15 = numbers[0:15]

The above code is to get all the words and sort them in the descending order according to the words’ frequency. top15 is to get the first 15 elements of the sorted array. Here is how the top15 looks:

[(‘to’, 27), (‘the’, 26), (‘of’, 20), (‘and’, 19), (‘in’, 16), (‘a’, 15), (‘is’, 12), (‘for’, 11), (‘it’, 9), (‘new’, 9), (‘are’, 9), (‘their’, 8), (‘video’, 7), (‘has’, 7), (‘by’, 7)]

After we get the top 15, we should put them into a data frame so that data processing can be easier. Here is how

text = [] #an array for the words
number = [] #an array for the frequency
for i in top15: #iterate through the top 15
   text.append(i[0])
   number.append(i[1])

#create the data frame
rawdata = {'words': text, 'frequency': number}
df = pd.DataFrame(rawdata, columns = ['words', 'frequency'])

This is the final data frame

    words  frequency
0      to         27
1     the         26
2      of         20
3     and         19
4      in         16
5       a         15
6      is         12
7     for         11
8      it          9
9     new          9
10    are          9
11  their          8
12  video          7
13    has          7
14     by          7

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.