10 ,most frequent words in a string Python -


i need display 10 frequent words in text file, frequent least number of times has been used. can't use dictionary or counter function. far have this:

import urllib cnt = 0 i=0 txtfile = urllib.urlopen("http://textfiles.com/etext/fiction/alice30.txt") uniques = [] line in txtfile:     words = line.split()     word in words:         if word not in uniques:             uniques.append(word) word in words:     while i<len(uniques):         i+=1         if word in uniques:              cnt += 1 print cnt 

now think should every word in array 'uniques' , see how many times repeated in file , add array counts instance of each word. stuck. don't know how proceed.

any appreciated. thank you

you're on right track. note algorithm quite slow because each unique word, iterates on of words. faster approach without hashing involve building trie.

# following assumes have alice30.txt on disk. # start splitting file lowercase words. words = open('alice30.txt').read().lower().split()  # set of unique words. uniques = [] word in words:   if word not in uniques:     uniques.append(word)  # make list of (count, unique) tuples. counts = [] unique in uniques:   count = 0              # initialize count zero.   word in words:     # iterate on words.     if word == unique:   # word equal current unique?       count += 1         # if so, increment count   counts.append((count, unique))  counts.sort()            # sorting list puts lowest counts first. counts.reverse()         # reverse it, putting highest counts first. # print ten words highest counts. in range(min(10, len(counts))):   count, word = counts[i]   print('%s %d' % (word, count)) 

Comments

Popular posts from this blog

python - mat is not a numerical tuple : openCV error -

c# - MSAA finds controls UI Automation doesn't -

wordpress - .htaccess: RewriteRule: bad flag delimiters -