10 ,most frequent words in a string Python -

- March 15, 2012

i need display 10 frequent words in text file, frequent least number of times has been used. can't use dictionary or counter function. far have this:

import urllib cnt = 0 i=0 txtfile = urllib.urlopen("http://textfiles.com/etext/fiction/alice30.txt") uniques = [] line in txtfile:     words = line.split()     word in words:         if word not in uniques:             uniques.append(word) word in words:     while i<len(uniques):         i+=1         if word in uniques:              cnt += 1 print cnt

now think should every word in array 'uniques' , see how many times repeated in file , add array counts instance of each word. stuck. don't know how proceed.

any appreciated. thank you

you're on right track. note algorithm quite slow because each unique word, iterates on of words. faster approach without hashing involve building trie.

# following assumes have alice30.txt on disk. # start splitting file lowercase words. words = open('alice30.txt').read().lower().split()  # set of unique words. uniques = [] word in words:   if word not in uniques:     uniques.append(word)  # make list of (count, unique) tuples. counts = [] unique in uniques:   count = 0              # initialize count zero.   word in words:     # iterate on words.     if word == unique:   # word equal current unique?       count += 1         # if so, increment count   counts.append((count, unique))  counts.sort()            # sorting list puts lowest counts first. counts.reverse()         # reverse it, putting highest counts first. # print ten words highest counts. in range(min(10, len(counts))):   count, word = counts[i]   print('%s %d' % (word, count))

Search This Blog

Crty

10 ,most frequent words in a string Python -

Comments

Post a Comment

Popular posts from this blog

c# - MSAA finds controls UI Automation doesn't -

python - mat is not a numerical tuple : openCV error -

wordpress - .htaccess: RewriteRule: bad flag delimiters -