java - Read a .txt file and return a list of words with their frequency in the file -
i have far prints .txt file screen:
import java.io.*; public class readfile { public static void main(string[] args) throws ioexception { string wordlist; int frequency; file file = new file("file1.txt"); bufferedreader br = new bufferedreader(new inputstreamreader(new fileinputstream(file))); string line = null; while( (line = br.readline()) != null) { string [] tokens = line.split("\\s+"); system.out.println(line); } } }
can me prints word list , words frequency?
do this. i'm assuming comma or period occur in file. else you'll have remove other punctuation characters well. i'm using treemap words in map stored natural alphabetical order
public static treemap<string, integer> generatefrequencylist() throws ioexception { treemap<string, integer> wordsfrequencymap = new treemap<string, integer>(); string file = "/tmp/lorem.txt"; bufferedreader br = new bufferedreader(new filereader(file)); string line; while( (line = br.readline()) != null){ string [] tokens = line.split("\\s+"); (string token : tokens) { token = removepunctuation(token); if (!wordsfrequencymap.containskey(token.tolowercase())) { wordsfrequencymap.put(token.tolowercase(), 1); } else { int count = wordsfrequencymap.get(token.tolowercase()); wordsfrequencymap.put(token.tolowercase(), count + 1); } } } return wordsfrequencymap; } private static string removepunctuation(string token) { token = token.replaceall("[^a-za-z]", ""); return token; }
main method testing shown below. getting percentages, count of words iterating through map , adding values , second pass getting percentages. way, if part of larger work, take @ apache commons math library calculating frequency distributions. if use frequency
class, can keep adding words , descriptive statistics @ end.
public static void main(string[] args) { try { int totalwords = 0; treemap<string, integer> freqmap = generatefrequencylist(); (string key : freqmap.keyset()) { totalwords += freqmap.get(key); } system.out.println("word\tcount\tpercentage"); (string key : freqmap.keyset()) { system.out.println(key+"\t"+freqmap.get(key)+"\t"+((double)freqmap.get(key)*100.0/(double)totalwords)); } } catch (exception e) { e.printstacktrace(); } }
Comments
Post a Comment