Wanted: A Weighted Words List

Does anyone know how to create a weight words list? I have the code below, which I found by following the trail from Bookish, but the instructions on the website are extremely minimal for a geek wannabe like me. I’ve tinkered with it (put it in an index table as instructed) but alas, no success. I’m curious to see how mine would look.

<?php

// Build the list of words and convert everything to lowercase.
$string = strtolower(‘<MTEntries lastn=”10000″><MTEntryCategory remove_html=”1″ encode_php=”q”> <MTEntryTitle remove_html=”1″ encode_php=”q”> <MTEntryBody remove_html=”1″ encode_php=”q”> <MTEntryMore remove_html=”1″ encode_php=”q”> </MTEntries><MTOtherBlog blog_id=”7″><MTEntries lastn=”10000″><MTEntryCategory remove_html=”1″ encode_php=”q”> <MTEntryTitle remove_html=”1″ encode_php=”q”> <MTEntryBody remove_html=”1″ encode_php=”q”> </MTEntries></MTOtherBlog>’);

// Remove punctuation.
$wordlist = preg_split(‘/\s*[\s+\.|\?|,|(|)|\-+|\’|\”|=|;|×|\$|\/|:|{|}]\s*/i’, $string);

// Build an array of the unique words and number of times they occur.
$a = array_count_values( $wordlist );

//Remove words that don’t matter–“stop words.”
$overusedwords = array( ”, ‘a’, ‘an’, ‘the’, ‘and’, ‘of’, ‘i’, ‘to’, ‘is’, ‘in’, ‘with’, ‘for’, ‘as’, ‘that’, ‘on’, ‘at’, ‘this’, ‘my’, ‘was’, ‘our’, ‘it’, ‘you’, ‘we’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘0’, ’10’, ‘about’, ‘after’, ‘all’, ‘almost’, ‘along’, ‘also’, ‘amp’, ‘another’, ‘any’, ‘are’, ‘area’, ‘around’, ‘available’, ‘back’, ‘be’, ‘because’, ‘been’, ‘being’, ‘best’, ‘better’, ‘big’, ‘bit’, ‘both’, ‘but’, ‘by’, ‘c’, ‘came’, ‘can’, ‘capable’, ‘control’, ‘could’, ‘course’, ‘d’, ‘dan’, ‘day’, ‘decided’, ‘did’, ‘didn’, ‘different’, ‘div’, ‘do’, ‘doesn’, ‘don’, ‘down’, ‘drive’, ‘e’, ‘each’, ‘easily’, ‘easy’, ‘edition’, ‘end’, ‘enough’, ‘even’, ‘every’, ‘example’, ‘few’, ‘find’, ‘first’, ‘found’, ‘from’, ‘get’, ‘go’, ‘going’, ‘good’, ‘got’, ‘gt’, ‘had’, ‘hard’, ‘has’, ‘have’, ‘he’, ‘her’, ‘here’, ‘how’, ‘if’, ‘into’, ‘isn’, ‘just’, ‘know’, ‘last’, ‘left’, ‘li’, ‘like’, ‘little’, ‘ll’, ‘long’, ‘look’, ‘lot’, ‘lt’, ‘m’, ‘made’, ‘make’, ‘many’, ‘mb’, ‘me’, ‘menu’, ‘might’, ‘mm’, ‘more’, ‘most’, ‘much’, ‘name’, ‘nbsp’, ‘need’, ‘new’, ‘no’, ‘not’, ‘now’, ‘number’, ‘off’, ‘old’, ‘one’, ‘only’, ‘or’, ‘original’, ‘other’, ‘out’, ‘over’, ‘part’, ‘place’, ‘point’, ‘pretty’, ‘probably’, ‘problem’, ‘put’, ‘quite’, ‘quot’, ‘r’, ‘re’, ‘really’, ‘results’, ‘right’, ‘s’, ‘same’, ‘saw’, ‘see’, ‘set’, ‘several’, ‘she’, ‘sherree’, ‘should’, ‘since’, ‘size’, ‘small’, ‘so’, ‘some’, ‘something’, ‘special’, ‘still’, ‘stuff’, ‘such’, ‘sure’, ‘system’, ‘t’, ‘take’, ‘than’, ‘their’, ‘them’, ‘then’, ‘there’, ‘these’, ‘they’, ‘thing’, ‘things’, ‘think’, ‘those’, ‘though’, ‘through’, ‘time’, ‘today’, ‘together’, ‘too’, ‘took’, ‘two’, ‘up’, ‘us’, ‘use’, ‘used’, ‘using’, ‘ve’, ‘very’, ‘want’, ‘way’, ‘well’, ‘went’, ‘were’, ‘what’, ‘when’, ‘where’, ‘which’, ‘while’, ‘white’, ‘who’, ‘will’, ‘would’, ‘your’);

// Remove the stop words from the list.
foreach ($overusedwords as $word) {
unset( $a[$word] ); }

// Sort the keys alphabetically.
ksort( $a );

// Print the data.
echo ‘<p class=”noindent”>’;

// Assign a font-size to the word based on frequency of use.
foreach ($a as $word => $count) {
if ($count <= 35) { $size = 75;
} elseif ($count <= 50) { $size = 100;
} elseif ($count <= 65) { $size = 125;
} elseif ($count <= 80) { $size = 150;
} elseif ($count <= 95) { $size = 175;
} elseif ($count <= 110) { $size = 200;
} elseif ($count <= 125) { $size = 225;
} elseif ($count <= 140) { $size = 250;
} elseif ($count <= 155) { $size = 275;
} elseif ($count <= 170) { $size = 300;
} elseif ($count <= 200) { $size = 340; }

// The keyword needs to be referenced 30 or more times to register.
if ($count >= 30) {
echo ‘ <span style=”font-size: ‘ . $size . ‘%;”><acronym title=”This keyword occurs ‘ . $count . ‘ times.”>’ . $word . ‘</acronym></span> ‘; }
}

echo ‘</p>’;

?>

Explore posts in the same categories: Humanities, Technology

2 Comments on “Wanted: A Weighted Words List”

  1. Dan Wolfgang Says:

    What sort of problem are you having with this? I’m happy to help, if I can.

  2. AC Says:

    You would need to customize string assignment in the original code for your movetable type blog id.

    , etc and pasting the update code in an index template should work.

    To get more detailed feedback, do things step by step, keep the page’s output somewhere and note down what did not work.

    Amit