Underminer: Analisys for prideandprejudice.txt

Analysis for prideandprejudice.txt

1. Most frequent words in prideandprejudice.txt

WORD	OCCURANCE
elizabeth	635
could	527
will	424
darcy	418
bennet	333
bingley	306
jane	295
sister	294
lady	265
time	224
well	224
good	201
wickham	194
great	187
collin	180
day	178
young	177
dear	175
lydia	171
hope	169

2. Typical word occurances per chapter

The chart shows the occurances of words typical to the text (x) for each chapter (y).

Use the input field to add more words - this is not a free text search, though all words present in the text are shown in the autofill menu.

You can also remove words by doubleclicking over the legend of each word. With a single click, you will highlight the given word.

Legend

3. Part of speech tagging

Part of speech tagging is an interesting breed: mostly all longer texts split up into a quite constant array of nouns / verbs / etc. - no surprise here!

What's more interesting when you combine part of speech tagging with other forms of analysis. Would the occurences of only adjectives tell us more about the mood of a certain part of text, like a chapter? Certainly so! What about verbs? Do they present traces of action and happening?

Part of speech tagging becomes especially helpful when playing with n-grams and sentiment analysis, so for now just take our word: the application is ready to bring 100.300 English words for tagging, there can not be a lot more than that!

These features will be coming out soon on Underminer. Until then, part of speech tagging is displayed in a form of the good old boring piechart.

4. Typical sentence length

The chart shows the average number of words present in an average sentence (x) for each chapter (y).

Colors indicate the most common part of speech at the given word position.

Only those sentences are counted which have at least the average of the chapter's sentence length.

Part of speech

Noun

Pronoun

Adjective

Verb

Auxiliary-verb

Adverb

Preposition

Conjuction

Interjection

Unknown

5. Sentiment analysis

Every novel's - we're thinking linear plots - primal statement is probably about consequences (that cursed human condition).

Are things going generally the right direction or they just keep getting worse? Using TextBlob's sentiment analysis, this chart aims to make a guess at the direction of the plot: by associating a value (y) between +1 (extremely good) and -1 (extremly unfornutane) for each sentence of the text (x), we are able to see how the plot progresses.

Please note: only those sentences are counted, where a sentiment could be detected - sentences with a value of zero are omitted from the chart.

6. Named entites

The chart shows every named entity recognized by Stanford Univerity's Named Entity Tagger.

Only those entites are connected via links, which are not further away from each other than 500 words in the text.

The width of the connecting links shows how frequently the two entites are mentioned together.

Please note: this chart is in development phase, it is only available for the demo text - it will not be present in your custom text analysis.

Colors

Person

Location

Organization

Date

Time

Money

ACCESS FILES

Click for processed data!

UNDERMINER

Analysis for prideandprejudice.txt

1. Most frequent words in prideandprejudice.txt

3. Part of speech tagging