Underminer: Analisys for donquixote.txt

Analysis for donquixote.txt

1. Most frequent words in donquixote.txt

WORD	OCCURANCE
don	3021
quixote	2327
sancho	2206
will	1681
knight	897
good	890
great	822
time	773
thee	764
well	714
could	681
master	635
day	554
worship	549
thy	537
hand	535
god	532
lady	524
reply	516
senor	511

2. Typical word occurances per chapter

The chart shows the occurances of words typical to the text (x) for each chapter (y).

Use the input field to add more words - this is not a free text search, though all words present in the text are shown in the autofill menu.

You can also remove words by doubleclicking over the legend of each word. With a single click, you will highlight the given word.

Legend

3. Part of speech tagging

Part of speech tagging is an interesting breed: mostly all longer texts split up into a quite constant array of nouns / verbs / etc. - no surprise here!

What's more interesting when you combine part of speech tagging with other forms of analysis. Would the occurences of only adjectives tell us more about the mood of a certain part of text, like a chapter? Certainly so! What about verbs? Do they present traces of action and happening?

Part of speech tagging becomes especially helpful when playing with n-grams and sentiment analysis, so for now just take our word: the application is ready to bring 100.300 English words for tagging, there can not be a lot more than that!

These features will be coming out soon on Underminer. Until then, part of speech tagging is displayed in a form of the good old boring piechart.

4. Typical sentence length

The chart shows the average number of words present in an average sentence (x) for each chapter (y).

Colors indicate the most common part of speech at the given word position.

Only those sentences are counted which have at least the average of the chapter's sentence length.

Part of speech

Noun

Pronoun

Adjective

Verb

Auxiliary-verb

Adverb

Preposition

Conjuction

Interjection

Unknown

5. Sentiment analysis

Every novel's - we're thinking linear plots - primal statement is probably about consequences (that cursed human condition).

Are things going generally the right direction or they just keep getting worse? Using TextBlob's sentiment analysis, this chart aims to make a guess at the direction of the plot: by associating a value (y) between +1 (extremely good) and -1 (extremly unfornutane) for each sentence of the text (x), we are able to see how the plot progresses.

Please note: only those sentences are counted, where a sentiment could be detected - sentences with a value of zero are omitted from the chart.

6. Named entites

The chart shows every named entity recognized by Stanford Univerity's Named Entity Tagger.

Only those entites are connected via links, which are not further away from each other than 500 words in the text.

The width of the connecting links shows how frequently the two entites are mentioned together.

Please note: this chart is in development phase, it is only available for the demo text - it will not be present in your custom text analysis.

Colors

Person

Location

Organization

Date

Time

Money

ACCESS FILES

Click for processed data!

UNDERMINER

Analysis for donquixote.txt

1. Most frequent words in donquixote.txt

3. Part of speech tagging