Text analysis

Word Clouds

To examine the textual differences of the two publication, we chose to look at word clouds and sentiment. The word clouds are generated by removing stop words and punctuation and calculating the tf-idf for each word. For the sentiment analysis, we used a BERT transformer model from the Huggingface library. See the explainer notebook for more information.

We are interested in examining how the two publications portray the same subject. Here, we will use the sections to look for general tendencies in their reporting. First, we need to determine which sections are concerned with the same subjects which we will do by looking at the word clouds generated by using tf-idf on some of the sections.

ReutersNYT

Clearly, both Reuters' politics section and the NYT’s us section has an emphasis on the 2016 United States presidential election which is a great topic for comparison as US politics is a highly polarizing topic.

Let’s plot these word clouds in a higher resolution

Comparing the word clouds for the Reuters Politics section and the NYT us section, we see that they are almost identical and it is very difficult to find any differences at all. As the word clouds show no apparent differences, we will instead move on to the sentiment analysis.

Sentiment Analysis

We used sentiment analysis on all the articles for each publication to look for some general trends in each publication. The table below shows how many articles was classified with a given sentiment for each publication

ReutersNYT
Negative47,651 (83%)17,376 (73%)
Neutral5,812 (10%)3,384 (14%)
Positive3,666 (7%)3,036 (13%)

Clearly, both publications share the same trend of the vast majority of articles having a negative sentiment. However, Reuters take the lead in being the most negative with 83% of their articles being negative and only 7% being positive. Percentwise, the NYT has almost twice as many positive articles than Reuters but it is still pretty low a 13%. While we did not expect these results, perhaps we should have as it is a well known phenomenon that the average person is drawn to negative news and ultimately publications print what sells.

We thought it interesting to look at if the negative sentiment is caused by a few high-throughput authors or if most authors write with this sentiment. We therefore found all the articles that each author wrote in the dataset and determined the most frequent sentiment of each author. In this was, we can classify each author as majorily writing in a negative, neutral, or positive sentiment. The bar plots below show these findings

From the plots it is evident that the vast majority of authors write in a negative sentiment but that it is even more common at the NYT than at Reuters.

Let us now return to the comparison of the publications' portrayal of US politics. We will compare the sentiments for the Reuters' politics section and the NYT’s us section.

Reuters politics sectionNYT us section
Negative2,472 (81%)5,938 (76%)
Neutral353 (12%)1,118 (14%)
Positive212 (7%)771 (10%)

These statistics are virtually identical to those of all the articles of each publication. One possible explanation is that while each publication has its own style / tone that affects the sentiment, they keep that style throughout sections and do not let opinions seethe through their writing since if this was the case, we would expect the percent of neutral sentiments to be even lower than that of the entire publication.