'Beat the Street' - Earnings Forecasting

By: Ena Dzemila

An earnings announcement is a public statement issued by a corporation to demonstrate its profitability during a specific time period. Typically, an earnings report describes profit for a quarter or a year. This report influences the behavior of stockholders and is thus released only when the stock market is closed. A common phrase you may run into when reading financial news is “beat The Street”, which in essence refers to “beating” the Wall Street earnings forecasts.

The ability to predict the overall sentiment of these forecasts beforehand would give anyone investing in the stock market a great advantage. An example of the impact one report can have is best depicted in the Netflix case from April 2022. In their report, Netflix announced a 200,000-subscriber loss in the first quarter, its first drop in paying customers in more than a decade, and warned of worse difficulty ahead. After the announcement, the company’s shares fell more than 25%, and the impact extended to other streaming stock – including Roku, Spotify, and Disney – which also saw a fall in the after-hours market following Netflix’s dismal news. [1]

Regardless of current or ongoing interest, estimating revenues and returns presents considerable hurdles to the forecaster. Prior research has explored the difficulties associated with projecting financial data – including financial data’s unpredictability, a poor signal-to-noise ratio in accessible variables, and model uncertainty.

About the data

With the significance the prediction of the earnings holds, we have decided to dabble into the problem and come up with our own approach to attempt to forecast the overall trend for a given company, i.e. whether the earnings will go up or down. We have first conducted research on state of the arts solutions to the problem, and have found several most popular approaches. Some included the use of Bloomberg’s estimated Earnings per share (EPS) to predict Earnings Surprise – working with the assumption that EPS surprise can be perfectly predicted before the market close as of the earning announcement date. [3]

Another popular approach was to predict whether earnings will go up or down based on news sentiment for the company in an unspecified time period. [4][5]

Most approaches had a far wider range of data that is not very easily attainable. It includes but is not limited to liabilities, balance sheet, income statement, and cash flow statement, hiring rates, annual profit margins, and other similar company specific financial information. [6][7]

For our research, we collected data from 2017 to 2022, and the information we had access to includes the following:

Date and time of the earnings announcement for tickers where the currency is USD
Their prior eps and revenue
The open-open returns for it on the days leading up to the announcement
The industry sector the ticker belongs to
The sentiment of the news that the ticker appears in, which we derived from the news headlines

Since the reports arrive outside of the working hours of the stock market, it is important to estimate when to trade considering the announcement time of the report. The trade should be done before the report, to beat the forecast before it is available. If the time of earnings is before the market opening time i.e. before 09:30 am, the day we need to trade on is the previous trading day to the date of the announcement. Otherwise, the day of trade is the same as the day of the announcement since it means that the report will arrive after the closing time of the market on that same day.

To work with returns, we decided to look at the last three days relevant for the report i.e. t, t-1, and t-2 where t is the date of announcement. To train our model we need an output return and an input return. In order to make sure there is no future propagation, we use return t-1 if the day of trade is less than the date of announcement, and return t if the two dates are the same. For the input return we use return t-1 if the dates are the same, and return t-2 if the day of trade is less than the date of announcement.

The output and input returns were normalized such that they were mapped to 1, 0, or -1 to represent their sign value.

For the news sentiment, we decided to go with the average sentiment of all news a ticker appears in, within the two days prior to the day of trade. This average sentiment was normalized to -1, 0, and 1 to represent the sign value of the sentiment, where 0 i.e. neutral sentiment was allocated if the value of the average sentiment was between -0.01 and 0.01

Lastly, to make use of the industry sector, since it might be important, we assigned a boolean value for all industries for a ticker, depending on whether it is an industry it belongs to or not.

Data Analysis & Feature Selection

In order to assess the correlation between different attributes, we have created a heatmap. As seen in Fig 1, the heatmap has indicated a correlation between eps and revenue estimates, which was to be anticipated. Weighted sentiment provided a very slight correlation with the eps estimate, the industry sector attributes did not seem to show any significant correlation.

Since the heatmap was not too conclusive, it was decided to conduct the training and testing of the predictive model on data containing all of these attributes, but also separately on that same data excluding the industry sectors and to compare the results.

Fig 1. Attribute Heatmap Depicting Attribute Correlation Used for Feature Selection

Methodology & Algorithm

The model selected for the classification was the Random Forest Classifier, since instead of looking for the most important feature, it looks for the best feature from a random group of features. As a result, it lowers overfitting in decision trees and reduces variation, thus improving the overall accuracy.

Train and Test data were cleaned in a way where any records with missing data in the fields used as input were dropped. Training was conducted on data from 2017-01-01 to 2021-09-17, and testing from that date up until November 2022. As mentioned above, we tested two versions of input – the first one with input and output return sign, revenue and eps estimate, average sentiment, and all of the industry sectors, and the second without the industry sectors.

Results

Unsurprisingly given the correlation results in the heatmaps, there was little to no difference between the results with the v1 and v2 inputs. However, the second version excluding industry sector Booleans, yielded better results by approximately 0.1 to 0.2%

A comparison was additionally made between:

Unnormalized input
Input normalized using MinMax transformation which scales the values to the default range of 0 to 1, and
Input normalized with the StandardScalar transformation which standardizes a feature by deducting the mean value from and then scaling to unit variance

The difference was not large between the three, however MinMax normalization improved the accuracy score slightly (by cca 0.3%), which sets the accuracy at 50.5%. See Fig 2. When working with data such as this, even a slight change in accuracy can make a great difference. Thus, having a model at an accuracy of a bit over 50% may be considered a good one.

Fig 2. Confusion Matrix and Classification Report for the Predictions of the Random Forest Classifier for All Three Versions of Input

With the volatility of the stock market, the days leading up to an earnings report are critical for investors and analysts to pay attention to, and even a slight insight can mean a great gain or a great loss for the investors.

We may conclude that our earnings prediction has shown promising results and has the potential to improve the accuracy and efficiency of financial forecasting. With increasing amounts of data available every day, predictive modeling in finance is likely to become more prevalent. However, it is important to note that the unpredictability of a market such as this entails unavoidable risks, and thus should be combined with conventional techniques and professional research to produce the most thorough and precise forecasts.

References

[1] S. Whitten, “Netflix shares crater 25% after company reports it lost subscribers for the first time in more than 10 years,” CNBC, 20-Apr-2022. [Online]. Available: https://www.cnbc.com/2022/04/19/netflix-nflx-earnings-q1-2022.html. [Accessed: 12-Dec-2022].

[2] J. Green and W. Zhao, “Forecasting earnings and returns: A review of recent advancements,” The Journal of Finance and Data Science, vol. 8, pp. 120–137, 2022.

[3] Q. Liu, L. Ouyang, and G. Xu, “PREDICTION OF EARNING SURPRISE USING DEEP LEARNING TECHNIQUE,” Jun. 2022. [Online]. Available: https://assets.bbhub.io/professional/sites/10/earning_surprise_prediction_china.pdf. [Accessed: 12-Dec-2022].

[4] M. Sorto, C. Aasheim, and H. Wimmer, “Feeling The Stock Market: A Study in the Prediction of Financial Markets Based on News Sentiment,” in SAIS 2017 Proceedings, 2017, pp. 30. [Online]. Available: http://aisel.aisnet.org/sais2017/30. [Accessed: 12-Dec-2022].

[5] S. Whitten, “Netflix shares crater 25% after company reports it lost subscribers for the first time in more than 10 years,” CNBC, 20-Apr-2022. [Online]. Available: https://www.cnbc.com/2022/04/19/netflix-nflx-earnings-q1-2022.html. [Accessed: 12-Dec-2022].

[6] Q. He, Z. Cai, and W. Liu, “Prediction of listed companies’ revenue based on model-fused,” 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI), 2019.

[7] J. Green and W. Zhao, “Forecasting earnings and returns: A review of recent advancements,” The Journal of Finance and Data Science, vol. 8, pp. 120–137, 2022.

About the data

Data Analysis & Feature Selection

Methodology & Algorithm

Results

References

Entropy387

Social media