Stock Movement Prediction from Tweets and Historical Prices (Paper Summary)

24 May 2018

This paper suggests a way of using both historical prices and text data together for financial time series prediction. They call it Stocknet. There seem to be 2 major contributions here: (a) Encoding both market data and text data together, (b) VAE (Variational AutoEncoder) inspired generative model.

TLDR

RNN-based variational autoencoder along with attention is used to predict whether the stock price will go up or down.

Dataset

88 stocks
From 2014-01-01 to 2016-01-01. Training data range: 2014-01-01 to 2015-08-01 (20,339 samples). Validation: 2015-08-01 to 2015-10-01 (2555 datapoints). Testing: 2015-10-01 to 2016-01-01 (3720 datapoints).
price_change <= -0.5% is assigned 0 label. price_change > 0.55% is assigned 1 label. The ones in between these 2 thresholds are ignored.

Model

There are 3 main components here:

Market Information Encoder (MIE) - Encodes tweets and prices to X.
Variational Movement Decoder (VMD) - Infers Z with X, y and decodes stock movements y from X, Z.
Attentive Temporal Auxiliary (ATA) - Integrates temporal loss through an attention mechanism for model training.

Market Information Encoder (MIE)

This component is relatively straightforward. Tweets for the given day are combined into the vector \(c_t\). Historical prices are normalized and stored in the vector \(p_t\). The output of this component (MIE) is the vector \(x_t = [c_t, p_t]\).

Variational Movement Decoder (VMD)

VMD uses the market information \(X\) received from the previous component and infers a latent factor \(Z\). This latent vector \(Z\) is then decoded into vector \(y_t\) using an RNN decoder with GRU cells.

Attentive Temporal Auxiliary (ATA)

Attention is applied to the outputs from the previous component. Both VAE and Attention components are combined to construct the final loss function \(F\).

\[F(\theta, \phi, X, y) = \frac{1}{N}\sum_n^Nv^{(n)}f^{(n)}\]

Here, \(v^{(n)}\) is the attention weight vector and \(f^{(n)}\) is the loss function from the variational autoencoder component.

\[\begin{aligned} f = \log p_{\theta} - \lambda D_{KL}[q_{\phi} \vert\vert p_{\theta}] \end{aligned}\]

\(\log p_{\theta}\) is the log-likelihood term, \(D_{KL}[q_{\phi} \vert\vert p_{\theta}]\) is the KL divergence loss and \(\lambda\) is the KL loss weight. \(\lambda\) is increased over time during training. It’s known as KL annealing trick Bowman et al., 2016.

Training and Hyperparameters

5-day lag window is used to construct the dataset.
Batch size is 32. Each batch contains randomly picked data points.
Initial learning rate of Adam - 0.001
Dropout rate - 0.3 for the hidden layer

Metrics and Results

MCC (Matthews Correlation Coefficient) is used as a metric. MCC is defined below in terms of tp (true positives), tn (true negatives), fp (false positives) and fn (false negatives).

\[MCC = \frac{tp \times tn - fp \times fn}{\sqrt{(tp + fp)(tp + fn)(tn + fp)(tn + fn)}}\]

Stocknet Results

Baselines:

RAND: Random Up or Down guess
ARIMA
RANDFOREST: Random Forest classifier using Word2vec text representation.
TSLDA: Generative topic model jointly learning topics and sentiment from Nguyen and Shirai, 2015.
HAN: Discriminative deep neural network with hierarchical attention from Hu et al., 2018.

TECHNICALANALYST, FUNDAMENTALANALYST, INDEPENDENTANALYST, DISCRIMINATIVEANALYST and HEDGEFUNDANALYST are simply different variants of their StockNet model with HEDGEFUNDANALYST being the model described above.

Hardik Patel

Stock Movement Prediction from Tweets and Historical Prices (Paper Summary)

TLDR

Dataset

Model

Market Information Encoder (MIE)

Variational Movement Decoder (VMD)

Attentive Temporal Auxiliary (ATA)

Training and Hyperparameters

Metrics and Results

Related Posts

MovieLens-25m Experiments (Part 1): Matrix Factorization 28 Oct 2022

Backtesting for Intraday Execution 28 Sep 2018

My Rules of Thumb for Unit/Integration Tests 25 Sep 2018