StockPot

Stockpot is an automatic stock trader. More than just money, it is a playground for data science and software development.

Introduction & Overview

Because I am collaborating with others on this project, the code remains private.

It's every brogrammer's dream: gaming the stock market. I decided to take on this project because it's a great source of fun and a great opportunity to learn about data science. More than anything, it's a playground to experiment with Python and libraries like NumPy, scikit-learn, pandas, and more.

Raspberry Pi Zero W

StockBot is an automatic stock trader. The project is built off of a Raspberry Pi Zero W that runs Linux. Every morning a 'cronjob' (time-based action) runs. The program pulls the most recent data and stock news. Based on this, Stockpot's algorithm picks and executes trades for that day. It sends the user a text message with the portfolio summary, recent transactions, and profits/losses.

Simulating the Market

How do we know the bot is making the correct moves? How can we be sure that the bot works well across different time periods? The first step to answer these questions is a simulation. Stockpot begins with exactly that - a suite of classes that work together to simulate the market.

New York Stock Exchange

DataScraper.py - A script use to scrape data from websites. This gives stockpot all the historical data it needs. It is also used in the "live" version, which runs on a daily basis.

Company.py - A class used to represent a company. Attributes include the ticker, price data, and more. Getter and setters are used to access data easily.

MarketMaker.py - Handled the operations that a market maker would make in real life. It makes sure that transactions are valid, calculates price jumps, and more. It uses company objects to abstract away all of the operations.

Portfolio.py - This class is a mock portfolio. It pings the Market Maker when trades are made.

Simulator.py - This is the main class. The simulator takes a location for the historical data, as well as a "Decision Maker." The Decision Maker is the user's algorithm which decides which stocks to trade that day.

The code makes extensive use of Pandas and Numpy to speed up calculations. The simulator runs through each and every day as it would happen in real life. There is some work that needs to be done here, especially when it comes to fees and rounding errors. It's not clear how the amounts may chance (if only by a few cents) in the real world. This significantly affects the outcome of the simulation.

Data Sandbox

Once the simulation was built, I needed a way to experiment and "filter" for the right trades. I make extensive use of Jupyter Notebooks as a playground for experimentation. Jupyter Notebooks is a powerful tool for making quick observations from data. We can use Pandas to filter out data and plot just about anything. Here, for example, is the distribution of price jumps overnight. This distribution includes AAPL, NVDA, AMZN, TSLA, FB, and MSFT.

'0c_1o' means a price change for '0 days at close, +1 days at open.' This corresponds to the price jump after hours for every day. In this case, I have plotted every day since 2015. Also of note is the geometric mean.

Geometric Mean - "mean or average which indicates a central tendency of a finite set of real numbers by using the product of their values."

The geometric mean is important, not the arithmetic mean. Buying and selling stocks corresponds multiplying an initial portfolio by a product of ratios. The ratio is the "gain" or "loss" from performing a certain trade. Here we see that the geometric mean of all "after hours" gains are positive. The events also follow a normal distribution, centered tightly around the mean (low standard deviation).

One strategy I tried for filtering is a k-means classifier. This groups similar stocks together.

K-Means Classifier - "an unsupervised classification algorithm, also called clusterization, that groups objects into k groups based on their characteristics."

If we take historical price data, we can form a vector where one entry corresponds to a price jump. We can use a 100 dimensional vector, for example, to represent the price jump of a stock for the past 100 days. A k-means classifier groups vectors together by euclidian distance - in the case of the stock history, this corresponds to the euclidian distance in a 100 dimensional space.

K-means classifier

The theory is that grouping stocks that have similar price jumps may help gaining a sense of "categories" of stocks that are currently performing well. Below, I use a K-means classifier to group stocks into 11 sectors. The first three sectors are:

PCH.csv 
WY.csv

DXC.csv 
FOSL.csv 
CHRS.csv 
CLF.csv 
BBBY.csv 
SSP.csv 
ASND.csv 
ENDP.csv 
RHI.csv 
TA.csv

SLM.csv, ARC.csv, NC.csv, AN.csv 
C.csv, CFG.csv, BAC.csv, LNC.csv 
NAVI.csv, F.csv, MET.csv, SNV.csv 
BBI.csv, BLK.csv, GME.csv, CMA.csv 
APTV.csv, GM.csv, GL.csv, UPS.csv 
RYI.csv, FHN.csv, NDAQ.csv, PNC.csv 
KMX.csv, HIG.csv, JEF.csv, SCHW.csv 
DFS.csv, FRC.csv, AMP.csv, BEN.csv 
BKR.csv, GNW.csv, RF.csv, MBI.csv 
USB.csv, WFC.csv, GS.csv, GE.csv 
MTG.csv, ALL.csv, BHF.csv, FITB.csv 
NTRS.csv, JPM.csv, COF.csv, MTB.csv 
IVZ.csv, SIVB.csv, STT.csv, MS.csv 
BK.csv, SYF.csv, TFC.csv, VRTS.csv 
PFG.csv, ZION.csv, RJF.csv, HBAN.csv 
PRU.csv, PBCT.csv, AIG.csv, KEY.csv 
UNM.csv, TRV.csv, L.csv BWA.csv

As it turns out, the first two stocks are directly linked; both belong to the lumber industry. The second category, it seems, are volatile stocks that have experienced a significant downturn. The third category seems to be stocks in the finance sector. Sometimes the lines separating these categories can be a bit blurry.

Live Deploy

To trade live, the bot is linked to Robinhood through the RobinStocks API. Linux is installed on the PI along with a copy of the GITHUB library. A bash script runs the software at a specific time every morning. Since the PI is WIFI connected, the bot can run every weekday. The bot is currently undergoing a re-design and will be deployed soon.