SPXW Dataset Demo

This notebook demonstrates the simplified spxw loader from volkit.datasets.

With the new API there are only two modes:

  • Compact (default): most relevant columns.

  • Full (full=True): everything, including dates, bid/ask sizes, and open interest.

We also show filters on:

  • Days-to-expiry (D)

  • Minimum volume (min_volume)

  • Custom CSV path (data_path)

from volkit.datasets import spxw
import pandas as pd
import matplotlib.pyplot as plt

pd.set_option("display.max_columns", 50)

1. Load defaults (compact schema)

Compact mode includes:

  • K, D, T

  • F_bid, F_ask

  • C_bid, C_ask, P_bid, P_ask

  • C_vol, P_vol

df = spxw()
print("Rows:", len(df))
print("Columns:", list(df.columns))
df.head(8)
Rows: 5192
Columns: ['K', 'D', 'T', 'F_bid', 'F_ask', 'C_bid', 'C_ask', 'P_bid', 'P_ask', 'C_vol', 'P_vol']
K D T F_bid F_ask C_bid C_ask P_bid P_ask C_vol P_vol
0 1700 0 0.0 2917.8 2918.42 1209.1 1229.2 0.0 0.05 0 0
1 1800 0 0.0 2917.8 2918.42 1109.1 1129.2 0.0 0.05 0 0
2 1850 0 0.0 2917.8 2918.42 1059.1 1079.2 0.0 0.05 0 0
3 1900 0 0.0 2917.8 2918.42 1009.1 1029.2 0.0 0.05 0 0
4 1950 0 0.0 2917.8 2918.42 959.1 979.2 0.0 0.05 0 0
5 2000 0 0.0 2917.8 2918.42 915.5 920.8 0.0 0.05 0 0
6 2050 0 0.0 2917.8 2918.42 865.6 870.7 0.0 0.05 0 0
7 2100 0 0.0 2917.8 2918.42 815.5 820.8 0.0 0.05 0 0

2. Full schema (full=True)

Full mode additionally includes:

  • quote_date, expiration_date

  • bid/ask sizes for C/P

  • open interest for C/P

df_full = spxw(full=True)
print("Rows:", len(df_full))
print("Columns:", list(df_full.columns))
df_full.head(8)
Rows: 5192
Columns: ['K', 'D', 'T', 'F_bid', 'F_ask', 'C_bid', 'C_ask', 'P_bid', 'P_ask', 'C_vol', 'P_vol', 'quote_date', 'expiration_date', 'C_bid_size', 'C_ask_size', 'P_bid_size', 'P_ask_size', 'C_oi', 'P_oi']
K D T F_bid F_ask C_bid C_ask P_bid P_ask C_vol P_vol quote_date expiration_date C_bid_size C_ask_size P_bid_size P_ask_size C_oi P_oi
0 1700 0 0.0 2917.8 2918.42 1209.1 1229.2 0.0 0.05 0 0 2019-06-26 2019-06-26 78 78 0 1699 0 24
1 1800 0 0.0 2917.8 2918.42 1109.1 1129.2 0.0 0.05 0 0 2019-06-26 2019-06-26 78 78 0 5956 0 4333
2 1850 0 0.0 2917.8 2918.42 1059.1 1079.2 0.0 0.05 0 0 2019-06-26 2019-06-26 78 78 0 7956 0 11328
3 1900 0 0.0 2917.8 2918.42 1009.1 1029.2 0.0 0.05 0 0 2019-06-26 2019-06-26 79 79 0 5067 20 3240
4 1950 0 0.0 2917.8 2918.42 959.1 979.2 0.0 0.05 0 0 2019-06-26 2019-06-26 78 78 0 2084 0 382
5 2000 0 0.0 2917.8 2918.42 915.5 920.8 0.0 0.05 0 0 2019-06-26 2019-06-26 10 10 0 2451 7 705
6 2050 0 0.0 2917.8 2918.42 865.6 870.7 0.0 0.05 0 0 2019-06-26 2019-06-26 10 10 0 1782 0 63
7 2100 0 0.0 2917.8 2918.42 815.5 820.8 0.0 0.05 0 0 2019-06-26 2019-06-26 10 10 0 1767 0 55

3. Filter by days-to-expiry D

First list available D values, then load a single slice.

df_all = spxw()
unique_D = sorted(df_all["D"].unique().tolist())
print("Available D values:", unique_D[:10], "..." if len(unique_D) > 10 else "")
some_D = unique_D[0]
df_D = spxw(D=some_D)
print(f"Rows with D={some_D}:", len(df_D))
df_D.head(8)
Available D values: [0, 2, 5, 7, 9, 12, 14, 16, 19, 21] ...
Rows with D=0: 161
K D T F_bid F_ask C_bid C_ask P_bid P_ask C_vol P_vol
0 1700 0 0.0 2917.8 2918.42 1209.1 1229.2 0.0 0.05 0 0
1 1800 0 0.0 2917.8 2918.42 1109.1 1129.2 0.0 0.05 0 0
2 1850 0 0.0 2917.8 2918.42 1059.1 1079.2 0.0 0.05 0 0
3 1900 0 0.0 2917.8 2918.42 1009.1 1029.2 0.0 0.05 0 0
4 1950 0 0.0 2917.8 2918.42 959.1 979.2 0.0 0.05 0 0
5 2000 0 0.0 2917.8 2918.42 915.5 920.8 0.0 0.05 0 0
6 2050 0 0.0 2917.8 2918.42 865.6 870.7 0.0 0.05 0 0
7 2100 0 0.0 2917.8 2918.42 815.5 820.8 0.0 0.05 0 0
# Tiny visualization: strike counts by D
counts = df_all.groupby('D')['K'].count().sort_index()
counts.plot(kind='bar', figsize=(8,3))
plt.title("Strike count by D")
plt.xlabel("D (calendar days to expiry)")
plt.ylabel("Count")
plt.tight_layout()
plt.show()
../_images/2e9c95e12c7a3a72cdb372955dd8f354b84e91f27099c0c97cbef23e9524cdf5.png

4. Volume filter (min_volume)

Keep only strikes where both call and put volumes exceed the threshold.

df_v0 = spxw(min_volume=0)
df_v100 = spxw(min_volume=100)
print("Rows (no volume filter):", len(df_v0))
print("Rows (min_volume=100):  ", len(df_v100))
df_v100.head(6)
Rows (no volume filter): 5192
Rows (min_volume=100):   66
K D T F_bid F_ask C_bid C_ask P_bid P_ask C_vol P_vol
0 2875 0 0.0 2917.8 2918.42 40.8 45.5 0.0 0.05 2943 4374
1 2880 0 0.0 2917.8 2918.42 36.0 40.4 0.0 0.05 127 2145
2 2885 0 0.0 2917.8 2918.42 30.8 35.5 0.0 0.05 300 4283
3 2890 0 0.0 2917.8 2918.42 25.8 30.5 0.0 0.05 121 4415
4 2895 0 0.0 2917.8 2918.42 20.8 25.5 0.0 0.05 230 5566
5 2900 0 0.0 2917.8 2918.42 15.8 20.5 0.0 0.05 180 12184