Structural Limits of OHLCV-Based Intraday Signals in MNQ Futures: A Systematic Falsification Study

Authors: Mathias Mesfin (2026) Source: arXiv:2605.04004 (preprint, công bố 05/05/2026) Tag: moi:2026-05-16 #mnq-futures #ohlcv #falsification #walk-forward #preprint

Ý tưởng cốt lõi

Đây là một paper "phản tín hiệu" cực kỳ giá trị: tác giả lấy 947 ngày giao dịch dữ liệu 5 phút trên hợp đồng Micro E-mini Nasdaq 100 futures (MNQ) giai đoạn 2021-2025 rồi cho chạy thử 14 họ tín hiệu intraday kinh điển dựa trên OHLCV — gồm opening range breakout, gap trading, volume-based, cross-session momentum, liquidity grab, volatility-conditioned classifier, news-driven. Mọi tín hiệu đều phải vượt qua bốn rào chắn institutional: walk-forward out-of-sample, T-statistic ≥ 2.0, số trade ≥ 30, và return ròng dương sau khi trừ chi phí round-trip 2 điểm (~$1/contract MNQ). Kết quả: không một họ tín hiệu nào vượt qua đồng thời cả bốn rào chắn. Gross edge khả dĩ nằm trong khoảng 0.07-1.50 điểm/trade, không đủ để bù chi phí giao dịch. Một biến thể gap-continuation đạt T = 3.23 với +14.52 điểm/trade nhưng chỉ có N = 22 mẫu — không đủ thống kê.

Đóng góp chính không phải là tìm ra alpha mới mà là một framework falsification có thể tái lập và một null result được tài liệu hóa cẩn thận. Tác giả thậm chí đưa hai tín hiệu đã validated từ một research program khác làm positive control để chứng minh framework có khả năng phát hiện edge thực khi có. Đây là loại nghiên cứu nuôi dưỡng tâm thế khoa học hiếm có: thay vì "tôi tìm thấy alpha", paper nói "đây là tất cả những thứ tôi đã thử và không cái nào sống được sau khi đóng phí".

Bài học chiến lược cho retail/proprietary trader chạy intraday: nếu chỉ dùng OHLCV bars trên một single instrument trong một phiên Mỹ thông thường, bạn đang chiến đấu với một frontier đã được explore đến tận đáy. Edge thực còn lại thuộc về (1) higher-frequency data (order book, tick), (2) cross-asset features (correlation breakdown, basis spread), (3) regime conditioning có nguồn dữ liệu ngoài OHLCV, hoặc (4) execution alpha (giảm chi phí thay vì tìm signal mới).

Ứng dụng giao dịch chính

Sử dụng paper này như một kiểm tra sanity cho mọi chiến lược intraday bạn đang phát triển. Framework đề xuất:

Định nghĩa rule signal dưới dạng deterministic function của OHLCV bars trước đó (no leakage from same bar).
Entry: tại open của bar t+1 sau khi tín hiệu ra ở cuối bar t.
Exit: theo time-based (end of session, N bars) hoặc rule-based đối xứng.
Walk-forward: chia dữ liệu thành các block train/test rolling, ví dụ train 250 ngày → test 50 ngày, dịch tiến 50 ngày.
Pass criteria: T-stat ≥ 2.0 trên out-of-sample, N ≥ 30 trade, net return > 0 sau cost 2 điểm round-trip (với MNQ), stability đa năm (không có năm nào âm > 1.5×|stdev|).

Một công thức T-statistic an toàn cho mean trade return:

T = (mean(r_trade) - 0) / (std(r_trade) / sqrt(N))

Trong đó r_trade là PnL ròng từng trade (đã trừ commission + slippage giả định). Nếu T < 2.0, KHÔNG triển khai — bất kể backtest curve trông đẹp đến đâu.

Áp dụng đa thị trường

VN30F (Hợp đồng tương lai chỉ số Việt Nam)

VN30F1M có tick value 100,000 VND, spread điển hình 0.1-0.3 điểm, chi phí round-trip (commission + thuế) thường 1-2 điểm tương đương 0.05-0.1% notional. Tỷ trọng cost/edge của VN30F thuận lợi hơn MNQ vì biên độ intraday VN30F lớn (1-2% trong các session sôi động) trong khi cost relative thấp hơn. Tuy nhiên, không nên optimistic vội — phải áp dụng cùng framework falsification:

Tham số đề xuất: train 200 ngày → test 30 ngày, rolling forward 30 ngày.
Cost giả định: 2 điểm round-trip (~200,000 VND/HĐ) để cover commission, thuế, slippage 1 tick.
T-stat threshold: giữ 2.0 nhưng đòi N ≥ 50 vì độ noise VN30F cao hơn do thanh khoản phân tán.
Cảnh báo: chế độ ATC (đóng cửa định kỳ) tạo ra microstructure shock không có trong MNQ, mọi gap-strategy phải bóc tách bar ATC riêng.

Đặc biệt, opening range breakout (ORB) trong session sáng VN30F là chiến lược "đắt khách" nhưng phải kiểm tra: trong giai đoạn 2022-2024 ORB 30 phút đầu phiên đã suy yếu rõ rệt do sự xuất hiện của arb HFT giữa VN30 spot và VN30F (xem Bui & Nguyen 2024). Đừng đưa ORB vào production nếu chưa walk-forward qua 2024-2026.

US equity futures (ES, NQ, RTY, YM, MNQ)

Paper test trực tiếp trên MNQ, nên có thể trust kết quả là upper bound trên family này: nếu MNQ đã không edge sau cost, thì NQ (cost tương đương per-tick) cũng không edge. ES và YM có liquidity lớn hơn nhưng beta thấp hơn, gross edge càng nhỏ. RTY là exception duy nhất — small-cap có vol tự nhiên cao hơn, có thể signal-to-noise tốt hơn marginally:

Recommendation: chạy lại 14 family signal của paper trên ES và RTY trước khi kết luận "intraday OHLCV đã chết toàn thị trường US futures".
Tham số cost: ES round-trip ~0.5-1.0 điểm ($25-50/contract), RTY ~0.5 điểm ($25), YM ~3 điểm ($15).
Cross-asset extension: thay vì dùng OHLCV của riêng MNQ, thêm features từ ES/RTY (lead-lag, basis) — paper Mesfin chưa làm điều này, đây là next-step rõ ràng.

Crypto spot (BTC, ETH, altcoins)

Crypto 24/7 không có "opening range" tự nhiên, do đó nhiều tín hiệu của paper không port trực tiếp được. Tuy nhiên các signal volume-based và liquidity grab vẫn relevant. Crypto spot có cost relative thấp hơn futures equity (~0.1% taker fee trên Binance, ~0.05% maker), nên gross edge 0.5-1% có thể survive cost. Cảnh báo:

BTC/ETH spot có market-cap rất lớn → noise thấp nhưng signal cũng thấp.
Altcoin mid-cap (top 20-100) cho tỷ lệ signal/noise tốt hơn nhưng phải worry về listing/delisting risk và pump-and-dump regime.
"Daily session boundaries" có thể chọn 00:00 UTC (Asia open) hoặc 13:30 UTC (US cash open) — chạy A/B test, đừng giả định.

Crypto perpetual futures

Crypto perp có thêm chiều funding rate không tồn tại trong MNQ. Một signal long-side gặp funding âm liên tục sẽ ăn carry tích cực, ngược lại funding dương kéo PnL âm. Khi áp dụng framework Mesfin lên BTC-PERP/ETH-PERP:

Cost round-trip: 0.08-0.10% (taker), 0.04-0.06% (maker hỗn hợp). Funding accrual phải tính riêng theo holding period.
Tín hiệu cross-session momentum: thay bằng cross-funding-cycle momentum (mỗi 8h hoặc 4h tuỳ exchange).
Liquidity grab signal: trên perp, đo bằng liquidation flow từ Coinglass/CryptoQuant thay vì pure volume spike — đây là edge có thật vì retail leverage rất cao.

Cân nhắc cross-market chung

Áp dụng đồng một bộ pass criteria (T≥2, N≥30, net>0, multi-year) cho mọi market — đừng "nới rule" khi backtest đẹp.
Khi cost relative thấp hơn (crypto spot, RTY) framework Mesfin có thể có ít null hơn — nhưng vẫn cần walk-forward.
Positive control luôn luôn cần thiết: chạy framework lên một signal bạn BIẾT là có edge để confirm bạn không bị bug pipeline trước khi public null result.
Đừng tin "ORB hoạt động ở thị trường nhỏ vì retail chiếm ưu thế" — chứng minh nó, đừng giả định.
News-driven signal trong paper bị reject — không có nghĩa news không matter, mà chỉ nói naïve event-window không edge sau cost. Cần kết hợp tone analysis (NLP) hoặc cross-asset reaction model.

Minh họa Python

Đoạn code dưới đây cài đặt framework falsification của Mesfin: chạy walk-forward đánh giá một họ signal trên dữ liệu MNQ 5 phút, output bảng T-stat, N trade, net return sau cost.

python

# Falsification framework cho intraday OHLCV signal
# Theo Mesfin (2026), arXiv:2605.04004
# Yêu cầu: pandas, numpy, scipy

import numpy as np
import pandas as pd
from scipy import stats


def opening_range_breakout(bars: pd.DataFrame, or_minutes: int = 30) -> pd.Series:
    """
    Sinh tín hiệu Opening Range Breakout.
    bars: DataFrame multi-day với cột [open, high, low, close, volume],
          index DateTimeIndex theo timezone exchange (vd: America/New_York).
    Trả về: Series tín hiệu {+1, -1, 0} aligned với bars.index, ra ở cuối bar t,
            entry tại open của bar t+1.
    """
    sig = pd.Series(0, index=bars.index, dtype=int)
    # Nhóm theo ngày, tính range của or_minutes đầu tiên
    grouped = bars.groupby(bars.index.date)
    for date, day in grouped:
        if len(day) < or_minutes // 5 + 2:
            continue
        # Mở phiên: bar đầu tiên trong ngày
        open_window = day.iloc[: or_minutes // 5]
        or_high = open_window["high"].max()
        or_low = open_window["low"].min()
        # Phần còn lại trong phiên
        rest = day.iloc[or_minutes // 5 :]
        # Breakout up: close vượt or_high → long ở bar tiếp theo
        long_break = rest[rest["close"] > or_high]
        short_break = rest[rest["close"] < or_low]
        sig.loc[long_break.index] = 1
        sig.loc[short_break.index] = -1
    return sig


def simulate_trades(
    bars: pd.DataFrame,
    signal: pd.Series,
    hold_bars: int = 12,
    cost_points: float = 2.0,
) -> pd.DataFrame:
    """
    Mô phỏng vào lệnh tại open của bar kế tiếp signal, exit sau hold_bars.
    Trả về DataFrame trades với cột [entry_time, exit_time, side, pnl_points, net].
    """
    bars = bars.copy()
    bars["open_next"] = bars["open"].shift(-1)
    trades = []
    in_position = False
    pending = None
    for ts, row in bars.iterrows():
        sig = signal.loc[ts]
        if sig != 0 and not in_position and not np.isnan(row["open_next"]):
            entry_idx = bars.index.get_loc(ts) + 1
            exit_idx = min(entry_idx + hold_bars, len(bars) - 1)
            entry_price = bars["open"].iloc[entry_idx]
            exit_price = bars["close"].iloc[exit_idx]
            pnl = (exit_price - entry_price) * sig
            net = pnl - cost_points
            trades.append(
                {
                    "entry_time": bars.index[entry_idx],
                    "exit_time": bars.index[exit_idx],
                    "side": sig,
                    "pnl_points": pnl,
                    "net": net,
                }
            )
    return pd.DataFrame(trades)


def walk_forward_eval(
    bars: pd.DataFrame,
    signal_fn,
    train_days: int = 250,
    test_days: int = 50,
    cost_points: float = 2.0,
) -> pd.DataFrame:
    """
    Chạy walk-forward: train_days để fit (chỗ tham số), test_days OOS.
    Ở đây signal không có tham số → chỉ apply lên test windows.
    """
    all_dates = sorted(set(bars.index.date))
    results = []
    start = 0
    while start + train_days + test_days <= len(all_dates):
        test_window_dates = all_dates[start + train_days : start + train_days + test_days]
        mask = pd.Series(bars.index.date, index=bars.index).isin(test_window_dates)
        bars_test = bars[mask]
        sig_test = signal_fn(bars_test)
        trades = simulate_trades(bars_test, sig_test, cost_points=cost_points)
        if len(trades) > 0:
            results.append(
                {
                    "window_start": test_window_dates[0],
                    "window_end": test_window_dates[-1],
                    "n_trades": len(trades),
                    "mean_net": trades["net"].mean(),
                    "std_net": trades["net"].std(ddof=1),
                    "t_stat": (
                        trades["net"].mean() / (trades["net"].std(ddof=1) / np.sqrt(len(trades)))
                        if trades["net"].std(ddof=1) > 0
                        else np.nan
                    ),
                }
            )
        start += test_days
    return pd.DataFrame(results)


def institutional_pass(
    results: pd.DataFrame, t_threshold: float = 2.0, min_trades: int = 30
) -> dict:
    """
    Kiểm tra 4 rào chắn institutional của Mesfin:
    1. T-stat trên TOÀN BỘ trade ≥ t_threshold
    2. Tổng số trade ≥ min_trades
    3. Net return tổng > 0
    4. Multi-year stability: không window nào âm sâu > 1.5 std overall
    """
    if results.empty:
        return {"pass": False, "reason": "no trades"}
    n_total = results["n_trades"].sum()
    mean_total = (results["mean_net"] * results["n_trades"]).sum() / n_total
    # Pool variance (xấp xỉ)
    var_total = ((results["std_net"] ** 2) * (results["n_trades"] - 1)).sum() / (n_total - len(results))
    t_total = mean_total / np.sqrt(var_total / n_total)
    stab_floor = mean_total - 1.5 * np.sqrt(var_total)
    multi_year_ok = (results["mean_net"] > stab_floor).all()
    pass_ok = (
        (t_total >= t_threshold)
        and (n_total >= min_trades)
        and (mean_total > 0)
        and multi_year_ok
    )
    return {
        "pass": pass_ok,
        "t_stat_total": float(t_total),
        "n_trades_total": int(n_total),
        "mean_net": float(mean_total),
        "multi_year_stable": bool(multi_year_ok),
    }


if __name__ == "__main__":
    # Sinh dữ liệu MNQ giả lập 5-min để demo (thay bằng dữ liệu thật của bạn)
    rng = np.random.default_rng(42)
    idx = pd.date_range("2024-01-02 09:30", periods=78 * 250, freq="5min", tz="America/New_York")
    # Loại bỏ các bar ngoài phiên cash 09:30-16:00
    idx = idx[(idx.time >= pd.Timestamp("09:30").time()) & (idx.time < pd.Timestamp("16:00").time())]
    price = 15000 + np.cumsum(rng.normal(0, 5, len(idx)))
    bars = pd.DataFrame(
        {
            "open": price + rng.normal(0, 1, len(idx)),
            "high": price + np.abs(rng.normal(2, 1, len(idx))),
            "low": price - np.abs(rng.normal(2, 1, len(idx))),
            "close": price,
            "volume": rng.integers(100, 2000, len(idx)),
        },
        index=idx,
    )

    results = walk_forward_eval(
        bars, opening_range_breakout, train_days=100, test_days=30, cost_points=2.0
    )
    verdict = institutional_pass(results)
    print("Walk-forward windows:")
    print(results.head())
    print("\nInstitutional verdict:")
    for k, v in verdict.items():
        print(f"  {k}: {v}")
    # Trên dữ liệu noise random, kỳ vọng: pass = False, đúng tinh thần paper.

Structural Limits of OHLCV-Based Intraday Signals in MNQ Futures: A Systematic Falsification Study ​

Ý tưởng cốt lõi ​

Ứng dụng giao dịch chính ​

Áp dụng đa thị trường ​

VN30F (Hợp đồng tương lai chỉ số Việt Nam) ​

US equity futures (ES, NQ, RTY, YM, MNQ) ​

Crypto spot (BTC, ETH, altcoins) ​

Crypto perpetual futures ​

Cân nhắc cross-market chung ​

Minh họa Python ​