Maximum variance is not maximum information
PCA picks directions with the most spread. But if the class boundary runs perpendicular to the high-variance direction, PCA projects it away. You lose the signal that matters for your task.
PCA finds directions of maximum variance. But variance is not the same as usefulness. If your goal is classification, the most variable direction might carry noise, not signal. nomoselect finds the subspace that captures the most task-relevant structure, using exact observer geometry to measure what matters and what gets lost.
PCA picks directions with the most spread. But if the class boundary runs perpendicular to the high-variance direction, PCA projects it away. You lose the signal that matters for your task.
nomoselect treats dimensionality reduction as an observer design problem. Given labelled data and a task (classification, minority detection, equal-weight discrimination), it finds the observer that captures the most task-relevant information, with exact diagnostics on what is kept and what is hidden.
from nomoselect import GeometricSubspaceSelector
# Fits like sklearn
sel = GeometricSubspaceSelector(n_components=2, task="fisher")
sel.fit(X_train, y_train)
# Transforms like sklearn
X_reduced = sel.transform(X_test)
# But also reports what PCA cannot
report = sel.report()
print(report.summary())
# Shows: visible fraction, advantage over PCA,
# hidden load, regularisation audit, conservation check
Task observer captures 32% more class-relevant information than PCA at 2 components.
PCA captures near-zero task info. The task observer captures all of it. Maximum possible advantage.
Consistent gain even on well-separated data where PCA already does reasonably well.
64 features, 10 classes. The observer finds the 5-dimensional subspace that captures the most label structure.
pip install nomoselect
Requires: nomogeo >= 0.4.0, numpy, scikit-learn >= 1.2.
cd nomoselect && python -m pytest tests/ -q
Covers selection, reporting, auditing, misuse detection, and all four task types.
The biggest gains appear when variance and task structure point in different directions. On well-separated data where PCA already aligns with the class boundary, both methods agree. The diagnostic report always tells you the exact advantage.
When the number of features greatly exceeds sample size (e.g., genomic data with 7000+ features), pass the data through PCA first to reduce to a manageable size, then apply nomoselect. Direct application to very high dimensions is slow.