Eyes Wide Open: How Karpathy's Autoresearch Framework Could Democratize Glaucoma Research — A Blueprint for Patient-Led, AI-Driven Discovery in Vision Restoration Artwork

Glaucoma, Vision & Longevity: Supplements & Science

Discover the latest science on glaucoma, vision, and longevity. Each episode explores evidence-based supplements for eye health, healthy aging, and lifespan extension. Original articles backed by real scientific research. All source links available at visualfieldtest.com, where you can also take a free visual field test online. Subscribe for weekly insights on glaucoma treatment, glaucoma prevention, vision supplements, and longevity research that could protect your sight and extend your healthspan.

MEDICAL DISCLAIMER:

This podcast is for educational and informational purposes only. It is not intended as medical advice, diagnosis, or treatment. The content presented should not replace professional medical consultation.

Glaucoma is a serious condition that can lead to permanent vision loss. Never stop or modify prescribed treatments without consulting your ophthalmologist or healthcare provider.

The supplements and research discussed are for informational purposes only. Individual results may vary, and supplements are not FDA-approved to treat, cure, or prevent any disease.

Always consult a qualified healthcare professional before starting any new supplement regimen, especially if you have existing eye conditions or are taking medications.

The visual field test available at visualfieldtest.com is a screening tool only and does not replace comprehensive eye exams by a licensed professional.

All Episodes

Glaucoma, Vision & Longevity: Supplements & Science

Eyes Wide Open: How Karpathy's Autoresearch Framework Could Democratize Glaucoma Research — A Blueprint for Patient-Led, AI-Driven Discovery in Vision Restoration

March 13, 2026 • Visual Field Test

0:00 | 50:37

This audio article is from VisualFieldTest.com.

Read the full article here: https://visualfieldtest.com/en/eyes-wide-open-how-karpathy-s-autoresearch-framework-could-democratize-glaucoma-research-a-blueprint-for-patient-led-ai-driven-discovery-in-vision-restoration

Test your visual field online: https://visualfieldtest.com

Support the show so new episodes keep coming: https://www.buzzsprout.com/2563091/support

Excerpt:

Eyes Wide Open: How Karpathy’s Autoresearch Framework Could Democratize Glaucoma ResearchIntroductionGlaucoma is a chronic optic neuropathy that progressively destroys the retinal ganglion cells (RGCs) and leads to irreversible vision loss. It affects millions worldwide – an estimated 64.3 million people in 2013, projected to rise above 110 million by 2040 (). Worryingly, about half of all cases remain undiagnosed until vision loss has already begun (). Traditional glaucoma care is focused on lowering intraocular pressure (IOP) through medications or surgery, but these treatments cannot reverse damage or fully prevent blindness () (). As a result, there is an urgent need for new discovery in areas like neuroprotection, RGC/optic nerve regeneration, and innovative gene and cell therapies. However, academic and Pharma research on these frontiers remains under-resourced, partly because they are long-term, high-risk efforts. Meanwhile, advances in machine learning (ML) and artificial intelligence (AI) are empowering new approaches to data analysis and generative design. Recent work (for example, Andrej Karpathy’s “autoresearch” project () ()) suggests that AI agents can autonomously run hundreds of small experiments on a single GPU based only on simple high-level instructions. In this paradigm, a human writes a short program.md describing the research goal, and an AI agent iteratively tweaks the model or hyperparameters, running 5-minute training runs, keeping successful changes, and discarding others () (). Overnight, this loop can perform on the order of 100 experiments, exploring architecture and parameter space without manual coding. This article explores how Karpathy’s autoresearch framework could be applied to glaucoma research by motivated patients, caregivers, citizen scientists, and open-source developers. We will survey under-explored glaucoma research areas (neuroprotection, regeneration, etc.) and identify machine-learning tasks in each domain where small-model experimentation could plausibly help. For each task we suggest specific public datasets, baseline models/architectures, evaluation metrics, and outline what the agent’s program.md instructions might look like. We then discuss practical steps for a community to set up and share such experiments, including hardware considerations, data preparation, and collaboration platforms. We examine the specific context of vision restoration therapies and whether autoresearch-style loops might speed up optimization of neural prostheses or other interventions. Finally, we address how citizen-generated hypotheses could be validated and escalated to clinicians, and lay out a concrete 90-day roadmap for launching a patient-led autoresearch initiative – including how to avoid pitfalls of “research theater” and ensure real impact. Throughout, we cite current sources on glaucoma research and AI in vision, aiming for a balanced, realistic, and accessible guide. The Glaucoma Research Landscape & Unmet NeedsGlaucoma research spans multiple fronts – from understanding disease mechanisms to developing new therapies for neuroprotection and vision restoration. Many promising areas are under-resourc

Support the show

SPEAKER_00 0:00

Eyes Wide Open How Carpathy's Auto Research Framework Could Democratize Glaucoma Research. Introduction. Glaucoma is a chronic optic neuropathy that progressively destroys the retinal ganglion cells, RGCs, and leads to irreversible vision loss. It affects millions worldwide, an estimated 64.3 million people in 2013, projected to rise above 110 million by 2040. Worryingly, about half of all cases remain undiagnosed until vision loss has already begun. Traditional glaucoma care is focused on lowering intraocular pressure through medications or surgery, but these treatments cannot reverse damage or fully prevent blindness. As a result, there is an urgent need for new discovery in areas like neuroprotection, RGC optic nerve regeneration, and innovative gene and cell therapies. However, academic and pharma research on these frontiers remains under-resourced, partly because they are long-term high-risk efforts. Meanwhile, advances in machine learning, ML, and artificial intelligence, AI, are empowering new approaches to data analysis and generative design. Recent work, for example, Andre Carpathy's Auto Research Project, suggests that AI agents can autonomously run hundreds of small experiments on a single GPU based only on simple high-level instructions. In this paradigm, a human writes a short programmed MD describing the research goal, and an AI agent iteratively tweaks the model or hyperparameters, running five-minute training runs, keeping successful changes, and discarding others. Overnight, this loop can perform on the order of 100 experiments, exploring architecture and parameter space without manual coding. This article explores how Carpathy's auto research framework could be applied to glaucoma research by motivated patients, caregivers, citizen scientists, and open source developers. We will survey underexplored glaucoma research areas, neuroprotection, regeneration, etc., and identify machine learning tasks in each domain where small model experimentation could plausibly help. For each task, we suggest specific public data sets, baseline models architectures, evaluation metrics, and outline what the agent's program. We then discuss practical steps for a community to set up and share such experiments, including hardware considerations, data preparation, and collaboration platforms. We examine the specific context of vision restoration therapies and whether auto research style loops might speed up optimization of neural prosthesis or other interventions. Finally, we address how citizen-generated hypotheses could be validated and escalated to clinicians and lay out a concrete 90-day roadmap for launching a patient-led auto research initiative, including how to avoid pitfalls of research theater and ensure real impact. Throughout, we cite current sources on glaucoma research and AI envision, aiming for a balanced, realistic, and accessible guide. The glaucoma research landscape unmet needs. Glaucoma research spans multiple fronts, from understanding disease mechanisms to developing new therapies for neuroprotection and vision restoration. Many promising areas are under-resourced. Neuroprotection, interventions that protect RGCs from dying, independent of IOP. Examples include neurotrophic factors and metabolic support. For instance, implants releasing ciliary neurotrophic factor, CNTF, have shown potential in early trials, and other molecules like nerve growth factor and acetylene are being investigated. However, these are not yet standard care, and more work is needed to translate them to patients. A 2025 review warns that neuroprotective glaucoma therapies are a future treatment needing further trials, reflecting an unmet need. RGC regeneration and optic nerve regeneration. Once RGCs and their axons die, current medicine has no way to reverse that. Some animal studies use gene therapies to reprogram RGCs or stimulate regrowth. For example, CRISPR-based repression of P10, a negative growth regulator, has promoted axon regrowth in rat neural cells, and experiments co-deleting P10 and SOX-3 drove sustained optic nerve regeneration in mice. However, these breakthroughs remain in lab models. The underlying biology, e.g., how to recapitulate retinal development or bypass growth inhibitors, is complex. There is a huge demand for modalities, small molecules, genes, biomaterials, that could stimulate RGC survival or axon regrowth, but progress to human trials is slow. Gene and cell therapies. New technologies like CRISPR viral vectors and stem cell-derived RGCs hold promise for glaucoma. Strategies include gene editing to reduce IOP, e.g., targeting aqueous humor production, or modulate neurodegenerative pathways. Stem cells could, theoretically, replace lost trabecular meshwork cells or RGCs and secrete protective factors. Early work has shown that certain transcription factors, e.g. Oct4, SOX2K4, can reprogram non-RGCs into RGC-like neurons in mice, restoring vision and optic nerve injury. Yet these approaches face safety and delivery challenges before reaching patients. Several recent reviews highlight gene therapy as an exciting but not yet clinical frontier for glaucoma. In sum, molecular and cell innovations are advancing, but resources and trial data are limited, creating an opportunity for computational exploration, e.g., designing optimal viral constructs or predicting effective gene edits. Electrical and optogenetic stimulation for vision restoration for patients with advanced glaucoma or combined diseases like retinitis pigmentosa, artificial vision prosthesis or optogenetic therapies aimed to bypass damaged RGCs, retinal implants, epiretinal or subretinal electrode arrays, and cortical implants have generated artificial percepts, phosphenes, but resolution is low and results vary widely. A recent 2025 review on AI in visual prosthesis notes that AI algorithms show promise in optimizing prosthetic vision, particularly through enhanced image saliency extraction and stimulation strategies, though so far most studies are simulations. In other words, machine learning can help transform camera images into patterns of stimulation that are most informative given the device's limits. Optogenetics, making surviving retinal cells light-sensitive, and transcorneal electrical stimulation, TES pulses, are also being trial for glaucoma-related vision loss. All these areas need extensive parameter tuning, e.g., spatiotemporal patterns of stimulation, gene expression vectors, tasks potentially suitable for autonomous ML search. IOP-independent mechanisms. Many people continue to lose vision despite well-controlled IOP. Factors like impaired ocular blood flow, neurovascular dysfunction, or metabolic stress in the optic nerve head are recognized but not fully understood. Genetic studies suggest significant IOP-independent components of glaucoma risk. Biomarkers of these processes beyond pressure are urgently needed. Also, half of glaucoma patients have normal tension disease, highlighting that high IOP is not the only culprit. Research into vascular factors or other damage pathways is ongoing but fragmented. Computational modeling or mining of large data sets, e.g., genome-wide association studies, could help identify novel mechanisms or therapeutic targets in this domain. Biomarker discovery via imaging in fields. Early detection and monitoring of glaucoma often rely on imaging, fundus photos, OCT, and functional tests, visual fields. Advanced algorithms could uncover subtle biomarkers that human clinicians miss. For example, deep learning has begun to detect pre-parametric visual field loss, changes in visible to standard field analysis. Similarly, AI has been used to analyze OCT layer thickness profiles to predict glaucoma before overt damage. However, there are not yet widely accepted AI biomarkers that are used clinically for screening or risk stratification. Computational bottlenecks here include the need for large, well-labeled data sets and robust validation protocols. Public challenges, refuge, airogs, etc., have begun to standardize data, but coverage of early stage disease is thin. Further machine-driven discovery of multimodal biomarkers, combining OCT, fields, genetics, etc., remains an open frontier. Where can small model ML help? Many of the above describe high-level problems. The bottlenecks often are data scarcity, many interacting variables, and slowly moving biology. Where an auto research agent shines is in automating small-scale experiments on available data. For example, if there is a modest dataset of OCT scans with and without early glaucoma, a citizen scientist can set up a rapid model testing loop to find what architecture best distinguishes them. Likewise, small transformers on genomics or literature could suggest novel gene or drug candidates. The key is focusing on narrow tasks with defined metrics, classification accuracy, AUC loss, and iterating quickly. Areas with limited public data, e.g. TS parameters or novel gene cocktails, might rely on synthetic data or proxies. In the next section, we map specific ML tasks in glaucoma to the autoresearch approach. Mapping autoresearch to glaucoma problems. Carpathy's autoresearch framework is domain agnostic. It can run experiments in any ML task provided by a prepare.py and train.py with a well-defined evaluation metric. We identify several concrete glaucoma-related tasks and specify how an agent could tackle each. Each use case below includes a publicly available dataset, if possible, a starting model or architecture, an evaluation metric, and a sketch of programmed MD instructions. 2.1 OCT image analysis, structural detection and segmentation. Task, early glaucoma detection from OCT scans. OCT imaging provides cross-sectional views of retinal layers. Thinning of the retinal nerve fiber layer, RNFL, and ganglion cell complex GCC can precede visual field loss. We can treat this as a classification task, glaucoma vs. Healthy, or regression, e.g. output RNFL thickness. Data set, a recent release SinoCet, is a synthetic dataset of 200,000 circumpapillary OCT images, 100K glaucoma, 100K normal, generated by GANs. Each image has associated RNFL thickness and segmentation masks. These are publicly available on Zenodo. Though synthetic, they are statistically validated to mimic real OCT. Alternatively, one could use the OCT DL dataset, 2064 images of various retinal diseases, or smaller clinical OCT collections. Model Start with a small convolutional neural network, CNN. For classification, a model with 3 to 5 convolutional layers, e.g. analogous to ResNet 18 truncated or a custom small CNN, can work. For segmentation of RNFLGCC, an encoder decoder like a tiny UNET with depth 3 to 4 is suitable. The initial train.py could implement a simple CNN and training loop with default hyperparameters. Metric, if doing glaucoma classification on OCT, use A use area under rock, or accuracy on a validation split. For segmentation, use DICE coefficient or IOU on RNFL layer masks. SAN OCT provides masks. Example Program. Goal, maximize validation A for detecting glaucoma from OCT images. Allowed modifications, number of conv layers, filter counts, kernel sizes, activation functions, learning rate, optimizer choice, batch size, etc. After each five-minute training run, evaluate AUC on the held-out set. If AUC improves, keep the change. Otherwise revert. The agent will thus try variations, e.g. adding layers, adjusting width, switching from atom to RMS prop to improve AUC. Task RNFL GCC layer segmentation. Precisely measuring RNFL thickness is crucial. Using synthetic OCT scans with provided segmentations or any real OCT with annotated layers, one can frame this as a segmentation task. Dataset, CINAH again provides RNFL segmentation masks. Another source. Some academic groups have labeled OCTB scans, though often proprietary. If needed, one might use generic OCT segmentation datasets like Duke Retina OCT Fluid Challenge as proxies. Model, a small UNET-like CNN, perhaps even channel trimmed from a baseline, e.g., use three down-up blocks starting with 16 filters. Agent is allowed to change depth and width. Metric, dice score, or mean IOU of the predicted RNFL mask versus truth. Example Program.md. Goal Maximize the Dice score for RNFL layer segmentation on OCT. The base model is a three-block UNET. The agent may vary the number of filters, add dropout, or change learning rate. Train for five minutes each trial and compute dice on validation. Keep modifications that increase dice. Task Progression Prediction via serial OCT. Using sequential OCT, predict future thinning. If longitudinal OCT data exist, e.g. UK Biobank or private clinic data, the goal could be to predict RNFL change or a binary fast progressor label. Data set, public longitudinal OCT data specific to glaucoma are scarce. However, one could repurpose SROCT challenge data or SYN OCT images with simulated progression to simulate this task. Alternatively, use UK Biobank OCT images, though not glaucoma-specific and not easily accessible to citizen scientists. For illustration, assume a dataset of OCT scans at time zero and time one with labels. Model, a Siamese or concatenated CNN taking pairs of OCT images, outputting probability of progression. Start with feeding time zero and predicting time one cutoff. Metric, AUC for binary progression classification, or MSE if trying to predict thickness change. Example program dot goal? Identify eyes that will have rapid RNFL loss. Input, baseline OCT, label 5mm thinning after one year. We use a CNN classifier. Allowed changes include network depth, learning rate, augmentation. Use validation AUC as the metric. 2.2 visual field VF analysis task predict future visual field loss. Given one or more past Humphrey visual field tests, pointwise sensitivity values, forecast future sensitivity or rate of progression. This is a classic glaucoma management problem. Data set, the GRAPE dataset 2023, provides longitudinal follow-up of 263 eyes, 115 records with VF and Fundus OCT, including annotated progression. Another resource is the U.S. UH Visual Field, U2HVF longitudinal database, 28,943 fields for many patients. However, GRAPE is well curated and public with both VF and outcomes. Model, a simple approach is a feedforward network, fully connected on the 54-point VF data or compressed to global indices. For progression prediction, a smaller MLP or 1D CNN can handle the 54 or 30 input features. Another idea, treat the 898 grid as a tiny image and use a small CNN, e.g. 3.3 kernels. Metric, if predicting future mean deviation or point values, use MSE, lower is better. If classifying fast progressor versus not, use AUC. Example program.md Goal, minimize MSE of predicted visual field. Alternatively, maximize AUC for classifying rapid loss. Base model, two layer perceptron on 54 VF values. Agent can adjust hidden size, activation, or add dropout. After each 5 min train, compute metric on val set. Task, identify fast progressors. Using a series of past VFs classify which eyes will lose vision quickly. Data set, use the annotated progression status in GRAPE. They marked eyes as progressed, or take UWH VF and label top decile of MD loss as fast. Model could concatenate features from two or three consecutive fields or differences into a small network. Possibly include baseline IOP and age if available. Metric AUC for distinguishing fast versus slow progressors. Example program MD Goal Maximize AUC for predicting rapid field progression. Input features second order differences of VF1 and VF2 plus IOP. Use small FC network agent may tune layer widths, learning rate, batch size. 2.3 Drug compound screening in silico candidate discovery. Task predict candidate neuroprotective regenerative compounds. Use ML to find small molecules that might protect RGCs or encourage regeneration. For example, many known compounds like nicotinamide, valproate show neuroprotective effects. We can train models to recognize chemotypes correlated with known efficacy and then search chemical space. Data set. This is challenging due to lack of a dedicated glaucoma drug database. As a proxy one could use MOLNET datasets, e.g. HIV inhibition, BBB permeability, or any bioactivity dataset. Alternatively, compile a list of compounds tested in optic nerve injury models from literature mining with labels. In practice, one might start with a more generic property e.g. blood brain barrier penetration data from molecule net. Model a small transformer or graph neural network on SMILE strings. A transformer like GPT2 style with few layers or a simple graph convolutional net, e.g. 3 GCN layers, can be implemented in the train.py metric. If we treat as classification, active vs inactive, use OROC. If predicting affinity or log P, use RMSE. Example Program dot MD. Goal Maximize classification ROC A for identifying neuroprotective like compounds. Base model small transformer on SMILES. Agent may adjust number of transformer layers, dropout, learning rate, or use alternative featurizations, e.g., fingerprint input. After each five min evaluate AUC on VAL molecules. Note, because public data for actual neuroprotection is scarce, this task is more illustrative. In practice, citizen scientists could create a custom dataset of known neuroprotective compounds versus controls and follow this pattern. 2.4 Gene Regulatory Network Modeling, single cell RGC task, identify regenerative TF combinations. Use single cell RNA-seq data from RGCs to learn transcriptional patterns of regenerative growth. For example, some RGC subtypes regenerate better than others. An ML model might predict a regenerative state label and one could inspect which transcription factors are important. Data set. A 2018 study provides RGC single cell transcriptomes, Geoaccession GSE 15404, identifying distinct RGC subtypes. We can use this dataset or a subset where cells are labeled by subtype or by experimental condition, e.g. pre-versus post-injury. Model, a small transformer or MLP operating on gene expression vectors, each cell has thousands of gene abundances. Practically, one would pre-select top 500 genes, e.g. highly variable genes. The train.py might implement a mini transformer, e.g. four layers embedding 256, or simple two-layer perceptron. Metric, if using unsupervised analysis, one could use silhouette score, but more simply, if labeling cells as regenerating versus non, if labels exist, use classification accuracy AUC. Example program dot MD. Goal Build a model distinguishing regenerating versus non-regenerating RGC gene expression profiles. Start with a three-layer transformer. Agent can change embed DIM, depth, learning rate, or add batch norm. Optimize validation accuracy. After runs, the best model's attention weights or learned features might highlight key transcription factors for experimentation. 2.5 Electrophysiology signal analysis task detect subclinical RGC dysfunction via ERG. Pattern electroretinogram, PERG, or other electrophysiological signals can reveal RGC health. For example, delayed or reduced ERG responses may precede visual field defects. We can attempt to classify signals as normal versus glaucoma suspect. Data set ERG datasets in glaucoma are rare. One could use a surrogate, a dataset from animals, retinal degeneration, or synthetic signals. If unavailable, even generic 1D electrophysiology datasets, e.g. ECG, could illustrate the pipeline. Model, a 1D CNN, e.g. two conveyers followed by FC on the time series data. Alternatively, an LSTM can be used if sequences are longer. Metric, accuracy or AUC in classifying a subtle dysfunction versus normal. Possibly F1 if classes are imbalanced. Example program.nem Goal Maximize validation accuracy for classifying ERG traces, healthy versus early glaucoma pattern. Use a 1D CNN agent may adjust filter sizes, stride, or add recurrent layer. Keep any changes that improve accuracy. 2.6 Literature Mining, Hypothesis Generation. Task Fine-tune a small language model to surface novel insights. With thousands of glaucoma research papers in PubMed, an ML agent could look for connections or repurpose candidates. For instance, link neuroprotective pathways to existing drugs. We can treat this as a language modeling problem or as a retrieval problem. Data set, compile a corpus of glaucoma-related abstracts, e.g. use PubMed search for glaucoma gene therapy, etc. One can download 10,000 abstracts via NCBI APIs. For a simpler start, use PMC open access glaucoma articles. Model a small transformer language model, e.g. six layer GPT2 or even BERT fine tuned. For autoresearch purposes, we likely fine tune a causal model GPT. On the text. Metrics. Standardly, validation loss perplexity is optimized. If doing a classification, e.g., give an abstract, predict a label for a drug or pathway, use accuracy AUC. Example Program.md Goal Minimize validation perplexity of a small GPT2 on the glaucoma literature corpus. Use five-minute fine-tuning runs. Agent can vary number of layers, hidden size, learning rate, context length. Keep changes that reduce perplexity. Once trained, one can prompt this model to generate hypotheses, e.g., top candidate repurposable drugs for neuroprotection in glaucoma. In each of these domains, the key is that a single GPU and brief runs allow many trials. We are not expecting the agent to code new algorithms from scratch, but to tweak an existing training script. The human role is writing program.md to guide the agent's search towards a glaucoma-specific goal, like maximizing AUC on a fundus dataset or predicting RNFL thickness. The examples above illustrate how Train.py could be set up initially and how Program.md prompts to improve a chosen metric. Practical Citizen Science Implementation Guide. How can motivated individuals with limited resources, e.g., a single RTX 3060 or a MacBook with Apple Silicon, actually apply auto research to glaucoma problems? The good news is Carpathy's repo is small and has guidance for scaling down. Here are key steps and tips. Environment setup. Clone the Carpathi Autoresearch Repo. You'll need a modern Python and ideally an LLM access. The agent itself is typically a pre-trained LLM like GPT-4 or Claude that edits the code. For GPUs, install PyTorch with proper CUDA metal support. For Apple Silicon, use one of the forks, e.g. MLX, or a PyTorch build for M1M2, see the repos docs. On Windows Linux with a 3060 or 4070, normal PyTorch CUDA works. Configuring for small GPU, the default autoresearch uses a 50 map parameter GPT-like model and sequences of length 1024, which may be heavy. For a GTX 3060, 12GB, you should reduce model size and sequence length. In train.py, set max sequel any U5012 or even 256. Drop the number of layers and width. The medium GPT is 8 layers. Try four layers, 256 width. The instructions in the community mention lowering depth, width, etc. You can also reduce the optimizer's memory by using smaller batch sizes, even 16 or 8. The agent can still mutate these parameters, but giving it a smaller starting point ensures runs 5 minutes. The auto research GitHub README and issue discussions also note that Mac M1 chips can handle shorter sequences, e.g., 256 tokens, due to limited memory. Similar scaling applies to any GPU. Preparing glaucoma data. Each task's data must be loaded and split. Public glaucoma datasets include fundus datasets, Origa Lite, 650 labeled images, RIM1DL, 485 images with CUP DISC segmentations, Refuge, 1200 plus images with training test splits, the new Hillel Yaf glaucoma dataset, HYD, with twice 1200 fundus images and high-quality labels. IPAC's AROGS, tens of thousands of retinal images, is also publicly accessible via registration, e.g. Kaggle. OCT datasets, SYNOC, 200K synthetic B scans with RNFL masks, OCTDL 2064 images of various retinal diseases, and others from public challenges. Visual field data, grape, 263 eyes longitudinal VF plus images. UWHVF 28K VF tests is open if you download from University of Washington repository. Some Kaggle challenges include VF data. Electrophysiology, no large open glaucoma ERG dataset is known, but one could start with any accessible norm versus glaucoma signal data. Chemical gene data. Standard datasets like moleculeNet for compounds or Geo for genes can be repurposed, e.g., download GSE15404 raw counts via GeoQuery and preprocess to expression matrices. For each, you need a prepare.py that loads data and defines trainset, valset, and an evaluation function. Carpathy's template expects prepare.py to output training data and an evaluation routine that returns a loss or metric. For example, prepare.py for RIM 1 might load images and CC labeled as glaucoma split into train val folders and define a function computing validation ache. Refer to FOIN2L79 for how RIM1 is structured. Adjusting data for small scale, if datasets are large like iPacks or CINO, you can subsample to create a tiny dataset of a few hundred examples. The model can still learn something valuable on a small corpus. The autoresearch repo even mentions using tiny stories, style tiny datasets to run on tiny hardware. For example, pick 500 images from Origa, Balanced, or 1,000 VF fields from Grape. Likewise, for language, one could use a 5,000 abstract subset of PubMed glaucoma papers. The key is a fixed dataset that the agent iterates over. Ensure to pre-shuffle and split 80-20 so each 5-min run sees the same train train valve split. Writing program.md strategies. The community should share different program.md prompts, like recipes, inversion control. Each file could encode a research strategy. For instance, one strategy might say increase network depth if depth 6, else reduce learning rate, while another might say focus on data augmentation changes. Over time, groups can compare which strategies yielded better metrics on leaderboards. A good program.md includes a goal, e.g., maximize AUC or minimize validation loss, and hints at allowable mutations, layers, filters, LR. The agent's LLM uses these instructions to propose code edits. Keep metrics standardized, e.g., always report AUC for glaucoma classification tasks, so experiments are comparable. Community collaboration. To make this effort scalable, a citizen science community should organize shared experiment logs. Post each experiment's results, e.g., run 27 of program V1 achieved val AU equals 0.82 with width equals 4, depth equals 3. Standardized metrics. Define metrics for each task, e.g. OCT glaucoma AUC, VF progression AUC, Attribute AUC, etc. A shared leaderboard, akin to Auto Research's Val BPB, can track top scores. For example, a Slack or GitHub Actions might collect each agent's best AUC weekly. Version controlled program.md, host all program.md in a GitHub repo. Members can fork and propose new strategies via pull requests while keeping historical versions. This way multiple approaches can be tested in parallel, e.g. programmer2vec.md is programtransformer.md. Data and code sharing. Use public repos or notebooks for data prep scripts and share train. PI modifications found by the agent to reproduce in standard ML frameworks. Linking to the original dataset sources Kaggle, PhysioNet, Zenodo ensures others can download the same data. By lowering technical barriers, the agent edits code, user edits instructions in Markdown, and by coordinating efforts, shared logs, leaderboards, citizen scientists can collectively explore hyperparameter model choices for these glaucoma ML problems. In essence, they invest human creativity in defining goals and let the agent run the grind of 100 experiments overnight per goal. Vision restoration, specifically vision restoration, regaining sight after damage, is a particularly exciting target for AI-driven optimization. Current AI-assisted vision restoration research includes retinal implants, cortical prosthesis and optogenetics. Here's how an auto-research loop could fit in. Optimizing visual prosthesis encoding, modern prosthesis, retinal implants or cameras linked to electrode arrays, try to translate a camera image into electrical stimulation patterns that the brain interprets as sight. The challenge is that the bandwidth of electrodes is very limited, often just tends to a few hundred points. An ML model, a small CNN or transformer, can be trained to map input images to ideal stimulation maps, but the best hyperparameters or architectures for this translation are unknown. An auto research agent could run 100 variations of a neural encoder model in hours. For example, set up a dataset of image stimulation pairs, either simulated phosphenes or patient data, and have the agent optimize the encoder network to minimize a reconstruction loss or maximize a utility metric, contrast intactness, recognition accuracy. The agent might try adding attention layers, changing convolution sizes, or tuning learning rates. Over many runs, one could find small networks that deliver more salient prosthetic outputs. Some recent work already uses AI to extract visual saliency for prosthesis. Autoresearch could automate the tuning of such pipelines. Optogenetic stimulation patterns in optogenetic therapy, survivors of RGCs or other retinal cells are made light sensitive via introduced genes. The inputs from a camera must then be encoded into light pulses. Here again an ML model can control patterns. One could frame a toy task small network transforms camera image to a light intensity map, same dimensions as cells. The agent's objective could be to maximize some metric of effective stimulation, e.g. maximize activation of target cells in a simulated retina. Each trial might run a quick simulation of the response. Over iterations the agent might explore pulse durations or spatial filters. For instance, adjusting the aggressiveness of a highpass filter on the camera input might be beneficial for some patterns. The point is that many analog parameters filter kernels, nonlinearity can be swept automatically. Pulse pattern optimization, TES and implants, even non-machine learning domains can benefit from quick search. For example, a recent study, SHIE et al. found that shorter pulse durations and insertion of interphase intervals significantly improved cortical activation for retinal implants. This suggests the parameter space of electrical stimulation has strong, non-intuitive effects. An auto research agent could treat the stimulation protocol parameters, phase duration, frequency, interval as network parameters and run many small experiments, each simulated or empirical, to maximize cortical response. For instance, set up a simplified electrical model or use recorded evoke potential data in prepare.py and let the agent tweak train.py parameters like pulse timing to maximize a defined response amplitude. This is akin to automating what aficionado neuroscientists do manually viral vector design and scaffold geometry. In more exploratory therapy development, the agent's looping approach could also tackle biomedical optimizations. For example, design of AAV viral capsids or promoters to target RGCs could be guided by small predictive models, e.g. logistic regression on sequence features. Autoresearch could repeatedly try modifying a model that predicts tropism or expression, trained on E.G. small viral libraries to improve that prediction. Similarly, if someone has simulation code for growth in nerve scaffolds, for optic nerve repair, the agent could tweak geometric parameters to maximize axon extension. These are advanced but conceptually fit, the agent as experimenter could adjust model or simulation parameters for improved outcomes. In summary, any aspect of vision prosthesis or restoration that relies on parameterized algorithms could be improved via rapid iterations. Importantly, the limitation is we generally only have simulational data for many of these tasks. Actual patient testing of hundreds of variants isn't possible, but autoresearch can operate in silico to propose the best candidates for later clinical testing. As the prosthesis review noted, ensuring phosphenes are reliably generated at precise locations is an important challenge, and AI-driven models have shown potential in this area. Autoresearch could significantly accelerate finding those AI models best configurations. Bridging to clinical impact computational results must ultimately connect back to real glaucoma research and care. How can ideas generated by patient-led autoresearch be validated and advanced? Collaboration with research groups citizen scientists should reach out to established glaucoma research consortia. Examples include the International Glaucoma Genetics Consortium and the Neighborhood Consortium which pool genetic and clinical data. Findings from auto research e.g. a novel candidate gene or drug repurposing hypothesis could be shared with such groups for experimental follow-up. Tissue culture labs, e.g. at major universities or sleep researchers might test compounds on RGC survival. Academic clinicians can correlate any biomarker or image classifier with their patient data under IRB. Starting dialogues between hackathon-style groups and formal labs is key. Engaging patient advocacy organizations groups like the Glaucoma Research Foundation or Cure Glaucoma Foundation often fund patient-centered innovation. They could sponsor proof of concept projects or citizen competitions using auto research. These organizations have clinician networks and could help route promising model leads to the clinic. For example, if an agent flags an existing FDA approved drug as neuroprotective, an advocacy group could assist in setting up a small trial under proper protocols. Highlighting successes will require framing outputs as hypotheses, not medical advice and ensuring transparency. Ethical and safety guardrails citizen scientists must use only de-identified public data or fully synthetic data. Any use of actual patient records requires an IRB approved protocol and likely patient consent. Output from autoresearch loops should be clearly labeled as hypothesis generating. For instance, this model suggests drug X may protect RGCs, experimental validation needed. Critical medical decisions must remain with doctors. Risks include inadvertently distributing models that predict personal outcomes, glaucoma progression. Explicit disclaimers are necessary not to treat these as diagnostic tools. Data privacy best practices, e.g. using aggregated or anonymized fields, are a must. Precedence in citizen science it is not unprecedented for amateurs to contribute to medical neuroscience research. The iWire project MIT's crowdsourced neuron mapping game mobilized volunteers to reconstruct retinal neural circuits. In ophthalmology, non-experts have helped annotate images in open AI funded challenges, e.g. labeled datasets for eye disease. Outside eye care, games like Fold It, Protein Folding Puzzles, and Galaxy Zoo, classifying galaxies show that citizen participation can solve hard scientific problems. These successes encourage the idea that many hands, and now AIs, can indeed aid complex research. The auto research approach is like giving each person an AI-powered lab assistant. Previous crowdsourced efforts only used humans to analyze fixed tasks, whereas here the human sets the goal and the AI does the iteration. By being transparent, cautious and collaborative, a citizen science auto research initiative can earn trust. It should emphasize generating leads, not prescriptions. If the community documents methods and shares code openly, professional researchers can reproduce findings. For example, if someone finds a new combination of RGC protective factors, they could publish it in a preprint or alert a lab. Citation style references as we do here help bridge, e.g., we treated your list of candidate drugs in context of known pathways. Ultimately this is a form of open science, patient-driven but scientifically rigorous. If ethical standards are maintained, such grassroots innovation has great potential to spark new collaborations and ultimately feed into peer-reviewed ophthalmology research. A concrete 90-day roadmap a focused, time-boxed plan can rally a community of 10 to 50 people with at least one GPU or Apple Silicon each to launch an auto research for glaucoma effort. Here is a suggested phased plan. Week 1-2, formation and setup recruitment and kickoff. Create a communication channel, e.g. Slack or Discord, and a GitHub repo for the project. Publicize to glaucoma patient forums, biohacker groups and AI meetups. Hardware check. Ensure everyone can install PyTorch and clone Carpathi's repo, or the Maple Fork. Hold a setup session where each member runs a sample auto research loop on a toy dataset, e.g. CFAR10 subset, to verify the environment. Dataset selection. Decide on one to three initial tasks, e.g. OCT classification, VF progression. For each, assign a small team to prepare data, e.g. one team downloads RIM1 images, another retrieves grapefields, another collects literature abstracts. Teams should split data 8020 and create prepare dot pie stubs. Baseline models. For each task finalize a simple train pie, e.g. a tiny CNN for RIM1 and MLP for VFs. Choose evaluation metrics, AUS DICE, MSE. Initial program.md. Drafting. Each team writes an initial instruction file stating the goal and allowed changes. E.g. for RIM 1, maximize glaucoma detection AUC. For grape, minimize VFM. Week 3-6, first experiment cycles run auto research loops. Each subgroup runs the agent on their task overnight, roughly 105 min runs. Use a single program.md to start, then let participants add variations, e.g. program temp1.md collect results. Each morning, teams examine the logs the repo auto logs each run. Record the best metric achieved, the model parameters at that time, and any notable changes the agent found. For transparency, push these results to the shared GitHub, perhaps in CSV or JSON. Iteration and feedback. Compare runs. Did any strategy beat the baseline significantly? If a sub-team sees little progress, they should tweak program.md, e.g. being more aggressive with learning rate changes. Each weekend, synthesize findings in a community meeting. Tools. Use Git for version control on program.md and on the code templates. Consider a shared Google Sheet or Wiki table for leaderboards, e.g. OCTA Bestie.85 by Alice, VFRMSE Bestie 2.1 by Bob. This motivates healthy competition and transparency. Week 712, refinement and outreach. Refine experiments. Based on early results refine promising tasks. For example, perhaps the RIM1 classifier topped zero. 9DAUC, now try adding data augmentation or a slightly deeper net. Encourage branching. Some can try different architectures, e.g. vision transformer tiny instead of CNN. Agents can run multiple programMD variants in parallel. Result synthesis, create short reports on each domain, OCT, VF, etc, summarizing what worked. For instance, we improve GCC segmentation dice from.60 to 0.75 by switching from Relue to GLU activation. Use lay language so non-experts can follow, glossary for ML terms. Community presentation. By week 10, write a blog post or slide deck summarizing the initiative so far. Highlight any non-trivial findings, even null results are useful to share. Invite feedback from online forums. Perhaps contact a researcher asking for comments. We found X Neural Network tweaks help classify early glaucoma. Any ideas if this aligns with physiology? Plan outreach. Identify one or two opt ophthalmology labs or clinicians interested in collaborating. Reach out with the initial results. For example, connect with authors of the HIGD dataset or the Grape team on Twitter, LinkedIn, mention your citizen findings. Explore possibilities for co-validation, e.g. send them the trained model weights to test on their data. Beyond 12 weeks, next steps. Continue looping on the most promising tasks and new ones. For example, if RIM 1 yields good results, next tackle refuge. Perhaps build composite models ensemble of CNNs. Officialize a project page or preprint describing the effort. Consider organizing a hackathon to bring in more minds, possibly in partnership with a glaucoma charity. By structuring this way, the community can make steady progress, learn together, and start bridging to experts by the end of 90 days. Risks, limitations and honest assessment. The autoresearch for glaucoma idea is ambitious, so it requires honesty about potential pitfalls, risk of overfitting and spurious patterns. Small models on small, noisy datasets often latch onto coincidences. An agent might find a tweak that improves validation AUC simply by overfitting to idiosyncrasies. For example, if a subset of images had a subtle annotation mark, the network might use that instead of true glaucoma features. This leads to gradient descent foolery. To mitigate, always use held out test sets completely separate from any tuning for final evaluation. Limit complexity, keep models modest, and watch if the agent excessively deepens or widens the net beyond reason. If a model achieves near perfect score too quickly, question it. Use sanity checks, e.g. scramble labels and see if AUC drops to random, if If not, there is leakage. Bias in data quality. Public glaucoma datasets often come from narrow populations, e.g., orga from Singapore. A model tuned to those may not generalize. Citizen experiments should note this limitation. Ideally, multiple datasets from different cohorts are used to check if findings are robust. False leads, research theater. Running tons of experiments feels productive, but if every improvement is only on synthetic or trivial datasets, it might not benefit patients. To avoid this, focus on tasks with clinical relevance, e.g., early detection from routine OCT. Tie outcomes to real measures when possible, e.g. AUC for progression, not just tiny loss delta. Prioritize interpretability. If the agent finds a new biomarker, try to ensure it makes sense, e.g., is it focusing on known anatomical changes? No clinical guarantee. It must be crystal clear. Output from these loops is hypothesis generation, not medical advice. A model suggesting a new drug must be vetted in the lab before any patient use. Overclaiming is dangerous. Label all shared results with disclaimers. This is an AI exploration and not a peer-reviewed finding. Small model, limitation. Very small networks have limited capacity, they may miss complex patterns. In contrast, big models often see breakthroughs but require huge data. Here we accept limited scope. Hope is that even small improvements can guide research. But we should not expect these models to replace deep learning on massive data. They're best at quickly trying obvious ideas. Agent trustworthiness. The agent, e.g. GPT-4, might hallucinate or deviate. It's important that results are reproducible. After an agent run, a human should check what changes were kept and rerun training to confirm the metric. Keep the agent honest by include statements in program.md like only accept actual improvements in evaluation metric. Despite these challenges, the key safeguard is transparency and critical follow-up. Document everything. When a model shows a pattern, verify it. If many citizen scientists see the same anomaly, e.g., all high AUC models for an OCT task emphasize the nasal retina region, that strengthens the case. The goal is accelerating the idea generation phase, not avoiding careful science afterwards. Conclusion: glaucoma is a complex, silent, blinding disease with many unmet research needs, from protecting neurons to restoring vision. At the same time, AI has democratized experimentation. One person with a GPU and some determination can run automated hyperparameter searches that would take teams weeks manually. Carpathy's autoresearch framework essentially hands each citizen an AI lab assistant. By writing clear, high-level goals in Markdown, community researchers can let an agent churn through products and cut straight to promising leads. We have outlined how this can be done in practice, identifying glaucoma ML tasks, selecting data, fundus and OCT images, visual fields, molecular data sets, defining models and metrics, and using program instructions to guide the search. We sketched a 90-day community roadmap and noted bridges to clinicians to ensure that valuable output can inform actual glaucoma science. The approach is very much citizen science, opening up scientific discovery tools in an accessible way while still relying on expert oversight where it matters. Citations. We have referenced the latest resources in both glaucoma research and AI, key facts, disease prevalence half-undiagnosed, promising therapies, CNTF implants, gene editing, and shady pitfalls, AI and imaging, are grounded in current literature. Autoresearch itself is described in Carpathi's walkthrough and review. These should lend credibility to the vision outlined here. By the end of it all, we hope the reader feels empowered. If you're a patient, caregiver, or passionate hobbyist, you could be part of driving glaucoma research forward. The tools and data exist, the problems are clear, and with coordination and an AI agent, we can accelerate learning. As with any research, the journey will have false starts, but even failures teach us something, often steering human minds toward the right approaches. With eyes wide open to both the possibilities and the pitfalls, citizen-led auto research could become a powerful complement to traditional glaucoma science. Start here. The easiest way to dip your toes into autoresearch for glaucoma today, run a tiny classification on Origa fundus images. Get the data, download the Origa Lite dataset, 650 retinal fundus images labeled normal versus glaucoma. Split 80% train, 20% validation. Initial model, user adapt the sample script from Carpathy Auto Research for image classification. For example, a bit of code to load orga images and train a small CNN 2-3 conv layers to distinguish glaucoma versus healthy. Write program.md. In text, set the goal to maximize validation AUC for glaucoma detection and instruct the agent it may tweak model depth, learning rate, etc. For instance, run the loop. Launch auto research. Point it to your prepare.py, train.py, and programmed MD. Let it run for several hours or overnight on your RTX 3060. It will perform 100 experiments automatically. Check results. Examine the console or log to see the best validation AUC achieved. Should begin at 0.8 if all goes well. You now have a model and training script that the AI agent refined. This simple weekend experiment already gives you firsthand experience with building an ML pipeline without writing new code by hand. Document what you tried and share your program.md and results with the community. Each small success, AUC bumps, interesting network changes, is a building block. You're literally instructing an AI to do research on your glaucoma problem of choice. And in doing so, you learn both glaucoma data science and have hope to make a difference in understanding or treating vision loss. Good luck. Keep questions and findings open source, and remember, this is research toy tools, not medical advice. Check your runs carefully and enjoy the process of discovery. All links to sources are available in the text version of this article. You can find the full article at visualfieldtest.com. Thanks for listening. To check your visual field, click the link at the bottom of this article or visit visualfieldtest.com.