Surgical versus Nonsurgical Therapy for Lumbar Spinal Stenosis

James N. Weinstein, D.O., M.S., Tor D. Tosteson, Sc.D., Jon D. Lurie, M.D., M.S., Anna N.A. Tosteson, Sc.D., Emily Blood, M.S., Brett Hanscom, M.S., Harry Herkowitz, M.D., Frank Cammisa, M.D., Todd Albert, M.D., Scott D. Boden, M.D., Alan Hilibrand, M.D., Harley Goldberg, D.O., Sigurd Berven, M.D., and Howard An, M.D. for the SPORT Investigators

N Engl J Med 2008; 358:794-810February 21, 2008DOI: 10.1056/NEJMoa0707136

Spinal stenosis is a narrowing of the spinal canal with encroachment on the neural structures by surrounding bone and soft tissue. Patients typically present with radicular leg pain or with neurogenic claudication (pain in the buttocks or legs on walking or standing that resolves with sitting down or lumbar flexion). Spinal stenosis is the most common reason for lumbar spine surgery in adults over the age of 65 years.1,2 Indications for surgery appear to vary widely, and rates of procedures vary by at least a factor of 5 across geographic areas.3,4 Radiographic evidence of stenosis is frequently asymptomatic; thus, careful clinical correlation between symptoms and imaging is critical.5,6

A 2005 Cochrane review found that the paucity and heterogeneity of evidence limited conclusions regarding surgical efficacy for spinal stenosis. The trials comparing surgical with nonsurgical treatment were generally small and involved patients both with and without degenerative spondylolisthesis.7-12 We know of no randomized trials of isolated spinal stenosis without degenerative spondylolisthesis.

In the Spine Patient Outcomes Research Trial (SPORT), we report on the 2-year outcomes of patients with spinal stenosis without degenerative spondylolisthesis to analyze the relative efficacy of surgical versus nonsurgical treatment.

METHODS

Study Design

SPORT was an investigator-initiated study conducted in 11 states at 13 U.S. medical centers with multidisciplinary spine practices. The study included both a randomized cohort and a concurrent observational cohort of patients who declined to undergo randomization.13-16 This design allowed for improved generalizability of the findings.17 The ethics committee at each participating institution approved a standardized protocol. An independent data and safety monitoring board evaluated interim safety and efficacy outcomes at 6-month intervals.13-16,18 Stopping rules were provided on the basis of the alpha spending function of DeMets and Lan.19

Patient Population

All patients had a history of neurogenic claudication or radicular leg symptoms for at least 12 weeks and confirmatory cross-sectional imaging showing lumbar spinal stenosis at one or more levels; all patients were judged to be surgical candidates. Patients with degenerative spondylolisthesis were studied separately.16 Patients with lumbar instability (which was defined as translation of more than 4 mm or 10 degrees of angular motion between flexion and extension on upright lateral radiographs) were excluded. The type of nonsurgical care before enrollment was not prespecified but included physical therapy (68% of patients), epidural injections (56%), chiropractic (28%), the use of antiinflammatory drugs (55%), and the use of opioid analgesics (27%).

Research nurses at each site verified eligibility. Patients were offered enrollment in either cohort. To aid in obtaining written informed consent, patients viewed evidence-based videotapes with standardized information regarding alternative treatments.20,21 Patients in the randomized cohort received treatment assignments with the use of randomly permuted blocks with variable block sizes stratified according to center. Patients in the observational cohort chose their treatment at enrollment with their physician. Enrollment began in March 2000 and ended in March 2005.

Study Interventions

The protocol surgery was standard posterior decompressive laminectomy.13 The nonsurgical protocol was “usual care,” which was recommended to include at least active physical therapy, education or counseling with home exercise instruction, and the administration of nonsteroidal antiinflammatory drugs, if tolerated.13,18

Study Measures

Primary outcomes were measures of bodily pain and physical function on the Medical Outcomes Study 36-item Short-Form General Health Survey (SF-36)22-25 and on the modified Oswestry Disability Index (American Academy of Orthopaedic Surgeons–MODEMS [Musculoskeletal Outcomes Data Evaluation and Management Systems] version),26 measured at 6 weeks, 3 months, 6 months, and 1 and 2 years. (SF-36 scores range from 0 to 100, with higher scores indicating less severe symptoms. The Oswestry Disability Index ranges from 0 to 100, with lower scores indicating less severe symptoms.)

If surgery was delayed beyond 6 weeks, additional follow-up data were obtained at 6 weeks and at 3 months after surgery. Secondary outcomes included patient-reported improvement, satisfaction with current symptoms and care,27 and the bothersomeness of both stenosis7,28 and low back pain.7 The effect of treatment was defined as the difference in the mean change from baseline between the surgical group and the nonsurgical group.

Statistical Analysis

For the randomized cohort, we determined that a sample size of 185 per group was needed to detect a 10-point difference in bodily pain and physical function on the SF-36 or a similar effect on the Oswestry Disability Index13 on the basis of a t-test, with a two-sided significance level of 0.05 and a power of 85%. Standard deviations for changes from baseline were derived from pilot data on repeated visits. The sample-size calculation allowed for 20% missing data but did not account for any specific levels of nonadherence.

Initial analyses compared the baseline characteristics of patients in the randomized cohort with those in the observational cohort and between study groups in the combined cohorts. The extent of missing data and the percentage of patients undergoing surgery were calculated according to study group for each scheduled follow-up. Baseline predictors of the time until surgical treatment (including treatment crossovers) in both cohorts were determined through a stepwise proportional-hazards regression model with an inclusion criterion of P<0.1 to enter and P>0.05 to exit. Predictors of missing follow-up visits at 1 year were determined through stepwise logistic regression.

Primary analyses compared surgical and nonsurgical treatments with the use of changes from baseline at each follow-up visit, with a mixed-effects model of longitudinal regression that included a random individual effect to account for correlation between repeated measurements. The randomized cohort was initially analyzed on an intention-to-treat basis. Because of crossover, subsequent analyses were based on treatments actually received. In the as-treated analyses, the treatment indicator was a time-varying covariate, allowing for variable times of surgery. For the intention-to-treat analyses, all times are from enrollment. For the as-treated analysis, the times are from the beginning of treatment (i.e., the time of surgery for the surgical group and the time of enrollment for the nonsurgical group). Therefore, all changes from baseline before surgery were included in the estimates of the nonsurgical treatment effect. After surgery, changes were assigned to the surgical group, with follow-up measured from the date of surgery. Repeated measures of outcomes were used as the dependent variables, and treatment received was included as a time-varying covariate. Adjustments were made for the time of surgery with respect to the original enrollment date so as to approximate the designated follow-up times.

The randomized and observational cohorts were each analyzed to produce separate as-treated estimates of treatment effect. These results were compared with the use of a Wald test to simultaneously test all follow-up visit times for differences in estimated treatment effects between the two cohorts.29 Subsequent analyses combined the two cohorts.

To adjust for potential confounding, baseline variables that were associated with missing data or treatment received were included as adjusting covariates in longitudinal regression models.29Computations were performed with the use of the PROC MIXED procedure for continuous data and the PROC GENMOD procedure for binary and non-normal secondary outcomes in SAS software, version 9.1 (SAS Institute). Statistical significance was defined as P<0.05 on the basis of a two-sided hypothesis test with no adjustments made for multiple comparisons. Data for these analyses were collected through March 2, 2007.

RESULTS

Patients

A total of 654 patients were enrolled out of 1091 who were eligible for enrollment: 289 in the randomized cohort and 365 in the observational cohort (Figure 1FIGURE 1

Enrollment, Randomization, and Follow-up.). In the randomized cohort, 138 patients were assigned to the surgical group, and 151 were assigned to the nonsurgical group. In the surgery group, 63% had undergone surgery at 1 year and 67% at 2 years. In the nonsurgical group, 42% had undergone surgery at 1 year and 43% at 2 years. In the observational cohort, 219 patients initially chose surgery and 146 patients initially chose nonsurgical care. Of those who initially chose surgery, 95% had undergone surgery at 1 year and 96% at 2 years. Of those who initially chose nonsurgical treatment, 17% had undergone surgery at 1 year and 22% at 2 years. In the two cohorts combined, 400 patients received surgery at some point during the first 2 years, and 254 received nonsurgical treatment.

The proportion of enrollees who supplied data at each follow-up interval ranged from 83 to 89%, with losses due to dropouts, missed visits, or deaths. A total of 634 patients, each with at least one follow-up through 2 years, were included in the analysis, including 278 patients (96%) in the randomized cohort and 356 patients (98%) in the observational cohort.

Characteristics of the Patients

Characteristics of the patients at baseline in the two cohorts are compared in Table 1TABLE 1

Demographic Characteristics, Coexisting Illnesses, and Measures of Health Status of the Patients.. Overall, the cohorts were similar. However, patients in the observational cohort had more signs of nerve-root tension and less lateral recess stenosis and expressed stronger treatment preferences than did patients in the randomized cohort.

Summary statistics for the combined cohorts are also shown in Table 1, according to treatment received. The study population had a mean age of 65 years; a majority were white men who had attended college. Of these patients, 80% had classic neurogenic claudication, and 79% had associated dermatomal pain radiation; 91% had stenosis at L4 or L5, and 61% had more than one level of stenosis. For most patients, the overall stenosis was graded as severe.

At baseline, the group undergoing surgery was younger and more likely to be working than was the group that did not undergo surgery. Patients in the surgical group had more pain, a lower level of function, more psychological distress, and more self-reported disability than did patients in the nonsurgical group. In addition, patients in the surgical group had symptoms that were more bothersome and radiographic evidence of more severe stenosis. The surgical group was more often dissatisfied with their symptoms and more often rated the symptoms as worsening than did patients in the nonsurgical group.

The final models, combining both cohorts, were adjusted for age, sex, coexisting disorders of the stomach or joints, the presence or absence of pain on straight-leg raising or femoral-nerve tension signs, smoking status, patient-assessed health trend, income, other compensation, body-mass index, baseline score for the outcome variable, and center.

Nonsurgical Treatments

At 2 years, nonsurgical treatments were similar in the two cohorts. However, more patients in the randomized group than in the observational group reported visits to a surgeon (45% vs. 32%, P=0.02) and receiving injections (52% vs. 39%, P=0.02), whereas more patients in the observational group reported the use of “other” medications, such as gabapentin (60% vs. 73%, P=0.01).

Surgical Treatments and Complications

Overall, surgical treatments and complications were similar in the two cohorts (Table 2TABLE 2

Surgical Treatments, Complications, and Events.). Among patients in the surgical group, 89% underwent decompression only. Instrumented fusion was performed in only 6% of patients. The median surgical time was 120 minutes, with a mean blood loss of 314 ml; 10% of patients required transfusions intraoperatively and 5% postoperatively. The most common surgical complication was dural tear, in 9% of patients. At 2 years, reoperation had occurred in 8% of patients; fewer than half of these operations were for recurrent stenosis.

At 2 years, there were seven deaths in the nonsurgical group and six in the surgical group, one of which occurred within 3 months after surgery. The deaths were reviewed and 12 were judged not to be treatment-related. The one death of unknown cause occurred 501 days after surgery.

Crossover

Nonadherence to treatment assignment affected both study cohorts: some patients in the surgical group chose to delay or decline surgery, and some in the nonsurgical group crossed over to undergo surgery (Figure 1). The characteristics of crossover patients that differed significantly from patients who did not cross over are shown in Table 3TABLE 3

Significant Predictors of Treatment Received within 2 Years among Patients in the Randomized Cohort.. Patients in the nonsurgical group who crossed over to undergo surgery had more self-rated disability, more psychological distress, worse symptoms, and a stronger treatment preference for surgery at baseline than did patients who did not opt for surgery. Patients in the surgical group who crossed over to receive nonsurgical care were more often not white, had less bothersome symptoms, less often rated their symptoms as worsening at enrollment, and had a stronger treatment preference for nonsurgical care at baseline.

Main Treatment Effects

In the intention-to-treat analysis, a significant treatment effect favoring surgery was seen at 2 years, with a mean difference in change from baseline of 7.8 (95% confidence interval [CI], 1.5 to 14.1) on the SF-36 scale for bodily pain; at earlier times, there was a smaller nonsignificant effect in favor of surgery. However, at 2 years, there were no significant differences between the surgical group and the nonsurgical group on the SF-36 scale for physical function (0.1; 95% CI, −6.4 to 6.5) or on the Oswestry Disability Index (−3.5; 95% CI, −8.7 to 1.7) (Table 4TABLE 4

Intention-to-Treat Analysis for the Randomized Cohort and Adjusted Analyses, According to Treatment Received, for the Randomized and Observational Cohorts Combined.).

In the as-treated analysis, the mean differences in change from baseline in the randomized and observational cohorts were similar at 2 years: bodily pain, 11.7 (95% CI, 6.2 to 17.2) in the randomized group versus 15.3 (95% CI, 10.4 to 20.2) in the observational group; physical function, 8.1 (95% CI, 2.8 to 13.5) in the randomized group versus 13.6 (95% CI, 8.7 to 18.4) in the observational group; and Oswestry Disability Index, −8.7 (95% CI, −13.3 to −4.0) in the randomized group versus −13.1 (95% CI, −16.9 to −9.2) in the observational group (Figure 2FIGURE 2

Primary Outcomes in the Randomized and Observational Cohorts during 2 Years of Follow-up.).

The global hypothesis test comparing the as-treated effects in the randomized group and the observational group over all time periods showed no difference between the two cohorts (P=0.93 for bodily pain, P=0.67 for physical function, and P=0.60 for the Oswestry Disability Index).

Results from the intention-to-treat analysis and the as-treated analysis of the two cohorts are compared in Figure 2. The effects shown in the as-treated analysis significantly favored surgery in both cohorts. In the combined analysis, treatment effects were significant in favor of surgery for all primary and secondary outcome measures at each time point during the 2 years (Table 4).

DISCUSSION

In patients with imaging-confirmed spinal stenosis without spondylolisthesis and leg symptoms persisting for at least 12 weeks, surgery was superior to nonsurgical treatment in relieving symptoms and improving function. In the as-treated analysis, the treatment effect for surgery was seen as early as 6 weeks, appeared to reach a maximum at 6 months, and persisted for 2 years; it is notable that the condition of patients in the nonsurgical group improved only moderately during the 2-year period. The intention-to-treat results must be viewed in the context of the substantial rates of nonadherence to assigned treatment. The pattern of nonadherence was striking because both the surgical and the nonsurgical groups were affected, unlike the results of many studies involving surgical procedures.30 The mixing of treatments owing to crossover can be expected to create a bias toward the null.31 The large effects seen in the as-treated analysis and the characteristics of the crossover patients suggest that the intention-to-treat analysis underestimated the true effect of surgery.

This study provides an opportunity to compare results involving patients who were willing to participate in a randomized study (randomized cohort) and those who were unwilling to participate in such a study (observational cohort).13-16 These two cohorts were remarkably similar at baseline. Other than treatment preference, the only significant differences were small ones in signs of nerve-root tension and the location of stenosis. The two cohorts also had similar outcomes, without significant differences in the as-treated analyses. Given these similarities, the combined analyses are well justified. Although these analyses are not based on randomized treatment assignments, the results are strengthened by the use of specific inclusion and exclusion criteria, the sample size, and adjustment for potentially confounding baseline differences.32

The characteristics of the patients were similar to those in previous studies, even though the latter involved mixed-cohort patients (i.e., those with or without spondylolisthesis). In our study, the functional status of the patients at baseline was similar to that of patients in the Maine Lumbar Spine Study7,8 (SF-36 score, 34.8 and 35.0, respectively) but worse than that in the study by Malmivaara et al.10,11 (Oswestry Disability Index, 42.4 and 35.0, respectively).

In the as-treated analysis, the functional improvement in the surgical group at 1 year was very similar to that in the Maine Lumbar Spine Study (26.5 and 27.0, respectively) but greater than in the study by Malmivaara et al. (Oswestry Disability Index, −21.4 and −11.3, respectively). Functional improvement in the nonsurgical group was greater in our study than in the previous studies, with a change of 10.5 in the SF-36 physical function score at 1 year, as compared with 1.0 in the Maine Lumbar Spine Study, and a change of 9.3 in the Oswestry Disability Index at 2 years, as compared with 4.5 in the study by Malmivaara et al. The greater improvements in our study, compared with those in the study by Malmivaara et al., may be related to differences in the selection of patients. In the study by Malmivaara et al., patients with moderate spinal stenosis were specifically selected, whereas in our study, we attempted to enroll patients with spinal stenosis who were surgical candidates.

In the as-treated analysis, we can directly compare the estimates of treatment effect with those of the previous studies. The estimated 1-year treatment effects for surgery were smaller in our study than in the Maine Lumbar Spine Study (changes in bodily pain of 14.6 and 30.4, respectively, and in physical function of 15.9 and 25.5, respectively). However, in the Maine Lumbar Spine Study, treatment effects for baseline differences between the study groups were not adjusted, which probably explains these discrepancies. At 1 year, the estimated treatment effects were similar in our study and the study by Malmivaara et al.: Oswestry Disability Index, −12.5 and −11.3, respectively; leg pain, 17% (on a 7-point scale) and 15% (on an 11-point scale); and back pain, 14% (on a 7-point scale) and 21% (on an 11-point scale).

It is interesting that among patients who underwent surgery, the magnitude of the mean changes in patients with spinal stenosis was nearly identical to that in the patients with degenerative spondylolisthesis at 2 years: bodily pain, 26.9 and 29.9, respectively; physical function, 23.0 and 26.6; Oswestry Disability Index, −20.5 and −24.2; and bothersomeness of symptoms, −7.8 and −8.9.16 The treatment effects in these studies of spinal stenosis were larger than those in the observational study of patients with intervertebral disk herniation because of strong improvements in the nonsurgical group of patients with intervertebral disk herniation that were not seen in either stenosis group.14-16

There was little evidence of harm from either treatment. Often patients fear they will get worse without surgery, but this was not the case for the majority of patients in the nonsurgical group, who, on average, showed small improvements in all outcomes. The 1-year rate of reoperation for recurrent stenosis was 1.3%, a rate similar to those reported by Malmivaara et al. (2%) and by the Maine Lumbar Spine Study (1.2%). At 2 years, mortality was nearly the same in the two study groups and was lower than actuarial projections. The postoperative death rate of 0.3% and the overall postoperative complication rate of 12% were slightly better than the reported Medicare rates in patients with spinal stenosis who did not undergo spinal fusion (death rate, 0.8%; rate of complications, 14%).1 However, higher rates of complications have been reported with increasing age and coexisting medical conditions.33

The primary limitation of our study was the marked degree of nonadherence to randomized treatment. This factor reduced the power of the intention-to-treat analysis to show treatment effects, though there was still a significant treatment effect for the measure of bodily pain at 2 years. The as-treated analyses do not share the strong protection from confounding that exists for the intention-to-treat analyses. However, these analyses were carefully adjusted for important baseline covariates and yielded results similar to those of previous studies. The characteristics of the crossover patients were as one might expect: those with severe symptoms and a preference for surgery crossed over into the surgical group, and vice versa.

Another limitation was the heterogeneity of the nonsurgical treatments. Given the limited evidence regarding efficacy of most nonsurgical treatments for spinal stenosis and individual variability in response, the creation of a limited, fixed protocol for nonsurgical treatment was neither clinically feasible nor generalizable. The flexible treatment protocols allowed for individualization of nonsurgical treatment plans, reflect current practice among multidisciplinary spine practices, and were consistent with published guidelines.34,35 However, we did not assess the effect of surgery versus any specific nonsurgical treatment.

In conclusion, in the as-treated analysis, if we combine the randomized and observational cohorts, carefully adjusting for potentially confounding baseline factors, patients with spinal stenosis without degenerative spondylolisthesis who underwent surgery showed significantly greater improvement in pain, function, satisfaction, and self-rated progress than did patients who were treated nonsurgically.

Supported by a grant (U01-AR45444-01A1) from the National Institute of Arthritis and Musculoskeletal and Skin Diseases (NIAMS), the National Institutes of Health Office of Research on Women's Health, the National Institute of Occupational Safety and Health of the Centers for Disease Control and Prevention, a grant (P60-AR048094-01A1) to the Multidisciplinary Clinical Research Center in Musculoskeletal Diseases from NIAMS, and a Research Career Award (1-K23-AR-048138-01, to Dr. Lurie) from NIAMS.

Dr. Lurie reports receiving grant support from St. Francis Medical Technologies and the American Board of Orthopaedic Surgery and consulting fees from Merck, Ortho-McNeil, Pfizer, Centocor, Myexpertdoctor.com, Pacific Business Group on Health, and the Foundation for Informed Medical Decision Making; Dr. A.N.A. Tosteson, receiving grant support from St. Francis Medical Technologies and Zimmer; Dr. Cammisa, having an equity interest in K2M, Spinal Kinetics, and HealthPoint Capital Partners; Dr. Albert, receiving consulting fees and royalties from DePuy Spine and having an equity interest in K2M; Dr. Boden, receiving consulting fees from Medtronic and lecture fees from Osteotech; and Dr. Berven, receiving grant support from Medtronic. No other potential conflict of interest relevant to this article was reported.

We thank Tamara S. Morgan, Department of Orthopaedic Surgery, Dartmouth Medical School, for graphic design and assistance with the manuscript and the following members of the data and safety monitoring board: Ron Thisted, Ph.D. (chair), University of Chicago, Chicago; Tim Carey, M.D., M.P.H., University of North Carolina at Chapel Hill, Chapel Hill; Peter C. Gerszten, M.D., Presbyterian University Hospital, Pittsburgh; Ed Hanley, M.D., Carolina Health Care, Charlotte, NC; and Bjorn Ryedvik, M.D., Ph.D., Sahlgrenska University Hospital, Gothenburg, Sweden. This study is dedicated to the memory of Brieanna Weinstein.

PHYSIOTHERAPY AND COMPLEMENTARY THERAPIES/ FISIOTERAPIA

segunda-feira, 27 de abril de 2015