ABSTRACT
CONCLUSION
Deep learning reconstruction for MRI has the potential to significantly improve DWI quality at higher b values. It has some effect on D* and f values in the IVIM index evaluation, but ADC and D values are less affected by DLR.
RESULTS
The in vitro study identified no significant differences between the ADC values for DWI with and without DLR (P > 0.05), and the CV% was significantly different for DWI with and without DLR (P < 0.05) when b values ≥250 s/mm2 were used. The in vivo study revealed that D* and f with and without DLR were significantly different (P < 0.001). The limits of agreement of the ADC, D, and D* values for DWI with and without DLR were determined as 0.00 ± 0.51 × 10-3, 0.00 ± 0.06 × 10-3, and 1.13 ± 4.04 × 10-3 mm2/s, respectively. The limits of agreement of the f values for DWI with and without DLR were determined as −0.01 ± 0.07.
METHODS
For the in vitro study, a phantom recommended by the Quantitative Imaging Biomarkers Alliance was scanned and reconstructed with and without DLR, and 15 patients with brain tumors with normal-appearing gray and white matter examined using IVIM and reconstructed with and without DLR were included in the in vivo study. The ADCs of all phantoms for DWI with and without DLR, as well as the coefficient of variation percentage (CV%), and ADCs and IVIM indexes for each participant, were evaluated based on DWI with and without DLR by means of region-of-interest measurements. For the in vitro study, using the mean ADCs for all phantoms, a t-test was adopted to compare DWI with and without DLR. For the in vivo study, a Wilcoxon signed-rank test was used to compare the CV% between the two types of DWI. In addition, the Wilcoxon signed-rank test was used to compare the ADC, true diffusion coefficient (D), pseudodiffusion coefficient (D*), and percentage of water molecules in micro perfusion within 1 voxel (f) with and without DLR; the limits of agreement of each parameter were determined through a Bland–Altman analysis.
PURPOSE
Deep learning reconstruction (DLR) to improve imaging quality has already been introduced, but no studies have evaluated the effect of DLR on diffusion-weighted imaging (DWI) or intravoxel incoherent motion (IVIM) in in vitro or in vivo studies. The purpose of this study was to determine the effect of DLR for magnetic resonance imaging (MRI) in terms of image quality improvement, apparent diffusion coefficient (ADC) assessment, and IVIM index evaluation on DWI through in vitro and in vivo studies.
Main points
• The in vitro study identified no significant differences in apparent diffusion coefficient values for diffusion-weighted imaging (DWI) with and without deep learning reconstruction (DLR) (P > 0.05).
• There were significant differences in coefficient of variation percentages between the DWI with and without DLR (P < 0.05) when b values of 250, 500, 750, 1000, and 1500 s/mm2 were used.
• There were significant differences in the pseudodiffusion coefficient and percentage of water molecules in micro perfusion within 1 voxel values between the DWI with and without DLR (P < 0.001).
The clinical application of artificial intelligence is expanding with a variety of targets not only for detection or diagnostics but also for image noise reduction for computed tomography (CT) and magnetic resonance imaging (MRI).1,2,3,4,5,6,7,8,9,10 In recent years, some MRI suppliers have introduced deep learning reconstruction (DLR) for denoising and improving imaging quality, which has been tested for MRI of the central nervous system as well as of the body.1,2,3,4,5,6,7,8,9,10 Moreover, in the late 1980s, Le Bihan et al.11 developed intravoxel incoherent motion (IVIM), a non-invasive approach that measures perfusion-related parameters using diffusion-weighted imaging (DWI) for MR examinations.12,13,14,15,16,17,18,19,20,21,22 This method exploits the fact that the signal acquired using a DWI sequence is affected by the incoherent motion of water resulting not only from thermal energy but also blood circulation in the microvasculature.11,12,13,14,15,16,17,18,19,20,21,22 To date, however, no studies have evaluated the efficacy of DLR for apparent diffusion coefficient (ADC) evaluation or IVIM index assessments in in vivo or in vitro studies.
We hypothesized that DLR may affect IVIM index measurements, possibly without influencing ADC measurements, by denoising and improving DWI quality. The purpose of this study was therefore to determine the efficacy of DLR for MRI on image quality improvement, ADC assessment, and IVIM index evaluation in DWI through in vitro and in vivo studies using a 3 Tesla (T) MR system.
Methods
Research ethics standards compliance
The in vivo study was a retrospective study and was approved by the Institutional Review Board (IRB) of Fujita Health University, Japan (research registration: HM22-328; IRB-approval number: CI22-647); it is compliant with the Health Insurance Portability and Accountability Act of Japan. Written informed consent was waived for each participant enrolled in this study. This study was also technically and financially supported by the Canon Medical Systems Corporation. Two of the authors are employees of the Canon Medical Systems Corporation (KY and MY) but did not have control over any of the data used in this study.
Quantitative diffusion phantom for in vitro study
The in vitro study quantitatively assessed an ADC evaluation of DWI obtained with and without using the DLR method. For this study, the quantitative diffusion phantom (High Precision Devices, Boulder, CO, USA), which was developed by the National Institute of Standards and Technology/Quantitative Imaging Biomarker Alliance (QIBA) of the Radiological Society of North America and is commercially available, consisting of 13 vials filled with varying concentrations of polyvinylpyrrolidone (PVP) in an aqueous solution, was used to evaluate ADC measurement accuracy.23,24 The phantom was specifically designed to quantitatively map the isotropic Gaussian diffusion of water molecules and generate physiologically relevant ADC values.25 The distribution of PVP concentrations in the phantom was as follows: 0% (vials 1–3), 10% (vials 4 and 5), 20% (vials 6 and 7), 30% (vials 8 and 9), 40% (vials 10 and 11), and 50% (vials: 12 and 13) in an aqueous solution.24 The vials were stored in an ice-water bath at 0°C to eliminate thermal variability across scanner locations and timepoints for the ADC measurements.23
Participants in the in vivo study
The in vivo study involved 314 consecutive patients (146 men, 168 women; mean age: 59.8 years; age range: 18–91 years) who had been diagnosed with a suspected brain tumor at nearby hospitals. The participants visited the outpatient clinic in our department of neurosurgery between March and August 2019, where they were examined using brain MRI with IVIM. The exclusion criteria were 1) mass effect of a brain tumor or peritumoral edema on a slice at the basal-ganglia level, 2) contraindications for MR examination, 3) contraindication for gadolinium (Gd) contrast media because of asthma, 4) renal dysfunction, and 5) severe motion artifact. Of the 314 patients originally included in this study, 299 were excluded because of the mass effect of brain tumor or peritumoral edema on a slice at basal-ganglia level (n = 287), contraindications for MR examination because of claustrophobia (n = 2) and having a cardiac pacemaker device (n = 1), contraindication for Gd contrast media because of asthma (n = 2) and renal disfunction (n = 4), and severe motion artifact (n = 3). The remaining 15 patients with brain tumors with normal-appearing gray and white matter (8 men, 7 women; mean age: 49.6 years; age range: 31–82 years) were included in this study. The patient selection chart is presented in Figure 1, and details of patient characteristics are summarized in Table 1.
Magnetic resonance examinations
All MR examinations for the in vitro and in vivo studies were performed using a 3T clinical MR scanner (Vantage Galan 3T/ZGO, Canon Medical Systems Corporation, Otawara, Tochigi, Japan) with a 32-channel phased-array surface coil (32 ch Head SPEEDER, Canon Medical Systems). The maximal gradient specifications were 100 mT/m for amplitude and 200 mT/m/msec for slew rate.
In vitro study
For the in vitro study, DWI was acquired in the axial planes using a two-dimensional spin-echo (SE)-type echo-planar imaging (EPI) sequence with a parallel imaging technique (SPEEDER, Canon Medical Systems) and the following parameters: repetition time (TR)/echo time (TE), 4500/66 ms; field of view (FOV), 220 × 220 mm; acquisition matrix, 144 × 144; slice thickness, 4 mm; reduction factor (SPEEDER factor), 3; number of acquisition (NAQ), 1; b values, 0, 10, 25, 50, 75, 100, 250, 500, 750, 1000, 1500, 2000, and 3000 s/mm2. All MRI was then reconstructed with and without the DLR method (Advanced Intelligent Clear-IQ Engine, Canon Medical Systems), which is consistent with other studies,3,5,10 and operated using an MRI system (version 6 SP0003, Canon Medical Systems).
In vivo study
For the in vivo study, DWI was acquired in the axial planes using the SE-EPI sequence with SPEEDER and the following parameters: TR/TE, 4500/72 ms; echo train spacing, 0.9 ms; number of slices, 30; slice thickness; 5 mm; FOV, 220 × 220 mm; acquisition matrix, 160 × 160; NAQ, 1; flip angle, 90/180; SPEEDER factor, 3; b values, 0, 5, 10, 20, 30, 50, 75, 100, 250, 500, 750, 1000, and 1500 s/mm2. All MRI data were then reconstructed with and without DLR, as in the in vitro study.3,5,10 The examination time including reconstruction time with and without DLR was recorded for each participant.
Deep learning reconstruction method for brain diffusion-weighted imaging
The DLR method used in this study is based on a convolutional neural network (CNN), and the details have been published in the literature.3 Figure 2 provides a diagram of the DLR method. A study has proposed CNN denoising using soft shrinkage, which adapts to the amount of noise by introducing a variable threshold of an inactive section, as an activation function;26 noise-adaptive soft shrinkage is also applied to the neural network in the DLR method.1 The present study used the same trained DLR network described in the literature.3 The network was trained and validated using conventional contrast images (T2 weighted, T1 weighted, etc.) of the brain and knees of several human volunteers.3 The quality of the different contrast images reconstructed using the DLR method was clinically evaluated in several body regions, such as the brain,3 pelvis,5 and abdominal arteries.27 The DLR details, including information on training and validation data sets, are provided in the literature.3,5,10
Image analysis
In vitro study
First, an ADC map was generated from the DWI for all b values. Signal intensity data obtained from each voxel on the DWI for all b values were fitted to a mono-exponential model to calculate the ADC using a built-in Tensor application (System software version 6.0, Canon Medical Systems). The ADC for each phantom was then measured by a neuroradiologist (SH) with 3-years’ experience using ImageJ version 1.52p (https://imagej.nih.gov/ij/). Five circular regions of interest (ROIs) with a diameter of 10 mm were placed on the center slice and two additional slices, one obtained 1 cm before and the other 1 cm after the center slice, as well as on each phantom, after which the mean ADC value within each phantom was calculated.
In vivo study
An ADC map was generated for each patient from the DWI reconstructed with and without DLR for b values of 0 s/mm2 and others (i.e., b = 5, 10, 20, 30, 50, 75, 100, 250, 500, 750, 1000, and 1500 s/mm2) by means of commercially available software (IVIM) using a mono-exponential model on a Vitrea workstation (version 7.4, Canon Medical Systems). In addition, IVIM parameters were determined using commercially available software (IVIM) on the same workstation and based on the theory described in other studies.11,12,13,14,15 Based on a bi-exponential model derived from DWI with different b values, the true diffusion coefficient (D), pseudodiffusion coefficient (D*), and percentage of water molecules in micro perfusion within 1 voxel (f) were determined using the following previously published formula:11,12,13,14,15
S(b)/S0 = fivim exp [−b (D* + Dblood)] + (1 − fivim) exp (−bDtissue) [1]
where S(b) is the signal intensity for each b value and S0 is the signal intensity at a b value of zero.
To quantitatively evaluate the influence of DLR on the DWI obtained at each b value for all patients, ROIs were measured using the Vitrea workstation. A center line was first placed manually on each slice. Subsequently, ROIs with a diameter of 10 mm were automatically placed on the normal cortex and white matter of a slice at basal-ganglia level obtained from each brain hemisphere (total of 10 ROIs = 5 ROIs × right and left hemisphere) to determine the mean signal intensity and standard deviation for ROIs on each slice. An example of ROI placements is provided in Supplementary Figure 1. For a quantitative image quality comparison of DWI obtained for each b value and reconstructed with and without DLR, the coefficient of variation percentages (CVs%) of the DWI for each b value with and without DLR were calculated by means of the following previously published formula:10,28,29,30
CV% = standard deviation within ROI/mean signal intensity within ROI × 100% [2]
To determine the influence of DLR on all DWI parameter evaluations, ADC, D, D*, and f values from the automatically copied ROIs were measured on the same slices on ADC, D, D*, and f maps for each patient.
Statistical analysis
In vitro study
To determine the influence of DLR on ADC measurements, mean ADCs measured within each phantom on DWIs with and without DLR were correlated with standard references and with each other using Spearman’s correlation coefficient. To determine the effect of DLR on ADC evaluation, mean ADCs for each phantom were then compared in DWI with and without DLR by means of a t-test. A Bland–Altman analysis was then performed to determine the limits of agreement between the DWI with and without DLR.31,32
In vivo study
To compare the IVIM examination time, including reconstruction with and without DLR, the mean IVIM examination time with DLR and that without DLR was compared using Wilcoxon’s signed-rank test.
To determine the utility of DLR for image quality improvement on the DWI at each b value, the Wilcoxon signed-rank test was used to compare CV% in the DWI with and without DLR. To determine the influence of DLR on ADC and IVIM index evaluations, ADC, D*, D, and f values were compared in the DWI with and without DLR by means of the Wilcoxon signed-rank test. Finally, the Bland–Altman analysis was used to evaluate the limits of agreement of the ADC and each IVIM index for DWI with and without DLR.31,32
Results
In vitro study
The correlations of ADC values for DWI with and without DLR with nominal ADC values as standard references are presented in Figure 3. The ADC values for DWI with and without DLR significantly and strongly correlated with those for standard references (with DLR: r = 0.99, P < 0.0001; without DLR: r = 0.99, P < 0.0001) and between DWI with and without DLR (r = 0.99, P < 0.0001).
Table 2 provides a comparison of ADC values for DWI with and without DLR for the in vitro study. The ADC values for DWI with and without DLR in the in vitro study were not significantly different (P > 0.05).
The results of the Bland–Altman analysis are presented in Figure 4. The limit of agreement between the ADC values for DWI with and without DLR and standard reference values was determined as −0.03 ± 0.04 × 10-3 mm2/s. In addition, the limit of agreement of ADC values for DWI with and without DLR was determined as −0.00 ± 0.01 × 10-3 mm2/s.
In vivo study
An example case is presented in Figure 5.
The mean examination time, including reconstruction time, of DLR (256 ± 4 s, range: 247–261 s) was significantly different from that without DLR (208 ± 4 s, range: 199–213 s, P < 0.001).
The results of the comparison of CV% of each phantom for DWI with and without DLR are summarized in Table 3. The CV% was significantly different for DWI with and without DLR (P < 0.05) when b values equal to or higher than 250 s/mm2 were used.
The results of the comparisons of ADC, D, D*, and f for DWI with and without DLR are presented in Table 4, and the results for the limits of agreements for ADC, D, D*, and f for DWI with and without DLR are illustrated in Figure 6; D* and f were significantly different for DWI with and without DLR (P < 0.001). The limit of agreement of ADC values for DWI with and without DLR was determined as 0.00 ± 0.51 × 10-3 mm2/s, the limit of agreement of D values for DWI with and without DLR was determined as 0.00 ± 0.06 × 10-3 mm2/s, the limit of agreement of D* values for DWI with and without DLR was determined as 1.13 ± 4.04 × 10-3 mm2/s, and the limit of agreement of f values for DWI with and without DLR was determined as −0.01 ± 0.07.
Discussion
Our study used in vitro and in vivo studies to determine the effect of DLR on ADC or IVIM parameter evaluations. This study was also the first to demonstrate that DLR had no effect on ADC evaluations in a QIBA-recommended diffusion phantom. In addition, the in vivo study was the first to determine that DLR had little effect on ADC and D evaluations using brain DWI or on brain IVIM examinations when b values were set at equal to or less than 1500 s/mm2. However, D* and f values were significantly different for IVIM with and without DLR when IVIM examinations applied the same b values. To the best of our knowledge, no other study has assessed the influence of DLR on ADC and IVIM parameter evaluations in in vitro or in vivo studies.
When the examination time, including reconstruction time, for IVIM with and without DLR was compared, IVIM with DLR exhibited a significantly longer mean examination time than that for IVIM without DLR; however, the acquisition time for IVIM was the same. Therefore, the prolongation of the mean examination time in IVIM with DLR was considered to mainly result from the significantly longer reconstruction time when compared with that of IVIM without DLR.
Our in vitro study demonstrated that correlations between ADC values assessed through DWI with and without DLR were significant and strong, whereas the differences between them were non-significant. Moreover, the limit of agreement for DWI with or without DLR compared with that for standard reference values can be considered negligible and small enough for clinical purposes. Therefore, DLR was determined to have little or no influence on ADC evaluations in this setting.
As for the in vivo study, we determined that DLR could significantly improve the CV% of DWI for b values set at equal to or more than 250, 500, 750, 1000, and 1500 s/mm2. These results suggest that DWI at b values equal to or more than 250 s/mm2 may feature an increase in image noise level and be decreased by it. When considering the equation for CV%, the standard deviation within ROIs may have predominantly Gaussian noise but additionally include spatial variation resulting from the anatomy. Moreover, DWI with higher b values tends to provide lower signal intensity in brain tissues than that of Gaussian noise; therefore, DLR can improve CV% more effectively when b values are higher. Thus, DLR is a viable choice for improving DWI with higher b values, such as ≥250 s/mm2, which is often used in routine clinical practice. For lower b values, such as <250 s/mm2, it is widely known that the signal of fluid components, such as cerebrospinal fluid and blood, remains. These components could provide additional spatial variation within ROIs, where reconstruction parameters with and without DLR affect the spatial variation of DWI differently. This situation may impact the statistical significance of the CV% with and without DLR. Our results therefore indicate that DLR should be used for obtaining DWI at b values equal to or more than 250 s/mm2 to improve image quality in routine clinical practice. Moreover, DLR for denoising MRI has been tested not only for image quality improvement but also for a reduction in acquisition time using compressed sensing or other k-space data acquisition methods for various clinical aims on DWI as well as other MR sequences since 2021.5,8,10,33,34,35,36,37,38 Therefore, it would be better for us to clinically apply DLR not only for denoising but also to reduce the examination time when using other techniques, although this study did not apply any techniques to reduce acquisition time.
A comparison of ADC and IVIM indexes for brain DWI with and without DLR in the in vivo study revealed that the ADC of DWI for b values equal to or less than 1500 s/mm2 were not significantly different. This result is compatible with that for our in vitro study. Moreover, we identified no significant difference in D for DWI with and without DLR when routine b values of less than 1500 s/mm2 were used. However, D* and f significantly influenced DLR when subjected to the same IVIM examination. The reasons for these results can be easily surmised when the similarity of the mechanisms underlying the models for those previously described are considered.11,12,13,14,15,16,17,18,19,20,21,22 Our results for the in vitro and in vivo studies demonstrate that ADC and D measurements for DWI with b values equal to or less than 1500 s/mm2 can be assumed to have no effect on DLR in this setting. However, b values of more than 1500 s/mm2 should be used carefully because these values might have some effect on the DLR results when considering the results of DWI for detecting prostate cancer, for which b values equal to or more than 3000 s/mm2 and DLR are used.10
In contrast to the ADC or D measurements obtained from DWI and IVIM examinations with and without DLR, we determined that D* and f were significantly affected by DLR, even though there were no significant differences in the CV% of DWI with and without DLR at b values lower than 250 s/mm2. In this study, DWI with and without DLR was generated from the same DWI data obtained from the same sequence and reconstructed with and without DLR. Moreover, all IVIM indexes were measured by means of a commercially available IVIM model. These facts and findings lead us to consider that the differences in D* and f in DWI with and without DLR might be the result of some interaction in the in vivo study between signal intensity and image noise within each voxel. Therefore, DLR may be useful for improving the quality of DWI as well as each DWI in the IVIM examinations and have limited influence on quantitative ADC and IVIM parameter evaluations in routine clinical practice.
This study has several limitations. First, we did not have an IVIM phantom or perform animal studies for IVIM in the in vitro study. Moreover, the scan parameters for the in vitro and in vivo studies were not fully matched because of different quantitative DWI index evaluations, and this study is a retrospective study, with no healthy volunteers included. Second, we applied commercially available software, using a mono-exponential model, for IVIM index calculations and assessed the influence of the reconstruction method on these calculations; other models, such as stretched exponential or tri-exponential models, were not tested in this study. Moreover, no comparison between DLR and non-DLR methods for denoising DWI or IVIM images was used, and no standard reference was determined, with only the differences in the ADC or each IVIM index between DWI and IVIM with and without DLR evaluated. To the best of our knowledge, no commercially available MRI phantom exists locally that contains multiple diffusion and circulation compartments and is suitable for IVIM quantification. This type of standardized phantom could be useful for clinicians to validate new advanced techniques in acquisition, reconstruction, and post-processing; therefore, we are now planning to study and develop this type of phantom in the near future. Further investigations are also warranted to determine the effect of DLR on IVIM evaluations using different models for in vitro and in vivo studies. Third, the study population was too small to allow for evaluations of patients with a variety of brain diseases; the tumor types of the 15 patients involved in our study were highly heterogeneous, which may affect the study results. Further investigations are therefore warranted to determine the influence of DLR on IVIM parameters as well as on clinical outcomes. Fourth, the b values used in this study were equal to or less than 1500 s/mm2 even though b values of more than 1500 s/mm2 are currently and frequently used for brain DWI examinations for various purposes.10,39,40,41 Further investigations are therefore warranted that use DLR for brain DWI examinations with b values higher than 1500 s/mm2. Fifth, no comparisons were made in this study of the D* and f values of the DWI with and without DLR and perfusion parameters from other perfusion MR and CT techniques and nuclear medicine studies. Sixth, this study used DLR provided by a single supplier for IVIM calculations in the in vivo study; however, the clinical relevance of IVIM examinations is currently evaluated primarily for academic purposes rather than clinical aims. Finally, the IVIM sequence and software used here have not yet been standardized. Multi-center studies using DLR and IVIM software provided by different suppliers are thus warranted for standardization and the determination of clinical relevance for the brain and other organs. Large prospective cohort studies using a variety of MR scanners, DLR algorithms, and IVIM software provided by different suppliers are also warranted, and we will plan these studies to address these issues in the near future.
In conclusion, DLR for MRI has the potential to significantly improve the quality of DWI with higher b values. It also has some effect on D* and f values for IVIM examination, whereas ADC and D values are less affected by DLR.