Healthcare-国产分布式-软件定义存储-海量对象医疗数据存储技术-霄云科技

医疗PACS影像存储解决方案

Demand Analysis

PACS (Picture Archiving and Communication System) has become fundamental infrastructure in modern radiology, playing a critical role in clinical diagnosis and medical research.

Hospitals now deploy numerous imaging devices (X-ray, CT, MRI, ultrasound), with physicians increasingly relying on medical imaging for diagnostics. As imaging technology advances, hospital PACS data volumes grow at 15% annually—doubling every five years—with accelerated growth trends.

Class A tertiary hospitals (general or specialty hospitals like pulmonary/orthopedic centers) generate 50TB–60TB of new imaging data per year
National electronic medical record regulations mandate ≥15-year retention periods

Both operational needs and compliance requirements drive demand for advanced PACS storage solutions.

Most hospitals currently use traditional FC SAN/NAS storage with three-tier architecture:

Online storage (high-performance)
Nearline storage (mid-tier)
Offline storage (archive)

Current Challenges:

■ Limited Scalability

PACS files are predominantly small:

MR average size: 60KB
CT average size: 300KB

Small-file I/O causes performance degradation at scale
Current systems deliver only ~80 images/sec retrieval speeds
Typical MR exam (3,000–5,000 images) requires >30-second load times
Concurrent access by hundreds of physicians during peak hours causes severe latency

■ Architectural Complexity & Data Fragmentation

Data silos across three storage systems create:

Delays retrieving historical exams (e.g., prior-year studies in nearline storage require manual migration)
Barriers to AI diagnostics, research analytics, and big data initiatives
Complex data migration and management overhead

■ High TCO

Expensive proprietary hardware procurement
Unpredictable expansion costs
Labor-intensive data migration
Multi-system maintenance burdens

Solution: BOSS Distributed Storage

Bihai Distributed Storage System (BOSS) overcomes these limitations through:

Linear scalability
10-billion-file management capability
High-throughput architecture
Simplified operations
Optimized TCO

Ideal for medical consortia and large hospitals requiring:

Massive-scale PACS storage
High-concurrency access
Agile infrastructure

Key Advantages:

■ Performance Enhancement

Proprietary optimizations for PACS workflows:

Small-file consolidation technology
High-concurrency FTP gateways

Sustains rapid retrieval speeds (>300 images/sec) even during peak loads (10,000+ concurrent FTP connections)

■ Architecture Simplification

NoSQL-based distributed metadata management
Consolidates three storage tiers into one unified platform
Immediate access to all historical studies (0-15 years)
Eliminates data silos for AI/analytics readiness

■ Operational Efficiency

Online scaling: Add nodes without downtime
Automated data rebalancing during hardware refresh
Zero manual migration

■ Cost Optimization

40% lower TCO vs. traditional arrays
Predictable scaling costs amid 15% annual data growth
Future-proof infrastructure for long-term retention

医疗病理场景解决方案

Demand Analysis As a cornerstone of precision diagnostics, digital pathology enables: Diagnostic specimen analysis Pathology data management Remote/electronic slide review This technology enhances diagnostic accuracy and accelerates clinical workflows, empowering pathologists to identify disease root causes at cellular level for targeted treatments. However, Class A tertiary hospitals face critical challenges during digital transformation: Performance bottlenecks: Single slide files (1-3GB) cause lag during retrieval, disrupting diagnostic workflows Exponential data growth: Daily production ranges from hundreds to tens of thousands of slides Massive storage demands: Data volumes 10× larger than PACS Top-tier hospitals generate 1-2PB annually Regulatory mandates require 15-30 year retention

Key Challenges

■ Performance Limitations Legacy systems struggle with GB-scale slide files, causing: Loading delays during review Diagnostic workflow interruptions

Compromised clinical efficiency

■ Scalability Constraints Traditional storage cannot accommodate: Unpredictable annual growth (1-2PB+/year) Performance-neutral capacity expansion

Decade-spanning archival requirements

■ Data Accessibility Issues Large file sizes create: Network transmission bottlenecks AI/model training limitations Cross-platform sharing barriers Solution: BOSS Digital Pathology Storage Xiaoyun's BOSS-FutureStor powered solution delivers: ☑️ High-density storage on x86/XinChuang servers ☑️ Unified file/object architecture ☑️ Enterprise-grade reliability Architecture Diagram

Competitive Advantages

■ Breakthrough Performance Intelligent file segmentation → parallel I/O across nodes 25G/100G network optimization → zero latency review Eliminates: Loading stutter Image fragmentation

AI workflow bottlenecks

■ Elastic Scalability Online expansion to 4,096 nodes Capacity/performance scaling without downtime

Future-proof infrastructure for 30-year compliance

■ Hardware Heterogeneity Proprietary architecture (non-Ceph based) Mixed hardware generations within clusters: Cross-vendor server compatibility Dynamic component integration 30% lower refresh costs vs. proprietary systems

基因测序场景解决方案

需求分析

基因检测是指通过特定设备对被检测者细胞中的DNA分子信息作检测，分析其所含有的基因类型和基因缺陷及其表达功能是否正常的一种方法，从而做出对疾病筛查、诊断、复发监测、靶向用药指导、疗效及预后等的技术。

以二代测序和三代测序为主的高通量测序在过去20年中飞速发展，与之相关的基础应用、科研探究以及临床应用随之大幅增加。同时随着”精准医疗”的快速发展，临床应用上对高通量测序的需求越来越大，病原学诊断、检测与遗传病、肿瘤等疾病的精准诊断等应用领域对高通量测序技术的要求也越越高。

基因测序产生的数据量都是TB级别，例如一台华大智造MGI的DNBSEQ-T7测序仪生产量：4.5Tb/24h，6Tb/30h。满负荷下，一年能产生1.7PB左右的数据量，加之生物信息分析过程一般会产生原始数据量5倍左右的中间文件及结果，因此要支撑一台DNBSEQ-T7一年的数据产出存储及分析，大约需要8.5PB有效存储空间。此外，医疗数据需要15-30年长期保存的行业规范，因此，对于基因序列业务的存储系统的大容量、高性能、扩展性、可靠性等均面临严峻挑战。

■ 存储扩展问题
测序仪的通量越来越高，高通量基因测序仪每日数据量在TB级别。以一台华大智造DNBSEQ-17为例，四载片连载日产数据量高达6TB，一天24小时可完成60例个人全基因组测序，单日可产生6TB数据，一年能产生2PB左右数据量，且生信分析过程中，一般会产生数倍于原始数据量的中间文件和结果，存储系统需实现低成本的海量基因数据长时间存储及数据在线分析、归档等生命周期管理需求。此外，从测序仪下机的单个原始数据通常为几GB、数十GB大小的文件，用户需将原始数据快速导入到存储系统中，而后开始对原始数据进行分析和解读。存储系统需提供超大容量的存储空间并支持大容量单文件存储，因此对存储系统后续的弹性扩展能力要求极高。
■ 存储性能问题
基因数据分析过程根据不同的应用需求、专业软件，要求计算和存储资源可支撑混合负载需求。此外，在基因测序的业务流程中，基因序列比对、结果检测分析等环节极为耗时，涉及大量的生信领域专业软件，计算资源的算力性能、存储资源的IO性能及方案优化对提升生信研发效率起着至关重要的作用。故要求底层存储系统可支持复杂的高并发读写，满足复杂业务分析计算的要求。
■ 存储可靠问题
完整的基因测序数据分析过程中，环节复杂，产生的数据量非常巨大，且中间结果特别多，参考数据知识库繁杂，同时业务系统需支持多用户同时进行在线作业分析。故运行数据分析Pipeline流程对实时性、稳定性要求非常高，一旦存储或计算系统出现故障，测序数据分析就会中断，甚至整个分析的Pipeline要重新进行。因此，基因测序业务要求存储系统满足7*24小时连续高压作业的要求，保证长时间的高稳定运行，才能保障整个业务的连续性。

解决方案

面对上述挑战，霄云发布了专门针对基因测序的碧海分布式存储解决方案。该方案基于霄云自主研发的BOSS-FutureStor分布式存储软件，结合通用的X86服务器或者主流信创服务器构建大容量、高性能、高可靠性以及易扩展的碧海分布式文件对象存储。该存储支持EB级单一命名空间，按需线性扩展存储容量和性能，具有较高的可靠性、可用性以及高并发性能，可帮助用户构建统一的基因数据共享资源池，为上层基因测序业务应用平台提供一体化的存储底座，确保基因测序业务7*24小时不间断且稳定可靠的运行。

方案优势

■ 高性能
基因测序业务处理需要较高的存储性能支持，包括数据的读取、写入和传输等方面。碧海分布式存储系统的多线程并发读写、多客户端性能均衡分配以及海量文件下性能不衰减的技术优势，完美匹配基因检测各流程中海量数据分析对计算资源的高性能需求，有力支撑基因业务增长带来的大容量和高性能需求，‌大大提高海量数据快速分发和基因计算分析效率。
■ 按需扩展
基因测序数据量庞大且增长速度快，对存储系统的可扩展性要求极高。碧海分布式存储系统的易扩展特性，‌避免了一次性成本投入或冗长的采购周期，未来可按需线性扩展容量和性能，‌使得存储的成本每年可以量化又经济。碧海存储集群支持扩展到4096个存储节点，新的存储节点可随时在线加入现有存储池，扩展存储容量和计算能力，以满足基因测序数据存储的需求。
■ 支持硬件异构
基因测序数据量不仅庞大，且保存时间长，存储节点的硬件更新换代较快，这就需要分布式存储系统具有硬件异构特性。碧海分布式存储系统软件完全自主研发，不基于Ceph开源存储架构，可支持存储池或存储集群级别的硬件异构，也就是可以由不同服务器和配件的品牌、型号、规格构建存储池或存储集群，不仅存储性能一致，而且存储系统也稳定可靠。碧海分布式存储系统对硬件异构的支持，给予基因测序的存储未来扩展带来极大的兼容与便利，使存储的扩容成本更低也更可控。
■ 智能数据管理
碧海分布式存储支持智能数据流转和冷热分层，‌优化数据存储成本。‌通过智能数据管理，‌可以实现数据的智能流转至低成本的大容量存储池，‌同时满足基因快速高效计算和海量基因数据成本优化的需求。‌这种管理方式有助于更好地控制存储成本，‌提高数据的使用效率。

综上所述，碧海分布式存储解决方案通过提供高性能、高可靠性、可扩展性的基因测序数据存储方案，‌有效地解决了基因测序领域在数据存储和分析方面面临的挑战，‌为科研和临床应用提供了强大的支持。

SOLUTION