Construction Practice of Comprehensive Operation and Maintenance Platform for Futures Enterprises
Customer Profile
Customer Profile The case client is a futures enterprise in Hainan, with a registered capital of nearly 300 million RMB. Currently, it has 9 branch offices in Shanghai, Shenzhen, Dalian, Zhengzhou, Xi'an, Zhejiang, Shandong, Guangdong, etc
Pain point analysis
With the increasing volume of business and the rising cost of operation and maintenance services, the pressure on customers to ensure basic service operation and maintenance in the two major data centers in Shanghai and Haikou has also increased. The core business system, especially the server hard disk failure rate, has a high failure rate, but the faulty server cannot be detected in a timely manner, which poses certain risks to the business.
In this regard, the client hopes to introduce an efficient and stable operation and maintenance monitoring system to integrate the existing operation and maintenance system, in order to comprehensively and accurately grasp the company's business system status; Focus on monitoring critical business systems such as server hard drives; Provide performance monitoring for key business applications; Simultaneously ensuring the core business system security and asset information of the computer room through unified management; In order to achieve the following goals:
(1) Through the basic operation and maintenance platform, ensure the health of the system and achieve a stable and virtuous cycle of business systems;
(2) Unify the monitoring and access of equipment in the two major computer rooms, enabling the supervision of equipment from dispersed to centralized;
(3) Create a unified portal, centrally manage platform entrances, and reduce maintenance entrances for different scenarios;
scheme
Based on the customer's operational pain points and project construction goals, the LeWei solution team has sorted out the project and made specific plans for project construction: with the operation and maintenance portal, unified monitoring, and centralized alarm management as the core, supplemented by asset management, visualization, etc., a comprehensive operation and maintenance monitoring solution has been created.
1. Deployment architecture
The monitoring objects of the client this time include network equipment, servers, virtualization, etc. The total monitoring objects are within 300. Based on the number, type, and frequency of monitoring objects, the system architecture deployment for this time is as follows:

Architecture Description:
Architecture Description:1. Responsible for data collection in Shanghai and Haikou data centers (consider adding proxies in Haikou data centers in the future) 2. Prohibit access to the public network
Proxy server:
Responsible for receiving alarm information from monitoring servers and forwarding it to public enterprise WeChat servers and Tencent enterprise email servers
2. Platform technical requirements
The platform adopts a distributed architecture (server+proxy) mode, and the database supports primary and backup mechanisms. It dynamically monitors and visualizes scenarios, and can detect system failures in a timely manner and implement multi-channel hierarchical and permission based alarm functions.
3. Platform architecture requirements
The platform adopts a distributed architecture (server+proxy) mode, and the database supports primary and backup mechanisms. It dynamically monitors and visualizes scenarios, and can detect system failures in a timely manner and implement multi-channel hierarchical and permission based alarm functions.
1 | System architecture requirements | The deployment architecture of the monitoring system supports distributed deployment, achieving unified monitoring and management in different network areas |
2 | Data backup requirements | The monitoring system database supports primary and backup mechanisms, and can use distributed databases to ensure high data availability |
3 | Ready to use out of the box, supports customization | The monitoring system needs to have a rich set of monitoring templates, including best practices for monitoring items, monitoring thresholds, and alarm methods. Meanwhile, users can customize monitoring templates |
4 | Alarm convergence | The monitoring system needs to have alarm aggregation function, supporting alarm aggregation convergence function by device item, monitoring item, and business system level |
5 | Alarm upgrade | Support alarm upgrade management function. When the device has an alarm that has not been processed for a long time, the system will automatically send the alarm content to the backup personnel or department leaders. Support multiple upgrades |
6 | Custom grouping | The monitoring system has grouping function, supporting grouping and management according to the perspectives of devices and business systems. Devices are grouped and displayed and managed according to the perspectives of servers, networks, storage, security, etc. The business system supports user-defined grouping and management of devices |
7 | data analysis | 监控系统需具备按服务器、网络、存储等不同设备型视角下的监控指标项历史数据回溯分析及数据图表展 |
8 | Rights Management | The permission management function of the monitoring system supports users to perform monitoring authorization management according to their roles. The scope of permission management includes grouping device management, function menus, and other dimensions |
10 | Support millisecond level detection | The monitoring system has practical monitoring scenarios in the futures industry, including Webservice service monitoring, millisecond level Ping monitoring (monitoring of the network in high-frequency trading), real-time monitoring of trading indicator data through integration with the futures comprehensive trading platform (CTP), and more |
11 | Requirements for License Scale of Deployment Monitoring Platform Software | Provide 300 monitoring nodes |
12 | Monitoring visualization implementation service | Based on the unified monitoring system platform, deliver a dynamic monitoring visualization scene implementation service. Realize dynamic monitoring of IT infrastructure SLA, including visual display of the health level of infrastructure grouped devices (SLA), visual monitoring of key indicators such as interconnection status between core devices in multi network environments, device status, and core links |
4. Core functions
4.1. Operation and maintenance portal
The plan introduces an operation and maintenance portal, which centrally connects and maintains several systems such as CRM, Boyi, and Wenhua, eliminating the need to switch between multiple systems.
4.2. Central Monitoring
Based on the full stack monitoring capability of LeWei Monitoring, it can achieve monitoring of indicators such as availability and performance from IT infrastructure to business systems. After sorting, the LeWei intelligent monitoring platform has achieved centralized monitoring of customer software and hardware resources, as follows:
hardware
Host: x86 servers such as DELL, HP, ACE, etc
Network equipment: Huawei, Shanshi
software
Virtualization: Venter
Unified access platform for equipment monitoring in the two major customer data centers in Shanghai and Haikou, achieving full coverage monitoring of information infrastructure resources and ensuring automated management of business critical equipment monitoring. Relevant adjustments can be made through configuration settings, reducing labor costs。
4.3. Centralized display of monitoring objects
Automatic classification display, realizing statistics, health status, and alarm quantity of different objects. It can visually view all current IT resource objects as a whole, and also view the CPU top, memory usage top, server temperature top, etc. of the current object as a whole. In addition, it can visually see whether the current IT status is normal, as well as the daily number of alarms and alarm recovery status.
Provide more accurate and intuitive overall condition viewing for operation and maintenance personnel, without the need to log in to each system or device separately for tedious inspection work.

4.4. asset management
Due to the small size of the assets, the client hopes to be equipped with basic asset management capabilities to facilitate asset maintenance. Regarding this, the LeWei solution provides a simple but practical asset management module.
The asset management module includes functions such as resource list and directory view. It can be divided into different directories according to business, clearly displaying the corresponding servers, network equipment, etc. used by each business system, and supporting custom device fields to record information such as the equipment's computer room and purpose; The monitoring server also collects device SN information, allowing users to quickly locate the device during troubleshooting and notify the device manufacturer.
At the same time, asset related alarms facilitate timely perception of anomalies and quick response to faults.

4.5. Visual View
To meet the visualization needs of customers, the solution also provides a series of visualization function modules, including network topology that can be automatically discovered, business maps, projection views, graphical views, overview views, etc;
The network topology supports automatic discovery and generation, which can help operation and maintenance personnel quickly sort out resources and their relationships. Topology linkage fault alarms are easy for operation and maintenance personnel to diagnose, locate, and analyze the impact range of faults.
Business maps and overview views can provide a global display of business overview and monitoring resource overview; The projection view and graphic view can also be customized to display various statistical charts, providing support for operation and maintenance decisions.
4.6. Diversity report
Support custom, multi-dimensional, and multi indicator report statistics functions; Large screen display: Large screen centralized monitoring enables customized display pages. Warning reminder: Notify users through different alert methods such as Enterprise WeChat and Tencent Enterprise Email.
Customer revenue
3.1. By comprehensively sorting out IT assets, monitoring the entire stack, and providing real-time alerts, a comprehensive and flexible mature operation and maintenance system has been established, bidding farewell to traditional "firefighting" operations, effectively improving operation and maintenance efficiency, and reducing enterprise operation and maintenance costs;
3.2. The effective linkage between equipment monitoring and asset management can not only detect problems through monitoring, but also quickly locate equipment through asset management, which can effectively improve fault response speed and optimize maintenance processes.
3.3. Personalized platform access management. Breaking down the linkage barriers between platforms, streamlining and integrating platforms, minimizing the possibility of repetitive operations, and visualizing unified management to maximize the value between platforms.

- Practice of Building Intelligent Operation and Maintenance Platform for International Securities Enterprises
- Digital transformation and upgrading of information technology enterprises
- Construction Practice of Comprehensive Operation and Maintenance Platform for Futures Enterprises
- Case Interpretation | Construction Practice of Comprehensive Operation and Maintenance Monitoring Platform for a Large Household Enterprise-Lewei Software
- Example of Upgrading the Operation and Maintenance Monitoring System in a Third Class Hospital