Case Interpretation | Construction Practice of Comprehensive Operation and Maintenance Monitoring Platform for a Large Household Enterprise-Lewei Software
project background
Customer ProfileThe case client is a large home furnishing enterprise listed on the A-share market, focusing on research, development, production, and sales of customized home furnishing products for dining rooms, bedrooms, and whole houses. It has multiple series of its own brands and has strategic partnerships with American and Italian home furnishing brands. Its business covers more than 120 countries and regions worldwide, and operates over 6000 brand specialty stores.
Pain point analysis:
The current IT resource scale of the client enterprise is close to 1300. In addition to traditional operating systems, network equipment, servers, databases, storage and other resources, cloud platforms, containers, virtualization platforms, links, etc. have also been introduced. The original operation and maintenance system is gradually unable to support the information system and cannot meet the maintenance requirements of the existing information system. The main manifestations are:
- The resource scale is large and scattered, the old and new systems are mixed, and the IT environment is heterogeneous, making it difficult to comprehensively perceive the health status of the system and maintain it;
- The system is decentralized and unable to establish a centralized and unified alarm response center, which makes it difficult to quickly respond to system anomalies and achieve rapid troubleshooting and positioning of faults;
- The correlation between business and operation and maintenance is poor, and most business system anomalies rely on feedback from front-end business personnel. It is difficult for the operation and maintenance system and personnel themselves to perceive the business situation;
- Outdated operation and maintenance management methods, unclear rights and responsibilities, and incidents of mutual buck passing occur from time to time.
scheme:
Lerwee has tailored a comprehensive operation and maintenance monitoring solution to meet the pain points and specific needs of customer enterprises, creating an intelligent monitoring platform. The platform system integrates functional modules such as unified monitoring, centralized alarm, report management, permission management, business service management, and operation and maintenance cockpit, providing a new one-stop operation and maintenance monitoring experience.
Unified monitoring:
Unified monitoring is the core of the entire solution. The plan is to integrate and reconstruct the customer's existing monitoring system, merging the previously scattered systems into a unified monitoring platform.
Based on the internal network environment of the client, the solution adopts a distributed implementation, which provides one-stop monitoring of the client's IT resources without affecting the normal operation of the business system. The indicators of each IT infrastructure are analyzed and managed one by one to ensure the efficient and stable operation of the business.

LeWei Monitoring supports dozens of protocols and has monitoring capabilities that cover the IT resources of the vast majority of manufacturers and brands on the market. With the help of automatic discovery and management capabilities, it quickly managed nearly 1300 monitoring objects, including operating systems, network devices, servers, databases, etc web、 Middleware, storage, virtualization platforms, links, cloud platforms, containers, etc.

At the same time, LeWei Monitoring also provides a global perspective operation and maintenance cockpit. As a part of the LeWei monitoring visualization system, the operation and maintenance cockpit can centrally display indicators such as the type, quantity, alarm overview, and various TOPN data of monitored resources, which is particularly suitable for operation and maintenance management personnel to control the overall operation status of enterprise information systems.

centralized alarm:
Before introducing LeWei Monitoring, the customer had already built two main alarm management systems, namely the alarm system provided by the resource vendor and the alarm platform based on Zabbix, as well as some scattered alarm information.
After the introduction of LeWei Monitoring, the LeWei Monitoring Alarm Center module integrates the alarm information of the original Zabbix with the alarm information system in the customer system, and directly manages other scattered alarm information, thus achieving one platform to manage three systems, display them uniformly, and improve efficiency.

Report Management:
Before introducing LeWei Monitoring, the client company had accumulated a large amount of operation and maintenance data. However, due to the dispersion of the operation and maintenance system and the existence of data silos, there was a lack of corresponding data analysis tools, and the value of these operation and maintenance data was not effectively explored and utilized.
For this purpose, Lewei Monitoring is equipped with a report management module, providing tools such as real-time reports, TOPN reports, traffic reports, daily and weekly reports, custom reports, and inspection reports to track and distinguish the real-time overview and trend of monitored resources, and provide support for operation and maintenance decisions.
As shown in the figure below, for the export internet traffic situation that customers are particularly concerned about, operation and maintenance personnel can view the business resources currently consuming export internet traffic through real-time reports. Furthermore, they can also view information such as port input/output bandwidth utilization rate and port sending rate to quickly determine the business situation at a certain time through these three indicators.

Rights Management:
Due to the previous lack of a unified monitoring and management system, the client enterprise was unable to configure resource permission management uniformly, and the responsibilities of operation and maintenance personnel were unclear. This not only led to confusion in resource management, but also greatly affected the improvement of fault response speed and maintenance efficiency, thereby affecting the normal operation of the business system.
Based on unified monitoring, Lewei Monitoring has created a unified permission management mechanism, with unified allocation and centralized distribution, supporting the allocation of management permissions by role and user, clear rights and responsibilities, and no conflicts between them.
LeWei Monitoring has divided the permissions of the 700+hosts managed by the customer's environmental business system. Each operation and maintenance personnel can only see the system, alarms, alarm notifications, and corresponding functions they are responsible for, achieving unified control of data permissions and functional permissions.

Business Services:
There are many and diverse business systems, and the disconnection between operation and maintenance systems is a common pain point and difficulty in the operation and maintenance of large enterprises. The main manifestation reflected in operation and maintenance practice is that operation and maintenance personnel can only see isolated node failures, and cannot have a more intuitive perception of the cause/impact of the failure, which can easily lead to "treating the head and foot", sometimes failing to grasp the essence of the problem, resulting in repeated operation and maintenance, and affecting efficiency improvement.
In response to the situation where there are many business systems in the customer environment, LeWei Monitoring provides various business service management capabilities from a business perspective, including business tree, business topology, business large screen, etc.
For large enterprise groups with complex organizational structures, the business tree can identify and distinguish the business resources managed by different levels of organizations. For operation and maintenance management personnel, the business tree can be used to judge the efficiency of operation and maintenance at each level.
Intelligent business topology automatically discovers business resources and generates business topology by scanning IP addresses, allowing for intuitive viewing of business system types, including device information, and more. Operations personnel can distinguish and focus on important business resource nodes based on the topology diagram, and determine the scope of impact of faulty nodes on the business system.

The business screen is also a part of the LeWei monitoring visualization system, used to display an overview of all business systems. Through color differentiation, the health status of the business system is clear at a glance.
Customer revenue:
After one year of construction, the comprehensive operation and maintenance monitoring platform will complete the first phase of construction and pass the acceptance in mid-2023. With the help of this platform, the response speed and overall operation and maintenance support capabilities of customer enterprises have been greatly improved, enhancing the overall quality of information services, and greatly improving the overall stability and timeliness of information technology response.
The value brought by the monitoring system to the operation and maintenance of customers is reflected in:
1. Real time monitoring and timely alerts. Timely alerting of regular resource usage, data center environment, equipment components, etc. has improved the response speed of operations and maintenance;
2. Decision support and pre operation and maintenance work. Using a reporting system to predict resource and performance consumption, layout in advance, and avoid possible anomalies. If the system inspection report is used to perceive that the system capacity is about to be exhausted, expansion can be carried out in advance;
From system operation and maintenance to business operation. By leveraging intelligent business topology, business tree, and other capabilities, we can more intuitively perceive the structure and health profile of business systems, providing more systematic and comprehensive support for business systems;
4. Overall improvement of operation and maintenance management capabilities. Resolve previous issues of unclear rights and responsibilities and chaotic management through unified permission management; Provide support for operation and maintenance decisions through the operation and maintenance cockpit, reporting system, etc.

- Case Interpretation | Construction Practice of Comprehensive Operation and Maintenance Monitoring Platform for a Large Household Enterprise-Lewei Software
- Practice of Building Intelligent Operation and Maintenance Platform for International Securities Enterprises
- Example of Upgrading the Operation and Maintenance Monitoring System in a Third Class Hospital
- Construction Practice of Comprehensive Operation and Maintenance Platform for Futures Enterprises
- Digital transformation and upgrading of information technology enterprises