Hear from CIOs, CTOs, and other senior executives and leaders on data and AI strategies at the Future of Work Summit on January 12, 2022. Learn more
When Socrates is said to have declared that “life without exams is not worth living”, the Greek philosopher did not imagine the modern Internet with its seemingly limitless capacity to absorb data. Every mouse click, page view, and event seems destined to end up somewhere in a log file. The sheer volume makes juggling all this information a challenge, this is where a log management database really shines.
Gathering information is one thing; analyzing it is much more difficult. But many business models depend on researching patterns and understanding the click flow to gain an edge and justify their margins. The log database should collect data and calculate important statistics. Modern systems are usually closely associated with presentation software that distills the data into a visual infographic.
What is a log management database?
Log management databases are special cases of time series databases. Information arrives in a constant stream of ordered events, and the log files record it. While many web applications typically focus on web events, like pageviews or mouse clicks, there is no reason that databases should be limited to this area. Any sequence of events can be analyzed, such as events in assembly lines, industrial plants, and manufacturing.
For example, a set of log files might follow an assembly line, tracking an article as it reaches different stages of the pipeline. The result can be as simple as noting the end of a step, or it can include additional data about the personalization that occurred at that step, such as paint color or size. If the line works well, many events will be routine and forgettable. But if something is wrong, the logs can help diagnose which step failed. If products need to be discarded or examined for defects, newspapers can reduce this work.
Specialized journal processing tools began to appear decades ago, and many have focused on simply creating reports that aggregate data to provide statistical insight. They counted the events by day, week or month, then generated statistics on averages, maxima and minima. Newer tools provide the ability to quickly search and generate reports on individual fields, such as IP address or account name. They can locate particular words or phrases in fields and search for numeric values.
What are the challenges of creating a log database?
Log data is often referred to as “high cardinality,” which means that the fields can contain many different values. This is because the value of any timestamp is constantly changing. Log databases use algorithms to create clues to locate particular stocks and optimize those indices for a wide variety of stocks.
Good log databases can manage archives to keep some data while discarding other data. They can also enforce a retention policy designed by compliance offices to answer any legal questions, then destroy the data to save money when it is no longer needed. Some log analysis systems may keep statistical summaries or aggregated metrics for older data.
How do existing databases approach the market?
Traditional database companies have generally not focused on providing a tool for storing logs because traditional relational databases were not suited to the type of high cardinality data that is written much more often. that they are not wanted. The cost of building the index that is the core offering of a relational database is often not worth it for large newspaper collections, as there simply won’t be enough JOINs out there. ‘to come up. Time series and log databases tend to avoid using regular relational databases to store raw information, but they can store some of the statistical summaries generated along the way.
IBM’s QRadar, for example, is a product designed to help identify suspicious behavior in log files. The database inside focuses on finding statistical anomalies. User Behavior Analysis (UBA) creates behavior patterns and monitors departures.
Oracle offers a service called Oracle Cloud Infrastructure Logging Analytics that can absorb log files from multiple cloud sources, index them, and apply certain machine learning algorithms. It will find issues ranging from poor performance to security holes. When the log files are analyzed, the data can also be classified according to compliance rules and stored for the future if necessary.
Microsoft’s monitor will also collect log files and telemetry from across the Azure cloud, and the company offers a wide range of analyzes. An SQL API is an example of a service suitable for the needs of database administrators who monitor Microsoft’s SQL Server log files.
Who are the emerging companies?
Several newspaper databases are based on Lucene, a popular open source project for building full-text search engines. While it was originally designed to search for particular words or phrases in large blocks of text, it can also split values into different fields, allowing it to function much like a database.
Elastic is a company that offers a tool that starts multiple versions of Lucene on different engines so that it automatically adapts as the load increases. The company is bundling it with two other open source projects, LogStash and Kibana, to create what it calls the “ELK stack”. LogStash ingests the data from the raw log files into the Elastic database, while Kibana analyzes the results.
Amazon’s log analytics feature also relies on the open source tools Elasticsearch, Kibana, and LogStash and specializes in deploying and supporting the tools on AWS cloud machines. AWS and Elastic recently separated, so differences may appear in future releases.
Loggly and LogDNA are two other tools built on Lucene. They integrate with most log file formats and track usage over time to identify performance issues and potential security vulnerabilities.
Not all businesses rely on Lucene, in part because the tool includes a lot of features for full-text search, which isn’t as important for newspaper processing, and those features add overhead. Sumo Logic, another performance monitoring company, ingests logs with its own version of SQL to query the database.
Splunk has built its own database to store log information. Customers who work directly with applications designed to automate monitoring tasks, such as looking for overloaded servers or unusual access patterns that may indicate a violation, typically do not use the database. Splunk’s database is designed to organize indexes and archive them slowly over time.
EraDB offers another database with a different kernel but the same API as Elastic. It promises faster ingestion and analysis because its engine has been specially designed for high cardinality log files without any overhead that could be useful for text search.
Is there something a log database cannot do?
Log databases are ideal for endless streams of events filled with different values. But not all data sources are populated with high cardinality fields. Those with frequently repeated values may find some reduction in storage due to a more traditional tabular structure which can save space.
Text search engine based journal systems such as Lucene may also provide additional functionality that is not required for many applications. In a hypothetical assembly line, for example, there is no need to search for arbitrary strings or words. Support for the ability to search for arbitrary text requires more sophisticated indexes that take time to compute and disk space to store.
This article is one of a series on Technology Trends for Enterprise Databases.
VentureBeat’s mission is to be a digital public place for technical decision-makers to learn about transformative technology and conduct transactions. Our site provides essential information on data technologies and strategies to guide you in managing your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the topics that interest you
- our newsletters
- Closed thought leader content and discounted access to our popular events, such as Transform 2021: Learn more
- networking features, and more
Become a member