Apache iotdb series tutorial-7: time series data file format tsfile

Iron head Joe's blog 2021-09-15 10:16:09

Apache IoTDB Series of tutorials -7: Time series data file format TsFile_python

There are many file formats in the big data ecosystem , image Parquet,ORC,Avro wait , Are file formats designed for nested data . These file formats generally have predefined properties schema, Data is written in line , Organize by attribute , The column type storage . However, these file formats generally can not meet the management needs of time series data . such as , In some scenarios of time series data , Generally, each sequence is written independently , Timestamps are not aligned ; Query results also need to be sorted by timestamp .TsFile(Time series File) This is the file format we designed for the temporal data scenario . Today mainly introduces the usage , Mainly aimed at 0.10 edition .

Use scenarios

The file format is lightweight , It is suitable to be used as a data compression packet at the edge , This edge can be inside the device , It can also be an industrial computer 、 Plant level . The data generated on the device can be persisted to a file for storage at any time . The equipment mentioned here may be a fan , There will be multiple measuring points above , Such as wind speed sensor 、 Temperature sensor, etc . The data collected by each sensor is a time series . Associative IoT Platform from 2017 It's been in use since TsFile Storing timing data .

therefore ,TsFile The target scenario is to manage the timing data of one or more devices .

equipment - Measuring point model

equipment (DeviceId): Similar to the concept of table .

Measuring point (MeasurementId): A device can have multiple measuring points , Similar to the concept of columns in a table .

Time series path (Path): It can be defined by equipment and measuring points Path( equipment Id, Measuring point Id).

Measuring point description information (MeasurementSchema): Each time series corresponds to a description , Including data types 、 Encoding mode 、 Compression way .

Each time series has two columns : Time column 、 Value column .

I like drawing recently , Let's draw a picture , It's basically like this , Different equipment can have different measuring points .

Apache IoTDB Series of tutorials -7: Time series data file format TsFile_java_02

Register metadata

Use TsFile, The first step is to register metadata .

Register time series :Path+MeasurementSchema

You can register each time series in this way .

To register a time series, you need to provide a Path And a MeasurementSchema 

String path = "test.tsfile";
File f = FSFactoryProducer.getFSFactory().getFile(path);
TsFileWriter tsFileWriter = new TsFileWriter(f);
// add measurements into file schema
tsFileWriter.registerTimeseries(new Path("device_1", "measurement_1"),new MeasurementSchema("measurement_1", TSDataType.INT64, TSEncoding.RLE));

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.

stay 0.10 before , All devices share a point table , The same name Measurement Of schema You need the same ( This is it. IoTDB The same restricted source is required for the type of measuring point with the same name in a storage group ). stay 0.10 in the future , Each time series is truly independent , Mutual interference .

Register the device according to the template : Device templates + equipment

The above registration is more troublesome , Therefore, it provides the function of a device template . Each template defines a set of MeasurementSchema, Such as the 10 A measuring point , When a device is associated with this template , Automatically registered 10 A sequence of .

First, generate the equipment template , Then register the template .

Map<String, MeasurementSchema> template = new HashMap<>();
template.put("measurement_1", new MeasurementSchema("measurement_1", TSDataType.INT64, TSEncoding.RLE));
template.put("measurement_2", new MeasurementSchema("measurement_2", TSDataType.DOUBLE, TSEncoding.GORILLA));
tsFileWriter.registerTemplate("template_1", template);

  • 1.
  • 2.
  • 3.
  • 4.

Next, register the device , Associate to template by template name :

tsFileWriter.registerDevice("device_1", "template_1");
tsFileWriter.registerDevice("device_2", "template_1");

  • 1.
  • 2.

such , I registered 2 Devices , Every device has 2 A measuring point .

Register a template , Write data in real time

This is an advanced simplified version . When we only register one device template , You can not register the device , Write data directly . In the writing process, if it is found that the data written by this device is not registered , Will go directly to the template to find the one with the same name MeasurementSchema To register . This is also inherited 0.9 The fine tradition of previous versions (0.9 Previous versions ,TsFile Only one template can be registered , Then you can write data ).

Writing data

TsFile There is a limitation on data writing , Each column needs to be written in time increments , Otherwise, the correctness is not guaranteed .

Write a row of data by device :TSRecord

One TSRecord It's a device , A timestamp , Values of multiple measuring points . A row of data similar to a table .

Write a batch of data by device :Tablet

ha-ha , See again Tablet, Yes , This structure runs through TsFile and IoTDB Session A structure of . Represents a device , Values of multiple measuring points with multiple timestamps , Similar to a sub table . This sub table cannot have null values .

Again , This write interface is fast , It can achieve a write speed of tens of millions of points per second .

Reading data

The query interface receives a batch of paths , An expression ( You can filter time and value ), In fact, it corresponds select   and where Two clauses .

When to inquire ,TsFile The default table structure is wide table ,time, d1.m1, d1.m2, d2.m1, d2.m2. This structure defaults to a given query Path Press Time Do alignment , And conditional filtering .


Today I introduced the time series file format TsFile Data model of , Metadata registration , Write and read processes . That's it .

Apache IoTDB Series of tutorials -7: Time series data file format TsFile_java_03

Please bring the original link to reprint ,thank
Similar articles