Key Concepts

Authentication

Parseable API calls require Basic Auth. You can find the username and password for your Parseable server in the environment variables P_USERNAME and P_PASSWORD. If not set, the default username and password is admin and admin. HTTP clients generate basic auth headers from the username and password.

In case you want to manually add the basic auth header, use the following command.

echo -n '<username>:<password>' | base64

Then add the following HTTP header to the API call.

Authorization: Basic <output-from-previous-command>

Ingestion

Log data is ingested to Parseable via HTTP POST requests with data as JSON payload. You can use HTTP output plugins of logging agents like FluentBit, Vector, syslog-ng, LogStash among others to send log events to Parseable. You can also directly integrate Parseable with your application via REST API calls.

Batching and Compression

Max payload size in Parseable is 10 MiB (10485760 Bytes). The payload can contain single log event as a JSON object or multiple log events in a JSON array. There is no limit to number of batched events in a single call. Wherever applicable, we recommend enabling log agent's compression and batching features to reduce the network traffic and improve ingestion performance.

Log Streams

Log data is ingested to a Parseable log stream. Log streams are the primary unit of organization in Parseable. Log streams are identified by a unique name. In future releases, access control and retention policies will be applied at the log stream level. Currently, alerts and notifications are supported at the log stream level.

Log streams are created automatically when the first event is sent to a Parseable with a http header x-p-stream:<stream-name>. If the stream already exists, the event is appended to the existing stream. If the stream doesn't exist, a new stream is created with the given name and the event is appended to the new stream.

Schema

Log streams have dynamic schema. This means as a user you don't have to define or modify a schema for a log stream. Parseable server detects the schema from first event and then subsequent events (with new schema) and updates internal schema accordingly. You can fetch the current schema of a log stream using the Get Schema API.

Log data formats evolve over time and users prefer a dynamic schema approach, where they don't have to worry about schema changes, and they are still able to ingest events to a given stream.

Flattening

Nested JSON objects are automatically flattened. For example, the following JSON object

{
  "foo": {
    "bar": "baz"
  }
}

will be flattened to

{
  "foo.bar": "baz"
}

before it gets stored. While querying, this field should be referred as foo.bar. For example, select foo.bar from <stream-name>. The flattened field will be available in the schema as well.

Timestamp

Correct time is critical in understanding the proper sequence of events. Timestamps are important for debugging, analytics, and deriving transactions. We recommend that you include a timestamp in your log events formatted in RFC3339 format.

Parseable uses the event received timestamp and adds it to the log event in the field p_timestamp. This ensures there is a time reference in the log event, even if the original event doesn't have a timestamp.

Case Sensitivity

Log stream fields are case sensitive. For example, if you send a log event like

{
  "foo": "bar",
  "Foo": "bar"
}

Parseable will create two columns foo and Foo in the schema. So, while querying please refer to the fields as foo and Foo respectively. Refer to the query section below for more details.

Storage

Once the JSON payload data reaches server, it is validated and parsed to a columnar Apache Arrow format in memory. Subsequent events are appended to the Arrow record batch in memory and a copy is kept on disk (to prevent data loss). Finally, after a configurable duration, the Arrow record batch is converted to Parquet and then pushed to S3 (or compatible) bucket.

Parquet on object storage is organized into prefixes based on stream name, date and time. This ensures the server fetches specific dataset(s) based on the query time range. We're working on a compaction approach that will further compress and optimize the storage while ensuring queryable data at all times.

Modes

Parseable can use a drive (mount points) or S3 (and compatible) bucket as the backend storage. The storage mode can be configured while starting the Parseable server with the sub commands local-store or s3-store, for drive or object storage respectively.

Based on the storage mode, server requires certain environment variables to be set. Refer the environment variables section for more details.

We recommend the local-store mode only development and testing purposes. Note that once the server is started, the storage mode can't be changed.

Stats

To fetch the ingested data and actual compressed data size for a stream, use the Get Stats API. Sample response:

{
    "ingestion": {
        "format": "json",
        "size": "12800 Bytes"
    },
    "storage": {
        "format": "parquet",
        "size": "15517 Bytes"
    },
    "stream": "reactapplogs",
    "time": "2022-11-17T07:03:13.134992Z"
}

Query

Parseable query API works with PostgreSQL compatible SQL. In addition to the SQL query, users need to specify the time range for which the query should be executed. The time range is specified using startTime and endTime parameters. The response is inclusive of both the timestamps.

Check out the Query API in Postman.

Parseable uses Apache Arrow native query engine called DataFusion in conjunction with efficient Parquet reader to execute the queries.

Querying case sensitive fields

While querying, unquoted identifiers are converted to lowercase. To query column names with uppercase letters, they must be passed in double quotes. For example when sending query via the REST API, the following JSON payload will apply WHERE condition to the column Foo:

{
    "query":"select * from stream where \"Foo\"=bar",
    "startTime":"2023-02-14T00:00:00+00:00",
    "endTime":"2023-02-15T23:59:00+00:00"
}

If you're querying Parseable via Grafana UI (via data source plugin), you can use the following query to query the column Foo:

SELECT * FROM stream WHERE "Foo" = 'bar'

Analytics

In some cases, you may want to understand the query performance. To view the detailed query execution plan, use the EXPLAIN ANALYZE keyword in the query. For example, the following query will return the query execution plan and time taken per step.

{
    "query": "EXPLAIN ANALYZE SELECT * FROM frontend LIMIT 100",
    "startTime": "2023-03-07T05:28:10.428Z",
    "endTime": "2023-03-08T05:28:10.428Z"
}

Access control

Parseable supports multiple users and granular access controls to resources based roles assigned to each user. Refer the Access Control section for more details.

Multitenancy

Parseable supports multi tenancy using the Parseable Kubernetes Operator. Using the operator, you can create multiple Parseable instances in a single Kubernetes cluster. Each Parseable instance is isolated from other instances and has its own storage, users, and access control. Refer the Operator section for more details.

Authentication​

Ingestion​

Batching and Compression​

Log Streams​

Schema​

Flattening​

Timestamp​

Case Sensitivity​

Storage​

Modes​

Stats​

Query​

Querying case sensitive fields​

Analytics​

Access control​

Multitenancy​

Get Updates from Parseable