Other Concepts
Authentication
Parseable API calls require Basic Auth. You can find the username and password for your Parseable server in the environment variables P_USERNAME
and P_PASSWORD
. If not set, the default username and password is admin
and admin
. HTTP clients generate basic auth headers from the username and password.
In case you want to manually add the basic auth header, use the following command.
echo -n '<username>:<password>' | base64
Then add the following HTTP header to the API call.
Authorization: Basic <output-from-previous-command>
You can also use OAuth2 tokens for authentication. Refer the OIDC section for more details.
High Availability
Parseable can be deployed in a high availability configuration. In this configuration, Parseable runs in a cluster of multiple ingestors and query node, also serving as the main node. The ingestors can be load balanced in a standard round robin configuration. In case an ingestor fails, the load balancer will automatically route the requests to the healthy ingestors.
The query node is responsible for executing the queries and fetching the data from the storage. The query node can be scaled vertically to handle more queries. Refer the High Availability section for more details.
Storage
Once the JSON payload data reaches server, it is validated and parsed to a columnar Apache Arrow format in memory. Subsequent events are appended to the Arrow record batch in memory and a copy is kept on disk (to prevent data loss). Finally, after a configurable duration, the Arrow record batch is converted to Parquet and then pushed to S3 (or compatible) bucket.
Parquet on object storage is organized into prefixes based on stream name, date and time. This ensures the server fetches specific dataset(s) based on the query time range. We're working on a compaction approach that will further compress and optimize the storage while ensuring queryable data at all times.
Modes
Parseable can use a drive (mount points) or S3 (and compatible) bucket as the backend storage. The storage mode can be configured while starting the Parseable server with the sub commands local-store
or s3-store
, for drive or object storage respectively.
Based on the storage mode, server requires certain environment variables to be set. Refer the environment variables section for more details.
We recommend the local-store
mode only development and testing purposes. Note that once the server is started, the storage mode can't be changed.
Stats
To fetch the ingested data and actual compressed data size for a stream, use the Get Stats API. Sample response:
{
"ingestion": {
"format": "json",
"size": "12800 Bytes"
},
"storage": {
"format": "parquet",
"size": "15517 Bytes"
},
"stream": "reactapplogs",
"time": "2022-11-17T07:03:13.134992Z"
}
Storage manifest
For each day, Parseable creates a manifest file that contains the metadata of the Parquet files ingested for that day. The manifest file is stored in the same S3 bucket as the Parquet files. The manifest file is used by the query server to filter out the relevant Parquet files based on the query filters and the time range.
Access control
Parseable supports multiple users and granular access controls to resources based roles assigned to each user. Refer the Access Control section for more details.
Multitenancy
Parseable supports multi tenancy using the Parseable Kubernetes Operator. Using the operator, you can create multiple Parseable instances in a single Kubernetes cluster. Each Parseable instance is isolated from other instances and has its own storage, users, and access control. Refer the Operator section for more details.