This document explains Parseable design and terminology at a high level.
Parseable API require basic auth based authentication. All HTTP clients are able to generate basic auth headers from the username and password. Note that the username and password are set when you start Parseable server (via the environment variables
In case you want to generate the basic auth yourself, use the following command.
echo -n '<username>:<password>' | base64
Then add the following header to your HTTP request.
Authorization: Basic <output-from-previous-command>
Log ingestion for Parseable is done via a HTTP POST request with a JSON payload. The payload can contain single log event as a JSON object or multiple log events as a JSON array.
You can use HTTP output plugins of any popular logging agent (FluentBit, Vector, LogStash among others) to send log events to Parseable. You can also directly integrate Parseable with your application via REST API calls.
With Parseable you don't need to explicitly define a schema for your log events. As you send the first log event to a Parseable stream - server automatically detects the schema and enforces that for subsequent log events (sent to that stream). You can fetch this schema using the Get schema API.
We're working on a fluid schema approach which will allow schema evolution over time. This means, you can add new fields to your log events as logs evolve without breaking the schema.
Nested JSON objects are automatically flattened. For example, the following JSON object
will be flattened to
before it gets stored. While querying, this field should be referred as
foo.bar. For example,
select foo.bar from <stream-name>. The flattened field will be available in the schema as well.
Once the JSON payload data reaches server, it is validated and parsed to a columnar Apache Arrow format in memory. Subsequent events are appended to the Arrow record batch in memory and a copy is kept on disk (to prevent data loss). Finally, after a configurable duration, the Arrow record batch is converted to Parquet and then pushed to S3 (or compatible) bucket.
Parquet on object storage is organized into prefixes based on stream name, date and time. This ensures the server fetches a very specific dataset based on the query time range. More on query in the next section. We're working on a compaction approach that will further compress and optimize the storage while ensuring queryable data at all times.
To fetch the ingested data and actual compressed data size for a stream, use the Get Stats API. Sample response:
"size": "12800 Bytes"
"size": "15517 Bytes"
Parseable query API works with standard SQL. In addition to the SQL query, users need to specify the time range for which the query should be executed. The time range is specified using
endTime parameters. The response is inclusive of both the timestamps.
Check out the Query API in Postman.
Parseable uses Apache Arrow native query engine called DataFusion in conjunction with efficient Parquet reader to execute the queries.