🔮 Instill Core automatically load a
.env
file that
contains key/value pairs defining required environment variables. You can
customize the file based on your configuration.
Besides, they use Koanf library for
configuration. It supports loading configuration from multiple sources and makes
it available to the service. To override the default configuration file, you can
set the corresponding environment variable in the compose file in
core.
All configuration environment variables for each service are prefixed with
CFG_
.
#Configuring 🔮 Instill Core Services
Read the default configuration files for a full overview of all supported configuration options of each service:
mgmt-backend
(localhost:8084
): a service to handle user management, token management, and metricsconsole
(localhost:3000
): a web-based UI app to provide a unified, clean, and intuitive user experience of Instill VDP and Model
#Configuring Service Version
Set the environment variable for a specific service to determine which version to use in Core.
The combination of default versions in .env
file is fully tested for compatibility.
Unless you are debugging and testing, updating the default versions in the .env
file is discouraged.
#Configuring the Console
To access Core via Console, set the host by overriding the environment variables:
By default they are set to localhost
.
#Configuring the Observability Stack
Observability is critical for distributed microservice architecture. Through OpenTelemetry, we can generate, collect and export metrics, logs and traces to help analyze the performance and behaviour of VDP services.
The observability stack is disabled by default. You can enable the stack setting OBSERVE_ENABLED=true
in the .env
file.
The following telemetry tools are supported now:
- Jaeger (
localhost:16686
): OpenTelemetry allows us to export spans to Jaeger. Use Jaeger when you want to debug the complete flow of a request through the VDP services. - InfluxDB (
localhost:8086
, username:admin
, password:password
): detailed metrics are sent to InfluxDB for monitoring, and are imported into the Grafana dashboard - Grafana (
localhost:3002
, username:admin
, password:admin
): the Grafana dashboard visualises the metrics to monitor the performance and anomalies of VDP services - Prometheus (
localhost:9090
): VDP exports metrics likevdp_pipeline_sync_trigger_counter_total
(total number of triggers from SYNC pipelines) andvdp_pipeline_async_trigger_counter_total
(total number of triggers from ASYNC pipelines) to Prometheus.
Set the environment variable for a specific telemetry tool to determine which version to use in Core.
#Configuring 💧 Instill VDP Services
Read the default configuration files for a full overview of all supported configuration options of each service:
pipeline-backend
(localhost:8081
): a service to build and manage unstructured data pipelines
#Configuring VDP Service Version
Set the environment variable for a specific service to determine which version to use in VDP.
#Configuring ⚗️ Instill Model Services
Read the default configuration files for a full overview of all supported configuration options of each service:
model-backend
(localhost:8083
): a service to import and serve ML models
#Configuring Model Service Version
Set the environment variable for a specific service to determine which version to use in VDP.
The combination of default versions in .env
file is fully tested for compatibility.
Unless you are debugging and testing, updating the default versions in the .env
file is discouraged.
#Configuring 💾 Instill Artifact Services
Read the default configuration files for a full overview of all supported configuration options of each service:
artifact-backend
(localhost:8082
): a service for managing all stateful resources
#Configuring Artifact Service Version
#Configuring the Embedding Feature
To enable the embedding feature in Artifact, you must set up the
OPENAI_SECRET_KEY
environment variable. This key is necessary for the Process
Files API, which uses embedding models to encode
text data. For now, the OpenAI API is the only supported embedding option, but
in the future we plan to offer additional options, including local embedding
solutions.
- Open the
.env
File:- Locate and open the
.env
file in your project directory.
- Locate and open the
- Add the OpenAI Secret Key:
- Insert the following line into the
.env
file, replacingsk-XXX
with your actual OpenAI secret key:
- Insert the following line into the
- Restart Instill-Core:
- After setting the environment variable, restart the instill-core service to apply the changes.
#Anonymised Usage Collection
To help us better understand how VDP and Model is used and can be improved, VDP and Model collects and reports anonymised usage statistics.
#What Data is Collected
We value your privacy. So, we went for the anonymous data and selected a set of details to share from your VDP instance that would give us insights about how to improve VDP and Model without being invasive.
When a new VDP and Model is running, the usage client in services including pipeline-backend
, model-backend
and mgmt-backend
in VDP will ask for a new session, respectively.
Our usage server returns a token used for future reporting.
For each session, we collect Session
data including some basic information about the service and the system details the service is running on:
- name of the service to collect data from, e.g.,
SERVICE_PIPELINE
forpipeline-backend
- edition of the service to identify the deployment, e.g.,
local-ce
for local community edition deployment - version of the service, e.g.,
0.5.0-alpha
- architecture of the system the service is running on, e.g.,
amd64
- operating system the service is running on, e.g.,
Linux
- uptime in seconds to identify the rough life span of the service
Each session is assigned a random UUID for tracking and identification.
Then, each session will collect and send its own SessionReport
data every 10 minutes:
MgmtUsageData
reports data formgmt-backend
session- UUID of the onboarded User
- a list of user metadata
- UUID of the onboarded User
PipelineUsageData
reports data forpipeline-backend
session of the onboarded User- UUID of the onboarded User
- a list of pipeline trigger metadata
ModelUsageData
reports data formodel-backend
session of the onboarded User- UUID of the onboarded User
- a list of model trigger metadata
You can check the full usage data structs in protobufs. These data do not allow us to track Personal Data but enable us to measure session counts and usage statistics.
#Implementation
The anonymous usage report client library is in usage-client
.
To limit risk exposure, we keep the usage server implementation private for now.
In summary, the Session data and SessionReport sent from each session get updated in the usage server.
Additionally, The frontend Console sends event data to Amplitude.
#Opting out
VDP usage collection helps the entire community. We'd appreciate it if you can leave it on.
However, if you want to opt out, you can disable it by overriding the .env
file in Core:
This will disable the VDP and Model usage collection for the entire project.
#Acknowledgements
Our anonymised usage collection was inspired by KrakenD's How we built our telemetry service and would love to acknowledge that their design has helped us to bootstrap our usage collection project.