Configuration

VDP automatically loads a .env file that contains key/value pairs defining required environment variables. You can customize the file based on your configuration.

#Configuring services

VDP services use Koanf library for configuration. It supports loading configuration from multiple sources and makes it available to the service. To override the default configuration file, you can set the corresponding environment variable in the Compose file in VDP. All configuration environment variables for each service are prefixed with CFG_.

Read the default configuration files for a full overview of all supported configuration options of each service: pipeline-backend, connector-backend, model-backend and mgmt-backend.

#Configuring VDP service version

Set the environment variable for a specific service to determine which version to use in VDP.

.env
Copy

# Set the version of individual VDP service
CONSOLE_VERSION=<console-version>
MGMT_BACKEND_VERSION=<mgmt-backend-version>
CONNECTOR_BACKEND_VERSION=<connector-backend-version>
MODEL_BACKEND_VERSION=<model-backend-version>
PIPELINE_BACKEND_VERSION=<pipeline-backend-version>
# Set the version of 3rd party tools
TRITON_SERVER_VERSION=<triton-server-version>
REDIS_VERSION=<redis-version>
POSTGRESQL_VERSION=<postgresql-version>
TEMPORAL_VERSION=<temporal-version>
TEMPORAL_UI_VERSION=<temporal-ui-version>

The combination of default versions in .env file is fully tested for compatibility. Unless you are debugging and testing, updating the default versions in the .env file is discouraged.

#Configuring the VDP domain

Set the domain name to access VDP by overriding the environment variable DOMAIN:

.env
Copy

DOMAIN=<domain-name-to-access-vdp>

Without setting DOMAIN, the default domain is localhost.

#Anonymised Usage Collection

To help us better understand how VDP is used and can be improved, VDP collects and reports anonymised usage statistics.

#What data is collected

INFO

We value your privacy. So, we went for the anonymous data and selected a set of details to share from your VDP instance that would give us insights about how to improve VDP without being invasive.

When a new VDP is running, the usage client in services including pipeline-backend, connector-backend, model-backend, and mgmt-backend in VDP will ask for a new session, respectively. Our usage server returns a token used for future reporting. For each session, we collect Session data including some basic information about the service and the system details the service is running on:

  • name of the service to collect data from, e.g., SERVICE_CONNECTOR for connector-backend
  • edition of the service to identify the deployment, e.g., local-ce for local community edition deployment
  • version of the service, e.g., 0.5.0-alpha
  • architecture of the system the service is running on, e.g., amd64
  • operating system the service is running on, e.g., Linux
  • uptime in seconds to identify the rough life span of the service

Each session is assigned a random UUID for tracking and identification. Then, each session will collect and send its own SessionReport data every 10 minutes:

  • MgmtUsageData reports data for mgmt-backend session
  • ConnectorUsageData reports data for connector-backend session of the onboarded User
    • UUID of the onboarded User
    • number of connected or disconnected Sources
    • number of connected or disconnected Destinations
    • an array of SourceConnectorDefinition of the Sources
    • an array of DestinationConnectorDefinition of the Destinations
  • ModelUsageData reports data for model-backend session of the onboarded User
    • UUID of the onboarded User
    • number of online and offline Models
    • the array of ModelDefinition of the Models
    • the array of AI tasks of the Models
    • number of processed images for testing models
  • PipelineUsageData reports data for pipeline-backend session of the onboarded User
    • UUID of the onboarded User
    • number of active and inactive Pipelines
    • number of SYNC and ASYNC Pipelines
    • number of images processed by the Pipelines

You can check the full usage data structs in protobufs. These data do not allow us to track Personal Data but enable us to measure session counts and usage statistics.

#Implementation

The anonymous usage report client library is in usage-client. To limit risk exposure, we keep the usage server implementation private for now. In summary, the Session data and SessionReport sent from each session get updated in the usage server.

Additionally, The frontend Console sends event data to Amplitude.

#Opting out

VDP usage collection helps the entire community. We'd appreciate it if you can leave it on. However, if you want to opt out, you can disable it by overriding the .env file:

.env
Copy

DISABLEUSAGE=true

This will disable the VDP usage collection for the entire project.

#Acknowledgements

Our anonymised usage collection was inspired by KrakenD's How we built our telemetry service and would love to acknowledge that their design has helped us to bootstrap our usage collection project.

Last updated: 5/29/2023, 12:50:07 AM