This repository is for building a Docker image for LinTO's NLP service: Topic Modeling on the basis of https://github.com/linto-ai/linto-platform-nlp-core, can be deployed along with https://github.com/linto-ai/linto-platform-stack or in a standalone way (see Develop section in below).
LinTo's NLP services adopt the basic design concept of spaCy: component and pipeline, components (located under the folder components/) are decoupled from the service and can be easily re-used in other spaCy projects, components are organised into pipelines for realising specific NLP tasks.
This service can be launched in two ways: REST API and Celery task, with and without GPU support.
See documentation : [***]
With our proposed stack https://github.com/linto-ai/linto-platform-stack
1 Download models into ./assets on the host machine (can be stored in other places), make sure that git-lfs: https://git-lfs.github.com/ is installed and availble at /usr/local/bin/git-lfs.
bashcd linto-platform-nlp-topic-modeling/ bash scripts/download_models.sh
2 configure running environment variables
bashcp .envdefault .env
| Environment Variable | Description | Default Value |
|---|---|---|
APP_LANG | A space-separated list of supported languages for the application | fr en |
ASSETS_PATH_ON_HOST | The path to the assets folder on the host machine | ./assets |
ASSETS_PATH_IN_CONTAINER | The volume mount point of models in container | /app/assets |
LM_MAP | A JSON string that maps each supported language to its corresponding language model | {"fr":"sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2","en":"sentence-transformers/all-MiniLM-L6-v2"} |
SERVICE_MODE | The mode in which the service is served, either "http" (REST API) or "task" (Celery task) | "http" |
CONCURRENCY | The maximum number of requests that can be handled concurrently | 1 |
USE_GPU | A flag indicating whether to use GPU for computation or not, either "True" or "False" | True |
SERVICE_NAME | The name of the micro-service | topic |
SERVICES_BROKER | The URL of the broker server used for communication between micro-services | "redis://localhost:6379" |
BROKER_PASS | The password for accessing the broker server | None |
4 Build image
bashsudo docker build --tag lintoai/linto-platform-nlp-topic-modeling:latest .
or
bashsudo docker-compose build
5 Run container with GPU support, make sure that NVIDIA Container Toolkit and GPU driver are installed.
bashsudo docker run --gpus all \ --rm -p 80:80 \ -v $PWD/assets:/app/assets:ro \ --env-file .env \ lintoai/linto-platform-nlp-topic-modeling:latest
--gpus all from the first command.USE_GPU=False in the .env.or
bashsudo docker-compose up
runtime: nvidia from the docker-compose.yml file.USE_GPU=False in the .env.6 If running under SERVICE_MODE=http, navigate to http://localhost/docs or http://localhost/redoc in your browser, to explore the REST API interactively. See the examples for how to query the API. If running under SERVICE_MODE=task, plese refers to the individual section in the end of this README.
http://localhost/topic/{lang}| {lang} | Model | Size |
|---|---|---|
en | sentence-transformers/all-MiniLM-L6-v2 | 80 MB |
fr | sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 | 418 MB |
Please use " | " (with a white-space on the left and right side) to seperate the segments (e.g., sentences, paragraphs, documents, etc.), which will be ***ed as the units for topic modeling.
The example in below of two topics consisting the first paragraphs about GAFAM and Supervised/Unsupervised/Semi-supervised/Reinforcement/Deep Learning, extracted from ***.
json{ "articles": [ { "text": "Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware. It is considered one of the Big Five companies in the American information technology industry, along with Amazon, Apple, Meta (Facebook) and Microsoft. | Amazon.com, Inc. is an American multinational technology company which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is one of the Big Five companies in the U.S. information technology industry, along with Google (Alphabet), Apple, Meta (Facebook), and Microsoft. The company has been referred to as one of the most influential economic and cultural forces in the world, as well as the world's most valuable brand. | Meta Platforms, Inc., doing business as Meta and formerly known as Facebook, Inc., is a multinational technology conglomerate based in Menlo Park, California. The company is the parent organization of Facebook, Instagram, and WhatsApp, among other subsidiaries. Meta is one of the world's most valuable companies and is considered one of the Big Tech companies in U.S. information technology, alongside Amazon, Google, Apple, and Microsoft. The company generates a substantial share of its revenue from the sale of advertisement placements to marketers. | Apple Inc. is an American multinational technology company that specializes in consumer electronics, computer software and online services. Apple is the largest information technology company by revenue (totaling $274.5 billion in 2020) and, since January 2021, the world's most valuable company. As of 2021, Apple is the fourth-largest PC vendor by unit sales and fourth-largest smartphone manufacturer. It is one of the Big Five American information technology companies, alongside Amazon, Google (Alphabet), Facebook (Meta), and Microsoft. | Microsoft Corporation is an American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services. Its best-known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers. Microsoft ranked No. 21 in the 2020 Fortune 500 rankings of the largest United States corporations by total revenue; it was the world's largest software maker by revenue as of 2016. It is considered one of the Big Five companies in the U.S. information technology industry, along with Amazon, Google (Alphabet), Apple, and Facebook (Meta). | Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a reasonable way (see inductive bias). This statistical quality of an algorithm is measured through the so-called generalization error. | Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data.[1][2] As a result, unsupervised learning algorithms must first self-discover any naturally occurring patterns in that training data set. Common examples include clustering, where the algorithm automatically groups its training examples into categories with similar features, and principal component analysis, where the algorithm finds ways to compress the training data set by identifying which features are most useful for discriminating between different training examples, and discarding the rest. This contrasts with supervised learning in which the training data include pre-assigned category labels (often by a human, or from the output of non-learning classification algorithm). Other intermediate levels in the supervision spectrum include reinforcement learning, where only numerical scores are available for each training example instead of detailed tags, and semi-supervised learning where only a portion of the training data have been tagged. | Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data). It is a special instance of weak supervision. | Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. | Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised." } ] }
In the response results, sometimes a topic with topic_id -1 is presented, which refers to noise topic and correponds to outlier input segments, can typically be ignored.
"count" refers to the topic frequency (number of segments attached to the topic), "phrases" represents a list of representative phrases of the topic with associated c-TF-IDF scores.
"topic_assignments" shows the list of segments, their assignments to a specific topic, and probabilities over all topics.
json{ "topic": [ { "text": "Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware. It is ***ed one of the Big Five companies in the American information technology industry, along with Amazon, Apple, Meta (***) and Microsoft. | Amazon.com, Inc. is an American multinational technology company which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is one of the Big Five companies in the U.S. information technology industry, along with Google (Alphabet), Apple, Meta (***), and Microsoft. The company has been referred to as one of the most influential economic and cultural forces in the world, as well as the world's most valuable brand. | Meta Platforms, Inc., doing business as Meta and formerly known as ***, Inc., is a multinational technology conglomerate based in Menlo Park, California. The company is the parent organization of ***, Instagram, and ***, among other subsidiaries. Meta is one of the world's most valuable companies and is ***ed one of the Big Tech companies in U.S. information technology, alongside Amazon, Google, Apple, and Microsoft. The company generates a substantial share of its revenue from the sale of advertisement placements to marketers. | Apple Inc. is an American multinational technology company that specializes in consumer electronics, computer software and online services. Apple is the largest information technology company by revenue (totaling $274.5 billion in 2020) and, since January 2021, the world's most valuable company. As of 2021, Apple is the fourth-largest PC vendor by unit sales and fourth-largest smartphone manufacturer. It is one of the Big Five American information technology companies, alongside Amazon, Google (Alphabet), *** (Meta), and Microsoft. | Microsoft Corporation is an American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services. Its best-known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers. Microsoft ranked No. 21 in the 2020 Fortune 500 rankings of the largest United States corporations by total revenue; it was the world's largest software maker by revenue as of 2016. It is ***ed one of the Big Five companies in the U.S. information technology industry, along with Amazon, Google (Alphabet), Apple, and *** (Meta). | Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a reasonable way (see inductive bias). This statistical quality of an algorithm is measured through the so-called generalization error. | Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data.[1][2] As a result, unsupervised learning algorithms must first self-discover any naturally occurring patterns in that training data set. Common examples include clustering, where the algorithm automatically groups its training examples into categories with similar features, and principal component analysis, where the algorithm finds ways to compress the training data set by identifying which features are most useful for discriminating between different training examples, and discarding the rest. This contrasts with supervised learning in which the training data include pre-assigned category labels (often by a human, or from the output of non-learning classification algorithm). Other intermediate levels in the supervision spectrum include reinforcement learning, where only numerical scores are available for each training example instead of detailed tags, and semi-supervised learning where only a portion of the training data have been tagged. | Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Semi-supervised learning falls between unsupervised learning (with no labeled training data) and supervised learning (with only labeled training data). It is a special instance of weak supervision. | Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning. | Deep learning (also known as deep structured learning) is part of a broader family of machine learning methods based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised.", "topics": [ { "topic_id": 0, "count": 5, "phrases": [ { "text": "technology", "score": 0.1007437481087233 }, { "text": "microsoft", "score": 0.08701055137154726 }, { "text": "company", "score": 0.07973424604917556 }, { "text": "apple", "score": 0.07213985044118004 }, { "text": "companies", "score": 0.06418162980264484 }, { "text": "amazon", "score": 0.05579841684222883 }, { "text": "multinational", "score": 0.05579841684222883 }, { "text": "software", "score": 0.05579841684222883 }, { "text": "revenue", "score": 0.04690415023839433 }, { "text": "inc", "score": 0.04690415023839433 } ] }, { "topic_id": 1, "count": 5, "phrases": [ { "text": "learning", "score": 0.18309249415991027 }, { "text": "training", "score": 0.11836015401314076 }, { "text": "data", "score": 0.10109415817647079 }, { "text": "supervised", "score": 0.09495941807377267 }, { "text": "algorithm", "score": 0.0751562032138162 }, { "text": "machine", "score": 0.06049656014890447 }, { "text": "unsupervised", "score": 0.05259467999004345 }, { "text": "labeled", "score": 0.04421108898068748 }, { "text": "labels", "score": 0.03522302060478677 }, { "text": "input", "score": 0.03522302060478677 } ] } ], "topic_assignments": [ { "text": "Google LLC is an American multinational technology company that specializes in Internet-related services and products, which include online advertising technologies, a search engine, cloud computing, software, and hardware. It is ***ed one of the Big Five companies in the American information technology industry, along with Amazon, Apple, Meta (***) and Microsoft.", "assigned_id": 0, "probabilities": [ 1, 4.085054396619016e-309 ] }, { "text": "Amazon.com, Inc. is an American multinational technology company which focuses on e-commerce, cloud computing, digital streaming, and artificial intelligence. It is one of the Big Five companies in the U.S. information technology industry, along with Google (Alphabet), Apple, Meta (***), and Microsoft. The company has been referred to as one of the most influential economic and cultural forces in the world, as well as the world's most valuable brand.", "assigned_id": 0, "probabilities": [ 1, 4.60356554840347e-309 ] }, { "text": "Meta Platforms, Inc., doing business as Meta and formerly known as ***, Inc., is a multinational technology conglomerate based in Menlo Park, California. The company is the parent organization of ***, Instagram, and ***, among other subsidiaries. Meta is one of the world's most valuable companies and is ***ed one of the Big Tech companies in U.S. information technology, alongside Amazon, Google, Apple, and Microsoft. The company generates a substantial share of its revenue from the sale of advertisement placements to marketers.", "assigned_id": 0, "probabilities": [ 1, 4.070943934486963e-309 ] }, { "text": "Apple Inc. is an American multinational technology company that specializes in consumer electronics, computer software and online services. Apple is the largest information technology company by revenue (totaling $274.5 billion in 2020) and, since January 2021, the world's most valuable company. As of 2021, Apple is the fourth-largest PC vendor by unit sales and fourth-largest smartphone manufacturer. It is one of the Big Five American information technology companies, alongside Amazon, Google (Alphabet), *** (Meta), and Microsoft.", "assigned_id": 0, "probabilities": [ 0.6053796529377782, 0.1948665243301664 ] }, { "text": "Microsoft Corporation is an American multinational technology corporation which produces computer software, consumer electronics, personal computers, and related services. Its best-known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers. Its flagship hardware products are the Xbox video game consoles and the Microsoft Surface lineup of touchscreen personal computers. Microsoft ranked No. 21 in the 2020 Fortune 500 rankings of the largest United States corporations by total revenue; it was the world's largest software maker by revenue as of 2016. It is ***ed one of the Big Five companies in the U.S. information technology industry, along with Amazon, Google (Alphabet), Apple, and *** (Meta).", "assigned_id": 0, "probabilities": [ 1, 3.565365632376766e-309 ] }, { "text": "Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a reasonable way (see inductive bias). This statistical quality of an algorithm is measured through the so-called generalization error.", "assigned_id": 1, "probabilities": [ 0.15511752359976377, 0.6804084692327057 ] }, { "text": "Unsupervised learning is a type of machine learning in which the algorithm is not provided with any pre-assigned labels or scores for the training data.[1][2] As a result, unsupervised learning algorithms must first self-discover any naturally occurring patterns in that training data set. Common examples include clustering, where the algorithm automatically groups its training examples into categories with similar features, and principal component analysis, where the algorithm finds ways to compress the training data set by identifying which features are most useful for discriminating between different training examples, and discarding the rest. This contrasts with supervised learning in which the training data include pre-assigned category labels (often by a human, or from the output of non-learning classification algorithm). Other intermediate levels in the supervision spectrum include reinforcement learning, where only numerical scores are available for each training example instead of detailed tags, and semi-supervised learning where only a portion of the training data have been tagged.", "assigned_id": 1, "probabilities": [ 4.048049922344117e-309, 1 ] }, { "text": "Semi-supervised learning is an approach to machine learning that combines a small amount of labeled data with a large amount of unlabeled data during training. Sem
探索更多轩辕镜像的使用方法,找到最适合您系统的配置方式
通过 Docker 登录认证访问私有仓库
无需登录使用专属域名
Kubernetes 集群配置 Containerd
K3s 轻量级 Kubernetes 镜像加速
VS Code Dev Containers 配置
Podman 容器引擎配置
HPC 科学计算容器配置
ghcr、Quay、nvcr 等镜像仓库
Harbor Proxy Repository 对接专属域名
Portainer Registries 加速拉取
Nexus3 Docker Proxy 内网缓存
需要其他帮助?请查看我们的 常见问题Docker 镜像访问常见问题解答 或 提交工单
manifest unknown
no matching manifest(架构)
invalid tar header(解压)
TLS 证书失败
DNS 超时
410 Gone 排查
402 与流量用尽
401 认证失败
429 限流
D-Bus 凭证提示
413 与超大单层
来自真实用户的反馈,见证轩辕镜像的优质服务