Update DEVELOPMENT_PLAN.md with internal API findings and pytchat focus

This commit is contained in:
2025-10-30 18:06:17 +01:00
parent a00c58989c
commit 956d5d7c6f

View File

@@ -16,7 +16,7 @@ To identify and, if feasible, implement a sustainable, quota-friendly, compliant
4. **Tooling:** Consider using tools like `mitmproxy` for more in-depth network traffic interception and analysis on a Linux system. 4. **Tooling:** Consider using tools like `mitmproxy` for more in-depth network traffic interception and analysis on a Linux system.
* **Expected Outcome:** A detailed understanding of YouTube's internal live chat data acquisition methods. * **Expected Outcome:** A detailed understanding of YouTube's internal live chat data acquisition methods.
### Findings: Internal `POST /youtubei/v1/live_chat/get_live_chat` API ### Findings: Internal `POST /youtubei/v1/live_chat/get_live_chat` API and `pytchat`
Our network analysis revealed that YouTube's web client uses a `POST` request to `https://www.youtube.com/youtubei/v1/live_chat/get_live_chat` for receiving live chat messages. This is a continuation-based polling mechanism, not a gRPC stream or a simple public API call. Our network analysis revealed that YouTube's web client uses a `POST` request to `https://www.youtube.com/youtubei/v1/live_chat/get_live_chat` for receiving live chat messages. This is a continuation-based polling mechanism, not a gRPC stream or a simple public API call.
@@ -34,16 +34,25 @@ Our network analysis revealed that YouTube's web client uses a `POST` request to
* `timestampUsec`: Timestamp of the message. * `timestampUsec`: Timestamp of the message.
* `authorExternalChannelId`: The author's channel ID. * `authorExternalChannelId`: The author's channel ID.
**Implications:** **Implications of using this Internal API (or libraries like `pytchat` that leverage it):**
* **Undocumented API:** This is an internal, undocumented API endpoint, subject to change without notice. This introduces significant fragility and maintenance risk. * **Undocumented API:** This is an internal, undocumented API endpoint, subject to change without notice. This introduces significant fragility and maintenance risk.
* **Compliance Risk:** Using undocumented internal APIs is generally against YouTube's Terms of Service. * **Compliance Risk:** Using undocumented internal APIs is generally against YouTube's Terms of Service. This is a **critical risk** that could lead to account-level sanctions or blocking.
* **Authentication Challenge:** The requests include `Authorization` (e.g., `SAPISIDHASH`) and `Cookie` headers, indicating that mimicking this client will require managing a logged-in YouTube session, which is complex and prone to breaking. * **Authentication Challenge:** The requests include `Authorization` (e.g., `SAPISIDHASH`) and `Cookie` headers, indicating that mimicking this client will require managing a logged-in YouTube session, which is complex and prone to breaking.
### Next Research Steps for Phase 1: **`pytchat` as a direct (but risky) solution:**
1. **Authentication Research:** Investigate methods to obtain and maintain valid `Authorization` and `Cookie` headers programmatically for a long-running application. This is a major hurdle for internal APIs. The `pytchat` library directly leverages this internal API. It handles the underlying HTTP requests and session management internally, making it simpler to use for reading public live chats without explicit API keys or OAuth. However, it inherits all the risks associated with using an undocumented internal API (ToS violation, fragility, unmaintained main repository).
2. **Mimicking Client Context:** Determine which parts of the `context` object are essential and how to generate/maintain them to avoid being blocked.
3. **Python Implementation Strategy:** Outline how to construct these `POST` requests, parse the JSON responses, and extract chat messages and the next `continuation` token. ### Next Research Steps for Phase 1 (Focus on `pytchat` for experimental implementation):
1. **`pytchat` Installation and Basic Usage:**
* **Action:** Install the `pytchat` library.
* **Action:** Create a basic Python script to fetch and display chat using `pytchat` for a given live stream ID.
2. **`pytchat` Internal Mechanism Analysis:**
* **Action:** Investigate how `pytchat` manages session/authentication internally (e.g., does it require a logged-in browser session, or does it generate necessary headers?).
* **Action:** Understand how `pytchat` handles the `continuation` token and polling.
3. **Integration with `rich` Display:**
* **Action:** Adapt the existing `rich` display logic from `main.py` to consume messages received from `pytchat`.
## Phase 2: Re-exploration of YouTube Data API v3 (Creative Use) ## Phase 2: Re-exploration of YouTube Data API v3 (Creative Use)
@@ -72,4 +81,4 @@ Our network analysis revealed that YouTube's web client uses a `POST` request to
**Prioritization:** All research will prioritize **open-source and Linux-compatible solutions**. Compliance with YouTube's Terms of Service remains a critical factor. **Prioritization:** All research will prioritize **open-source and Linux-compatible solutions**. Compliance with YouTube's Terms of Service remains a critical factor.
**Next Steps:** The findings from this revised research will be compiled into a structured document to inform the design and implementation of a robust YouTube Live Chat monitoring solution. **Next Steps:** The findings from this revised research will be compiled into a structured document to inform the design and implementation of a robust YouTube Live Chat monitoring solution.