From a00c58989c35dae895d8dd3573014ee82f9f8dfd Mon Sep 17 00:00:00 2001 From: Ramforth Date: Thu, 30 Oct 2025 18:00:56 +0100 Subject: [PATCH] Update DEVELOPMENT_PLAN.md with internal API findings --- DEVELOPMENT_PLAN.md | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/DEVELOPMENT_PLAN.md b/DEVELOPMENT_PLAN.md index 4882b03..0fb2b7e 100644 --- a/DEVELOPMENT_PLAN.md +++ b/DEVELOPMENT_PLAN.md @@ -16,6 +16,35 @@ To identify and, if feasible, implement a sustainable, quota-friendly, compliant 4. **Tooling:** Consider using tools like `mitmproxy` for more in-depth network traffic interception and analysis on a Linux system. * **Expected Outcome:** A detailed understanding of YouTube's internal live chat data acquisition methods. +### Findings: Internal `POST /youtubei/v1/live_chat/get_live_chat` API + +Our network analysis revealed that YouTube's web client uses a `POST` request to `https://www.youtube.com/youtubei/v1/live_chat/get_live_chat` for receiving live chat messages. This is a continuation-based polling mechanism, not a gRPC stream or a simple public API call. + +**Request Payload Key Elements:** +* `context`: Contains extensive client-specific information (e.g., `userAgent`, `clientName`, `clientVersion`, `visitorData`, `timeZone`, `browserName`, `browserVersion`). Mimicking this accurately is crucial. +* `continuation`: A string token that dictates which set of messages to fetch next. This token is received in the previous response. + +**Response Payload Key Elements:** +* `continuationContents.liveChatContinuation.continuations`: Contains an `invalidationContinuationData` object with the *new* `continuation` token for the subsequent request. +* `actions`: An array containing `addChatItemAction` objects, each representing a chat message. + * `addChatItemAction.item.liveChatTextMessageRenderer`: Contains the message details. + * `message.runs[0].text`: The actual chat message content. + * `authorName.simpleText`: The author's display name. + * `authorPhoto.thumbnails`: URLs for the author's profile picture. + * `timestampUsec`: Timestamp of the message. + * `authorExternalChannelId`: The author's channel ID. + +**Implications:** +* **Undocumented API:** This is an internal, undocumented API endpoint, subject to change without notice. This introduces significant fragility and maintenance risk. +* **Compliance Risk:** Using undocumented internal APIs is generally against YouTube's Terms of Service. +* **Authentication Challenge:** The requests include `Authorization` (e.g., `SAPISIDHASH`) and `Cookie` headers, indicating that mimicking this client will require managing a logged-in YouTube session, which is complex and prone to breaking. + +### Next Research Steps for Phase 1: + +1. **Authentication Research:** Investigate methods to obtain and maintain valid `Authorization` and `Cookie` headers programmatically for a long-running application. This is a major hurdle for internal APIs. +2. **Mimicking Client Context:** Determine which parts of the `context` object are essential and how to generate/maintain them to avoid being blocked. +3. **Python Implementation Strategy:** Outline how to construct these `POST` requests, parse the JSON responses, and extract chat messages and the next `continuation` token. + ## Phase 2: Re-exploration of YouTube Data API v3 (Creative Use) * **Objective:** Explore creative, highly optimized uses of the existing REST API that might offer better sustainability, even if not truly event-driven. @@ -43,4 +72,4 @@ To identify and, if feasible, implement a sustainable, quota-friendly, compliant **Prioritization:** All research will prioritize **open-source and Linux-compatible solutions**. Compliance with YouTube's Terms of Service remains a critical factor. -**Next Steps:** The findings from this revised research will be compiled into a structured document to inform the design and implementation of a robust YouTube Live Chat monitoring solution. \ No newline at end of file +**Next Steps:** The findings from this revised research will be compiled into a structured document to inform the design and implementation of a robust YouTube Live Chat monitoring solution.