================================================================================
j63_recipe 数据迁移路径报告
生成时间: 2025-12-09 23:15:14
环境: cs-oci-ord
================================================================================

【最小传输目录】(去重合并后，直接用于 rsync)
--------------------------------------------------------------------------------
1. [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V
2. [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground

【rsync 命令】
--------------------------------------------------------------------------------
rsync -avP --progress /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/ $SHARE_OUTPUT/data/ShareGPT4V/
rsync -avP --progress /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/ $SHARE_OUTPUT/data/playground/

【共享目录】(多个数据集共用)
--------------------------------------------------------------------------------
  /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/data
    -> 被以下数据集共用: sharegpt4v_gpt4_100k, sharegpt4v_sft

【详细清单】(per-dataset)
--------------------------------------------------------------------------------
1. ai2d_train_12k
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/ai2d_train_12k.jsonl
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/ai2d

2. chartqa_train_18k
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/chartqa_train_18k.jsonl
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/chartqa

3. docvqa_train_10k
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/docvqa_train_10k.jsonl
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/docvqa

4. dvqa_train_200k
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/dvqa_train_200k.jsonl
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/dvqa

5. geoqa
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/geoqa+.jsonl
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/geoqa+

6. llava_instruct
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/llava_instruct_150k_zh.jsonl
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/coco

7. sharegpt4v_gpt4_100k
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/jason-filter-sharegpt4v_instruct_gpt4-vision_cap100k.json
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/data [共享: sharegpt4v_sft]

8. sharegpt4v_sft
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/jason-filter-sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/data [共享: sharegpt4v_gpt4_100k]

9. synthdog_en
   data_path: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/synthdog_en.jsonl
   media_dir: [✓] /lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/synthdog-en

【原始路径列表】(用于脚本处理)
--------------------------------------------------------------------------------
# 最小传输目录:
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground

# 所有 data_path 文件:
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/jason-filter-sharegpt4v_instruct_gpt4-vision_cap100k.json
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/jason-filter-sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.json
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/ai2d_train_12k.jsonl
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/chartqa_train_18k.jsonl
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/docvqa_train_10k.jsonl
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/dvqa_train_200k.jsonl
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/geoqa+.jsonl
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/llava_instruct_150k_zh.jsonl
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/synthdog_en.jsonl

# 所有 media_dir 目录:
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/ShareGPT4V/data
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/ai2d
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/chartqa
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/coco
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/docvqa
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/dvqa
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/geoqa+
/lustre/fsw/portfolios/nvr/projects/nvr_elm_llm/dataset/vila-sft/internvl_chat/playground/data/synthdog-en