• R/O
  • SSH

提交

标签
No Tags

Frequently used words (click to add to your profile)

javac++androidlinuxc#windowsobjective-ccocoa誰得qtpythonphprubygameguibathyscaphec計画中(planning stage)翻訳omegatframeworktwitterdomtestvb.netdirectxゲームエンジンbtronarduinopreviewer

Commit MetaInfo

修订版755b11e0bd678429fa40a6870432fea49cfde86f (tree)
时间2025-01-03 03:35:54
作者Lorenzo Isella <lorenzo.isella@gmai...>
CommiterLorenzo Isella

Log Message

I can now choose whether to run the left_join with arrow or duckplyr.

更改概述

差异

diff -r 76b06075d3a3 -r 755b11e0bd67 R-codes/duckplyr_test.R
--- a/R-codes/duckplyr_test.R Wed Jan 01 23:57:40 2025 +0100
+++ b/R-codes/duckplyr_test.R Thu Jan 02 19:35:54 2025 +0100
@@ -1,9 +1,13 @@
11 library(tidyverse)
22
3+choose_arrow <- 1 ## choose whether to use arrow or duckplyr
4+
5+if (choose_arrow==1){
6+
37 library(arrow)
48
59
6-# Uncomment and run this only once
10+## Uncomment and run this only once
711 ## dd <- tibble(x=1:100000000, y=rep(LETTERS[1:20], 5000000))
812
913
@@ -32,27 +36,29 @@
3236
3337 df_out2|>glimpse()
3438
35-
36-## uncomment to run --this takes a lot of memory on my system
37-
38-## library(duckplyr)
39+} else {
3940
40-## duck_exec("set memory_limit='1GB'")
4141
42-## df <- duck_csv("test.csv")
42+library(duckplyr)
4343
44-## system.time({
45-## df_stat <- df |>
46-## summarise(total=sum(x), .by = y)
44+duck_exec("set memory_limit='1GB'")
45+
46+df <- duck_csv("test.csv")
47+
48+system.time({
49+df_stat <- df |>
50+ summarise(total=sum(x), .by = y)
4751
4852
4953
50-## df_out <- df |>
51-## left_join(y=df_stat, by=c("y")) |>
52-## collect()
54+df_out <- df |>
55+ left_join(y=df_stat, by=c("y")) |>
56+ as_tibble()
5357
54-## })
58+})
5559
60+df_out |> glimpse()
61+}
5662
5763 sessionInfo()
5864