TODO: Write a description here
-
Add the dependency to your
shard.yml
:dependencies: tinypandas: github: orangeSi/tinypandas
-
Run
shards install
1. support seprated by tab format or csv or vcf format file
test code is in example/test.cr
like this:
require "tinypandas"
pd = Tinypandas.new
## support seprate by tab format file
df = pd.read_table(ifile, sep: "\t") # def read_table(filepath_or_buffer : String, sep = "\t", delimiter : String = "\n", header : HeaderType = 0, index_col : IndexColType = 0, comment : String|Regex = "#", skiprows : SkiprowsType = false, skip_blank_lines : Bool = true)
puts "df is #{df}\n"
puts "df.to_str is\n#{df.to_str}\n"
puts "df[A2][B3] is #{df["A2"]["B3"]}\n"
puts "df[df[A2]>=5].to_str is"
puts df[df["A2"]>=5].to_str
puts "df[df[A3]==9][A2].to_str is "
puts df[df["A3"]==9]["A2"].to_str
puts "df[df[A3]>=3][A2].to_str is "
puts df[df["A3"]>=3]["A2"].to_str
t = df["A2"]
puts "t = df[A2]is #{t}"
puts "t>2 is #{t>2}"
puts "df.t.to_str is\n#{df.t.to_str}"
puts "df.t[B3][A1] is "
puts df.t["B3"]["A1"]
## support vcf format file
df = pd.load_vcf("demo.vcf")
puts "df.head(1).to_s is\n"
puts df.head(1).to_s
puts "\n"
## support csv format file
df = pd.load_csv("sample.csv")
puts "df is #{df}\n"
puts "df.to_str is\n#{df.to_str}\n"
puts "df[col2][2] is #{df["col2"]["2"]}\n"
## convert Array(Array) to DataFrame
data = [[1,2,3],[4,5,6],[6,7,8]]
df = DataFrame.new(data, columns: ["c1","c2","c3"]) # read_array_by_row: true
puts "\nArray(Array()):#{data} to DataFrame:\n#{df.to_s}"
## read Hash(String, Array()) as DataFrame
data = {"c1"=>[1,2,3], "c2"=>[4,5,6], "c3"=>[6,7,8]}
df = DataFrame.new(data)
puts "\nHash(String, Array()):#{data} to DataFrame:\n#{df.to_s}"
then go to example cd example; crystal build test.cr --release
$cat demo.xls
# note
A1 A3 A2
B1 1 3 2
B2 7 2 8
B3 4 9 5
then ./test demo.xls
or ./test demo.xls.gz
will get this:
## support seprate by tab format file
intpu file demo.xls
df is DataFrame(@dict={"A1" => Series(@dict={"B1" => 1, "B2" => 7, "B3" => 4}), "A3" => Series(@dict={"B1" => 3, "B2" => 2, "B3" => 9}), "A2" => Series(@dict={"B1" => 2, "B2" => 8, "B3" => 5})}, @index=["B1", "B2", "B3"], @columns=["A1", "A3", "A2"])
df.to_str is
A1 A3 A2
B1 1 3 2
B2 7 2 8
B3 4 9 5
df[A2][B3] is 5
df[df[A2]>=5].to_str is
A1 A3 A2
B2 7 2 8
B3 4 9 5
df[df[A3]==9][A2].to_str is
B3 5
df[df[A3]>=3][A2].to_str is
B1 2
B3 5
t = df[A2]is Series(@dict={"B1" => 2, "B2" => 8, "B3" => 5})
t>2 is Series(@dict={"B2" => 8, "B3" => 5})
df.t.to_str is
B1 B2 B3
A1 1 7 4
A3 3 2 9
A2 2 8 5
df.t[B3][A1] is
4
## support vcf format file
df.head(1).to_s is
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099
0 MT 10 . T C 100 fa VT=S;AC=3 GT 0 0 0
## support csv format file
df is DataFrame(@dict={"date" => Series(@dict={"0" => "2020-02-01 12:00:02", "1" => "2020-02-01 12:00:07", "2" => "2020-02-01 12:00:12", "3" => "2020-02-01 12:00:17", "4" => "2020-02-01 12:00:22", "5" => "2020-02-01 12:00:27", "6" => "2020-02-01 12:00:32", "7" => "2020-02-01 12:00:37"}), "col1" => Series(@dict={"0" => 66808, "1" => 66873, "2" => 66875, "3" => 66874, "4" => 66881, "5" => 66858, "6" => 66905, "7" => 66885}), "col2" => Series(@dict={"0" => 0.68, "1" => 0.67, "2" => 0.65, "3" => 0.67, "4" => 0.67, "5" => 0.66, "6" => 0.64, "7" => 0.66}), "col3" => Series(@dict={"0" => "TRUE", "1" => "FALSE", "2" => "TRUE", "3" => "FALSE", "4" => "TRUE", "5" => "FALSE", "6" => "TRUE", "7" => "FALSE"}), "col4" => Series(@dict={"0" => "str1", "1" => "str2", "2" => "str3", "3" => "str4", "4" => "str5", "5" => "str6", "6" => "str7", "7" => "str8"})}, @index=["0", "1", "2", "3", "4", "5", "6", "7"], @columns=["date", "col1", "col2", "col3", "col4"])
df.to_str is
date col1 col2 col3 col4
0 2020-02-01 12:00:02 66808 0.68 TRUE str1
1 2020-02-01 12:00:07 66873 0.67 FALSE str2
2 2020-02-01 12:00:12 66875 0.65 TRUE str3
3 2020-02-01 12:00:17 66874 0.67 FALSE str4
4 2020-02-01 12:00:22 66881 0.67 TRUE str5
5 2020-02-01 12:00:27 66858 0.66 FALSE str6
6 2020-02-01 12:00:32 66905 0.64 TRUE str7
7 2020-02-01 12:00:37 66885 0.66 FALSE str8
df[col2][2] is 0.65
Array(Array()):[[1, 2, 3], [4, 5, 6], [6, 7, 8]] to DataFrame:
c1 c2 c3
0 1 2 3
1 4 5 6
2 6 7 8
Hash(String, Array()):{"c1" => [1, 2, 3], "c2" => [4, 5, 6], "c3" => [6, 7, 8]} to DataFrame:
c1 c2 c3
0 1 4 6
1 2 5 7
2 3 6 8
TODO: Write usage instructions here
TODO: Write development instructions here
- Fork it (https://github.com/orangeSi/tinypandas/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
- orangeSi - creator and maintainer